29949ccfbb
https://source.android.com/docs/security/bulletin/2023-08-01 CVE-2023-21264 CVE-2020-29374 * tag 'ASB-2023-08-05_11-5.4' of https://android.googlesource.com/kernel/common: UPSTREAM: media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*() ANDROID: ABI: Update allowed list for QCOM UPSTREAM: usb: gadget: udc: renesas_usb3: Fix use after free bug in renesas_usb3_remove due to race condition UPSTREAM: x86/mm: Avoid using set_pgd() outside of real PGD pages UPSTREAM: net/sched: flower: fix possible OOB write in fl_set_geneve_opt() Linux 5.4.249 xfs: verify buffer contents when we skip log replay mm: make wait_on_page_writeback() wait for multiple pending writebacks mm: fix VM_BUG_ON(PageTail) and BUG_ON(PageWriteback) i2c: imx-lpi2c: fix type char overflow issue when calculating the clock cycle x86/apic: Fix kernel panic when booting with intremap=off and x2apic_phys drm/radeon: fix race condition UAF in radeon_gem_set_domain_ioctl drm/exynos: fix race condition UAF in exynos_g2d_exec_ioctl drm/exynos: vidi: fix a wrong error return ARM: dts: Fix erroneous ADS touchscreen polarities ASoC: nau8824: Add quirk to active-high jack-detect s390/cio: unregister device when the only path is gone usb: gadget: udc: fix NULL dereference in remove() nfcsim.c: Fix error checking for debugfs_create_dir media: cec: core: don't set last_initiator if tx in progress arm64: Add missing Set/Way CMO encodings HID: wacom: Add error check to wacom_parse_and_register() scsi: target: iscsi: Prevent login threads from racing between each other sch_netem: acquire qdisc lock in netem_change() Revert "net: phy: dp83867: perform soft reset and retain established link" netfilter: nfnetlink_osf: fix module autoload netfilter: nf_tables: disallow element updates of bound anonymous sets be2net: Extend xmit workaround to BE3 chip net: dsa: mt7530: fix trapping frames on non-MT7621 SoC MT7530 switch ipvs: align inner_mac_header for encapsulation mmc: usdhi60rol0: fix deferred probing mmc: sh_mmcif: fix deferred probing mmc: sdhci-acpi: fix deferred probing mmc: omap_hsmmc: fix deferred probing mmc: omap: fix deferred probing mmc: mvsdio: fix deferred probing mmc: mvsdio: convert to devm_platform_ioremap_resource mmc: mtk-sd: fix deferred probing net: qca_spi: Avoid high load if QCA7000 is not available xfrm: Linearize the skb after offloading if needed. ieee802154: hwsim: Fix possible memory leaks rcu: Upgrade rcu_swap_protected() to rcu_replace_pointer() x86/mm: Avoid using set_pgd() outside of real PGD pages cifs: Fix potential deadlock when updating vol in cifs_reconnect() cifs: Merge is_path_valid() into get_normalized_path() cifs: Introduce helpers for finding TCP connection cifs: Get rid of kstrdup_const()'d paths cifs: Clean up DFS referral cache nilfs2: prevent general protection fault in nilfs_clear_dirty_page() writeback: fix dereferencing NULL mapping->host on writeback_page_template ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN mmc: meson-gx: remove redundant mmc_request_done() call from irq context cgroup: Do not corrupt task iteration when rebinding subsystem PCI: hv: Fix a race condition bug in hv_pci_query_relations() Drivers: hv: vmbus: Fix vmbus_wait_for_unload() to scan present CPUs nilfs2: fix buffer corruption due to concurrent device reads media: dvb-core: Fix use-after-free due to race at dvb_register_device() media: dvbdev: fix error logic at dvb_register_device() media: dvbdev: Fix memleak in dvb_register_device tick/common: Align tick period during sched_timer setup x86/purgatory: remove PGO flags tracing: Add tracing_reset_all_online_cpus_unlocked() function epoll: ep_autoremove_wake_function should use list_del_init_careful list: add "list_del_init_careful()" to go with "list_empty_careful()" mm: rewrite wait_on_page_bit_common() logic nilfs2: reject devices with insufficient block count Revert "neighbour: Replace zero-length array with flexible-array member" Revert "neighbour: fix unaligned access to pneigh_entry" Revert "tcp: deny tcp_disconnect() when threads are waiting" Linux 5.4.248 mmc: block: ensure error propagation for non-blk drm/nouveau/kms: Fix NULL pointer dereference in nouveau_connector_detect_depth neighbour: delete neigh_lookup_nodev as not used net: Remove unused inline function dst_hold_and_use() neighbour: Remove unused inline function neigh_key_eq16() afs: Fix vlserver probe RTT handling selftests/ptp: Fix timestamp printf format for PTP_SYS_OFFSET net: tipc: resize nlattr array to correct size net: lapbether: only support ethernet devices net/sched: cls_api: Fix lockup on flushing explicitly created chain drm/nouveau: add nv_encoder pointer check for NULL drm/nouveau/kms: Don't change EDID when it hasn't actually changed drm/nouveau/dp: check for NULL nv_connector->native_mode igb: fix nvm.ops.read() error handling sctp: fix an error code in sctp_sf_eat_auth() ipvlan: fix bound dev checking for IPv6 l3s mode IB/isert: Fix incorrect release of isert connection IB/isert: Fix possible list corruption in CMA handler IB/isert: Fix dead lock in ib_isert IB/uverbs: Fix to consider event queue closing also upon non-blocking mode iavf: remove mask from iavf_irq_enable_queues() RDMA/rxe: Fix the use-before-initialization error of resp_pkts RDMA/rxe: Removed unused name from rxe_task struct RDMA/rxe: Remove the unused variable obj net/sched: cls_u32: Fix reference counter leak leading to overflow ping6: Fix send to link-local addresses with VRF. netfilter: nfnetlink: skip error delivery on batch in case of ENOMEM spi: fsl-dspi: avoid SCK glitches with continuous transfers spi: spi-fsl-dspi: Remove unused chip->void_write_data usb: dwc3: gadget: Reset num TRBs before giving back the request serial: lantiq: add missing interrupt ack USB: serial: option: add Quectel EM061KGL series Remove DECnet support from kernel ALSA: hda/realtek: Add a quirk for Compaq N14JP6 net: usb: qmi_wwan: add support for Compal RXM-G1 RDMA/uverbs: Restrict usage of privileged QKEYs nouveau: fix client work fence deletion race powerpc/purgatory: remove PGO flags kexec: support purgatories with .text.hot sections nilfs2: fix possible out-of-bounds segment allocation in resize ioctl nilfs2: fix incomplete buffer cleanup in nilfs_btnode_abort_change_key() nios2: dts: Fix tse_mac "max-frame-size" property ocfs2: check new file size on fallocate call ocfs2: fix use-after-free when unmounting read-only filesystem drm:amd:amdgpu: Fix missing buffer object unlock in failure path xen/blkfront: Only check REQ_FUA for writes mips: Move initrd_start check after initrd address sanitisation. MIPS: Alchemy: fix dbdma2 parisc: Flush gatt writes and adjust gatt mask in parisc_agp_mask_memory() parisc: Improve cache flushing for PCXL in arch_sync_dma_for_cpu() btrfs: handle memory allocation failure in btrfs_csum_one_bio power: supply: Fix logic checking if system is running from battery irqchip/meson-gpio: Mark OF related data as maybe unused regulator: Fix error checking for debugfs_create_dir platform/x86: asus-wmi: Ignore WMI events with codes 0x7B, 0xC0 power: supply: Ratelimit no data debug output ARM: dts: vexpress: add missing cache properties power: supply: bq27xxx: Use mod_delayed_work() instead of cancel() + schedule() power: supply: sc27xx: Fix external_power_changed race power: supply: ab8500: Fix external_power_changed race s390/dasd: Use correct lock while counting channel queue length dasd: refactor dasd_ioctl_information KEYS: asymmetric: Copy sig and digest in public_key_verify_signature() test_firmware: fix a memory leak with reqs buffer Revert "firmware: arm_sdei: Fix sleep from invalid context BUG" Revert "PM: domains: Fix up terminology with parent/child" Revert "PM: domains: Restore comment indentation for generic_pm_domain.child_links" Revert "scripts/gdb: bail early if there are no generic PD" Revert "uapi/linux/const.h: prefer ISO-friendly __typeof__" Revert "netfilter: nf_tables: don't write table validation state without mutex" Linux 5.4.247 Revert "staging: rtl8192e: Replace macro RTL_PCI_DEVICE with PCI_DEVICE" mtd: spinand: macronix: Add support for MX35LFxGE4AD btrfs: unset reloc control if transaction commit fails in prepare_to_relocate() btrfs: check return value of btrfs_commit_transaction in relocation rbd: get snapshot context after exclusive lock is ensured to be held drm/atomic: Don't pollute crtc_state->mode_blob with error pointers cifs: handle empty list of targets in cifs_reconnect() cifs: get rid of unused parameter in reconn_setup_dfs_targets() ext4: only check dquot_initialize_needed() when debugging eeprom: at24: also select REGMAP i2c: sprd: Delete i2c adapter in .remove's error path bonding (gcc13): synchronize bond_{a,t}lb_xmit() types usb: usbfs: Use consistent mmap functions usb: usbfs: Enforce page requirements for mmap pinctrl: meson-axg: add missing GPIOA_18 gpio group rbd: move RBD_OBJ_FLAG_COPYUP_ENABLED flag setting Bluetooth: Fix use-after-free in hci_remove_ltk/hci_remove_irk ceph: fix use-after-free bug for inodes when flushing capsnaps can: j1939: avoid possible use-after-free when j1939_can_rx_register fails can: j1939: change j1939_netdev_lock type to mutex can: j1939: j1939_sk_send_loop_abort(): improved error queue handling in J1939 Socket drm/amdgpu: fix xclk freq on CHIP_STONEY ALSA: hda/realtek: Add Lenovo P3 Tower platform ALSA: hda/realtek: Add a quirk for HP Slim Desktop S01 Input: psmouse - fix OOB access in Elantech protocol Input: xpad - delete a Razer DeathAdder mouse VID/PID entry batman-adv: Broken sync while rescheduling delayed work bnxt_en: Query default VLAN before VNIC setup on a VF lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release() net: sched: fix possible refcount leak in tc_chain_tmplt_add() net: sched: move rtm_tca_policy declaration to include file rfs: annotate lockless accesses to RFS sock flow table rfs: annotate lockless accesses to sk->sk_rxhash netfilter: ipset: Add schedule point in call_ad(). netfilter: conntrack: fix NULL pointer dereference in nf_confirm_cthelper Bluetooth: L2CAP: Add missing checks for invalid DCID Bluetooth: Fix l2cap_disconnect_req deadlock net: dsa: lan9303: allow vid != 0 in port_fdb_{add|del} methods neighbour: fix unaligned access to pneigh_entry neighbour: Replace zero-length array with flexible-array member spi: qup: Request DMA before enabling clocks i40e: fix build warnings in i40e_alloc.h i40iw: fix build warning in i40iw_manage_apbvt() block/blk-iocost (gcc13): keep large values in a new enum blk-iocost: avoid 64-bit division in ioc_timer_fn Linux 5.4.246 drm/edid: fix objtool warning in drm_cvt_modes() wifi: rtlwifi: 8192de: correct checking of IQK reload drm/edid: Fix uninitialized variable in drm_cvt_modes() RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds RDMA/bnxt_re: Remove set but not used variable 'dev_attr' scsi: dpt_i2o: Do not process completions with invalid addresses scsi: dpt_i2o: Remove broken pass-through ioctl (I2OUSERCMD) regmap: Account for register length when chunking test_firmware: fix the memory leak of the allocated firmware buffer fbcon: Fix null-ptr-deref in soft_cursor ext4: add lockdep annotations for i_data_sem for ea_inode's ext4: disallow ea_inodes with extended attributes ext4: set lockdep subclass for the ea_inode in ext4_xattr_inode_cache_find() ext4: add EA_INODE checking to ext4_iget() tracing/probe: trace_probe_primary_from_call(): checked list_first_entry selinux: don't use make's grouped targets feature yet tty: serial: fsl_lpuart: use UARTCTRL_TXINV to send break instead of UARTCTRL_SBK mmc: vub300: fix invalid response handling wifi: rtlwifi: remove always-true condition pointed out by GCC 12 lib/dynamic_debug.c: use address-of operator on section symbols treewide: Remove uninitialized_var() usage kernel/extable.c: use address-of operator on section symbols eth: sun: cassini: remove dead code gcc-12: disable '-Wdangling-pointer' warning for now ACPI: thermal: drop an always true check x86/boot: Wrap literal addresses in absolute_pointer() flow_dissector: work around stack frame size warning ata: libata-scsi: Use correct device no in ata_find_dev() scsi: stex: Fix gcc 13 warnings misc: fastrpc: reject new invocations during device removal misc: fastrpc: return -EPIPE to invocations on device removal usb: gadget: f_fs: Add unbind event before functionfs_unbind net: usb: qmi_wwan: Set DTR quirk for BroadMobi BM818 iio: dac: build ad5758 driver when AD5758 is selected iio: dac: mcp4725: Fix i2c_master_send() return value handling iio: light: vcnl4035: fixed chip ID check HID: wacom: avoid integer overflow in wacom_intuos_inout() HID: google: add jewel USB id iio: adc: mxs-lradc: fix the order of two cleanup operations mailbox: mailbox-test: fix a locking issue in mbox_test_message_write() atm: hide unused procfs functions ALSA: oss: avoid missing-prototype warnings netfilter: conntrack: define variables exp_nat_nla_policy and any_addr with CONFIG_NF_NAT wifi: b43: fix incorrect __packed annotation scsi: core: Decrease scsi_device's iorequest_cnt if dispatch failed arm64/mm: mark private VM_FAULT_X defines as vm_fault_t ARM: dts: stm32: add pin map for CAN controller on stm32f7 wifi: rtl8xxxu: fix authentication timeout due to incorrect RCR value media: dvb-core: Fix use-after-free due to race condition at dvb_ca_en50221 media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*() media: dvb-core: Fix use-after-free due on race condition at dvb_net media: mn88443x: fix !CONFIG_OF error by drop of_match_ptr from ID table media: ttusb-dec: fix memory leak in ttusb_dec_exit_dvb() media: dvb_ca_en50221: fix a size write bug media: netup_unidvb: fix irq init by register it at the end of probe media: dvb-usb: dw2102: fix uninit-value in su3000_read_mac_address media: dvb-usb: digitv: fix null-ptr-deref in digitv_i2c_xfer() media: dvb-usb-v2: rtl28xxu: fix null-ptr-deref in rtl28xxu_i2c_xfer media: dvb-usb-v2: ce6230: fix null-ptr-deref in ce6230_i2c_master_xfer() media: dvb-usb-v2: ec168: fix null-ptr-deref in ec168_i2c_xfer() media: dvb-usb: az6027: fix three null-ptr-deref in az6027_i2c_xfer() media: dvb_demux: fix a bug for the continuity counter ASoC: ssm2602: Add workaround for playback distortions xfrm: Check if_id in inbound policy/secpath match ASoC: dwc: limit the number of overrun messages nbd: Fix debugfs_create_dir error checking fbdev: stifb: Fix info entry in sti_struct on error path fbdev: modedb: Add 1920x1080 at 60 Hz video mode media: rcar-vin: Select correct interrupt mode for V4L2_FIELD_ALTERNATE ARM: 9295/1: unwind:fix unwind abort for uleb128 case mailbox: mailbox-test: Fix potential double-free in mbox_test_message_write() watchdog: menz069_wdt: fix watchdog initialisation mtd: rawnand: marvell: don't set the NAND frequency select mtd: rawnand: marvell: ensure timing values are written net: dsa: mv88e6xxx: Increase wait after reset deactivation net/sched: flower: fix possible OOB write in fl_set_geneve_opt() udp6: Fix race condition in udp6_sendmsg & connect net/netlink: fix NETLINK_LIST_MEMBERSHIPS length report ocfs2/dlm: move BITS_TO_BYTES() to bitops.h for wider use net: sched: fix NULL pointer dereference in mq_attach net/sched: Prohibit regrafting ingress or clsact Qdiscs net/sched: Reserve TC_H_INGRESS (TC_H_CLSACT) for ingress (clsact) Qdiscs net/sched: sch_clsact: Only create under TC_H_CLSACT net/sched: sch_ingress: Only create under TC_H_INGRESS tcp: Return user_mss for TCP_MAXSEG in CLOSE/LISTEN state if user_mss set tcp: deny tcp_disconnect() when threads are waiting af_packet: do not use READ_ONCE() in packet_bind() mtd: rawnand: ingenic: fix empty stub helper definitions amd-xgbe: fix the false linkup in xgbe_phy_status af_packet: Fix data-races of pkt_sk(sk)->num. netrom: fix info-leak in nr_write_internal() net/mlx5: fw_tracer, Fix event handling dmaengine: pl330: rename _start to prevent build error iommu/amd: Don't block updates to GATag if guest mode is on iommu/rockchip: Fix unwind goto issue RDMA/bnxt_re: Fix return value of bnxt_re_process_raw_qp_pkt_rx RDMA/bnxt_re: Refactor queue pair creation code RDMA/bnxt_re: Enable SRIOV VF support on Broadcom's 57500 adapter series RDMA/efa: Fix unsupported page sizes in device Linux 5.4.245 netfilter: ctnetlink: Support offloaded conntrack entry deletion ipv{4,6}/raw: fix output xfrm lookup wrt protocol binder: fix UAF caused by faulty buffer cleanup bluetooth: Add cmd validity checks at the start of hci_sock_ioctl() io_uring: have io_kill_timeout() honor the request references io_uring: don't drop completion lock before timer is fully initialized io_uring: always grab lock in io_cancel_async_work() cdc_ncm: Fix the build warning net/mlx5: Devcom, serialize devcom registration net/mlx5: devcom only supports 2 ports fs: fix undefined behavior in bit shift for SB_NOUSER power: supply: bq24190: Call power_supply_changed() after updating input current power: supply: core: Refactor power_supply_set_input_current_limit_from_supplier() power: supply: bq27xxx: After charger plug in/out wait 0.5s for things to stabilize net: cdc_ncm: Deal with too low values of dwNtbOutMaxSize cdc_ncm: Implement the 32-bit version of NCM Transfer Block Linux 5.4.244 3c589_cs: Fix an error handling path in tc589_probe() net/mlx5: Devcom, fix error flow in mlx5_devcom_register_device net/mlx5: Fix error message when failing to allocate device memory forcedeth: Fix an error handling path in nv_probe() ASoC: Intel: Skylake: Fix declaration of enum skl_ch_cfg x86/show_trace_log_lvl: Ensure stack pointer is aligned, again xen/pvcalls-back: fix double frees with pvcalls_new_active_socket() coresight: Fix signedness bug in tmc_etr_buf_insert_barrier_packet() power: supply: sbs-charger: Fix INHIBITED bit for Status reg power: supply: bq27xxx: Fix poll_interval handling and races on remove power: supply: bq27xxx: Fix I2C IRQ race on remove power: supply: bq27xxx: Fix bq27xxx_battery_update() race condition power: supply: leds: Fix blink to LED on transition ipv6: Fix out-of-bounds access in ipv6_find_tlv() bpf: Fix mask generation for 32-bit narrow loads of 64-bit fields selftests: fib_tests: mute cleanup error message net: fix skb leak in __skb_tstamp_tx() media: radio-shark: Add endpoint checks USB: sisusbvga: Add endpoint checks USB: core: Add routines for endpoint checks in old drivers udplite: Fix NULL pointer dereference in __sk_mem_raise_allocated(). net: fix stack overflow when LRO is disabled for virtual interfaces fbdev: udlfb: Fix endpoint check debugobjects: Don't wake up kswapd from fill_pool() x86/topology: Fix erroneous smp_num_siblings on Intel Hybrid platforms parisc: Fix flush_dcache_page() for usage from irq context selftests/memfd: Fix unknown type name build failure x86/mm: Avoid incomplete Global INVLPG flushes btrfs: use nofs when cleaning up aborted transactions gpio: mockup: Fix mode of debugfs files parisc: Allow to reboot machine after system halt parisc: Handle kgdb breakpoints only in kernel context m68k: Move signal frame following exception on 68020/030 ALSA: hda/realtek: Enable headset onLenovo M70/M90 ALSA: hda/ca0132: add quirk for EVGA X299 DARK mt76: mt7615: Fix build with older compilers spi: fsl-cpm: Use 16 bit mode for large transfers with even size spi: fsl-spi: Re-organise transfer bits_per_word adaptation watchdog: sp5100_tco: Immediately trigger upon starting. s390/qdio: fix do_sqbs() inline assembly constraint s390/qdio: get rid of register asm vc_screen: reload load of struct vc_data pointer in vcs_write() to avoid UAF vc_screen: rewrite vcs_size to accept vc, not inode usb: gadget: u_ether: Fix host MAC address case usb: gadget: u_ether: Convert prints to device prints lib/string_helpers: Introduce string_upper() and string_lower() helpers HID: wacom: add three styli to wacom_intuos_get_tool_type HID: wacom: Add new Intuos Pro Small (PTH-460) device IDs HID: wacom: Force pen out of prox if no events have been received in a while netfilter: nf_tables: hold mutex on netns pre_exit path netfilter: nf_tables: validate NFTA_SET_ELEM_OBJREF based on NFT_SET_OBJECT flag netfilter: nf_tables: stricter validation of element data netfilter: nf_tables: allow up to 64 bytes in the set element data area netfilter: nf_tables: add nft_setelem_parse_key() netfilter: nf_tables: validate registers coming from userspace. netfilter: nftables: statify nft_parse_register() netfilter: nftables: add nft_parse_register_store() and use it netfilter: nftables: add nft_parse_register_load() and use it nilfs2: fix use-after-free bug of nilfs_root in nilfs_evict_inode() powerpc/64s/radix: Fix soft dirty tracking tpm/tpm_tis: Disable interrupts for more Lenovo devices ceph: force updating the msg pointer in non-split case serial: Add support for Advantech PCI-1611U card statfs: enforce statfs[64] structure initialization KVM: x86: do not report a vCPU as preempted outside instruction boundaries can: kvaser_pciefd: Disable interrupts in probe error path can: kvaser_pciefd: Do not send EFLUSH command on TFD interrupt can: kvaser_pciefd: Clear listen-only bit if not explicitly requested can: kvaser_pciefd: Empty SRB buffer in probe can: kvaser_pciefd: Call request_irq() before enabling interrupts can: kvaser_pciefd: Set CAN_STATE_STOPPED in kvaser_pciefd_stop() can: j1939: recvmsg(): allow MSG_CMSG_COMPAT flag ALSA: hda/realtek: Add quirk for 2nd ASUS GU603 ALSA: hda/realtek: Add a quirk for HP EliteDesk 805 ALSA: hda: Add NVIDIA codec IDs a3 through a7 to patch table ALSA: hda: Fix Oops by 9.1 surround channel names usb: typec: altmodes/displayport: fix pin_assignment_show usb: dwc3: debugfs: Resume dwc3 before accessing registers USB: UHCI: adjust zhaoxin UHCI controllers OverCurrent bit value usb-storage: fix deadlock when a scsi command timeouts more than once USB: usbtmc: Fix direction for 0-length ioctl control messages vlan: fix a potential uninit-value in vlan_dev_hard_start_xmit() igb: fix bit_shift to be in [1..8] range cassini: Fix a memory leak in the error handling path of cas_init_one() wifi: iwlwifi: mvm: don't trust firmware n_channels net: bcmgenet: Restore phy_stop() depending upon suspend/close net: bcmgenet: Remove phy_stop() from bcmgenet_netif_stop() net: nsh: Use correct mac_offset to unwind gso skb in nsh_gso_segment() drm/exynos: fix g2d_open/close helper function definitions media: netup_unidvb: fix use-after-free at del_timer() net: hns3: fix reset delay time to avoid configuration timeout net: hns3: fix sending pfc frames after reset issue erspan: get the proto with the md version for collect_md ip_gre, ip6_gre: Fix race condition on o_seqno in collect_md mode ip6_gre: Make o_seqno start from 0 in native mode ip6_gre: Fix skb_under_panic in __gre6_xmit() serial: arc_uart: fix of_iomap leak in `arc_serial_probe` vsock: avoid to close connected socket after the timeout ALSA: firewire-digi00x: prevent potential use after free net: fec: Better handle pm_runtime_get() failing in .remove() af_key: Reject optional tunnel/BEET mode templates in outbound policies cpupower: Make TSC read per CPU for Mperf monitor ASoC: fsl_micfil: register platform component before registering cpu dai btrfs: fix space cache inconsistency after error loading it from disk btrfs: replace calls to btrfs_find_free_ino with btrfs_find_free_objectid mfd: dln2: Fix memory leak in dln2_probe() phy: st: miphy28lp: use _poll_timeout functions for waits Input: xpad - add constants for GIP interface numbers iommu/arm-smmu-v3: Acknowledge pri/event queue overflow if any clk: tegra20: fix gcc-7 constant overflow warning RDMA/core: Fix multiple -Warray-bounds warnings recordmcount: Fix memory leaks in the uwrite function sched: Fix KCSAN noinstr violation mcb-pci: Reallocate memory region to avoid memory overlapping serial: 8250: Reinit port->pm on port specific driver unbind usb: typec: tcpm: fix multiple times discover svids error HID: wacom: generic: Set battery quirk only when we see battery data spi: spi-imx: fix MX51_ECSPI_* macros when cs > 3 HID: logitech-hidpp: Reconcile USB and Unifying serials HID: logitech-hidpp: Don't use the USB serial for USB devices staging: rtl8192e: Replace macro RTL_PCI_DEVICE with PCI_DEVICE Bluetooth: L2CAP: fix "bad unlock balance" in l2cap_disconnect_rsp wifi: iwlwifi: dvm: Fix memcpy: detected field-spanning write backtrace wifi: iwlwifi: pcie: Fix integer overflow in iwl_write_to_user_buf wifi: iwlwifi: pcie: fix possible NULL pointer dereference samples/bpf: Fix fout leak in hbm's run_bpf_prog f2fs: fix to drop all dirty pages during umount() if cp_error is set ext4: Fix best extent lstart adjustment logic in ext4_mb_new_inode_pa() ext4: set goal start correctly in ext4_mb_normalize_request gfs2: Fix inode height consistency check scsi: message: mptlan: Fix use after free bug in mptlan_remove() due to race condition lib: cpu_rmap: Avoid use after free on rmap->obj array entries scsi: target: iscsit: Free cmds before session free net: Catch invalid index in XPS mapping net: pasemi: Fix return type of pasemi_mac_start_tx() scsi: lpfc: Prevent lpfc_debugfs_lockstat_write() buffer overflow ext2: Check block size validity during mount wifi: brcmfmac: cfg80211: Pass the PMK in binary instead of hex ACPICA: ACPICA: check null return of ACPI_ALLOCATE_ZEROED in acpi_db_display_objects ACPICA: Avoid undefined behavior: applying zero offset to null pointer drm/tegra: Avoid potential 32-bit integer overflow ACPI: EC: Fix oops when removing custom query handlers firmware: arm_sdei: Fix sleep from invalid context BUG memstick: r592: Fix UAF bug in r592_remove due to race condition regmap: cache: Return error in cache sync operations for REGCACHE_NONE drm/amd/display: Use DC_LOG_DC in the trasform pixel function fs: hfsplus: remove WARN_ON() from hfsplus_cat_{read,write}_inode() af_unix: Fix data races around sk->sk_shutdown. af_unix: Fix a data race of sk->sk_receive_queue->qlen. net: datagram: fix data-races in datagram_poll() ipvlan:Fix out-of-bounds caused by unclear skb->cb net: add vlan_get_protocol_and_depth() helper net: tap: check vlan with eth_type_vlan() method net: annotate sk->sk_err write from do_recvmmsg() netlink: annotate accesses to nlk->cb_running netfilter: conntrack: fix possible bug_on with enable_hooks=1 net: Fix load-tearing on sk->sk_stamp in sock_recv_cmsgs(). linux/dim: Do nothing if no time delta between samples ARM: 9296/1: HP Jornada 7XX: fix kernel-doc warnings drm/mipi-dsi: Set the fwnode for mipi_dsi_device driver core: add a helper to setup both the of_node and fwnode of a device Linux 5.4.243 drm/amd/display: Fix hang when skipping modeset mm/page_alloc: fix potential deadlock on zonelist_update_seq seqlock drm/exynos: move to use request_irq by IRQF_NO_AUTOEN flag drm/msm/adreno: Fix null ptr access in adreno_gpu_cleanup() firmware: raspberrypi: fix possible memory leak in rpi_firmware_probe() drm/msm: Fix double pm_runtime_disable() call PM: domains: Restore comment indentation for generic_pm_domain.child_links printk: declare printk_deferred_{enter,safe}() in include/linux/printk.h PCI: pciehp: Fix AB-BA deadlock between reset_lock and device_lock PCI: pciehp: Use down_read/write_nested(reset_lock) to fix lockdep errors drbd: correctly submit flush bio on barrier serial: 8250: Fix serial8250_tx_empty() race with DMA Tx tty: Prevent writing chars during tcsetattr TCSADRAIN/FLUSH ext4: fix invalid free tracking in ext4_xattr_move_to_block() ext4: remove a BUG_ON in ext4_mb_release_group_pa() ext4: bail out of ext4_xattr_ibody_get() fails for any reason ext4: add bounds checking in get_max_inline_xattr_value_size() ext4: fix deadlock when converting an inline directory in nojournal mode ext4: improve error recovery code paths in __ext4_remount() ext4: fix data races when using cached status extents ext4: avoid a potential slab-out-of-bounds in ext4_group_desc_csum ext4: fix WARNING in mb_find_extent HID: wacom: insert timestamp to packed Bluetooth (BT) events HID: wacom: Set a default resolution for older tablets drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspend drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx ras drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini() drm/panel: otm8009a: Set backlight parent to panel device f2fs: fix potential corruption when moving a directory ARM: dts: s5pv210: correct MIPI CSIS clock name ARM: dts: exynos: fix WM8960 clock name in Itop Elite remoteproc: st: Call of_node_put() on iteration error remoteproc: stm32: Call of_node_put() on iteration error sh: nmi_debug: fix return value of __setup handler sh: init: use OF_EARLY_FLATTREE for early init sh: math-emu: fix macro redefined warning inotify: Avoid reporting event with invalid wd platform/x86: touchscreen_dmi: Add info for the Dexp Ursus KX210i cifs: fix pcchunk length type in smb2_copychunk_range btrfs: print-tree: parent bytenr must be aligned to sector size btrfs: don't free qgroup space unless specified btrfs: fix btrfs_prev_leaf() to not return the same key twice perf symbols: Fix return incorrect build_id size in elf_read_build_id() perf map: Delete two variable initialisations before null pointer checks in sort__sym_from_cmp() perf vendor events power9: Remove UTF-8 characters from JSON files virtio_net: suppress cpu stall when free_unused_bufs virtio_net: split free_unused_bufs() net: dsa: mt7530: fix corrupt frames using trgmii on 40 MHz XTAL MT7621 ALSA: caiaq: input: Add error handling for unsupported input methods in `snd_usb_caiaq_input_init` drm/amdgpu: add a missing lock for AMDGPU_SCHED af_packet: Don't send zero-byte data in packet_sendmsg_spkt(). ionic: remove noise from ethtool rxnfc error msg rxrpc: Fix hard call timeout units net/sched: act_mirred: Add carrier check writeback: fix call of incorrect macro net: dsa: mv88e6xxx: add mv88e6321 rsvd2cpu sit: update dev->needed_headroom in ipip6_tunnel_bind_dev() net/sched: cls_api: remove block_cb from driver_list before freeing net/ncsi: clear Tx enable mode when handling a Config required AEN relayfs: fix out-of-bounds access in relay_file_read kernel/relay.c: fix read_pos error when multiple readers crypto: safexcel - Cleanup ring IRQ workqueues on load failure crypto: inside-secure - irq balance dm verity: fix error handling for check_at_most_once on FEC dm verity: skip redundant verity_handle_err() on I/O errors mailbox: zynqmp: Fix counts of child nodes mailbox: zynq: Switch to flexible array to simplify code tick/nohz: Fix cpu_is_hotpluggable() by checking with nohz subsystem nohz: Add TICK_DEP_BIT_RCU netfilter: nf_tables: deactivate anonymous set from preparation phase debugobject: Ensure pool refill (again) perf intel-pt: Fix CYC timestamps after standalone CBR perf auxtrace: Fix address filter entire kernel size dm ioctl: fix nested locking in table_clear() to remove deadlock concern dm flakey: fix a crash with invalid table line dm integrity: call kmem_cache_destroy() in dm_integrity_init() error path dm clone: call kmem_cache_destroy() in dm_clone_init() error path s390/dasd: fix hanging blockdevice after request requeue btrfs: scrub: reject unsupported scrub flags scripts/gdb: fix lx-timerlist for Python3 clk: rockchip: rk3399: allow clk_cifout to force clk_cifout_src to reparent wifi: rtl8xxxu: RTL8192EU always needs full init mailbox: zynqmp: Fix typo in IPI documentation mailbox: zynqmp: Fix IPI isr handling md/raid10: fix null-ptr-deref in raid10_sync_request nilfs2: fix infinite loop in nilfs_mdt_get_block() nilfs2: do not write dirty data after degenerating to read-only parisc: Fix argument pointer in real64_call_asm() afs: Fix updating of i_size with dv jump from server dmaengine: at_xdmac: do not enable all cyclic channels dmaengine: dw-edma: Fix to enable to issue dma request on DMA processing dmaengine: dw-edma: Fix to change for continuous transfer phy: tegra: xusb: Add missing tegra_xusb_port_unregister for usb2_port and ulpi_port pwm: mtk-disp: Disable shadow registers before setting backlight values pwm: mtk-disp: Adjust the clocks to avoid them mismatch pwm: mtk-disp: Don't check the return code of pwmchip_remove() dmaengine: mv_xor_v2: Fix an error code. leds: TI_LMU_COMMON: select REGMAP instead of depending on it ext4: fix use-after-free read in ext4_find_extent for bigalloc + inline openrisc: Properly store r31 to pt_regs on unhandled exceptions clocksource/drivers/davinci: Fix memory leak in davinci_timer_register when init fails clocksource: davinci: axe a pointless __GFP_NOFAIL clocksource/drivers/davinci: Avoid trailing '\n' hidden in pr_fmt() RDMA/mlx5: Use correct device num_ports when modify DC SUNRPC: remove the maximum number of retries in call_bind_status Input: raspberrypi-ts - fix refcount leak in rpi_ts_probe input: raspberrypi-ts: Release firmware handle when not needed firmware: raspberrypi: Introduce devm_rpi_firmware_get() firmware: raspberrypi: Keep count of all consumers NFSv4.1: Always send a RECLAIM_COMPLETE after establishing lease IB/hfi1: Fix SDMA mmu_rb_node not being evicted in LRU order RDMA/siw: Remove namespace check from siw_netdev_event() clk: add missing of_node_put() in "assigned-clocks" property parsing power: supply: generic-adc-battery: fix unit scaling rtc: meson-vrtc: Use ktime_get_real_ts64() to get the current time RDMA/mlx4: Prevent shift wrapping in set_user_sq_size() rtc: omap: include header for omap_rtc_power_off_program prototype RDMA/rdmavt: Delete unnecessary NULL check RDMA/siw: Fix potential page_array out of range access perf/core: Fix hardlockup failure caused by perf throttle powerpc/rtas: use memmove for potentially overlapping buffer copy macintosh: via-pmu-led: requires ATA to be set powerpc/sysdev/tsi108: fix resource printk format warnings powerpc/wii: fix resource printk format warnings powerpc/mpc512x: fix resource printk format warning macintosh/windfarm_smu_sat: Add missing of_node_put() spmi: Add a check for remove callback when removing a SPMI driver staging: rtl8192e: Fix W_DISABLE# does not work after stop/start serial: 8250: Add missing wakeup event reporting tty: serial: fsl_lpuart: adjust buffer length to the intended size firmware: stratix10-svc: Fix an NULL vs IS_ERR() bug in probe usb: mtu3: fix kernel panic at qmu transfer done irq handler usb: chipidea: fix missing goto in `ci_hdrc_probe` sh: sq: Fix incorrect element size for allocating bitmap buffer uapi/linux/const.h: prefer ISO-friendly __typeof__ spi: cadence-quadspi: fix suspend-resume implementations mtd: spi-nor: cadence-quadspi: Handle probe deferral while requesting DMA channel mtd: spi-nor: cadence-quadspi: Don't initialize rx_dma_complete on failure mtd: spi-nor: cadence-quadspi: Provide a way to disable DAC mode mtd: spi-nor: cadence-quadspi: Make driver independent of flash geometry scripts/gdb: bail early if there are no generic PD PM: domains: Fix up terminology with parent/child scripts/gdb: bail early if there are no clocks ia64: salinfo: placate defined-but-not-used warning ia64: mm/contig: fix section mismatch warning/error of: Fix modalias string generation vmci_host: fix a race condition in vmci_host_poll() causing GPF spi: fsl-spi: Fix CPM/QE mode Litte Endian spi: qup: Don't skip cleanup in remove's error path linux/vt_buffer.h: allow either builtin or modular for macros ASoC: es8316: Handle optional IRQ assignment ASoC: es8316: Use IRQF_NO_AUTOEN when requesting the IRQ genirq: Add IRQF_NO_AUTOEN for request_irq/nmi() PCI: imx6: Install the fault handler only on compatible match usb: gadget: udc: renesas_usb3: Fix use after free bug in renesas_usb3_remove due to race condition iio: light: max44009: add missing OF device matching fpga: bridge: fix kernel-doc parameter description usb: host: xhci-rcar: remove leftover quirk handling pstore: Revert pmsg_lock back to a normal mutex tcp/udp: Fix memleaks of sk and zerocopy skbs with TX timestamp. net: amd: Fix link leak when verifying config failed netlink: Use copy_to_user() for optval in netlink_getsockopt(). Revert "Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work" ipv4: Fix potential uninit variable access bug in __ip_make_skb() netfilter: nf_tables: don't write table validation state without mutex bpf: Don't EFAULT for getsockopt with optval=NULL ixgbe: Enable setting RSS table to default values ixgbe: Allow flow hash to be set via ethtool wifi: iwlwifi: mvm: check firmware response size wifi: iwlwifi: make the loop for card preparation effective md/raid10: fix memleak of md thread md: update the optimal I/O size on reshape md/raid10: fix memleak for 'conf->bio_split' md/raid10: fix leak of 'r10bio->remaining' for recovery bpf, sockmap: Revert buggy deadlock fix in the sockhash and sockmap nvme-fcloop: fix "inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage" nvme: fix async event trace event nvme: handle the persistent internal error AER bpf, sockmap: fix deadlocks in the sockhash and sockmap scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup() crypto: drbg - Only fail when jent is unavailable in FIPS mode crypto: drbg - make drbg_prepare_hrng() handle jent instantiation errors bpftool: Fix bug for long instructions in program CFG dumps wifi: rtlwifi: fix incorrect error codes in rtl_debugfs_set_write_reg() wifi: rtlwifi: fix incorrect error codes in rtl_debugfs_set_write_rfreg() rtlwifi: Replace RT_TRACE with rtl_dbg rtlwifi: Start changing RT_TRACE into rtl_dbg f2fs: handle dqget error in f2fs_transfer_project_quota() scsi: megaraid: Fix mega_cmd_done() CMDID_INT_CMDS scsi: target: iscsit: Fix TAS handling during conn cleanup net/packet: convert po->auxdata to an atomic flag net/packet: convert po->origdev to an atomic flag net/packet: annotate accesses to po->xmit vlan: partially enable SIOCSHWTSTAMP in container scm: fix MSG_CTRUNC setting condition for SO_PASSSEC wifi: rtw88: mac: Return the original error from rtw_mac_power_switch() wifi: rtw88: mac: Return the original error from rtw_pwr_seq_parser() tools: bpftool: Remove invalid \' json escape wifi: ath6kl: reduce WARN to dev_dbg() in callback wifi: ath5k: fix an off by one check in ath5k_eeprom_read_freq_list() wifi: ath9k: hif_usb: fix memory leak of remain_skbs wifi: ath6kl: minor fix for allocation size tick/common: Align tick period with the HZ tick. tick: Get rid of tick_period tick/sched: Optimize tick_do_update_jiffies64() further tick/sched: Reduce seqcount held scope in tick_do_update_jiffies64() tick/sched: Use tick_next_period for lockless quick check timekeeping: Split jiffies seqlock debugobject: Prevent init race with static objects arm64: kgdb: Set PSTATE.SS to 1 to re-enable single-step x86/ioapic: Don't return 0 from arch_dynirq_lower_bound() regulator: stm32-pwr: fix of_iomap leak media: rc: gpio-ir-recv: Fix support for wake-up media: rcar_fdp1: Fix refcount leak in probe and remove function media: rcar_fdp1: Fix the correct variable assignments media: rcar_fdp1: Make use of the helper function devm_platform_ioremap_resource() media: rcar_fdp1: fix pm_runtime_get_sync() usage count media: rcar_fdp1: simplify error check logic at fdp_open() media: saa7134: fix use after free bug in saa7134_finidev due to race condition media: dm1105: Fix use after free bug in dm1105_remove due to race condition x86/apic: Fix atomic update of offset in reserve_eilvt_offset() regulator: core: Avoid lockdep reports when resolving supplies regulator: core: Consistently set mutex_owner when using ww_mutex_lock_slow() drm/lima/lima_drv: Add missing unwind goto in lima_pdev_probe() mmc: sdhci-of-esdhc: fix quirk to ignore command inhibit for data drm/msm/adreno: drop bogus pm_runtime_set_active() drm/msm/adreno: Defer enabling runpm until hw_init() drm/msm: fix unbalanced pm_runtime_enable in adreno_gpu_{init, cleanup} firmware: qcom_scm: Clear download bit during reboot media: av7110: prevent underflow in write_ts_to_decoder() media: uapi: add MEDIA_BUS_FMT_METADATA_FIXED media bus format. media: bdisp: Add missing check for create_workqueue ARM: dts: qcom: ipq8064: Fix the PCI I/O port range ARM: dts: qcom: ipq8064: reduce pci IO size to 64K ARM: dts: qcom: ipq4019: Fix the PCI I/O port range EDAC/skx: Fix overflows on the DRAM row address mapping arrays arm64: dts: renesas: r8a774c0: Remove bogus voltages from OPP table arm64: dts: renesas: r8a77990: Remove bogus voltages from OPP table drm/probe-helper: Cancel previous job before starting new one drm/vgem: add missing mutex_destroy drm/rockchip: Drop unbalanced obj unref erofs: fix potential overflow calculating xattr_isize erofs: stop parsing non-compact HEAD index if clusterofs is invalid tpm, tpm_tis: Do not skip reset of original interrupt vector selinux: ensure av_permissions.h is built when needed selinux: fix Makefile dependencies of flask.h ubifs: Free memory for tmpfile name ubi: Fix return value overwrite issue in try_write_vid_and_data() ubifs: Fix memleak when insert_old_idx() failed Revert "ubifs: dirty_cow_znode: Fix memleak in error handling path" i2c: omap: Fix standard mode false ACK readings KVM: nVMX: Emulate NOPs in L2, and PAUSE if it's not intercepted reiserfs: Add security prefix to xattr name in reiserfs_security_write() ring-buffer: Sync IRQ works before buffer destruction pwm: meson: Fix g12a ao clk81 name pwm: meson: Fix axg ao mux parents kheaders: Use array declaration instead of char ipmi: fix SSIF not responding under certain cond. ipmi:ssif: Add send_retries increment MIPS: fw: Allow firmware to pass a empty env xhci: fix debugfs register accesses while suspended debugfs: regset32: Add Runtime PM support staging: iio: resolver: ads1210: fix config mode perf sched: Cast PTHREAD_STACK_MIN to int as it may turn into sysconf(__SC_THREAD_STACK_MIN_VALUE) USB: dwc3: fix runtime pm imbalance on unbind USB: dwc3: fix runtime pm imbalance on probe errors asm-generic/io.h: suppress endianness warnings for readq() and writeq() ASoC: Intel: bytcr_rt5640: Add quirk for the Acer Iconia One 7 B1-750 iio: adc: palmas_gpadc: fix NULL dereference on rmmod USB: serial: option: add UNISOC vendor and TOZED LT70C product bluetooth: Perform careful capability checks in hci_sock_ioctl() drm/fb-helper: set x/yres_virtual in drm_fb_helper_check_var wifi: brcmfmac: slab-out-of-bounds read in brcmf_get_assoc_ies() counter: 104-quad-8: Fix race condition between FLAG and CNTR reads Conflicts: drivers/firmware/qcom_scm.c drivers/md/dm-verity-target.c drivers/usb/dwc3/core.c drivers/usb/dwc3/debugfs.c drivers/usb/gadget/function/f_fs.c Change-Id: Iedad1fcca99a9b739e08ea6d60988800b3a7aefa
1086 lines
30 KiB
C
1086 lines
30 KiB
C
// SPDX-License-Identifier: GPL-2.0-only
|
|
/*
|
|
* linux/mm/swap.c
|
|
*
|
|
* Copyright (C) 1991, 1992, 1993, 1994 Linus Torvalds
|
|
*/
|
|
|
|
/*
|
|
* This file contains the default values for the operation of the
|
|
* Linux VM subsystem. Fine-tuning documentation can be found in
|
|
* Documentation/admin-guide/sysctl/vm.rst.
|
|
* Started 18.12.91
|
|
* Swap aging added 23.2.95, Stephen Tweedie.
|
|
* Buffermem limits added 12.3.98, Rik van Riel.
|
|
*/
|
|
|
|
#include <linux/mm.h>
|
|
#include <linux/sched.h>
|
|
#include <linux/kernel_stat.h>
|
|
#include <linux/swap.h>
|
|
#include <linux/mman.h>
|
|
#include <linux/pagemap.h>
|
|
#include <linux/pagevec.h>
|
|
#include <linux/init.h>
|
|
#include <linux/export.h>
|
|
#include <linux/mm_inline.h>
|
|
#include <linux/percpu_counter.h>
|
|
#include <linux/memremap.h>
|
|
#include <linux/percpu.h>
|
|
#include <linux/cpu.h>
|
|
#include <linux/notifier.h>
|
|
#include <linux/backing-dev.h>
|
|
#include <linux/memcontrol.h>
|
|
#include <linux/gfp.h>
|
|
#include <linux/uio.h>
|
|
#include <linux/hugetlb.h>
|
|
#include <linux/page_idle.h>
|
|
|
|
#include "internal.h"
|
|
|
|
#define CREATE_TRACE_POINTS
|
|
#include <trace/events/pagemap.h>
|
|
|
|
/* How many pages do we try to swap or page in/out together? */
|
|
int page_cluster;
|
|
|
|
static DEFINE_PER_CPU(struct pagevec, lru_add_pvec);
|
|
static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs);
|
|
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs);
|
|
static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs);
|
|
static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs);
|
|
#ifdef CONFIG_SMP
|
|
static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs);
|
|
#endif
|
|
|
|
/*
|
|
* This path almost never happens for VM activity - pages are normally
|
|
* freed via pagevecs. But it gets used by networking.
|
|
*/
|
|
static void __page_cache_release(struct page *page)
|
|
{
|
|
if (PageLRU(page)) {
|
|
pg_data_t *pgdat = page_pgdat(page);
|
|
struct lruvec *lruvec;
|
|
unsigned long flags;
|
|
|
|
spin_lock_irqsave(&pgdat->lru_lock, flags);
|
|
lruvec = mem_cgroup_page_lruvec(page, pgdat);
|
|
VM_BUG_ON_PAGE(!PageLRU(page), page);
|
|
__ClearPageLRU(page);
|
|
del_page_from_lru_list(page, lruvec, page_off_lru(page));
|
|
spin_unlock_irqrestore(&pgdat->lru_lock, flags);
|
|
}
|
|
__ClearPageWaiters(page);
|
|
}
|
|
|
|
static void __put_single_page(struct page *page)
|
|
{
|
|
__page_cache_release(page);
|
|
mem_cgroup_uncharge(page);
|
|
free_unref_page(page);
|
|
}
|
|
|
|
static void __put_compound_page(struct page *page)
|
|
{
|
|
compound_page_dtor *dtor;
|
|
|
|
/*
|
|
* __page_cache_release() is supposed to be called for thp, not for
|
|
* hugetlb. This is because hugetlb page does never have PageLRU set
|
|
* (it's never listed to any LRU lists) and no memcg routines should
|
|
* be called for hugetlb (it has a separate hugetlb_cgroup.)
|
|
*/
|
|
if (!PageHuge(page))
|
|
__page_cache_release(page);
|
|
dtor = get_compound_page_dtor(page);
|
|
(*dtor)(page);
|
|
}
|
|
|
|
void __put_page(struct page *page)
|
|
{
|
|
if (is_zone_device_page(page)) {
|
|
put_dev_pagemap(page->pgmap);
|
|
|
|
/*
|
|
* The page belongs to the device that created pgmap. Do
|
|
* not return it to page allocator.
|
|
*/
|
|
return;
|
|
}
|
|
|
|
if (unlikely(PageCompound(page)))
|
|
__put_compound_page(page);
|
|
else
|
|
__put_single_page(page);
|
|
}
|
|
EXPORT_SYMBOL(__put_page);
|
|
|
|
/**
|
|
* put_pages_list() - release a list of pages
|
|
* @pages: list of pages threaded on page->lru
|
|
*
|
|
* Release a list of pages which are strung together on page.lru. Currently
|
|
* used by read_cache_pages() and related error recovery code.
|
|
*/
|
|
void put_pages_list(struct list_head *pages)
|
|
{
|
|
while (!list_empty(pages)) {
|
|
struct page *victim;
|
|
|
|
victim = lru_to_page(pages);
|
|
list_del(&victim->lru);
|
|
put_page(victim);
|
|
}
|
|
}
|
|
EXPORT_SYMBOL(put_pages_list);
|
|
|
|
/*
|
|
* get_kernel_pages() - pin kernel pages in memory
|
|
* @kiov: An array of struct kvec structures
|
|
* @nr_segs: number of segments to pin
|
|
* @write: pinning for read/write, currently ignored
|
|
* @pages: array that receives pointers to the pages pinned.
|
|
* Should be at least nr_segs long.
|
|
*
|
|
* Returns number of pages pinned. This may be fewer than the number
|
|
* requested. If nr_pages is 0 or negative, returns 0. If no pages
|
|
* were pinned, returns -errno. Each page returned must be released
|
|
* with a put_page() call when it is finished with.
|
|
*/
|
|
int get_kernel_pages(const struct kvec *kiov, int nr_segs, int write,
|
|
struct page **pages)
|
|
{
|
|
int seg;
|
|
|
|
for (seg = 0; seg < nr_segs; seg++) {
|
|
if (WARN_ON(kiov[seg].iov_len != PAGE_SIZE))
|
|
return seg;
|
|
|
|
pages[seg] = kmap_to_page(kiov[seg].iov_base);
|
|
get_page(pages[seg]);
|
|
}
|
|
|
|
return seg;
|
|
}
|
|
EXPORT_SYMBOL_GPL(get_kernel_pages);
|
|
|
|
/*
|
|
* get_kernel_page() - pin a kernel page in memory
|
|
* @start: starting kernel address
|
|
* @write: pinning for read/write, currently ignored
|
|
* @pages: array that receives pointer to the page pinned.
|
|
* Must be at least nr_segs long.
|
|
*
|
|
* Returns 1 if page is pinned. If the page was not pinned, returns
|
|
* -errno. The page returned must be released with a put_page() call
|
|
* when it is finished with.
|
|
*/
|
|
int get_kernel_page(unsigned long start, int write, struct page **pages)
|
|
{
|
|
const struct kvec kiov = {
|
|
.iov_base = (void *)start,
|
|
.iov_len = PAGE_SIZE
|
|
};
|
|
|
|
return get_kernel_pages(&kiov, 1, write, pages);
|
|
}
|
|
EXPORT_SYMBOL_GPL(get_kernel_page);
|
|
|
|
static void pagevec_lru_move_fn(struct pagevec *pvec,
|
|
void (*move_fn)(struct page *page, struct lruvec *lruvec, void *arg),
|
|
void *arg)
|
|
{
|
|
int i;
|
|
struct pglist_data *pgdat = NULL;
|
|
struct lruvec *lruvec;
|
|
unsigned long flags = 0;
|
|
|
|
for (i = 0; i < pagevec_count(pvec); i++) {
|
|
struct page *page = pvec->pages[i];
|
|
struct pglist_data *pagepgdat = page_pgdat(page);
|
|
|
|
if (pagepgdat != pgdat) {
|
|
if (pgdat)
|
|
spin_unlock_irqrestore(&pgdat->lru_lock, flags);
|
|
pgdat = pagepgdat;
|
|
spin_lock_irqsave(&pgdat->lru_lock, flags);
|
|
}
|
|
|
|
lruvec = mem_cgroup_page_lruvec(page, pgdat);
|
|
(*move_fn)(page, lruvec, arg);
|
|
}
|
|
if (pgdat)
|
|
spin_unlock_irqrestore(&pgdat->lru_lock, flags);
|
|
release_pages(pvec->pages, pvec->nr);
|
|
pagevec_reinit(pvec);
|
|
}
|
|
|
|
static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec,
|
|
void *arg)
|
|
{
|
|
int *pgmoved = arg;
|
|
|
|
if (PageLRU(page) && !PageUnevictable(page)) {
|
|
del_page_from_lru_list(page, lruvec, page_lru(page));
|
|
ClearPageActive(page);
|
|
add_page_to_lru_list_tail(page, lruvec, page_lru(page));
|
|
(*pgmoved)++;
|
|
}
|
|
}
|
|
|
|
/*
|
|
* pagevec_move_tail() must be called with IRQ disabled.
|
|
* Otherwise this may cause nasty races.
|
|
*/
|
|
static void pagevec_move_tail(struct pagevec *pvec)
|
|
{
|
|
int pgmoved = 0;
|
|
|
|
pagevec_lru_move_fn(pvec, pagevec_move_tail_fn, &pgmoved);
|
|
__count_vm_events(PGROTATED, pgmoved);
|
|
}
|
|
|
|
/*
|
|
* Writeback is about to end against a page which has been marked for immediate
|
|
* reclaim. If it still appears to be reclaimable, move it to the tail of the
|
|
* inactive list.
|
|
*/
|
|
void rotate_reclaimable_page(struct page *page)
|
|
{
|
|
if (!PageLocked(page) && !PageDirty(page) &&
|
|
!PageUnevictable(page) && PageLRU(page)) {
|
|
struct pagevec *pvec;
|
|
unsigned long flags;
|
|
|
|
get_page(page);
|
|
local_irq_save(flags);
|
|
pvec = this_cpu_ptr(&lru_rotate_pvecs);
|
|
if (!pagevec_add(pvec, page) || PageCompound(page))
|
|
pagevec_move_tail(pvec);
|
|
local_irq_restore(flags);
|
|
}
|
|
}
|
|
|
|
static void update_page_reclaim_stat(struct lruvec *lruvec,
|
|
int file, int rotated)
|
|
{
|
|
struct zone_reclaim_stat *reclaim_stat = &lruvec->reclaim_stat;
|
|
|
|
reclaim_stat->recent_scanned[file]++;
|
|
if (rotated)
|
|
reclaim_stat->recent_rotated[file]++;
|
|
}
|
|
|
|
static void __activate_page(struct page *page, struct lruvec *lruvec,
|
|
void *arg)
|
|
{
|
|
if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
|
|
int file = page_is_file_cache(page);
|
|
int lru = page_lru_base_type(page);
|
|
|
|
del_page_from_lru_list(page, lruvec, lru);
|
|
SetPageActive(page);
|
|
lru += LRU_ACTIVE;
|
|
add_page_to_lru_list(page, lruvec, lru);
|
|
trace_mm_lru_activate(page);
|
|
|
|
__count_vm_event(PGACTIVATE);
|
|
update_page_reclaim_stat(lruvec, file, 1);
|
|
}
|
|
}
|
|
|
|
#ifdef CONFIG_SMP
|
|
static void activate_page_drain(int cpu)
|
|
{
|
|
struct pagevec *pvec = &per_cpu(activate_page_pvecs, cpu);
|
|
|
|
if (pagevec_count(pvec))
|
|
pagevec_lru_move_fn(pvec, __activate_page, NULL);
|
|
}
|
|
|
|
static bool need_activate_page_drain(int cpu)
|
|
{
|
|
return pagevec_count(&per_cpu(activate_page_pvecs, cpu)) != 0;
|
|
}
|
|
|
|
void activate_page(struct page *page)
|
|
{
|
|
page = compound_head(page);
|
|
if (PageLRU(page) && !PageActive(page) && !PageUnevictable(page)) {
|
|
struct pagevec *pvec = &get_cpu_var(activate_page_pvecs);
|
|
|
|
get_page(page);
|
|
if (!pagevec_add(pvec, page) || PageCompound(page))
|
|
pagevec_lru_move_fn(pvec, __activate_page, NULL);
|
|
put_cpu_var(activate_page_pvecs);
|
|
}
|
|
}
|
|
|
|
#else
|
|
static inline void activate_page_drain(int cpu)
|
|
{
|
|
}
|
|
|
|
void activate_page(struct page *page)
|
|
{
|
|
pg_data_t *pgdat = page_pgdat(page);
|
|
|
|
page = compound_head(page);
|
|
spin_lock_irq(&pgdat->lru_lock);
|
|
__activate_page(page, mem_cgroup_page_lruvec(page, pgdat), NULL);
|
|
spin_unlock_irq(&pgdat->lru_lock);
|
|
}
|
|
#endif
|
|
|
|
static void __lru_cache_activate_page(struct page *page)
|
|
{
|
|
struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
|
|
int i;
|
|
|
|
/*
|
|
* Search backwards on the optimistic assumption that the page being
|
|
* activated has just been added to this pagevec. Note that only
|
|
* the local pagevec is examined as a !PageLRU page could be in the
|
|
* process of being released, reclaimed, migrated or on a remote
|
|
* pagevec that is currently being drained. Furthermore, marking
|
|
* a remote pagevec's page PageActive potentially hits a race where
|
|
* a page is marked PageActive just after it is added to the inactive
|
|
* list causing accounting errors and BUG_ON checks to trigger.
|
|
*/
|
|
for (i = pagevec_count(pvec) - 1; i >= 0; i--) {
|
|
struct page *pagevec_page = pvec->pages[i];
|
|
|
|
if (pagevec_page == page) {
|
|
SetPageActive(page);
|
|
break;
|
|
}
|
|
}
|
|
|
|
put_cpu_var(lru_add_pvec);
|
|
}
|
|
|
|
/*
|
|
* Mark a page as having seen activity.
|
|
*
|
|
* inactive,unreferenced -> inactive,referenced
|
|
* inactive,referenced -> active,unreferenced
|
|
* active,unreferenced -> active,referenced
|
|
*
|
|
* When a newly allocated page is not yet visible, so safe for non-atomic ops,
|
|
* __SetPageReferenced(page) may be substituted for mark_page_accessed(page).
|
|
*/
|
|
void mark_page_accessed(struct page *page)
|
|
{
|
|
page = compound_head(page);
|
|
if (!PageActive(page) && !PageUnevictable(page) &&
|
|
PageReferenced(page)) {
|
|
|
|
/*
|
|
* If the page is on the LRU, queue it for activation via
|
|
* activate_page_pvecs. Otherwise, assume the page is on a
|
|
* pagevec, mark it active and it'll be moved to the active
|
|
* LRU on the next drain.
|
|
*/
|
|
if (PageLRU(page))
|
|
activate_page(page);
|
|
else
|
|
__lru_cache_activate_page(page);
|
|
ClearPageReferenced(page);
|
|
if (page_is_file_cache(page))
|
|
workingset_activation(page);
|
|
} else if (!PageReferenced(page)) {
|
|
SetPageReferenced(page);
|
|
}
|
|
if (page_is_idle(page))
|
|
clear_page_idle(page);
|
|
}
|
|
EXPORT_SYMBOL(mark_page_accessed);
|
|
|
|
static void __lru_cache_add(struct page *page)
|
|
{
|
|
struct pagevec *pvec = &get_cpu_var(lru_add_pvec);
|
|
|
|
get_page(page);
|
|
if (!pagevec_add(pvec, page) || PageCompound(page))
|
|
__pagevec_lru_add(pvec);
|
|
put_cpu_var(lru_add_pvec);
|
|
}
|
|
|
|
/**
|
|
* lru_cache_add_anon - add a page to the page lists
|
|
* @page: the page to add
|
|
*/
|
|
void lru_cache_add_anon(struct page *page)
|
|
{
|
|
if (PageActive(page))
|
|
ClearPageActive(page);
|
|
__lru_cache_add(page);
|
|
}
|
|
|
|
void lru_cache_add_file(struct page *page)
|
|
{
|
|
if (PageActive(page))
|
|
ClearPageActive(page);
|
|
__lru_cache_add(page);
|
|
}
|
|
EXPORT_SYMBOL(lru_cache_add_file);
|
|
|
|
/**
|
|
* lru_cache_add - add a page to a page list
|
|
* @page: the page to be added to the LRU.
|
|
*
|
|
* Queue the page for addition to the LRU via pagevec. The decision on whether
|
|
* to add the page to the [in]active [file|anon] list is deferred until the
|
|
* pagevec is drained. This gives a chance for the caller of lru_cache_add()
|
|
* have the page added to the active list using mark_page_accessed().
|
|
*/
|
|
void lru_cache_add(struct page *page)
|
|
{
|
|
VM_BUG_ON_PAGE(PageActive(page) && PageUnevictable(page), page);
|
|
VM_BUG_ON_PAGE(PageLRU(page), page);
|
|
__lru_cache_add(page);
|
|
}
|
|
|
|
/**
|
|
* lru_cache_add_active_or_unevictable
|
|
* @page: the page to be added to LRU
|
|
* @vma: vma in which page is mapped for determining reclaimability
|
|
*
|
|
* Place @page on the active or unevictable LRU list, depending on its
|
|
* evictability. Note that if the page is not evictable, it goes
|
|
* directly back onto it's zone's unevictable list, it does NOT use a
|
|
* per cpu pagevec.
|
|
*/
|
|
void __lru_cache_add_active_or_unevictable(struct page *page,
|
|
unsigned long vma_flags)
|
|
{
|
|
VM_BUG_ON_PAGE(PageLRU(page), page);
|
|
|
|
if (likely((vma_flags & (VM_LOCKED | VM_SPECIAL)) != VM_LOCKED))
|
|
SetPageActive(page);
|
|
else if (!TestSetPageMlocked(page)) {
|
|
/*
|
|
* We use the irq-unsafe __mod_zone_page_stat because this
|
|
* counter is not modified from interrupt context, and the pte
|
|
* lock is held(spinlock), which implies preemption disabled.
|
|
*/
|
|
__mod_zone_page_state(page_zone(page), NR_MLOCK,
|
|
hpage_nr_pages(page));
|
|
count_vm_event(UNEVICTABLE_PGMLOCKED);
|
|
}
|
|
lru_cache_add(page);
|
|
}
|
|
|
|
/*
|
|
* If the page can not be invalidated, it is moved to the
|
|
* inactive list to speed up its reclaim. It is moved to the
|
|
* head of the list, rather than the tail, to give the flusher
|
|
* threads some time to write it out, as this is much more
|
|
* effective than the single-page writeout from reclaim.
|
|
*
|
|
* If the page isn't page_mapped and dirty/writeback, the page
|
|
* could reclaim asap using PG_reclaim.
|
|
*
|
|
* 1. active, mapped page -> none
|
|
* 2. active, dirty/writeback page -> inactive, head, PG_reclaim
|
|
* 3. inactive, mapped page -> none
|
|
* 4. inactive, dirty/writeback page -> inactive, head, PG_reclaim
|
|
* 5. inactive, clean -> inactive, tail
|
|
* 6. Others -> none
|
|
*
|
|
* In 4, why it moves inactive's head, the VM expects the page would
|
|
* be write it out by flusher threads as this is much more effective
|
|
* than the single-page writeout from reclaim.
|
|
*/
|
|
static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec,
|
|
void *arg)
|
|
{
|
|
int lru, file;
|
|
bool active;
|
|
|
|
if (!PageLRU(page))
|
|
return;
|
|
|
|
if (PageUnevictable(page))
|
|
return;
|
|
|
|
/* Some processes are using the page */
|
|
if (page_mapped(page))
|
|
return;
|
|
|
|
active = PageActive(page);
|
|
file = page_is_file_cache(page);
|
|
lru = page_lru_base_type(page);
|
|
|
|
del_page_from_lru_list(page, lruvec, lru + active);
|
|
ClearPageActive(page);
|
|
ClearPageReferenced(page);
|
|
|
|
if (PageWriteback(page) || PageDirty(page)) {
|
|
/*
|
|
* PG_reclaim could be raced with end_page_writeback
|
|
* It can make readahead confusing. But race window
|
|
* is _really_ small and it's non-critical problem.
|
|
*/
|
|
add_page_to_lru_list(page, lruvec, lru);
|
|
SetPageReclaim(page);
|
|
} else {
|
|
/*
|
|
* The page's writeback ends up during pagevec
|
|
* We moves tha page into tail of inactive.
|
|
*/
|
|
add_page_to_lru_list_tail(page, lruvec, lru);
|
|
__count_vm_event(PGROTATED);
|
|
}
|
|
|
|
if (active)
|
|
__count_vm_event(PGDEACTIVATE);
|
|
update_page_reclaim_stat(lruvec, file, 0);
|
|
}
|
|
|
|
static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec,
|
|
void *arg)
|
|
{
|
|
if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
|
|
int file = page_is_file_cache(page);
|
|
int lru = page_lru_base_type(page);
|
|
|
|
del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE);
|
|
ClearPageActive(page);
|
|
ClearPageReferenced(page);
|
|
add_page_to_lru_list(page, lruvec, lru);
|
|
|
|
__count_vm_events(PGDEACTIVATE, hpage_nr_pages(page));
|
|
update_page_reclaim_stat(lruvec, file, 0);
|
|
}
|
|
}
|
|
|
|
static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec,
|
|
void *arg)
|
|
{
|
|
if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
|
|
!PageSwapCache(page) && !PageUnevictable(page)) {
|
|
bool active = PageActive(page);
|
|
|
|
del_page_from_lru_list(page, lruvec,
|
|
LRU_INACTIVE_ANON + active);
|
|
ClearPageActive(page);
|
|
ClearPageReferenced(page);
|
|
/*
|
|
* lazyfree pages are clean anonymous pages. They have
|
|
* SwapBacked flag cleared to distinguish normal anonymous
|
|
* pages
|
|
*/
|
|
ClearPageSwapBacked(page);
|
|
add_page_to_lru_list(page, lruvec, LRU_INACTIVE_FILE);
|
|
|
|
__count_vm_events(PGLAZYFREE, hpage_nr_pages(page));
|
|
count_memcg_page_event(page, PGLAZYFREE);
|
|
update_page_reclaim_stat(lruvec, 1, 0);
|
|
}
|
|
}
|
|
|
|
/*
|
|
* Drain pages out of the cpu's pagevecs.
|
|
* Either "cpu" is the current CPU, and preemption has already been
|
|
* disabled; or "cpu" is being hot-unplugged, and is already dead.
|
|
*/
|
|
void lru_add_drain_cpu(int cpu)
|
|
{
|
|
struct pagevec *pvec = &per_cpu(lru_add_pvec, cpu);
|
|
|
|
if (pagevec_count(pvec))
|
|
__pagevec_lru_add(pvec);
|
|
|
|
pvec = &per_cpu(lru_rotate_pvecs, cpu);
|
|
if (pagevec_count(pvec)) {
|
|
unsigned long flags;
|
|
|
|
/* No harm done if a racing interrupt already did this */
|
|
local_irq_save(flags);
|
|
pagevec_move_tail(pvec);
|
|
local_irq_restore(flags);
|
|
}
|
|
|
|
pvec = &per_cpu(lru_deactivate_file_pvecs, cpu);
|
|
if (pagevec_count(pvec))
|
|
pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
|
|
|
|
pvec = &per_cpu(lru_deactivate_pvecs, cpu);
|
|
if (pagevec_count(pvec))
|
|
pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
|
|
|
|
pvec = &per_cpu(lru_lazyfree_pvecs, cpu);
|
|
if (pagevec_count(pvec))
|
|
pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
|
|
|
|
activate_page_drain(cpu);
|
|
}
|
|
|
|
/**
|
|
* deactivate_file_page - forcefully deactivate a file page
|
|
* @page: page to deactivate
|
|
*
|
|
* This function hints the VM that @page is a good reclaim candidate,
|
|
* for example if its invalidation fails due to the page being dirty
|
|
* or under writeback.
|
|
*/
|
|
void deactivate_file_page(struct page *page)
|
|
{
|
|
/*
|
|
* In a workload with many unevictable page such as mprotect,
|
|
* unevictable page deactivation for accelerating reclaim is pointless.
|
|
*/
|
|
if (PageUnevictable(page))
|
|
return;
|
|
|
|
if (likely(get_page_unless_zero(page))) {
|
|
struct pagevec *pvec = &get_cpu_var(lru_deactivate_file_pvecs);
|
|
|
|
if (!pagevec_add(pvec, page) || PageCompound(page))
|
|
pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL);
|
|
put_cpu_var(lru_deactivate_file_pvecs);
|
|
}
|
|
}
|
|
|
|
/*
|
|
* deactivate_page - deactivate a page
|
|
* @page: page to deactivate
|
|
*
|
|
* deactivate_page() moves @page to the inactive list if @page was on the active
|
|
* list and was not an unevictable page. This is done to accelerate the reclaim
|
|
* of @page.
|
|
*/
|
|
void deactivate_page(struct page *page)
|
|
{
|
|
if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) {
|
|
struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs);
|
|
|
|
get_page(page);
|
|
if (!pagevec_add(pvec, page) || PageCompound(page))
|
|
pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL);
|
|
put_cpu_var(lru_deactivate_pvecs);
|
|
}
|
|
}
|
|
|
|
/**
|
|
* mark_page_lazyfree - make an anon page lazyfree
|
|
* @page: page to deactivate
|
|
*
|
|
* mark_page_lazyfree() moves @page to the inactive file list.
|
|
* This is done to accelerate the reclaim of @page.
|
|
*/
|
|
void mark_page_lazyfree(struct page *page)
|
|
{
|
|
if (PageLRU(page) && PageAnon(page) && PageSwapBacked(page) &&
|
|
!PageSwapCache(page) && !PageUnevictable(page)) {
|
|
struct pagevec *pvec = &get_cpu_var(lru_lazyfree_pvecs);
|
|
|
|
get_page(page);
|
|
if (!pagevec_add(pvec, page) || PageCompound(page))
|
|
pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL);
|
|
put_cpu_var(lru_lazyfree_pvecs);
|
|
}
|
|
}
|
|
|
|
void lru_add_drain(void)
|
|
{
|
|
lru_add_drain_cpu(get_cpu());
|
|
put_cpu();
|
|
}
|
|
|
|
#ifdef CONFIG_SMP
|
|
|
|
static DEFINE_PER_CPU(struct work_struct, lru_add_drain_work);
|
|
|
|
static void lru_add_drain_per_cpu(struct work_struct *dummy)
|
|
{
|
|
lru_add_drain();
|
|
}
|
|
|
|
/*
|
|
* Doesn't need any cpu hotplug locking because we do rely on per-cpu
|
|
* kworkers being shut down before our page_alloc_cpu_dead callback is
|
|
* executed on the offlined cpu.
|
|
* Calling this function with cpu hotplug locks held can actually lead
|
|
* to obscure indirect dependencies via WQ context.
|
|
*/
|
|
void lru_add_drain_all(void)
|
|
{
|
|
static DEFINE_MUTEX(lock);
|
|
static struct cpumask has_work;
|
|
int cpu;
|
|
|
|
/*
|
|
* Make sure nobody triggers this path before mm_percpu_wq is fully
|
|
* initialized.
|
|
*/
|
|
if (WARN_ON(!mm_percpu_wq))
|
|
return;
|
|
|
|
mutex_lock(&lock);
|
|
cpumask_clear(&has_work);
|
|
|
|
for_each_online_cpu(cpu) {
|
|
struct work_struct *work = &per_cpu(lru_add_drain_work, cpu);
|
|
|
|
if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) ||
|
|
pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) ||
|
|
pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) ||
|
|
pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) ||
|
|
pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) ||
|
|
need_activate_page_drain(cpu)) {
|
|
INIT_WORK(work, lru_add_drain_per_cpu);
|
|
queue_work_on(cpu, mm_percpu_wq, work);
|
|
cpumask_set_cpu(cpu, &has_work);
|
|
}
|
|
}
|
|
|
|
for_each_cpu(cpu, &has_work)
|
|
flush_work(&per_cpu(lru_add_drain_work, cpu));
|
|
|
|
mutex_unlock(&lock);
|
|
}
|
|
#else
|
|
void lru_add_drain_all(void)
|
|
{
|
|
lru_add_drain();
|
|
}
|
|
#endif
|
|
|
|
/**
|
|
* release_pages - batched put_page()
|
|
* @pages: array of pages to release
|
|
* @nr: number of pages
|
|
*
|
|
* Decrement the reference count on all the pages in @pages. If it
|
|
* fell to zero, remove the page from the LRU and free it.
|
|
*/
|
|
void release_pages(struct page **pages, int nr)
|
|
{
|
|
int i;
|
|
LIST_HEAD(pages_to_free);
|
|
struct pglist_data *locked_pgdat = NULL;
|
|
struct lruvec *lruvec;
|
|
unsigned long flags;
|
|
unsigned int lock_batch;
|
|
|
|
for (i = 0; i < nr; i++) {
|
|
struct page *page = pages[i];
|
|
|
|
/*
|
|
* Make sure the IRQ-safe lock-holding time does not get
|
|
* excessive with a continuous string of pages from the
|
|
* same pgdat. The lock is held only if pgdat != NULL.
|
|
*/
|
|
if (locked_pgdat && ++lock_batch == SWAP_CLUSTER_MAX) {
|
|
spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
|
|
locked_pgdat = NULL;
|
|
}
|
|
|
|
if (is_huge_zero_page(page))
|
|
continue;
|
|
|
|
if (is_zone_device_page(page)) {
|
|
if (locked_pgdat) {
|
|
spin_unlock_irqrestore(&locked_pgdat->lru_lock,
|
|
flags);
|
|
locked_pgdat = NULL;
|
|
}
|
|
/*
|
|
* ZONE_DEVICE pages that return 'false' from
|
|
* put_devmap_managed_page() do not require special
|
|
* processing, and instead, expect a call to
|
|
* put_page_testzero().
|
|
*/
|
|
if (put_devmap_managed_page(page))
|
|
continue;
|
|
}
|
|
|
|
page = compound_head(page);
|
|
if (!put_page_testzero(page))
|
|
continue;
|
|
|
|
if (PageCompound(page)) {
|
|
if (locked_pgdat) {
|
|
spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
|
|
locked_pgdat = NULL;
|
|
}
|
|
__put_compound_page(page);
|
|
continue;
|
|
}
|
|
|
|
if (PageLRU(page)) {
|
|
struct pglist_data *pgdat = page_pgdat(page);
|
|
|
|
if (pgdat != locked_pgdat) {
|
|
if (locked_pgdat)
|
|
spin_unlock_irqrestore(&locked_pgdat->lru_lock,
|
|
flags);
|
|
lock_batch = 0;
|
|
locked_pgdat = pgdat;
|
|
spin_lock_irqsave(&locked_pgdat->lru_lock, flags);
|
|
}
|
|
|
|
lruvec = mem_cgroup_page_lruvec(page, locked_pgdat);
|
|
VM_BUG_ON_PAGE(!PageLRU(page), page);
|
|
__ClearPageLRU(page);
|
|
del_page_from_lru_list(page, lruvec, page_off_lru(page));
|
|
}
|
|
|
|
/* Clear Active bit in case of parallel mark_page_accessed */
|
|
__ClearPageActive(page);
|
|
__ClearPageWaiters(page);
|
|
|
|
list_add(&page->lru, &pages_to_free);
|
|
}
|
|
if (locked_pgdat)
|
|
spin_unlock_irqrestore(&locked_pgdat->lru_lock, flags);
|
|
|
|
mem_cgroup_uncharge_list(&pages_to_free);
|
|
free_unref_page_list(&pages_to_free);
|
|
}
|
|
EXPORT_SYMBOL(release_pages);
|
|
|
|
/*
|
|
* The pages which we're about to release may be in the deferred lru-addition
|
|
* queues. That would prevent them from really being freed right now. That's
|
|
* OK from a correctness point of view but is inefficient - those pages may be
|
|
* cache-warm and we want to give them back to the page allocator ASAP.
|
|
*
|
|
* So __pagevec_release() will drain those queues here. __pagevec_lru_add()
|
|
* and __pagevec_lru_add_active() call release_pages() directly to avoid
|
|
* mutual recursion.
|
|
*/
|
|
void __pagevec_release(struct pagevec *pvec)
|
|
{
|
|
if (!pvec->percpu_pvec_drained) {
|
|
lru_add_drain();
|
|
pvec->percpu_pvec_drained = true;
|
|
}
|
|
release_pages(pvec->pages, pagevec_count(pvec));
|
|
pagevec_reinit(pvec);
|
|
}
|
|
EXPORT_SYMBOL(__pagevec_release);
|
|
|
|
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
|
|
/* used by __split_huge_page_refcount() */
|
|
void lru_add_page_tail(struct page *page, struct page *page_tail,
|
|
struct lruvec *lruvec, struct list_head *list)
|
|
{
|
|
const int file = 0;
|
|
|
|
VM_BUG_ON_PAGE(!PageHead(page), page);
|
|
VM_BUG_ON_PAGE(PageCompound(page_tail), page);
|
|
VM_BUG_ON_PAGE(PageLRU(page_tail), page);
|
|
lockdep_assert_held(&lruvec_pgdat(lruvec)->lru_lock);
|
|
|
|
if (!list)
|
|
SetPageLRU(page_tail);
|
|
|
|
if (likely(PageLRU(page)))
|
|
list_add_tail(&page_tail->lru, &page->lru);
|
|
else if (list) {
|
|
/* page reclaim is reclaiming a huge page */
|
|
get_page(page_tail);
|
|
list_add_tail(&page_tail->lru, list);
|
|
} else {
|
|
/*
|
|
* Head page has not yet been counted, as an hpage,
|
|
* so we must account for each subpage individually.
|
|
*
|
|
* Put page_tail on the list at the correct position
|
|
* so they all end up in order.
|
|
*/
|
|
add_page_to_lru_list_tail(page_tail, lruvec,
|
|
page_lru(page_tail));
|
|
}
|
|
|
|
if (!PageUnevictable(page))
|
|
update_page_reclaim_stat(lruvec, file, PageActive(page_tail));
|
|
}
|
|
#endif /* CONFIG_TRANSPARENT_HUGEPAGE */
|
|
|
|
static void __pagevec_lru_add_fn(struct page *page, struct lruvec *lruvec,
|
|
void *arg)
|
|
{
|
|
enum lru_list lru;
|
|
int was_unevictable = TestClearPageUnevictable(page);
|
|
|
|
VM_BUG_ON_PAGE(PageLRU(page), page);
|
|
|
|
SetPageLRU(page);
|
|
/*
|
|
* Page becomes evictable in two ways:
|
|
* 1) Within LRU lock [munlock_vma_page() and __munlock_pagevec()].
|
|
* 2) Before acquiring LRU lock to put the page to correct LRU and then
|
|
* a) do PageLRU check with lock [check_move_unevictable_pages]
|
|
* b) do PageLRU check before lock [clear_page_mlock]
|
|
*
|
|
* (1) & (2a) are ok as LRU lock will serialize them. For (2b), we need
|
|
* following strict ordering:
|
|
*
|
|
* #0: __pagevec_lru_add_fn #1: clear_page_mlock
|
|
*
|
|
* SetPageLRU() TestClearPageMlocked()
|
|
* smp_mb() // explicit ordering // above provides strict
|
|
* // ordering
|
|
* PageMlocked() PageLRU()
|
|
*
|
|
*
|
|
* if '#1' does not observe setting of PG_lru by '#0' and fails
|
|
* isolation, the explicit barrier will make sure that page_evictable
|
|
* check will put the page in correct LRU. Without smp_mb(), SetPageLRU
|
|
* can be reordered after PageMlocked check and can make '#1' to fail
|
|
* the isolation of the page whose Mlocked bit is cleared (#0 is also
|
|
* looking at the same page) and the evictable page will be stranded
|
|
* in an unevictable LRU.
|
|
*/
|
|
smp_mb();
|
|
|
|
if (page_evictable(page)) {
|
|
lru = page_lru(page);
|
|
update_page_reclaim_stat(lruvec, page_is_file_cache(page),
|
|
PageActive(page));
|
|
if (was_unevictable)
|
|
count_vm_event(UNEVICTABLE_PGRESCUED);
|
|
} else {
|
|
lru = LRU_UNEVICTABLE;
|
|
ClearPageActive(page);
|
|
SetPageUnevictable(page);
|
|
if (!was_unevictable)
|
|
count_vm_event(UNEVICTABLE_PGCULLED);
|
|
}
|
|
|
|
add_page_to_lru_list(page, lruvec, lru);
|
|
trace_mm_lru_insertion(page, lru);
|
|
}
|
|
|
|
/*
|
|
* Add the passed pages to the LRU, then drop the caller's refcount
|
|
* on them. Reinitialises the caller's pagevec.
|
|
*/
|
|
void __pagevec_lru_add(struct pagevec *pvec)
|
|
{
|
|
pagevec_lru_move_fn(pvec, __pagevec_lru_add_fn, NULL);
|
|
}
|
|
EXPORT_SYMBOL(__pagevec_lru_add);
|
|
|
|
/**
|
|
* pagevec_lookup_entries - gang pagecache lookup
|
|
* @pvec: Where the resulting entries are placed
|
|
* @mapping: The address_space to search
|
|
* @start: The starting entry index
|
|
* @nr_entries: The maximum number of pages
|
|
* @indices: The cache indices corresponding to the entries in @pvec
|
|
*
|
|
* pagevec_lookup_entries() will search for and return a group of up
|
|
* to @nr_pages pages and shadow entries in the mapping. All
|
|
* entries are placed in @pvec. pagevec_lookup_entries() takes a
|
|
* reference against actual pages in @pvec.
|
|
*
|
|
* The search returns a group of mapping-contiguous entries with
|
|
* ascending indexes. There may be holes in the indices due to
|
|
* not-present entries.
|
|
*
|
|
* pagevec_lookup_entries() returns the number of entries which were
|
|
* found.
|
|
*/
|
|
unsigned pagevec_lookup_entries(struct pagevec *pvec,
|
|
struct address_space *mapping,
|
|
pgoff_t start, unsigned nr_entries,
|
|
pgoff_t *indices)
|
|
{
|
|
pvec->nr = find_get_entries(mapping, start, nr_entries,
|
|
pvec->pages, indices);
|
|
return pagevec_count(pvec);
|
|
}
|
|
|
|
/**
|
|
* pagevec_remove_exceptionals - pagevec exceptionals pruning
|
|
* @pvec: The pagevec to prune
|
|
*
|
|
* pagevec_lookup_entries() fills both pages and exceptional radix
|
|
* tree entries into the pagevec. This function prunes all
|
|
* exceptionals from @pvec without leaving holes, so that it can be
|
|
* passed on to page-only pagevec operations.
|
|
*/
|
|
void pagevec_remove_exceptionals(struct pagevec *pvec)
|
|
{
|
|
int i, j;
|
|
|
|
for (i = 0, j = 0; i < pagevec_count(pvec); i++) {
|
|
struct page *page = pvec->pages[i];
|
|
if (!xa_is_value(page))
|
|
pvec->pages[j++] = page;
|
|
}
|
|
pvec->nr = j;
|
|
}
|
|
|
|
/**
|
|
* pagevec_lookup_range - gang pagecache lookup
|
|
* @pvec: Where the resulting pages are placed
|
|
* @mapping: The address_space to search
|
|
* @start: The starting page index
|
|
* @end: The final page index
|
|
*
|
|
* pagevec_lookup_range() will search for & return a group of up to PAGEVEC_SIZE
|
|
* pages in the mapping starting from index @start and upto index @end
|
|
* (inclusive). The pages are placed in @pvec. pagevec_lookup() takes a
|
|
* reference against the pages in @pvec.
|
|
*
|
|
* The search returns a group of mapping-contiguous pages with ascending
|
|
* indexes. There may be holes in the indices due to not-present pages. We
|
|
* also update @start to index the next page for the traversal.
|
|
*
|
|
* pagevec_lookup_range() returns the number of pages which were found. If this
|
|
* number is smaller than PAGEVEC_SIZE, the end of specified range has been
|
|
* reached.
|
|
*/
|
|
unsigned pagevec_lookup_range(struct pagevec *pvec,
|
|
struct address_space *mapping, pgoff_t *start, pgoff_t end)
|
|
{
|
|
pvec->nr = find_get_pages_range(mapping, start, end, PAGEVEC_SIZE,
|
|
pvec->pages);
|
|
return pagevec_count(pvec);
|
|
}
|
|
EXPORT_SYMBOL(pagevec_lookup_range);
|
|
|
|
unsigned pagevec_lookup_range_tag(struct pagevec *pvec,
|
|
struct address_space *mapping, pgoff_t *index, pgoff_t end,
|
|
xa_mark_t tag)
|
|
{
|
|
pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
|
|
PAGEVEC_SIZE, pvec->pages);
|
|
return pagevec_count(pvec);
|
|
}
|
|
EXPORT_SYMBOL(pagevec_lookup_range_tag);
|
|
|
|
unsigned pagevec_lookup_range_nr_tag(struct pagevec *pvec,
|
|
struct address_space *mapping, pgoff_t *index, pgoff_t end,
|
|
xa_mark_t tag, unsigned max_pages)
|
|
{
|
|
pvec->nr = find_get_pages_range_tag(mapping, index, end, tag,
|
|
min_t(unsigned int, max_pages, PAGEVEC_SIZE), pvec->pages);
|
|
return pagevec_count(pvec);
|
|
}
|
|
EXPORT_SYMBOL(pagevec_lookup_range_nr_tag);
|
|
/*
|
|
* Perform any setup for the swap system
|
|
*/
|
|
void __init swap_setup(void)
|
|
{
|
|
unsigned long megs = totalram_pages() >> (20 - PAGE_SHIFT);
|
|
|
|
/* Use a smaller cluster for small-memory machines */
|
|
if (megs < 16)
|
|
page_cluster = 2;
|
|
else
|
|
page_cluster = 3;
|
|
/*
|
|
* Right now other parts of the system means that we
|
|
* _really_ don't want to cluster much more
|
|
*/
|
|
}
|