android11-5.4
86 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Greg Kroah-Hartman
|
68fdd20442 |
This is the 5.4.229 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmPHzUQACgkQONu9yGCS aT7QohAAtxV33qGSKGUdKMZk1JzIYuc8tAa+CHZhTi6xjTsoy1a5MlQGrj8a9YQ7 /5VvwslGSn29h/ThO/ai04CfeOsWugMtnuo4mT4+198DgH0CNQMlfWq2c25cCvY6 dIrrMTA7B2YhpdbjM4vkX8QIAxBVCHOVkseSammhMnujP7d+k4LtC6rRV4uiF+lD cKtsIJn8h+pezBeo5+pjvcTwndaAoApVOES4uOjJcf9pYOOoHxyi+8StpiO+j2Pv sRvkbvvmpS+IWAH+DMa3SAFI3C3AihX2Fu0rIFzUZByAviB1NmyWluX5mU54wW3R P80fl0rQFwuygEBU1UqTXe4hQ8YYwpJGAQzbLR22a11IT2MSO+vMRINdqG1un2BE T9hHix5R0JMeIN9AP7nKGBLrEZ3V6DqxEBz6ZC1sOUIIVQv93twtiwb0rNM0e7pq PpkIXpwXPIgqFDGXrd0y5ksRT08jJUKCRttuRVWkcGX8adotngWnrl0WBI5zqSuo B+x8X9Dw7YblJ6yQ+8mAZGk0Mj3j+cb4uhuRaz/6rqHmFOrbHm+JDXvPzZY65xy3 k8Ebtq5CxINLDwahfb/o13MgbmzMPPNPPp0cz23zOhm88OmwVzB4hAoB/1CfHZvF XhSbZMVBhhP9hYr2gYl902EQeZGE5yjk5xhFT5Wrh7QoZaPW2XM= =as6n -----END PGP SIGNATURE----- Merge 5.4.229 into android11-5.4-lts Changes in 5.4.229 tracing/ring-buffer: Only do full wait when cpu != RING_BUFFER_ALL_CPUS udf: Discard preallocation before extending file with a hole udf: Fix preallocation discarding at indirect extent boundary udf: Do not bother looking for prealloc extents if i_lenExtents matches i_size udf: Fix extending file within last block usb: gadget: uvc: Prevent buffer overflow in setup handler USB: serial: option: add Quectel EM05-G modem USB: serial: cp210x: add Kamstrup RF sniffer PIDs USB: serial: f81232: fix division by zero on line-speed change USB: serial: f81534: fix division by zero on line-speed change igb: Initialize mailbox message for VF reset xen-netback: move removal of "hotplug-status" to the right place HID: ite: Add support for Acer S1002 keyboard-dock HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch 10E HID: ite: Enable QUIRK_TOUCHPAD_ON_OFF_REPORT on Acer Aspire Switch V 10 HID: uclogic: Add HID_QUIRK_HIDINPUT_FORCE quirk Bluetooth: L2CAP: Fix u8 overflow net: loopback: use NET_NAME_PREDICTABLE for name_assign_type usb: musb: remove extra check in musb_gadget_vbus_draw ARM: dts: qcom: apq8064: fix coresight compatible arm64: dts: qcom: sdm845-cheza: fix AP suspend pin bias drivers: soc: ti: knav_qmss_queue: Mark knav_acc_firmwares as static arm: dts: spear600: Fix clcd interrupt soc: ti: knav_qmss_queue: Use pm_runtime_resume_and_get instead of pm_runtime_get_sync soc: ti: knav_qmss_queue: Fix PM disable depth imbalance in knav_queue_probe soc: ti: smartreflex: Fix PM disable depth imbalance in omap_sr_probe perf: arm_dsu: Fix hotplug callback leak in dsu_pmu_init() perf/smmuv3: Fix hotplug callback leak in arm_smmu_pmu_init() arm64: dts: mt2712e: Fix unit_address_vs_reg warning for oscillators arm64: dts: mt2712e: Fix unit address for pinctrl node arm64: dts: mt2712-evb: Fix vproc fixed regulators unit names arm64: dts: mt2712-evb: Fix usb vbus regulators unit names arm64: dts: mediatek: mt6797: Fix 26M oscillator unit name ARM: dts: dove: Fix assigned-addresses for every PCIe Root Port ARM: dts: armada-370: Fix assigned-addresses for every PCIe Root Port ARM: dts: armada-xp: Fix assigned-addresses for every PCIe Root Port ARM: dts: armada-375: Fix assigned-addresses for every PCIe Root Port ARM: dts: armada-38x: Fix assigned-addresses for every PCIe Root Port ARM: dts: armada-39x: Fix assigned-addresses for every PCIe Root Port ARM: dts: turris-omnia: Add ethernet aliases ARM: dts: turris-omnia: Add switch port 6 node arm64: dts: armada-3720-turris-mox: Add missing interrupt for RTC pstore/ram: Fix error return code in ramoops_probe() ARM: mmp: fix timer_read delay pstore: Avoid kcore oops by vmap()ing with VM_IOREMAP tpm/tpm_crb: Fix error message in __crb_relinquish_locality() cpuidle: dt: Return the correct numbers of parsed idle states alpha: fix syscall entry in !AUDUT_SYSCALL case PM: hibernate: Fix mistake in kerneldoc comment fs: don't audit the capability check in simple_xattr_list() selftests/ftrace: event_triggers: wait longer for test_event_enable perf: Fix possible memleak in pmu_dev_alloc() timerqueue: Use rb_entry_safe() in timerqueue_getnext() proc: fixup uptime selftest lib/fonts: fix undefined behavior in bit shift for get_default_font ocfs2: fix memory leak in ocfs2_stack_glue_init() MIPS: vpe-mt: fix possible memory leak while module exiting MIPS: vpe-cmp: fix possible memory leak while module exiting selftests/efivarfs: Add checking of the test return value PNP: fix name memory leak in pnp_alloc_dev() perf/x86/intel/uncore: Fix reference count leak in hswep_has_limit_sbox() irqchip: gic-pm: Use pm_runtime_resume_and_get() in gic_probe() EDAC/i10nm: fix refcount leak in pci_get_dev_wrapper() nfsd: don't call nfsd_file_put from client states seqfile display genirq/irqdesc: Don't try to remove non-existing sysfs files cpufreq: amd_freq_sensitivity: Add missing pci_dev_put() libfs: add DEFINE_SIMPLE_ATTRIBUTE_SIGNED for signed value lib/notifier-error-inject: fix error when writing -errno to debugfs file docs: fault-injection: fix non-working usage of negative values debugfs: fix error when writing negative value to atomic_t debugfs file ocfs2: ocfs2_mount_volume does cleanup job before return error ocfs2: rewrite error handling of ocfs2_fill_super ocfs2: fix memory leak in ocfs2_mount_volume() rapidio: fix possible name leaks when rio_add_device() fails rapidio: rio: fix possible name leak in rio_register_mport() clocksource/drivers/sh_cmt: Make sure channel clock supply is enabled ACPICA: Fix use-after-free in acpi_ut_copy_ipackage_to_ipackage() uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix xen/events: only register debug interrupt for 2-level events x86/xen: Fix memory leak in xen_smp_intr_init{_pv}() x86/xen: Fix memory leak in xen_init_lock_cpu() xen/privcmd: Fix a possible warning in privcmd_ioctl_mmap_resource() PM: runtime: Improve path in rpm_idle() when no callback PM: runtime: Do not call __rpm_callback() from rpm_idle() platform/x86: mxm-wmi: fix memleak in mxm_wmi_call_mx[ds|mx]() MIPS: BCM63xx: Add check for NULL for clk in clk_enable MIPS: OCTEON: warn only once if deprecated link status is being used fs: sysv: Fix sysv_nblocks() returns wrong value rapidio: fix possible UAF when kfifo_alloc() fails eventfd: change int to __u64 in eventfd_signal() ifndef CONFIG_EVENTFD relay: fix type mismatch when allocating memory in relay_create_buf() hfs: Fix OOB Write in hfs_asc2mac rapidio: devices: fix missing put_device in mport_cdev_open wifi: ath9k: hif_usb: fix memory leak of urbs in ath9k_hif_usb_dealloc_tx_urbs() wifi: ath9k: hif_usb: Fix use-after-free in ath9k_hif_usb_reg_in_cb() wifi: rtl8xxxu: Fix reading the vendor of combo chips pata_ipx4xx_cf: Fix unsigned comparison with less than zero media: i2c: ad5820: Fix error path can: kvaser_usb: do not increase tx statistics when sending error message frames can: kvaser_usb: kvaser_usb_leaf: Get capabilities from device can: kvaser_usb: kvaser_usb_leaf: Rename {leaf,usbcan}_cmd_error_event to {leaf,usbcan}_cmd_can_error_event can: kvaser_usb: kvaser_usb_leaf: Handle CMD_ERROR_EVENT can: kvaser_usb_leaf: Set Warning state even without bus errors can: kvaser_usb_leaf: Fix improved state not being reported can: kvaser_usb_leaf: Fix wrong CAN state after stopping can: kvaser_usb_leaf: Fix bogus restart events can: kvaser_usb: Add struct kvaser_usb_busparams can: kvaser_usb: Compare requested bittiming parameters with actual parameters in do_set_{,data}_bittiming clk: renesas: r9a06g032: Repair grave increment error spi: Update reference to struct spi_controller drm/panel/panel-sitronix-st7701: Remove panel on DSI attach failure ima: Rename internal filter rule functions ima: Fix fall-through warnings for Clang ima: Handle -ESTALE returned by ima_filter_rule_match() media: vivid: fix compose size exceed boundary bpf: propagate precision in ALU/ALU64 operations mtd: Fix device name leak when register device failed in add_mtd_device() wifi: rsi: Fix handling of 802.3 EAPOL frames sent via control port media: camss: Clean up received buffers on failed start of streaming net, proc: Provide PROC_FS=n fallback for proc_create_net_single_write() rxrpc: Fix ack.bufferSize to be 0 when generating an ack drm/radeon: Add the missed acpi_put_table() to fix memory leak drm/mediatek: Modify dpi power on/off sequence. ASoC: pxa: fix null-pointer dereference in filter() regulator: core: fix unbalanced of node refcount in regulator_dev_lookup() amdgpu/pm: prevent array underflow in vega20_odn_edit_dpm_table() integrity: Fix memory leakage in keyring allocation error path ima: Fix misuse of dereference of pointer in template_desc_init_fields() wifi: ath10k: Fix return value in ath10k_pci_init() mtd: lpddr2_nvm: Fix possible null-ptr-deref Input: elants_i2c - properly handle the reset GPIO when power is off media: solo6x10: fix possible memory leak in solo_sysfs_init() media: platform: exynos4-is: Fix error handling in fimc_md_init() media: videobuf-dma-contig: use dma_mmap_coherent bpf: Move skb->len == 0 checks into __bpf_redirect HID: hid-sensor-custom: set fixed size for custom attributes ALSA: pcm: fix undefined behavior in bit shift for SNDRV_PCM_RATE_KNOT ALSA: seq: fix undefined behavior in bit shift for SNDRV_SEQ_FILTER_USE_EVENT regulator: core: use kfree_const() to free space conditionally clk: rockchip: Fix memory leak in rockchip_clk_register_pll() bonding: Export skip slave logic to function bonding: Rename slave_arr to usable_slaves bonding: fix link recovery in mode 2 when updelay is nonzero mtd: maps: pxa2xx-flash: fix memory leak in probe media: imon: fix a race condition in send_packet() clk: imx8mn: correct the usb1_ctrl parent to be usb_bus clk: imx: replace osc_hdmi with dummy pinctrl: pinconf-generic: add missing of_node_put() media: dvb-core: Fix ignored return value in dvb_register_frontend() media: dvb-usb: az6027: fix null-ptr-deref in az6027_i2c_xfer() media: s5p-mfc: Add variant data for MFC v7 hardware for Exynos 3250 SoC drm/tegra: Add missing clk_disable_unprepare() in tegra_dc_probe() ASoC: dt-bindings: wcd9335: fix reset line polarity in example ASoC: mediatek: mtk-btcvsd: Add checks for write and read of mtk_btcvsd_snd NFSv4.2: Clear FATTR4_WORD2_SECURITY_LABEL when done decoding NFSv4.2: Fix a memory stomp in decode_attr_security_label NFSv4.2: Fix initialisation of struct nfs4_label NFSv4: Fix a deadlock between nfs4_open_recover_helper() and delegreturn ALSA: asihpi: fix missing pci_disable_device() wifi: iwlwifi: mvm: fix double free on tx path. ASoC: mediatek: mt8173: Enable IRQ when pdata is ready drm/radeon: Fix PCI device refcount leak in radeon_atrm_get_bios() drm/amdgpu: Fix PCI device refcount leak in amdgpu_atrm_get_bios() ASoC: pcm512x: Fix PM disable depth imbalance in pcm512x_probe netfilter: conntrack: set icmpv6 redirects as RELATED bpf, sockmap: Fix repeated calls to sock_put() when msg has more_data bpf, sockmap: Fix data loss caused by using apply_bytes on ingress redirect bonding: uninitialized variable in bond_miimon_inspect() spi: spidev: mask SPI_CS_HIGH in SPI_IOC_RD_MODE wifi: cfg80211: Fix not unregister reg_pdev when load_builtin_regdb_keys() fails regulator: core: fix module refcount leak in set_supply() clk: qcom: clk-krait: fix wrong div2 functions hsr: Avoid double remove of a node. configfs: fix possible memory leak in configfs_create_dir() regulator: core: fix resource leak in regulator_register() bpf, sockmap: fix race in sock_map_free() media: saa7164: fix missing pci_disable_device() ALSA: mts64: fix possible null-ptr-defer in snd_mts64_interrupt xprtrdma: Fix regbuf data not freed in rpcrdma_req_create() SUNRPC: Fix missing release socket in rpc_sockname() NFSv4.x: Fail client initialisation if state manager thread can't run mmc: alcor: fix return value check of mmc_add_host() mmc: moxart: fix return value check of mmc_add_host() mmc: mxcmmc: fix return value check of mmc_add_host() mmc: pxamci: fix return value check of mmc_add_host() mmc: rtsx_usb_sdmmc: fix return value check of mmc_add_host() mmc: toshsd: fix return value check of mmc_add_host() mmc: vub300: fix return value check of mmc_add_host() mmc: wmt-sdmmc: fix return value check of mmc_add_host() mmc: atmel-mci: fix return value check of mmc_add_host() mmc: omap_hsmmc: fix return value check of mmc_add_host() mmc: meson-gx: fix return value check of mmc_add_host() mmc: via-sdmmc: fix return value check of mmc_add_host() mmc: wbsd: fix return value check of mmc_add_host() mmc: mmci: fix return value check of mmc_add_host() media: c8sectpfe: Add of_node_put() when breaking out of loop media: coda: Add check for dcoda_iram_alloc media: coda: Add check for kmalloc clk: samsung: Fix memory leak in _samsung_clk_register_pll() spi: spi-gpio: Don't set MOSI as an input if not 3WIRE mode wifi: rtl8xxxu: Add __packed to struct rtl8723bu_c2h wifi: brcmfmac: Fix error return code in brcmf_sdio_download_firmware() blktrace: Fix output non-blktrace event when blk_classic option enabled clk: socfpga: clk-pll: Remove unused variable 'rc' clk: socfpga: use clk_hw_register for a5/c5 clk: socfpga: Fix memory leak in socfpga_gate_init() net: vmw_vsock: vmci: Check memcpy_from_msg() net: defxx: Fix missing err handling in dfx_init() net: stmmac: selftests: fix potential memleak in stmmac_test_arpoffload() drivers: net: qlcnic: Fix potential memory leak in qlcnic_sriov_init() of: overlay: fix null pointer dereferencing in find_dup_cset_node_entry() and find_dup_cset_prop() ethernet: s2io: don't call dev_kfree_skb() under spin_lock_irqsave() net: farsync: Fix kmemleak when rmmods farsync net/tunnel: wait until all sk_user_data reader finish before releasing the sock net: apple: mace: don't call dev_kfree_skb() under spin_lock_irqsave() net: apple: bmac: don't call dev_kfree_skb() under spin_lock_irqsave() net: emaclite: don't call dev_kfree_skb() under spin_lock_irqsave() net: ethernet: dnet: don't call dev_kfree_skb() under spin_lock_irqsave() hamradio: don't call dev_kfree_skb() under spin_lock_irqsave() net: amd: lance: don't call dev_kfree_skb() under spin_lock_irqsave() net: amd-xgbe: Fix logic around active and passive cables net: amd-xgbe: Check only the minimum speed for active/passive cables can: tcan4x5x: Remove invalid write in clear_interrupts net: lan9303: Fix read error execution path ntb_netdev: Use dev_kfree_skb_any() in interrupt context Bluetooth: btusb: don't call kfree_skb() under spin_lock_irqsave() Bluetooth: hci_qca: don't call kfree_skb() under spin_lock_irqsave() Bluetooth: hci_ll: don't call kfree_skb() under spin_lock_irqsave() Bluetooth: hci_h5: don't call kfree_skb() under spin_lock_irqsave() Bluetooth: hci_bcsp: don't call kfree_skb() under spin_lock_irqsave() Bluetooth: hci_core: don't call kfree_skb() under spin_lock_irqsave() Bluetooth: RFCOMM: don't call kfree_skb() under spin_lock_irqsave() stmmac: fix potential division by 0 apparmor: fix a memleak in multi_transaction_new() apparmor: fix lockdep warning when removing a namespace apparmor: Fix abi check to include v8 abi apparmor: Use pointer to struct aa_label for lbs_cred RDMA/core: Fix order of nldev_exit call f2fs: fix normal discard process RDMA/siw: Fix immediate work request flush to completion queue RDMA/nldev: Return "-EAGAIN" if the cm_id isn't from expected port RDMA/siw: Set defined status for work completion with undefined status scsi: scsi_debug: Fix a warning in resp_write_scat() crypto: ccree - swap SHA384 and SHA512 larval hashes at build time crypto: ccree - Remove debugfs when platform_driver_register failed PCI: Check for alloc failure in pci_request_irq() RDMA/hfi: Decrease PCI device reference count in error path crypto: ccree - Make cc_debugfs_global_fini() available for module init function RDMA/rxe: Fix NULL-ptr-deref in rxe_qp_do_cleanup() when socket create failed scsi: hpsa: Fix possible memory leak in hpsa_init_one() crypto: tcrypt - Fix multibuffer skcipher speed test mem leak scsi: mpt3sas: Fix possible resource leaks in mpt3sas_transport_port_add() scsi: hpsa: Fix error handling in hpsa_add_sas_host() scsi: hpsa: Fix possible memory leak in hpsa_add_sas_device() scsi: fcoe: Fix possible name leak when device_register() fails scsi: ipr: Fix WARNING in ipr_init() scsi: fcoe: Fix transport not deattached when fcoe_if_init() fails scsi: snic: Fix possible UAF in snic_tgt_create() RDMA/nldev: Add checks for nla_nest_start() in fill_stat_counter_qps() f2fs: avoid victim selection from previous victim section crypto: omap-sham - Use pm_runtime_resume_and_get() in omap_sham_probe() RDMA/hfi1: Fix error return code in parse_platform_config() orangefs: Fix sysfs not cleanup when dev init failed crypto: img-hash - Fix variable dereferenced before check 'hdev->req' hwrng: amd - Fix PCI device refcount leak hwrng: geode - Fix PCI device refcount leak IB/IPoIB: Fix queue count inconsistency for PKEY child interfaces drivers: dio: fix possible memory leak in dio_init() tty: serial: tegra: Activate RX DMA transfer by request serial: tegra: Read DMA status before terminating class: fix possible memory leak in __class_register() vfio: platform: Do not pass return buffer to ACPI _RST method uio: uio_dmem_genirq: Fix missing unlock in irq configuration uio: uio_dmem_genirq: Fix deadlock between irq config and handling usb: fotg210-udc: Fix ages old endianness issues staging: vme_user: Fix possible UAF in tsi148_dma_list_add usb: typec: Check for ops->exit instead of ops->enter in altmode_exit usb: typec: tcpci: fix of node refcount leak in tcpci_register_port() serial: amba-pl011: avoid SBSA UART accessing DMACR register serial: pl011: Do not clear RX FIFO & RX interrupt in unthrottle. serial: pch: Fix PCI device refcount leak in pch_request_dma() tty: serial: clean up stop-tx part in altera_uart_tx_chars() tty: serial: altera_uart_{r,t}x_chars() need only uart_port serial: altera_uart: fix locking in polling mode serial: sunsab: Fix error handling in sunsab_init() test_firmware: fix memory leak in test_firmware_init() misc: ocxl: fix possible name leak in ocxl_file_register_afu() misc: tifm: fix possible memory leak in tifm_7xx1_switch_media() misc: sgi-gru: fix use-after-free error in gru_set_context_option, gru_fault and gru_handle_user_call_os cxl: fix possible null-ptr-deref in cxl_guest_init_afu|adapter() cxl: fix possible null-ptr-deref in cxl_pci_init_afu|adapter() counter: stm32-lptimer-cnt: fix the check on arr and cmp registers update usb: roles: fix of node refcount leak in usb_role_switch_is_parent() usb: gadget: f_hid: optional SETUP/SET_REPORT mode usb: gadget: f_hid: fix f_hidg lifetime vs cdev usb: gadget: f_hid: fix refcount leak on error path drivers: mcb: fix resource leak in mcb_probe() mcb: mcb-parse: fix error handing in chameleon_parse_gdd() chardev: fix error handling in cdev_device_add() i2c: pxa-pci: fix missing pci_disable_device() on error in ce4100_i2c_probe staging: rtl8192u: Fix use after free in ieee80211_rx() staging: rtl8192e: Fix potential use-after-free in rtllib_rx_Monitor() vme: Fix error not catched in fake_init() i2c: ismt: Fix an out-of-bounds bug in ismt_access() usb: storage: Add check for kcalloc tracing/hist: Fix issue of losting command info in error_log samples: vfio-mdev: Fix missing pci_disable_device() in mdpy_fb_probe() fbdev: ssd1307fb: Drop optional dependency fbdev: pm2fb: fix missing pci_disable_device() fbdev: via: Fix error in via_core_init() fbdev: vermilion: decrease reference count in error path fbdev: uvesafb: Fixes an error handling path in uvesafb_probe() HSI: omap_ssi_core: fix unbalanced pm_runtime_disable() HSI: omap_ssi_core: fix possible memory leak in ssi_probe() power: supply: fix residue sysfs file in error handle route of __power_supply_register() perf trace: Return error if a system call doesn't exist perf trace: Separate 'struct syscall_fmt' definition from syscall_fmts variable perf trace: Factor out the initialization of syscal_arg_fmt->scnprintf perf trace: Add the syscall_arg_fmt pointer to syscall_arg perf trace: Allow associating scnprintf routines with well known arg names perf trace: Add a strtoul() method to 'struct syscall_arg_fmt' perf trace: Use macro RAW_SYSCALL_ARGS_NUM to replace number perf trace: Handle failure when trace point folder is missed perf symbol: correction while adjusting symbol HSI: omap_ssi_core: Fix error handling in ssi_init() power: supply: fix null pointer dereferencing in power_supply_get_battery_info RDMA/siw: Fix pointer cast warning include/uapi/linux/swab: Fix potentially missing __always_inline rtc: snvs: Allow a time difference on clock register read rtc: pcf85063: Fix reading alarm iommu/amd: Fix pci device refcount leak in ppr_notifier() iommu/fsl_pamu: Fix resource leak in fsl_pamu_probe() macintosh: fix possible memory leak in macio_add_one_device() macintosh/macio-adb: check the return value of ioremap() powerpc/52xx: Fix a resource leak in an error handling path cxl: Fix refcount leak in cxl_calc_capp_routing powerpc/xive: add missing iounmap() in error path in xive_spapr_populate_irq_data() powerpc/perf: callchain validate kernel stack pointer bounds powerpc/83xx/mpc832x_rdb: call platform_device_put() in error case in of_fsl_spi_probe() powerpc/hv-gpci: Fix hv_gpci event list selftests/powerpc: Fix resource leaks pwm: sifive: Call pwm_sifive_update_clock() while mutex is held remoteproc: sysmon: fix memory leak in qcom_add_sysmon_subdev() remoteproc: qcom_q6v5_pas: Fix missing of_node_put() in adsp_alloc_memory_region() rtc: st-lpc: Add missing clk_disable_unprepare in st_rtc_probe() rtc: pic32: Move devm_rtc_allocate_device earlier in pic32_rtc_probe() nfsd: Define the file access mode enum for tracing NFSD: Add tracepoints to NFSD's duplicate reply cache nfsd: under NFSv4.1, fix double svc_xprt_put on rpc_create failure mISDN: hfcsusb: don't call dev_kfree_skb/kfree_skb() under spin_lock_irqsave() mISDN: hfcpci: don't call dev_kfree_skb/kfree_skb() under spin_lock_irqsave() mISDN: hfcmulti: don't call dev_kfree_skb/kfree_skb() under spin_lock_irqsave() nfc: pn533: Clear nfc_target before being used r6040: Fix kmemleak in probe and remove rtc: mxc_v2: Add missing clk_disable_unprepare() openvswitch: Fix flow lookup to use unmasked key skbuff: Account for tail adjustment during pull operations mailbox: zynq-ipi: fix error handling while device_register() fails net_sched: reject TCF_EM_SIMPLE case for complex ematch module rxrpc: Fix missing unlock in rxrpc_do_sendmsg() myri10ge: Fix an error handling path in myri10ge_probe() net: stream: purge sk_error_queue in sk_stream_kill_queues() rcu: Fix __this_cpu_read() lockdep warning in rcu_force_quiescent_state() binfmt_misc: fix shift-out-of-bounds in check_special_flags fs: jfs: fix shift-out-of-bounds in dbAllocAG udf: Avoid double brelse() in udf_rename() fs: jfs: fix shift-out-of-bounds in dbDiscardAG ACPICA: Fix error code path in acpi_ds_call_control_method() nilfs2: fix shift-out-of-bounds/overflow in nilfs_sb2_bad_offset() acct: fix potential integer overflow in encode_comp_t() hfs: fix OOB Read in __hfs_brec_find drm/etnaviv: add missing quirks for GC300 brcmfmac: return error when getting invalid max_flowrings from dongle wifi: ath9k: verify the expected usb_endpoints are present wifi: ar5523: Fix use-after-free on ar5523_cmd() timed out ASoC: codecs: rt298: Add quirk for KBL-R RVP platform ipmi: fix memleak when unload ipmi driver bpf: make sure skb->len != 0 when redirecting to a tunneling device net: ethernet: ti: Fix return type of netcp_ndo_start_xmit() hamradio: baycom_epp: Fix return type of baycom_send_packet() wifi: brcmfmac: Fix potential shift-out-of-bounds in brcmf_fw_alloc_request() igb: Do not free q_vector unless new one was allocated s390/ctcm: Fix return type of ctc{mp,}m_tx() s390/netiucv: Fix return type of netiucv_tx() s390/lcs: Fix return type of lcs_start_xmit() drm/rockchip: Use drm_mode_copy() drm/sti: Use drm_mode_copy() drivers/md/md-bitmap: check the return value of md_bitmap_get_counter() md/raid1: stop mdx_raid1 thread when raid1 array run failed net: add atomic_long_t to net_device_stats fields mrp: introduce active flags to prevent UAF when applicant uninit ppp: associate skb with a device at tx bpf: Prevent decl_tag from being referenced in func_proto arg media: dvb-frontends: fix leak of memory fw media: dvbdev: adopts refcnt to avoid UAF media: dvb-usb: fix memory leak in dvb_usb_adapter_init() blk-mq: fix possible memleak when register 'hctx' failed regulator: core: fix use_count leakage when handling boot-on mmc: f-sdh30: Add quirks for broken timeout clock capability media: si470x: Fix use-after-free in si470x_int_in_callback() clk: st: Fix memory leak in st_of_quadfs_setup() hugetlbfs: fix null-ptr-deref in hugetlbfs_parse_param() drm/fsl-dcu: Fix return type of fsl_dcu_drm_connector_mode_valid() drm/sti: Fix return type of sti_{dvo,hda,hdmi}_connector_mode_valid() orangefs: Fix kmemleak in orangefs_prepare_debugfs_help_string() orangefs: Fix kmemleak in orangefs_{kernel,client}_debug_init() ALSA/ASoC: hda: move/rename snd_hdac_ext_stop_streams to hdac_stream.c ALSA: hda: add snd_hdac_stop_streams() helper ASoC: Intel: Skylake: Fix driver hang during shutdown ASoC: mediatek: mt8173-rt5650-rt5514: fix refcount leak in mt8173_rt5650_rt5514_dev_probe() ASoC: audio-graph-card: fix refcount leak of cpu_ep in __graph_for_each_link() ASoC: rockchip: pdm: Add missing clk_disable_unprepare() in rockchip_pdm_runtime_resume() ASoC: wm8994: Fix potential deadlock ASoC: rockchip: spdif: Add missing clk_disable_unprepare() in rk_spdif_runtime_resume() ASoC: rt5670: Remove unbalanced pm_runtime_put() pstore: Switch pmsg_lock to an rt_mutex to avoid priority inversion pstore: Make sure CONFIG_PSTORE_PMSG selects CONFIG_RT_MUTEXES ALSA: hda/realtek: Add quirk for Lenovo TianYi510Pro-14IOB ALSA: hda/hdmi: Add HP Device 0x8711 to force connect list usb: dwc3: core: defer probe on ulpi_read_id timeout HID: wacom: Ensure bootloader PID is usable in hidraw mode reiserfs: Add missing calls to reiserfs_security_free() iio: adc: ad_sigma_delta: do not use internal iio_dev lock iio: adc128s052: add proper .data members in adc128_of_match table regulator: core: fix deadlock on regulator enable gcov: add support for checksum field media: dvbdev: fix build warning due to comments media: dvbdev: fix refcnt bug cifs: fix oops during encryption nvme-pci: fix doorbell buffer value endianness ata: ahci: Fix PCS quirk application for suspend nvme: resync include/linux/nvme.h with nvmecli nvme: fix the NVME_CMD_EFFECTS_CSE_MASK definition objtool: Fix SEGFAULT powerpc/rtas: avoid device tree lookups in rtas_os_term() powerpc/rtas: avoid scheduling in rtas_os_term() HID: multitouch: fix Asus ExpertBook P2 P2451FA trackpoint HID: plantronics: Additional PIDs for double volume key presses quirk hfsplus: fix bug causing custom uid and gid being unable to be assigned with mount ovl: Use ovl mounter's fsuid and fsgid in ovl_link() ALSA: line6: correct midi status byte when receiving data from podxt ALSA: line6: fix stack overflow in line6_midi_transmit pnode: terminate at peers of source md: fix a crash in mempool_free mm, compaction: fix fast_isolate_around() to stay within boundaries f2fs: should put a page when checking the summary info mmc: vub300: fix warning - do not call blocking ops when !TASK_RUNNING tpm: tpm_crb: Add the missed acpi_put_table() to fix memory leak tpm: tpm_tis: Add the missed acpi_put_table() to fix memory leak SUNRPC: Don't leak netobj memory when gss_read_proxy_verf() fails net/af_packet: add VLAN support for AF_PACKET SOCK_RAW GSO net/af_packet: make sure to pull mac header media: stv0288: use explicitly signed char soc: qcom: Select REMAP_MMIO for LLCC driver kest.pl: Fix grub2 menu handling for rebooting ktest.pl minconfig: Unset configs instead of just removing them mmc: sdhci-sprd: Disable CLK_AUTO when the clock is less than 400K btrfs: fix resolving backrefs for inline extent followed by prealloc ARM: ux500: do not directly dereference __iomem arm64: dts: qcom: sdm850-lenovo-yoga-c630: correct I2C12 pins drive strength selftests: Use optional USERCFLAGS and USERLDFLAGS cpufreq: Init completion before kobject_init_and_add() binfmt: Move install_exec_creds after setup_new_exec to match binfmt_elf binfmt: Fix error return code in load_elf_fdpic_binary() dm cache: Fix ABBA deadlock between shrink_slab and dm_cache_metadata_abort dm thin: Fix ABBA deadlock between shrink_slab and dm_pool_abort_metadata dm thin: Use last transaction's pmd->root when commit failed dm thin: Fix UAF in run_timer_softirq() dm integrity: Fix UAF in dm_integrity_dtr() dm clone: Fix UAF in clone_dtr() dm cache: Fix UAF in destroy() dm cache: set needs_check flag after aborting metadata tracing/hist: Fix out-of-bound write on 'action_data.var_ref_idx' x86/microcode/intel: Do not retry microcode reloading on the APs tracing/hist: Fix wrong return value in parse_action_params() tracing: Fix infinite loop in tracing_read_pipe on overflowed print_trace_line ARM: 9256/1: NWFPE: avoid compiler-generated __aeabi_uldivmod media: dvb-core: Fix double free in dvb_register_device() media: dvb-core: Fix UAF due to refcount races at releasing cifs: fix confusing debug message cifs: fix missing display of three mount options md/bitmap: Fix bitmap chunk size overflow issues efi: Add iMac Pro 2017 to uefi skip cert quirk ipmi: fix long wait in unload when IPMI disconnect mtd: spi-nor: Check for zero erase size in spi_nor_find_best_erase_type() ima: Fix a potential NULL pointer access in ima_restore_measurement_list ipmi: fix use after free in _ipmi_destroy_user() PCI: Fix pci_device_is_present() for VFs by checking PF PCI/sysfs: Fix double free in error path crypto: n2 - add missing hash statesize iommu/amd: Fix ivrs_acpihid cmdline parsing code parisc: led: Fix potential null-ptr-deref in start_task() device_cgroup: Roll back to original exceptions after copy failure drm/connector: send hotplug uevent on connector cleanup drm/vmwgfx: Validate the box size for the snooped cursor ext4: add inode table check in __ext4_get_inode_loc to aovid possible infinite loop ext4: fix undefined behavior in bit shift for ext4_check_flag_values ext4: add EXT4_IGET_BAD flag to prevent unexpected bad inode ext4: add helper to check quota inums ext4: fix reserved cluster accounting in __es_remove_extent() ext4: fix bug_on in __es_tree_search caused by bad boot loader inode ext4: init quota for 'old.inode' in 'ext4_rename' ext4: fix delayed allocation bug in ext4_clu_mapped for bigalloc + inline ext4: fix corruption when online resizing a 1K bigalloc fs ext4: fix error code return to user-space in ext4_get_branch() ext4: avoid BUG_ON when creating xattrs ext4: fix inode leak in ext4_xattr_inode_create() on an error path ext4: initialize quota before expanding inode in setproject ioctl ext4: avoid unaccounted block allocation when expanding inode ext4: allocate extended attribute value in vmalloc area btrfs: replace strncpy() with strscpy() PM/devfreq: governor: Add a private governor_data for governor media: s5p-mfc: Fix to handle reference queue during finishing media: s5p-mfc: Clear workbit to handle error condition media: s5p-mfc: Fix in register read and write for H264 dm thin: resume even if in FAIL mode perf probe: Use dwarf_attr_integrate as generic DWARF attr accessor perf probe: Fix to get the DW_AT_decl_file and DW_AT_call_file as unsinged data KVM: x86: optimize more exit handlers in vmx.c KVM: retpolines: x86: eliminate retpoline from vmx.c exit handlers KVM: VMX: Rename INTERRUPT_PENDING to INTERRUPT_WINDOW KVM: VMX: Rename NMI_PENDING to NMI_WINDOW KVM: VMX: Fix the spelling of CPU_BASED_USE_TSC_OFFSETTING KVM: nVMX: Properly expose ENABLE_USR_WAIT_PAUSE control to L1 ravb: Fix "failed to switch device to config mode" message during unbind ext4: goto right label 'failed_mount3a' ext4: correct inconsistent error msg in nojournal mode mm/highmem: Lift memcpy_[to|from]_page to core ext4: use memcpy_to_page() in pagecache_write() fs: ext4: initialize fsdata in pagecache_write() ext4: use kmemdup() to replace kmalloc + memcpy mbcache: don't reclaim used entries mbcache: add functions to delete entry if unused ext4: remove EA inode entry from mbcache on inode eviction ext4: unindent codeblock in ext4_xattr_block_set() ext4: fix race when reusing xattr blocks mbcache: automatically delete entries from cache on freeing ext4: fix deadlock due to mbcache entry corruption SUNRPC: ensure the matching upcall is in-flight upon downcall bpf: pull before calling skb_postpull_rcsum() nfsd: shut down the NFSv4 state objects before the filecache net: hns3: add interrupts re-initialization while doing VF FLR net: sched: fix memory leak in tcindex_set_parms qlcnic: prevent ->dcb use-after-free on qlcnic_dcb_enable() failure nfc: Fix potential resource leaks vhost: fix range used in translate_desc() net: amd-xgbe: add missed tasklet_kill net: phy: xgmiitorgmii: Fix refcount leak in xgmiitorgmii_probe RDMA/uverbs: Silence shiftTooManyBitsSigned warning RDMA/mlx5: Fix validation of max_rd_atomic caps for DC net: sched: atm: dont intepret cls results when asked to drop net: sched: cbq: dont intepret cls results when asked to drop perf tools: Fix resources leak in perf_data__open_dir() drivers/net/bonding/bond_3ad: return when there's no aggregator usb: rndis_host: Secure rndis_query check against int overflow drm/i915: unpin on error in intel_vgpu_shadow_mm_pin() caif: fix memory leak in cfctrl_linkup_request() udf: Fix extension of the last extent in the file ASoC: Intel: bytcr_rt5640: Add quirk for the Advantech MICA-071 tablet x86/bugs: Flush IBP in ib_prctl_set() nfsd: fix handling of readdir in v4root vs. mount upcall timeout riscv: uaccess: fix type of 0 variable on error in get_user() ext4: don't allow journal inode to have encrypt flag hfs/hfsplus: use WARN_ON for sanity check hfs/hfsplus: avoid WARN_ON() for sanity check, use proper error handling mbcache: Avoid nesting of cache->c_list_lock under bit locks parisc: Align parisc MADV_XXX constants with all other architectures selftests: Fix kselftest O=objdir build from cluttering top level objdir selftests: set the BUILD variable to absolute path driver core: Fix bus_type.match() error handling in __driver_attach() net: sched: disallow noqueue for qdisc classes KVM: arm64: Fix S1PTW handling on RO memslots efi: tpm: Avoid READ_ONCE() for accessing the event log docs: Fix the docs build with Sphinx 6.0 perf auxtrace: Fix address filter duplicate symbol selection s390/kexec: fix ipl report address for kdump s390/percpu: add READ_ONCE() to arch_this_cpu_to_op_simple() net/ulp: prevent ULP without clone op from entering the LISTEN status ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF ALSA: hda/hdmi: Add a HP device 0x8715 to force connect list cifs: Fix uninitialized memory read for smb311 posix symlink create drm/msm/adreno: Make adreno quirks not overwrite each other platform/x86: sony-laptop: Don't turn off 0x153 keyboard backlight during probe ixgbe: fix pci device refcount leak ipv6: raw: Deduct extension header length in rawv6_push_pending_frames wifi: wilc1000: sdio: fix module autoloading usb: ulpi: defer ulpi_register on ulpi_read_id timeout jbd2: use the correct print format quota: Factor out setup of quota inode ext4: fix bug_on in __es_tree_search caused by bad quota inode ext4: lost matching-pair of trace in ext4_truncate ext4: fix use-after-free in ext4_orphan_cleanup ext4: fix uninititialized value in 'ext4_evict_inode' netfilter: ipset: Fix overflow before widen in the bitmap_ip_create() function. powerpc/imc-pmu: Fix use of mutex in IRQs disabled section x86/boot: Avoid using Intel mnemonics in AT&T syntax asm EDAC/device: Fix period calculation in edac_device_reset_delay_period() regulator: da9211: Use irq handler when ready tipc: improve throughput between nodes in netns tipc: eliminate checking netns if node established tipc: fix unexpected link reset due to discovery messages hvc/xen: lock console list traversal nfc: pn533: Wait for out_urb's completion in pn533_usb_send_frame() net/sched: act_mpls: Fix warning during failed attribute validation net/mlx5: Rename ptp clock info net/mlx5: Fix ptp max frequency adjustment range iommu/mediatek-v1: Add error handle for mtk_iommu_probe iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe() x86/resctrl: Use task_curr() instead of task_struct->on_cpu to prevent unnecessary IPI x86/resctrl: Fix task CLOSID/RMID update race drm/virtio: Fix GEM handle creation UAF arm64: atomics: format whitespace consistently arm64: atomics: remove LL/SC trampolines arm64: cmpxchg_double*: hazard against entire exchange variable efi: fix NULL-deref in init error path mm: Always release pages to the buddy allocator in memblock_free_late(). Revert "usb: ulpi: defer ulpi_register on ulpi_read_id timeout" tipc: fix use-after-free in tipc_disc_rcv() tty: serial: tegra: Handle RX transfer in PIO mode if DMA wasn't started tipc: Add a missing case of TIPC_DIRECT_MSG type ocfs2: fix freeing uninitialized resource on ocfs2_dlm_shutdown tipc: call tipc_lxc_xmit without holding node_read_lock Linux 5.4.229 Change-Id: If8e35d5d3e707352766ae3e4b665fd2369d9382b Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Christian Brauner
|
2dae4211b5 |
pnode: terminate at peers of source
commit 11933cf1d91d57da9e5c53822a540bbdc2656c16 upstream. The propagate_mnt() function handles mount propagation when creating mounts and propagates the source mount tree @source_mnt to all applicable nodes of the destination propagation mount tree headed by @dest_mnt. Unfortunately it contains a bug where it fails to terminate at peers of @source_mnt when looking up copies of the source mount that become masters for copies of the source mount tree mounted on top of slaves in the destination propagation tree causing a NULL dereference. Once the mechanics of the bug are understood it's easy to trigger. Because of unprivileged user namespaces it is available to unprivileged users. While fixing this bug we've gotten confused multiple times due to unclear terminology or missing concepts. So let's start this with some clarifications: * The terms "master" or "peer" denote a shared mount. A shared mount belongs to a peer group. * A peer group is a set of shared mounts that propagate to each other. They are identified by a peer group id. The peer group id is available in @shared_mnt->mnt_group_id. Shared mounts within the same peer group have the same peer group id. The peers in a peer group can be reached via @shared_mnt->mnt_share. * The terms "slave mount" or "dependent mount" denote a mount that receives propagation from a peer in a peer group. IOW, shared mounts may have slave mounts and slave mounts have shared mounts as their master. Slave mounts of a given peer in a peer group are listed on that peers slave list available at @shared_mnt->mnt_slave_list. * The term "master mount" denotes a mount in a peer group. IOW, it denotes a shared mount or a peer mount in a peer group. The term "master mount" - or "master" for short - is mostly used when talking in the context of slave mounts that receive propagation from a master mount. A master mount of a slave identifies the closest peer group a slave mount receives propagation from. The master mount of a slave can be identified via @slave_mount->mnt_master. Different slaves may point to different masters in the same peer group. * Multiple peers in a peer group can have non-empty ->mnt_slave_lists. Non-empty ->mnt_slave_lists of peers don't intersect. Consequently, to ensure all slave mounts of a peer group are visited the ->mnt_slave_lists of all peers in a peer group have to be walked. * Slave mounts point to a peer in the closest peer group they receive propagation from via @slave_mnt->mnt_master (see above). Together with these peers they form a propagation group (see below). The closest peer group can thus be identified through the peer group id @slave_mnt->mnt_master->mnt_group_id of the peer/master that a slave mount receives propagation from. * A shared-slave mount is a slave mount to a peer group pg1 while also a peer in another peer group pg2. IOW, a peer group may receive propagation from another peer group. If a peer group pg1 is a slave to another peer group pg2 then all peers in peer group pg1 point to the same peer in peer group pg2 via ->mnt_master. IOW, all peers in peer group pg1 appear on the same ->mnt_slave_list. IOW, they cannot be slaves to different peer groups. * A pure slave mount is a slave mount that is a slave to a peer group but is not a peer in another peer group. * A propagation group denotes the set of mounts consisting of a single peer group pg1 and all slave mounts and shared-slave mounts that point to a peer in that peer group via ->mnt_master. IOW, all slave mounts such that @slave_mnt->mnt_master->mnt_group_id is equal to @shared_mnt->mnt_group_id. The concept of a propagation group makes it easier to talk about a single propagation level in a propagation tree. For example, in propagate_mnt() the immediate peers of @dest_mnt and all slaves of @dest_mnt's peer group form a propagation group propg1. So a shared-slave mount that is a slave in propg1 and that is a peer in another peer group pg2 forms another propagation group propg2 together with all slaves that point to that shared-slave mount in their ->mnt_master. * A propagation tree refers to all mounts that receive propagation starting from a specific shared mount. For example, for propagate_mnt() @dest_mnt is the start of a propagation tree. The propagation tree ecompasses all mounts that receive propagation from @dest_mnt's peer group down to the leafs. With that out of the way let's get to the actual algorithm. We know that @dest_mnt is guaranteed to be a pure shared mount or a shared-slave mount. This is guaranteed by a check in attach_recursive_mnt(). So propagate_mnt() will first propagate the source mount tree to all peers in @dest_mnt's peer group: for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) { ret = propagate_one(n); if (ret) goto out; } Notice, that the peer propagation loop of propagate_mnt() doesn't propagate @dest_mnt itself. @dest_mnt is mounted directly in attach_recursive_mnt() after we propagated to the destination propagation tree. The mount that will be mounted on top of @dest_mnt is @source_mnt. This copy was created earlier even before we entered attach_recursive_mnt() and doesn't concern us a lot here. It's just important to notice that when propagate_mnt() is called @source_mnt will not yet have been mounted on top of @dest_mnt. Thus, @source_mnt->mnt_parent will either still point to @source_mnt or - in the case @source_mnt is moved and thus already attached - still to its former parent. For each peer @m in @dest_mnt's peer group propagate_one() will create a new copy of the source mount tree and mount that copy @child on @m such that @child->mnt_parent points to @m after propagate_one() returns. propagate_one() will stash the last destination propagation node @m in @last_dest and the last copy it created for the source mount tree in @last_source. Hence, if we call into propagate_one() again for the next destination propagation node @m, @last_dest will point to the previous destination propagation node and @last_source will point to the previous copy of the source mount tree and mounted on @last_dest. Each new copy of the source mount tree is created from the previous copy of the source mount tree. This will become important later. The peer loop in propagate_mnt() is straightforward. We iterate through the peers copying and updating @last_source and @last_dest as we go through them and mount each copy of the source mount tree @child on a peer @m in @dest_mnt's peer group. After propagate_mnt() handled the peers in @dest_mnt's peer group propagate_mnt() will propagate the source mount tree down the propagation tree that @dest_mnt's peer group propagates to: for (m = next_group(dest_mnt, dest_mnt); m; m = next_group(m, dest_mnt)) { /* everything in that slave group */ n = m; do { ret = propagate_one(n); if (ret) goto out; n = next_peer(n); } while (n != m); } The next_group() helper will recursively walk the destination propagation tree, descending into each propagation group of the propagation tree. The important part is that it takes care to propagate the source mount tree to all peers in the peer group of a propagation group before it propagates to the slaves to those peers in the propagation group. IOW, it creates and mounts copies of the source mount tree that become masters before it creates and mounts copies of the source mount tree that become slaves to these masters. It is important to remember that propagating the source mount tree to each mount @m in the destination propagation tree simply means that we create and mount new copies @child of the source mount tree on @m such that @child->mnt_parent points to @m. Since we know that each node @m in the destination propagation tree headed by @dest_mnt's peer group will be overmounted with a copy of the source mount tree and since we know that the propagation properties of each copy of the source mount tree we create and mount at @m will mostly mirror the propagation properties of @m. We can use that information to create and mount the copies of the source mount tree that become masters before their slaves. The easy case is always when @m and @last_dest are peers in a peer group of a given propagation group. In that case we know that we can simply copy @last_source without having to figure out what the master for the new copy @child of the source mount tree needs to be as we've done that in a previous call to propagate_one(). The hard case is when we're dealing with a slave mount or a shared-slave mount @m in a destination propagation group that we need to create and mount a copy of the source mount tree on. For each propagation group in the destination propagation tree we propagate the source mount tree to we want to make sure that the copies @child of the source mount tree we create and mount on slaves @m pick an ealier copy of the source mount tree that we mounted on a master @m of the destination propagation group as their master. This is a mouthful but as far as we can tell that's the core of it all. But, if we keep track of the masters in the destination propagation tree @m we can use the information to find the correct master for each copy of the source mount tree we create and mount at the slaves in the destination propagation tree @m. Let's walk through the base case as that's still fairly easy to grasp. If we're dealing with the first slave in the propagation group that @dest_mnt is in then we don't yet have marked any masters in the destination propagation tree. We know the master for the first slave to @dest_mnt's peer group is simple @dest_mnt. So we expect this algorithm to yield a copy of the source mount tree that was mounted on a peer in @dest_mnt's peer group as the master for the copy of the source mount tree we want to mount at the first slave @m: for (n = m; ; n = p) { p = n->mnt_master; if (p == dest_master || IS_MNT_MARKED(p)) break; } For the first slave we walk the destination propagation tree all the way up to a peer in @dest_mnt's peer group. IOW, the propagation hierarchy can be walked by walking up the @mnt->mnt_master hierarchy of the destination propagation tree @m. We will ultimately find a peer in @dest_mnt's peer group and thus ultimately @dest_mnt->mnt_master. Btw, here the assumption we listed at the beginning becomes important. Namely, that peers in a peer group pg1 that are slaves in another peer group pg2 appear on the same ->mnt_slave_list. IOW, all slaves who are peers in peer group pg1 point to the same peer in peer group pg2 via their ->mnt_master. Otherwise the termination condition in the code above would be wrong and next_group() would be broken too. So the first iteration sets: n = m; p = n->mnt_master; such that @p now points to a peer or @dest_mnt itself. We walk up one more level since we don't have any marked mounts. So we end up with: n = dest_mnt; p = dest_mnt->mnt_master; If @dest_mnt's peer group is not slave to another peer group then @p is now NULL. If @dest_mnt's peer group is a slave to another peer group then @p now points to @dest_mnt->mnt_master points which is a master outside the propagation tree we're dealing with. Now we need to figure out the master for the copy of the source mount tree we're about to create and mount on the first slave of @dest_mnt's peer group: do { struct mount *parent = last_source->mnt_parent; if (last_source == first_source) break; done = parent->mnt_master == p; if (done && peers(n, parent)) break; last_source = last_source->mnt_master; } while (!done); We know that @last_source->mnt_parent points to @last_dest and @last_dest is the last peer in @dest_mnt's peer group we propagated to in the peer loop in propagate_mnt(). Consequently, @last_source is the last copy we created and mount on that last peer in @dest_mnt's peer group. So @last_source is the master we want to pick. We know that @last_source->mnt_parent->mnt_master points to @last_dest->mnt_master. We also know that @last_dest->mnt_master is either NULL or points to a master outside of the destination propagation tree and so does @p. Hence: done = parent->mnt_master == p; is trivially true in the base condition. We also know that for the first slave mount of @dest_mnt's peer group that @last_dest either points @dest_mnt itself because it was initialized to: last_dest = dest_mnt; at the beginning of propagate_mnt() or it will point to a peer of @dest_mnt in its peer group. In both cases it is guaranteed that on the first iteration @n and @parent are peers (Please note the check for peers here as that's important.): if (done && peers(n, parent)) break; So, as we expected, we select @last_source, which referes to the last copy of the source mount tree we mounted on the last peer in @dest_mnt's peer group, as the master of the first slave in @dest_mnt's peer group. The rest is taken care of by clone_mnt(last_source, ...). We'll skip over that part otherwise this becomes a blogpost. At the end of propagate_mnt() we now mark @m->mnt_master as the first master in the destination propagation tree that is distinct from @dest_mnt->mnt_master. IOW, we mark @dest_mnt itself as a master. By marking @dest_mnt or one of it's peers we are able to easily find it again when we later lookup masters for other copies of the source mount tree we mount copies of the source mount tree on slaves @m to @dest_mnt's peer group. This, in turn allows us to find the master we selected for the copies of the source mount tree we mounted on master in the destination propagation tree again. The important part is to realize that the code makes use of the fact that the last copy of the source mount tree stashed in @last_source was mounted on top of the previous destination propagation node @last_dest. What this means is that @last_source allows us to walk the destination propagation hierarchy the same way each destination propagation node @m does. If we take @last_source, which is the copy of @source_mnt we have mounted on @last_dest in the previous iteration of propagate_one(), then we know @last_source->mnt_parent points to @last_dest but we also know that as we walk through the destination propagation tree that @last_source->mnt_master will point to an earlier copy of the source mount tree we mounted one an earlier destination propagation node @m. IOW, @last_source->mnt_parent will be our hook into the destination propagation tree and each consecutive @last_source->mnt_master will lead us to an earlier propagation node @m via @last_source->mnt_master->mnt_parent. Hence, by walking up @last_source->mnt_master, each of which is mounted on a node that is a master @m in the destination propagation tree we can also walk up the destination propagation hierarchy. So, for each new destination propagation node @m we use the previous copy of @last_source and the fact it's mounted on the previous propagation node @last_dest via @last_source->mnt_master->mnt_parent to determine what the master of the new copy of @last_source needs to be. The goal is to find the _closest_ master that the new copy of the source mount tree we are about to create and mount on a slave @m in the destination propagation tree needs to pick. IOW, we want to find a suitable master in the propagation group. As the propagation structure of the source mount propagation tree we create mirrors the propagation structure of the destination propagation tree we can find @m's closest master - i.e., a marked master - which is a peer in the closest peer group that @m receives propagation from. We store that closest master of @m in @p as before and record the slave to that master in @n We then search for this master @p via @last_source by walking up the master hierarchy starting from the last copy of the source mount tree stored in @last_source that we created and mounted on the previous destination propagation node @m. We will try to find the master by walking @last_source->mnt_master and by comparing @last_source->mnt_master->mnt_parent->mnt_master to @p. If we find @p then we can figure out what earlier copy of the source mount tree needs to be the master for the new copy of the source mount tree we're about to create and mount at the current destination propagation node @m. If @last_source->mnt_master->mnt_parent and @n are peers then we know that the closest master they receive propagation from is @last_source->mnt_master->mnt_parent->mnt_master. If not then the closest immediate peer group that they receive propagation from must be one level higher up. This builds on the earlier clarification at the beginning that all peers in a peer group which are slaves of other peer groups all point to the same ->mnt_master, i.e., appear on the same ->mnt_slave_list, of the closest peer group that they receive propagation from. However, terminating the walk has corner cases. If the closest marked master for a given destination node @m cannot be found by walking up the master hierarchy via @last_source->mnt_master then we need to terminate the walk when we encounter @source_mnt again. This isn't an arbitrary termination. It simply means that the new copy of the source mount tree we're about to create has a copy of the source mount tree we created and mounted on a peer in @dest_mnt's peer group as its master. IOW, @source_mnt is the peer in the closest peer group that the new copy of the source mount tree receives propagation from. We absolutely have to stop @source_mnt because @last_source->mnt_master either points outside the propagation hierarchy we're dealing with or it is NULL because @source_mnt isn't a shared-slave. So continuing the walk past @source_mnt would cause a NULL dereference via @last_source->mnt_master->mnt_parent. And so we have to stop the walk when we encounter @source_mnt again. One scenario where this can happen is when we first handled a series of slaves of @dest_mnt's peer group and then encounter peers in a new peer group that is a slave to @dest_mnt's peer group. We handle them and then we encounter another slave mount to @dest_mnt that is a pure slave to @dest_mnt's peer group. That pure slave will have a peer in @dest_mnt's peer group as its master. Consequently, the new copy of the source mount tree will need to have @source_mnt as it's master. So we walk the propagation hierarchy all the way up to @source_mnt based on @last_source->mnt_master. So terminate on @source_mnt, easy peasy. Except, that the check misses something that the rest of the algorithm already handles. If @dest_mnt has peers in it's peer group the peer loop in propagate_mnt(): for (n = next_peer(dest_mnt); n != dest_mnt; n = next_peer(n)) { ret = propagate_one(n); if (ret) goto out; } will consecutively update @last_source with each previous copy of the source mount tree we created and mounted at the previous peer in @dest_mnt's peer group. So after that loop terminates @last_source will point to whatever copy of the source mount tree was created and mounted on the last peer in @dest_mnt's peer group. Furthermore, if there is even a single additional peer in @dest_mnt's peer group then @last_source will __not__ point to @source_mnt anymore. Because, as we mentioned above, @dest_mnt isn't even handled in this loop but directly in attach_recursive_mnt(). So it can't even accidently come last in that peer loop. So the first time we handle a slave mount @m of @dest_mnt's peer group the copy of the source mount tree we create will make the __last copy of the source mount tree we created and mounted on the last peer in @dest_mnt's peer group the master of the new copy of the source mount tree we create and mount on the first slave of @dest_mnt's peer group__. But this means that the termination condition that checks for @source_mnt is wrong. The @source_mnt cannot be found anymore by propagate_one(). Instead it will find the last copy of the source mount tree we created and mounted for the last peer of @dest_mnt's peer group again. And that is a peer of @source_mnt not @source_mnt itself. IOW, we fail to terminate the loop correctly and ultimately dereference @last_source->mnt_master->mnt_parent. When @source_mnt's peer group isn't slave to another peer group then @last_source->mnt_master is NULL causing the splat below. For example, assume @dest_mnt is a pure shared mount and has three peers in its peer group: =================================================================================== mount-id mount-parent-id peer-group-id =================================================================================== (@dest_mnt) mnt_master[216] 309 297 shared:216 \ (@source_mnt) mnt_master[218]: 609 609 shared:218 (1) mnt_master[216]: 607 605 shared:216 \ (P1) mnt_master[218]: 624 607 shared:218 (2) mnt_master[216]: 576 574 shared:216 \ (P2) mnt_master[218]: 625 576 shared:218 (3) mnt_master[216]: 545 543 shared:216 \ (P3) mnt_master[218]: 626 545 shared:218 After this sequence has been processed @last_source will point to (P3), the copy generated for the third peer in @dest_mnt's peer group we handled. So the copy of the source mount tree (P4) we create and mount on the first slave of @dest_mnt's peer group: =================================================================================== mount-id mount-parent-id peer-group-id =================================================================================== mnt_master[216] 309 297 shared:216 / / (S0) mnt_slave 483 481 master:216 \ \ (P3) mnt_master[218] 626 545 shared:218 \ / \/ (P4) mnt_slave 627 483 master:218 will pick the last copy of the source mount tree (P3) as master, not (S0). When walking the propagation hierarchy via @last_source's master hierarchy we encounter (P3) but not (S0), i.e., @source_mnt. We can fix this in multiple ways: (1) By setting @last_source to @source_mnt after we processed the peers in @dest_mnt's peer group right after the peer loop in propagate_mnt(). (2) By changing the termination condition that relies on finding exactly @source_mnt to finding a peer of @source_mnt. (3) By only moving @last_source when we actually venture into a new peer group or some clever variant thereof. The first two options are minimally invasive and what we want as a fix. The third option is more intrusive but something we'd like to explore in the near future. This passes all LTP tests and specifically the mount propagation testsuite part of it. It also holds up against all known reproducers of this issues. Final words. First, this is a clever but __worringly__ underdocumented algorithm. There isn't a single detailed comment to be found in next_group(), propagate_one() or anywhere else in that file for that matter. This has been a giant pain to understand and work through and a bug like this is insanely difficult to fix without a detailed understanding of what's happening. Let's not talk about the amount of time that was sunk into fixing this. Second, all the cool kids with access to unshare --mount --user --map-root --propagation=unchanged are going to have a lot of fun. IOW, triggerable by unprivileged users while namespace_lock() lock is held. [ 115.848393] BUG: kernel NULL pointer dereference, address: 0000000000000010 [ 115.848967] #PF: supervisor read access in kernel mode [ 115.849386] #PF: error_code(0x0000) - not-present page [ 115.849803] PGD 0 P4D 0 [ 115.850012] Oops: 0000 [#1] PREEMPT SMP PTI [ 115.850354] CPU: 0 PID: 15591 Comm: mount Not tainted 6.1.0-rc7 #3 [ 115.850851] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 115.851510] RIP: 0010:propagate_one.part.0+0x7f/0x1a0 [ 115.851924] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10 49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01 00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37 02 4d [ 115.853441] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282 [ 115.853865] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00 [ 115.854458] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780 [ 115.855044] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0 [ 115.855693] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8 [ 115.856304] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000 [ 115.856859] FS: 00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000) knlGS:0000000000000000 [ 115.857531] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.858006] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0 [ 115.858598] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 115.859393] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 115.860099] Call Trace: [ 115.860358] <TASK> [ 115.860535] propagate_mnt+0x14d/0x190 [ 115.860848] attach_recursive_mnt+0x274/0x3e0 [ 115.861212] path_mount+0x8c8/0xa60 [ 115.861503] __x64_sys_mount+0xf6/0x140 [ 115.861819] do_syscall_64+0x5b/0x80 [ 115.862117] ? do_faccessat+0x123/0x250 [ 115.862435] ? syscall_exit_to_user_mode+0x17/0x40 [ 115.862826] ? do_syscall_64+0x67/0x80 [ 115.863133] ? syscall_exit_to_user_mode+0x17/0x40 [ 115.863527] ? do_syscall_64+0x67/0x80 [ 115.863835] ? do_syscall_64+0x67/0x80 [ 115.864144] ? do_syscall_64+0x67/0x80 [ 115.864452] ? exc_page_fault+0x70/0x170 [ 115.864775] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 115.865187] RIP: 0033:0x7f92c92b0ebe [ 115.865480] Code: 48 8b 0d 75 4f 0c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 49 89 ca b8 a5 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 42 4f 0c 00 f7 d8 64 89 01 48 [ 115.866984] RSP: 002b:00007fff000aa728 EFLAGS: 00000246 ORIG_RAX: 00000000000000a5 [ 115.867607] RAX: ffffffffffffffda RBX: 000055a77888d6b0 RCX: 00007f92c92b0ebe [ 115.868240] RDX: 000055a77888d8e0 RSI: 000055a77888e6e0 RDI: 000055a77888e620 [ 115.868823] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000001 [ 115.869403] R10: 0000000000001000 R11: 0000000000000246 R12: 000055a77888e620 [ 115.869994] R13: 000055a77888d8e0 R14: 00000000ffffffff R15: 00007f92c93e4076 [ 115.870581] </TASK> [ 115.870763] Modules linked in: nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set rfkill nf_tables nfnetlink qrtr snd_intel8x0 sunrpc snd_ac97_codec ac97_bus snd_pcm snd_timer intel_rapl_msr intel_rapl_common snd vboxguest intel_powerclamp video rapl joydev soundcore i2c_piix4 wmi fuse zram xfs vmwgfx crct10dif_pclmul crc32_pclmul crc32c_intel polyval_clmulni polyval_generic drm_ttm_helper ttm e1000 ghash_clmulni_intel serio_raw ata_generic pata_acpi scsi_dh_rdac scsi_dh_emc scsi_dh_alua dm_multipath [ 115.875288] CR2: 0000000000000010 [ 115.875641] ---[ end trace 0000000000000000 ]--- [ 115.876135] RIP: 0010:propagate_one.part.0+0x7f/0x1a0 [ 115.876551] Code: 75 eb 4c 8b 05 c2 25 37 02 4c 89 ca 48 8b 4a 10 49 39 d0 74 1e 48 3b 81 e0 00 00 00 74 26 48 8b 92 e0 00 00 00 be 01 00 00 00 <48> 8b 4a 10 49 39 d0 75 e2 40 84 f6 74 38 4c 89 05 84 25 37 02 4d [ 115.878086] RSP: 0018:ffffb8d5443d7d50 EFLAGS: 00010282 [ 115.878511] RAX: ffff8e4d87c41c80 RBX: ffff8e4d88ded780 RCX: ffff8e4da4333a00 [ 115.879128] RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff8e4d88ded780 [ 115.879715] RBP: ffff8e4d88ded780 R08: ffff8e4da4338000 R09: ffff8e4da43388c0 [ 115.880359] R10: 0000000000000002 R11: ffffb8d540158000 R12: ffffb8d5443d7da8 [ 115.880962] R13: ffff8e4d88ded780 R14: 0000000000000000 R15: 0000000000000000 [ 115.881548] FS: 00007f92c90c9800(0000) GS:ffff8e4dfdc00000(0000) knlGS:0000000000000000 [ 115.882234] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 115.882713] CR2: 0000000000000010 CR3: 0000000022f4c002 CR4: 00000000000706f0 [ 115.883314] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 115.883966] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Fixes: |
||
Greg Kroah-Hartman
|
ae0dae9ffc |
This is the 5.4.37 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAl6tF/4ACgkQONu9yGCS aT7GdA//U9Nzp0upthsH5IMqIOwaJQBEwXF83fTResLKPSNjq6wAYO6kQwdTBMZ1 PUo/ZEmOnDigdHM3PCGw+Z779UCb9/2laH+KPnPTnst9LcM0sLJMsgoCIuqsyl8J mPDLCbx4f7/ffkw/cSb+JrqCn/2mFib3uCwktTSqxVWm9S7EcE3CRxSTEE1XP/z6 FzDPCjeNijNa3U96NnHFcKXEo/vcaEKHIB9bgdR7kUuRKGBhXSjv7LWUV/940F2w eyGgW5A+o94dsCORx2aOgBwOoujAto/DxDihv4jm/S5HTg68hqWQxqWerlsy0PFP k7j854aaHamIJjt5SE2MTm9YxnvWh4rpbXjuYDLYLM1jLaACZ+5mIj+w18yrpmOs 7vjlHBBBTt4xNbODML4KLrj+fCdXk4uEBy7sWi/qYPUmrV3CLK1DqcqRQ9toS+yh o22JwyVYuD2os0YMYikqSVRlCe4UwJcW0ZZfOFg2cpB9anG7i+DrzW9Lc6CuPsHo ZC9rdVNEHLh9Ti9zcXrs8AFjxoIbP/m0n+ZH7bQPo1/rWE4+fzj14wtKslGtkT0B 00/Vo9mtmmBC0MVBignbWsq5aE3bFLWTOveJppjgAVXYJ7mQhtnvw4eFSJahtBa0 s+SB9M6kGNvWpL11cokqIaVfklDjaMo0Jeakd78KdobeNOgBvug= =TNyS -----END PGP SIGNATURE----- Merge 5.4.37 into android-5.4-stable Changes in 5.4.37 remoteproc: Fix wrong rvring index computation ubifs: Fix ubifs_tnc_lookup() usage in do_kill_orphans() printk: queue wake_up_klogd irq_work only if per-CPU areas are ready ASoC: stm32: sai: fix sai probe usb: dwc3: gadget: Do link recovery for SS and SSP kbuild: fix DT binding schema rule again to avoid needless rebuilds usb: gadget: udc: bdc: Remove unnecessary NULL checks in bdc_req_complete usb: gadget: udc: atmel: Fix vbus disconnect handling afs: Make record checking use TASK_UNINTERRUPTIBLE when appropriate afs: Fix to actually set AFS_SERVER_FL_HAVE_EPOCH iio:ad7797: Use correct attribute_group propagate_one(): mnt_set_mountpoint() needs mount_lock counter: 104-quad-8: Add lock guards - generic interface s390/ftrace: fix potential crashes when switching tracers ASoC: q6dsp6: q6afe-dai: add missing channels to MI2S DAIs ASoC: tas571x: disable regulators on failed probe ASoC: meson: axg-card: fix codec-to-codec link setup ASoC: wm8960: Fix wrong clock after suspend & resume drivers: soc: xilinx: fix firmware driver Kconfig dependency nfsd: memory corruption in nfsd4_lock() bpf: Forbid XADD on spilled pointers for unprivileged users i2c: altera: use proper variable to hold errno rxrpc: Fix DATA Tx to disable nofrag for UDP on AF_INET6 socket net/cxgb4: Check the return from t4_query_params properly xfs: acquire superblock freeze protection on eofblocks scans svcrdma: Fix trace point use-after-free race svcrdma: Fix leak of svc_rdma_recv_ctxt objects net/mlx5e: Don't trigger IRQ multiple times on XSK wakeup to avoid WQ overruns net/mlx5e: Get the latest values from counters in switchdev mode PCI: Avoid ASMedia XHCI USB PME# from D0 defect PCI: Add ACS quirk for Zhaoxin multi-function devices PCI: Make ACS quirk implementations more uniform PCI: Unify ACS quirk desired vs provided checking PCI: Add Zhaoxin Vendor ID PCI: Add ACS quirk for Zhaoxin Root/Downstream Ports PCI: Move Apex Edge TPU class quirk to fix BAR assignment ARM: dts: bcm283x: Disable dsi0 node cpumap: Avoid warning when CONFIG_DEBUG_PER_CPU_MAPS is enabled s390/pci: do not set affinity for floating irqs net/mlx5: Fix failing fw tracer allocation on s390 sched/core: Fix reset-on-fork from RT with uclamp perf/core: fix parent pid/tid in task exit events netfilter: nat: fix error handling upon registering inet hook PM: sleep: core: Switch back to async_schedule_dev() blk-iocost: Fix error on iocost_ioc_vrate_adj um: ensure `make ARCH=um mrproper` removes arch/$(SUBARCH)/include/generated/ bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension bpf, x86_32: Fix clobbering of dst for BPF_JSET bpf, x86_32: Fix logic error in BPF_LDX zero-extension mm: shmem: disable interrupt when acquiring info->lock in userfaultfd_copy path xfs: clear PF_MEMALLOC before exiting xfsaild thread bpf, x86: Fix encoding for lower 8-bit registers in BPF_STX BPF_B libbpf: Initialize *nl_pid so gcc 10 is happy net: fec: set GPR bit on suspend by DT configuration. x86: hyperv: report value of misc_features signal: check sig before setting info in kill_pid_usb_asyncio afs: Fix length of dump of bad YFSFetchStatus record xfs: fix partially uninitialized structure in xfs_reflink_remap_extent ALSA: hda: Release resources at error in delayed probe ALSA: hda: Keep the controller initialization even if no codecs found ALSA: hda: Explicitly permit using autosuspend if runtime PM is supported scsi: target: fix PR IN / READ FULL STATUS for FC scsi: target: tcmu: reset_ring should reset TCMU_DEV_BIT_BROKEN objtool: Fix CONFIG_UBSAN_TRAP unreachable warnings objtool: Support Clang non-section symbols in ORC dump xen/xenbus: ensure xenbus_map_ring_valloc() returns proper grant status ALSA: hda: call runtime_allow() for all hda controllers net: stmmac: socfpga: Allow all RGMII modes mac80211: fix channel switch trigger from unknown mesh peer arm64: Delete the space separator in __emit_inst ext4: use matching invalidatepage in ext4_writepage ext4: increase wait time needed before reuse of deleted inode numbers ext4: convert BUG_ON's to WARN_ON's in mballoc.c blk-mq: Put driver tag in blk_mq_dispatch_rq_list() when no budget hwmon: (jc42) Fix name to have no illegal characters taprio: do not use BIT() in TCA_TAPRIO_ATTR_FLAG_* definitions qed: Fix race condition between scheduling and destroying the slowpath workqueue Crypto: chelsio - Fixes a hang issue during driver registration net: use indirect call wrappers for skb_copy_datagram_iter() qed: Fix use after free in qed_chain_free ext4: check for non-zero journal inum in ext4_calculate_overhead ASoC: soc-core: disable route checks for legacy devices ASoC: stm32: spdifrx: fix regmap status check Linux 5.4.37 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: Ice2ab2e77117b798ed22e9442f72a44f39be28dc |
||
Al Viro
|
db66fd5fef |
propagate_one(): mnt_set_mountpoint() needs mount_lock
commit b0d3869ce9eeacbb1bbd541909beeef4126426d5 upstream. ... to protect the modification of mp->m_count done by it. Most of the places that modify that thing also have namespace_lock held, but not all of them can do so, so we really need mount_lock here. Kudos to Piotr Krysiuk <piotras@gmail.com>, who'd spotted a related bug in pivot_root(2) (fixed unnoticed in 5.3); search for other similar turds has caught out this one. Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Daniel Rosenberg
|
8e31748d2c |
ANDROID: mnt: Add filesystem private data to mount points
This starts to add private data associated directly to mount points. The intent is to give filesystems a sense of where they have come from, as a means of letting a filesystem take different actions based on this information. Bug: 62094374 Bug: 120446149 Bug: 122428178 Change-Id: Ie769d7b3bb2f5972afe05c1bf16cf88c91647ab2 Signed-off-by: Daniel Rosenberg <drosen@google.com> [astrachan: Folded 89a54ed3bf68 ("ANDROID: mnt: Fix next_descendent") into this patch] [drosen: Folded 138993ea820 ("Android: mnt: Propagate remount correctly") into this patch, integrated fs_context things Now has update_mnt_data instead of needing remount2 Signed-off-by: Alistair Strachan <astrachan@google.com> |
||
Christian Brauner
|
d728cf7916 |
fs/namespace: fix unprivileged mount propagation
When propagating mounts across mount namespaces owned by different user
namespaces it is not possible anymore to move or umount the mount in the
less privileged mount namespace.
Here is a reproducer:
sudo mount -t tmpfs tmpfs /mnt
sudo --make-rshared /mnt
# create unprivileged user + mount namespace and preserve propagation
unshare -U -m --map-root --propagation=unchanged
# now change back to the original mount namespace in another terminal:
sudo mkdir /mnt/aaa
sudo mount -t tmpfs tmpfs /mnt/aaa
# now in the unprivileged user + mount namespace
mount --move /mnt/aaa /opt
Unfortunately, this is a pretty big deal for userspace since this is
e.g. used to inject mounts into running unprivileged containers.
So this regression really needs to go away rather quickly.
The problem is that a recent change falsely locked the root of the newly
added mounts by setting MNT_LOCKED. Fix this by only locking the mounts
on copy_mnt_ns() and not when adding a new mount.
Fixes:
|
||
Thomas Gleixner
|
59bd9ded4d |
treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 209
Based on 1 normalized pattern(s): released under gpl v2 extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 15 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steve Winslow <swinslow@gmail.com> Reviewed-by: Allison Randal <allison@lohutok.net> Reviewed-by: Alexios Zavras <alexios.zavras@intel.com> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190528171438.895196075@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Al Viro
|
3bd045cc9c |
separate copying and locking mount tree on cross-userns copies
Rather than having propagate_mnt() check doing unprivileged copies, lock them before commit_tree(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
David Howells
|
e262e32d6b |
vfs: Suppress MS_* flag defs within the kernel unless explicitly enabled
Only the mount namespace code that implements mount(2) should be using the MS_* flags. Suppress them inside the kernel unless uapi/linux/mount.h is included. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Reviewed-by: David Howells <dhowells@redhat.com> |
||
Eric W. Biederman
|
296990deb3 |
mnt: Make propagate_umount less slow for overlapping mount propagation trees
Andrei Vagin pointed out that time to executue propagate_umount can go
non-linear (and take a ludicrious amount of time) when the mount
propogation trees of the mounts to be unmunted by a lazy unmount
overlap.
Make the walk of the mount propagation trees nearly linear by
remembering which mounts have already been visited, allowing
subsequent walks to detect when walking a mount propgation tree or a
subtree of a mount propgation tree would be duplicate work and to skip
them entirely.
Walk the list of mounts whose propgatation trees need to be traversed
from the mount highest in the mount tree to mounts lower in the mount
tree so that odds are higher that the code will walk the largest trees
first, allowing later tree walks to be skipped entirely.
Add cleanup_umount_visitation to remover the code's memory of which
mounts have been visited.
Add the functions last_slave and skip_propagation_subtree to allow
skipping appropriate parts of the mount propagation tree without
needing to change the logic of the rest of the code.
A script to generate overlapping mount propagation trees:
$ cat runs.h
set -e
mount -t tmpfs zdtm /mnt
mkdir -p /mnt/1 /mnt/2
mount -t tmpfs zdtm /mnt/1
mount --make-shared /mnt/1
mkdir /mnt/1/1
iteration=10
if [ -n "$1" ] ; then
iteration=$1
fi
for i in $(seq $iteration); do
mount --bind /mnt/1/1 /mnt/1/1
done
mount --rbind /mnt/1 /mnt/2
TIMEFORMAT='%Rs'
nr=$(( ( 2 ** ( $iteration + 1 ) ) + 1 ))
echo -n "umount -l /mnt/1 -> $nr "
time umount -l /mnt/1
nr=$(cat /proc/self/mountinfo | grep zdtm | wc -l )
time umount -l /mnt/2
$ for i in $(seq 9 19); do echo $i; unshare -Urm bash ./run.sh $i; done
Here are the performance numbers with and without the patch:
mhash | 8192 | 8192 | 1048576 | 1048576
mounts | before | after | before | after
------------------------------------------------
1025 | 0.040s | 0.016s | 0.038s | 0.019s
2049 | 0.094s | 0.017s | 0.080s | 0.018s
4097 | 0.243s | 0.019s | 0.206s | 0.023s
8193 | 1.202s | 0.028s | 1.562s | 0.032s
16385 | 9.635s | 0.036s | 9.952s | 0.041s
32769 | 60.928s | 0.063s | 44.321s | 0.064s
65537 | | 0.097s | | 0.097s
131073 | | 0.233s | | 0.176s
262145 | | 0.653s | | 0.344s
524289 | | 2.305s | | 0.735s
1048577 | | 7.107s | | 2.603s
Andrei Vagin reports fixing the performance problem is part of the
work to fix CVE-2016-6213.
Cc: stable@vger.kernel.org
Fixes:
|
||
Eric W. Biederman
|
99b19d1647 |
mnt: In propgate_umount handle visiting mounts in any order
While investigating some poor umount performance I realized that in
the case of overlapping mount trees where some of the mounts are locked
the code has been failing to unmount all of the mounts it should
have been unmounting.
This failure to unmount all of the necessary
mounts can be reproduced with:
$ cat locked_mounts_test.sh
mount -t tmpfs test-base /mnt
mount --make-shared /mnt
mkdir -p /mnt/b
mount -t tmpfs test1 /mnt/b
mount --make-shared /mnt/b
mkdir -p /mnt/b/10
mount -t tmpfs test2 /mnt/b/10
mount --make-shared /mnt/b/10
mkdir -p /mnt/b/10/20
mount --rbind /mnt/b /mnt/b/10/20
unshare -Urm --propagation unchaged /bin/sh -c 'sleep 5; if [ $(grep test /proc/self/mountinfo | wc -l) -eq 1 ] ; then echo SUCCESS ; else echo FAILURE ; fi'
sleep 1
umount -l /mnt/b
wait %%
$ unshare -Urm ./locked_mounts_test.sh
This failure is corrected by removing the prepass that marks mounts
that may be umounted.
A first pass is added that umounts mounts if possible and if not sets
mount mark if they could be unmounted if they weren't locked and adds
them to a list to umount possibilities. This first pass reconsiders
the mounts parent if it is on the list of umount possibilities, ensuring
that information of umoutability will pass from child to mount parent.
A second pass then walks through all mounts that are umounted and processes
their children unmounting them or marking them for reparenting.
A last pass cleans up the state on the mounts that could not be umounted
and if applicable reparents them to their first parent that remained
mounted.
While a bit longer than the old code this code is much more robust
as it allows information to flow up from the leaves and down
from the trunk making the order in which mounts are encountered
in the umount propgation tree irrelevant.
Cc: stable@vger.kernel.org
Fixes:
|
||
Eric W. Biederman
|
570487d3fa |
mnt: In umount propagation reparent in a separate pass
It was observed that in some pathlogical cases that the current code
does not unmount everything it should. After investigation it
was determined that the issue is that mnt_change_mntpoint can
can change which mounts are available to be unmounted during mount
propagation which is wrong.
The trivial reproducer is:
$ cat ./pathological.sh
mount -t tmpfs test-base /mnt
cd /mnt
mkdir 1 2 1/1
mount --bind 1 1
mount --make-shared 1
mount --bind 1 2
mount --bind 1/1 1/1
mount --bind 1/1 1/1
echo
grep test-base /proc/self/mountinfo
umount 1/1
echo
grep test-base /proc/self/mountinfo
$ unshare -Urm ./pathological.sh
The expected output looks like:
46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
The output without the fix looks like:
46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
49 54 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
50 53 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
51 49 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
54 47 0:25 /1/1 /mnt/1/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
53 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
52 50 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
46 31 0:25 / /mnt rw,relatime - tmpfs test-base rw,uid=1000,gid=1000
47 46 0:25 /1 /mnt/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
48 46 0:25 /1 /mnt/2 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
52 48 0:25 /1/1 /mnt/2/1 rw,relatime shared:1 - tmpfs test-base rw,uid=1000,gid=1000
That last mount in the output was in the propgation tree to be unmounted but
was missed because the mnt_change_mountpoint changed it's parent before the walk
through the mount propagation tree observed it.
Cc: stable@vger.kernel.org
Fixes:
|
||
Eric W. Biederman
|
1064f874ab |
mnt: Tuck mounts under others instead of creating shadow/side mounts.
Ever since mount propagation was introduced in cases where a mount in
propagated to parent mount mountpoint pair that is already in use the
code has placed the new mount behind the old mount in the mount hash
table.
This implementation detail is problematic as it allows creating
arbitrary length mount hash chains.
Furthermore it invalidates the constraint maintained elsewhere in the
mount code that a parent mount and a mountpoint pair will have exactly
one mount upon them. Making it hard to deal with and to talk about
this special case in the mount code.
Modify mount propagation to notice when there is already a mount at
the parent mount and mountpoint where a new mount is propagating to
and place that preexisting mount on top of the new mount.
Modify unmount propagation to notice when a mount that is being
unmounted has another mount on top of it (and no other children), and
to replace the unmounted mount with the mount on top of it.
Move the MNT_UMUONT test from __lookup_mnt_last into
__propagate_umount as that is the only call of __lookup_mnt_last where
MNT_UMOUNT may be set on any mount visible in the mount hash table.
These modifications allow:
- __lookup_mnt_last to be removed.
- attach_shadows to be renamed __attach_mnt and its shadow
handling to be removed.
- commit_tree to be simplified
- copy_tree to be simplified
The result is an easier to understand tree of mounts that does not
allow creation of arbitrary length hash chains in the mount hash table.
The result is also a very slight userspace visible difference in semantics.
The following two cases now behave identically, where before order
mattered:
case 1: (explicit user action)
B is a slave of A
mount something on A/a , it will propagate to B/a
and than mount something on B/a
case 2: (tucked mount)
B is a slave of A
mount something on B/a
and than mount something on A/a
Histroically umount A/a would fail in case 1 and succeed in case 2.
Now umount A/a succeeds in both configurations.
This very small change in semantics appears if anything to be a bug
fix to me and my survey of userspace leads me to believe that no programs
will notice or care of this subtle semantic change.
v2: Updated to mnt_change_mountpoint to not call dput or mntput
and instead to decrement the counts directly. It is guaranteed
that there will be other references when mnt_change_mountpoint is
called so this is safe.
v3: Moved put_mountpoint under mount_lock in attach_recursive_mnt
As the locking in fs/namespace.c changed between v2 and v3.
v4: Reworked the logic in propagate_mount_busy and __propagate_umount
that detects when a mount completely covers another mount.
v5: Removed unnecessary tests whose result is alwasy true in
find_topper and attach_recursive_mnt.
v6: Document the user space visible semantic difference.
Cc: stable@vger.kernel.org
Fixes:
|
||
Al Viro
|
5235d448c4 |
reorganize do_make_slave()
Make sure that clone_mnt() never returns a mount with MNT_SHARED in flags, but without a valid ->mnt_group_id. That allows to demystify do_make_slave() quite a bit, among other things. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Eric W. Biederman
|
d29216842a |
mnt: Add a per mount namespace limit on the number of mounts
CAI Qian <caiqian@redhat.com> pointed out that the semantics of shared subtrees make it possible to create an exponentially increasing number of mounts in a mount namespace. mkdir /tmp/1 /tmp/2 mount --make-rshared / for i in $(seq 1 20) ; do mount --bind /tmp/1 /tmp/2 ; done Will create create 2^20 or 1048576 mounts, which is a practical problem as some people have managed to hit this by accident. As such CVE-2016-6213 was assigned. Ian Kent <raven@themaw.net> described the situation for autofs users as follows: > The number of mounts for direct mount maps is usually not very large because of > the way they are implemented, large direct mount maps can have performance > problems. There can be anywhere from a few (likely case a few hundred) to less > than 10000, plus mounts that have been triggered and not yet expired. > > Indirect mounts have one autofs mount at the root plus the number of mounts that > have been triggered and not yet expired. > > The number of autofs indirect map entries can range from a few to the common > case of several thousand and in rare cases up to between 30000 and 50000. I've > not heard of people with maps larger than 50000 entries. > > The larger the number of map entries the greater the possibility for a large > number of active mounts so it's not hard to expect cases of a 1000 or somewhat > more active mounts. So I am setting the default number of mounts allowed per mount namespace at 100,000. This is more than enough for any use case I know of, but small enough to quickly stop an exponential increase in mounts. Which should be perfect to catch misconfigurations and malfunctioning programs. For anyone who needs a higher limit this can be changed by writing to the new /proc/sys/fs/mount-max sysctl. Tested-by: CAI Qian <caiqian@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
5ec0811d30 |
propogate_mnt: Handle the first propogated copy being a slave
When the first propgated copy was a slave the following oops would result:
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
> IP: [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> PGD bacd4067 PUD bac66067 PMD 0
> Oops: 0000 [#1] SMP
> Modules linked in:
> CPU: 1 PID: 824 Comm: mount Not tainted 4.6.0-rc5userns+ #1523
> Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
> task: ffff8800bb0a8000 ti: ffff8800bac3c000 task.ti: ffff8800bac3c000
> RIP: 0010:[<ffffffff811fba4e>] [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> RSP: 0018:ffff8800bac3fd38 EFLAGS: 00010283
> RAX: 0000000000000000 RBX: ffff8800bb77ec00 RCX: 0000000000000010
> RDX: 0000000000000000 RSI: ffff8800bb58c000 RDI: ffff8800bb58c480
> RBP: ffff8800bac3fd48 R08: 0000000000000001 R09: 0000000000000000
> R10: 0000000000001ca1 R11: 0000000000001c9d R12: 0000000000000000
> R13: ffff8800ba713800 R14: ffff8800bac3fda0 R15: ffff8800bb77ec00
> FS: 00007f3c0cd9b7e0(0000) GS:ffff8800bfb00000(0000) knlGS:0000000000000000
> CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000010 CR3: 00000000bb79d000 CR4: 00000000000006e0
> Stack:
> ffff8800bb77ec00 0000000000000000 ffff8800bac3fd88 ffffffff811fbf85
> ffff8800bac3fd98 ffff8800bb77f080 ffff8800ba713800 ffff8800bb262b40
> 0000000000000000 0000000000000000 ffff8800bac3fdd8 ffffffff811f1da0
> Call Trace:
> [<ffffffff811fbf85>] propagate_mnt+0x105/0x140
> [<ffffffff811f1da0>] attach_recursive_mnt+0x120/0x1e0
> [<ffffffff811f1ec3>] graft_tree+0x63/0x70
> [<ffffffff811f1f6b>] do_add_mount+0x9b/0x100
> [<ffffffff811f2c1a>] do_mount+0x2aa/0xdf0
> [<ffffffff8117efbe>] ? strndup_user+0x4e/0x70
> [<ffffffff811f3a45>] SyS_mount+0x75/0xc0
> [<ffffffff8100242b>] do_syscall_64+0x4b/0xa0
> [<ffffffff81988f3c>] entry_SYSCALL64_slow_path+0x25/0x25
> Code: 00 00 75 ec 48 89 0d 02 22 22 01 8b 89 10 01 00 00 48 89 05 fd 21 22 01 39 8e 10 01 00 00 0f 84 e0 00 00 00 48 8b 80 d8 00 00 00 <48> 8b 50 10 48 89 05 df 21 22 01 48 89 15 d0 21 22 01 8b 53 30
> RIP [<ffffffff811fba4e>] propagate_one+0xbe/0x1c0
> RSP <ffff8800bac3fd38>
> CR2: 0000000000000010
> ---[ end trace 2725ecd95164f217 ]---
This oops happens with the namespace_sem held and can be triggered by
non-root users. An all around not pleasant experience.
To avoid this scenario when finding the appropriate source mount to
copy stop the walk up the mnt_master chain when the first source mount
is encountered.
Further rewrite the walk up the last_source mnt_master chain so that
it is clear what is going on.
The reason why the first source mount is special is that it it's
mnt_parent is not a mount in the dest_mnt propagation tree, and as
such termination conditions based up on the dest_mnt mount propgation
tree do not make sense.
To avoid other kinds of confusion last_dest is not changed when
computing last_source. last_dest is only used once in propagate_one
and that is above the point of the code being modified, so changing
the global variable is meaningless and confusing.
Cc: stable@vger.kernel.org
fixes:
|
||
Maxim Patlasov
|
7ae8fd0351 |
fs/pnode.c: treat zero mnt_group_id-s as unequal
propagate_one(m) calculates "type" argument for copy_tree() like this: > if (m->mnt_group_id == last_dest->mnt_group_id) { > type = CL_MAKE_SHARED; > } else { > type = CL_SLAVE; > if (IS_MNT_SHARED(m)) > type |= CL_MAKE_SHARED; > } The "type" argument then governs clone_mnt() behavior with respect to flags and mnt_master of new mount. When we iterate through a slave group, it is possible that both current "m" and "last_dest" are not shared (although, both are slaves, i.e. have non-NULL mnt_master-s). Then the comparison above erroneously makes new mount shared and sets its mnt_master to last_source->mnt_master. The patch fixes the problem by handling zero mnt_group_id-s as though they are unequal. The similar problem exists in the implementation of "else" clause above when we have to ascend upward in the master/slave tree by calling: > last_source = last_source->mnt_master; > last_dest = last_source->mnt_parent; proper number of times. The last step is governed by "n->mnt_group_id != last_dest->mnt_group_id" condition that may lie if both are zero. The patch fixes this case in the same way as the former one. [AV: don't open-code an obvious helper...] Signed-off-by: Maxim Patlasov <mpatlasov@virtuozzo.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Eric W. Biederman
|
0c56fe3142 |
mnt: Don't propagate unmounts to locked mounts
If the first mount in shared subtree is locked don't unmount the shared subtree. This is ensured by walking through the mounts parents before children and marking a mount as unmountable if it is not locked or it is locked but it's parent is marked. This allows recursive mount detach to propagate through a set of mounts when unmounting them would not reveal what is under any locked mount. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
5d88457eb5 |
mnt: On an unmount propagate clearing of MNT_LOCKED
A prerequisite of calling umount_tree is that the point where the tree is mounted at is valid to unmount. If we are propagating the effect of the unmount clear MNT_LOCKED in every instance where the same filesystem is mounted on the same mountpoint in the mount tree, as we know (by virtue of the fact that umount_tree was called) that it is safe to reveal what is at that mountpoint. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
411a938b5a |
mnt: Delay removal from the mount hash.
- Modify __lookup_mnt_hash_last to ignore mounts that have MNT_UMOUNTED set. - Don't remove mounts from the mount hash table in propogate_umount - Don't remove mounts from the mount hash table in umount_tree before the entire list of mounts to be umounted is selected. - Remove mounts from the mount hash table as the last thing that happens in the case where a mount has a parent in umount_tree. Mounts without parents are not hashed (by definition). This paves the way for delaying removal from the mount hash table even farther and fixing the MNT_LOCKED vs MNT_DETACH issue. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
590ce4bcbf |
mnt: Add MNT_UMOUNT flag
In some instances it is necessary to know if the the unmounting process has begun on a mount. Add MNT_UMOUNT to make that reliably testable. This fix gets used in fixing locked mounts in MNT_DETACH Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
c003b26ff9 |
mnt: In umount_tree reuse mnt_list instead of mnt_hash
umount_tree builds a list of mounts that need to be unmounted. Utilize mnt_list for this purpose instead of mnt_hash. This begins to allow keeping a mount on the mnt_hash after it is unmounted, which is necessary for a properly functioning MNT_LOCKED implementation. The fact that mnt_list is an ordinary list makding available list_move is nice bonus. Cc: stable@vger.kernel.org Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
8486a7882b |
mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
Clear MNT_LOCKED in the callers of copy_tree except copy_mnt_ns, and collect_mounts. In copy_mnt_ns it is necessary to create an exact copy of a mount tree, so not clearing MNT_LOCKED is important. Similarly collect_mounts is used to take a snapshot of the mount tree for audit logging purposes and auditing using a faithful copy of the tree is important. This becomes particularly significant when we start setting MNT_LOCKED on rootfs to prevent it from being unmounted. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Al Viro
|
88b368f27a |
get rid of propagate_umount() mistakenly treating slaves as busy.
The check in __propagate_umount() ("has somebody explicitly mounted something on that slave?") is done *before* taking the already doomed victims out of the child lists. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
f2ebb3a921 |
smarter propagate_mnt()
The current mainline has copies propagated to *all* nodes, then tears down the copies we made for nodes that do not contain counterparts of the desired mountpoint. That sets the right propagation graph for the copies (at teardown time we move the slaves of removed node to a surviving peer or directly to master), but we end up paying a fairly steep price in useless allocations. It's fairly easy to create a situation where N calls of mount(2) create exactly N bindings, with O(N^2) vfsmounts allocated and freed in process. Fortunately, it is possible to avoid those allocations/freeings. The trick is to create copies in the right order and find which one would've eventually become a master with the current algorithm. It turns out to be possible in O(nodes getting propagation) time and with no extra allocations at all. One part is that we need to make sure that eventual master will be created before its slaves, so we need to walk the propagation tree in a different order - by peer groups. And iterate through the peers before dealing with the next group. Another thing is finding the (earlier) copy that will be a master of one we are about to create; to do that we are (temporary) marking the masters of mountpoints we are attaching the copies to. Either we are in a peer of the last mountpoint we'd dealt with, or we have the following situation: we are attaching to mountpoint M, the last copy S_0 had been attached to M_0 and there are sequences S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i}, S_{i} mounted on M{i} and we need to create a slave of the first S_{k} such that M is getting propagation from M_{k}. It means that the master of M_{k} will be among the sequence of masters of M. On the other hand, the nearest marked node in that sequence will either be the master of M_{k} or the master of M_{k-1} (the latter - in the case if M_{k-1} is a slave of something M gets propagation from, but in a wrong peer group). So we go through the sequence of masters of M until we find a marked one (P). Let N be the one before it. Then we go through the sequence of masters of S_0 until we find one (say, S) mounted on a node D that has P as master and check if D is a peer of N. If it is, S will be the master of new copy, if not - the master of S will be. That's it for the hard part; the rest is fairly simple. Iterator is in next_group(), handling of one prospective mountpoint is propagate_one(). It seems to survive all tests and gives a noticably better performance than the current mainline for setups that are seriously using shared subtrees. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
38129a13e6 |
switch mnt_hash to hlist
fixes RCU bug - walking through hlist is safe in face of element moves, since it's self-terminating. Cyclic lists are not - if we end up jumping to another hash chain, we'll loop infinitely without ever hitting the original list head. [fix for dumb braino folded] Spotted by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
474279dc0f |
split __lookup_mnt() in two functions
Instead of passing the direction as argument (and checking it on every step through the hash chain), just have separate __lookup_mnt() and __lookup_mnt_last(). And use the standard iterators... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
719ea2fbb5 |
new helpers: lock_mount_hash/unlock_mount_hash
aka br_write_{lock,unlock} of vfsmount_lock. Inlines in fs/mount.h, vfsmount_lock extern moved over there as well. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
aba809cf09 |
namespace.c: get rid of mnt_ghosts
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Takashi Iwai
|
5d477b6079 |
vfs: Fix invalid ida_remove() call
When the group id of a shared mount is not allocated, the umount still tries to call mnt_release_group_id(), which eventually hits a kernel warning at ida_remove() spewing a message like: ida_remove called for id=0 which is not allocated. This patch fixes the bug simply checking the group id in the caller. Reported-by: Cristian Rodríguez <crrodriguez@opensuse.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Linus Torvalds
|
20b4fb4852 |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull VFS updates from Al Viro, Misc cleanups all over the place, mainly wrt /proc interfaces (switch create_proc_entry to proc_create(), get rid of the deprecated create_proc_read_entry() in favor of using proc_create_data() and seq_file etc). 7kloc removed. * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (204 commits) don't bother with deferred freeing of fdtables proc: Move non-public stuff from linux/proc_fs.h to fs/proc/internal.h proc: Make the PROC_I() and PDE() macros internal to procfs proc: Supply a function to remove a proc entry by PDE take cgroup_open() and cpuset_open() to fs/proc/base.c ppc: Clean up scanlog ppc: Clean up rtas_flash driver somewhat hostap: proc: Use remove_proc_subtree() drm: proc: Use remove_proc_subtree() drm: proc: Use minor->index to label things, not PDE->name drm: Constify drm_proc_list[] zoran: Don't print proc_dir_entry data in debug reiserfs: Don't access the proc_dir_entry in r_open(), r_start() r_show() proc: Supply an accessor for getting the data from a PDE's parent airo: Use remove_proc_subtree() rtl8192u: Don't need to save device proc dir PDE rtl8187se: Use a dir under /proc/net/r8180/ proc: Add proc_mkdir_data() proc: Move some bits from linux/proc_fs.h to linux/{of.h,signal.h,tty.h} proc: Move PDE_NET() to fs/proc/proc_net.c ... |
||
Al Viro
|
328e6d9014 |
switch unlock_mount() to namespace_unlock(), convert all umount_tree() callers
which allows to kill the last argument of umount_tree() and make release_mounts() static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
84d17192d2 |
get rid of full-hash scan on detaching vfsmounts
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Eric W. Biederman
|
132c94e31b |
vfs: Carefully propogate mounts across user namespaces
As a matter of policy MNT_READONLY should not be changable if the original mounter had more privileges than creator of the mount namespace. Add the flag CL_UNPRIVILEGED to note when we are copying a mount from a mount namespace that requires more privileges to a mount namespace that requires fewer privileges. When the CL_UNPRIVILEGED flag is set cause clone_mnt to set MNT_NO_REMOUNT if any of the mnt flags that should never be changed are set. This protects both mount propagation and the initial creation of a less privileged mount namespace. Cc: stable@vger.kernel.org Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Reported-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
David Howells
|
be34d1a3bc |
VFS: Make clone_mnt()/copy_tree()/collect_mounts() return errors
copy_tree() can theoretically fail in a case other than ENOMEM, but always returns NULL which is interpreted by callers as -ENOMEM. Change it to return an explicit error. Also change clone_mnt() for consistency and because union mounts will add new error cases. Thanks to Andreas Gruenbacher <agruen@suse.de> for a bug fix. [AV: folded braino fix by Dan Carpenter] Original-author: Valerie Aurora <vaurora@redhat.com> Signed-off-by: David Howells <dhowells@redhat.com> Cc: Valerie Aurora <valerie.aurora@gmail.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Andi Kleen
|
962830df36 |
brlocks/lglocks: API cleanups
lglocks and brlocks are currently generated with some complicated macros in lglock.h. But there's no reason to not just use common utility functions and put all the data into a common data structure. In preparation, this patch changes the API to look more like normal function calls with pointers, not magic macros. The patch is rather large because I move over all users in one go to keep it bisectable. This impacts the VFS somewhat in terms of lines changed. But no actual behaviour change. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
fc7be130c7 |
vfs: switch pnode.h macros to struct mount *
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
863d684f94 |
vfs: move the rest of int fields to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
15169fe784 |
vfs: mnt_id/mnt_group_id moved
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
143c8c91ce |
vfs: mnt_ns moved to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
6776db3d32 |
vfs: take mnt_share/mnt_slave/mnt_slave_list and mnt_expire to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
32301920f4 |
vfs: and now we can make ->mnt_master point to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
d10e8def07 |
vfs: take mnt_master to struct mount
make IS_MNT_SLAVE take struct mount * at the same time Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
14cf1fa8f5 |
vfs: spread struct mount - remaining argument of mnt_set_mountpoint()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
a8d56d8e4f |
vfs: spread struct mount - propagate_mnt()
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
c937135d98 |
vfs: spread struct mount - shared subtree iterators
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
6fc7871fed |
vfs: spread struct mount - get_dominating_id / do_make_slave
next pile of horrors, similar to mnt_parent one; this time it's mnt_master. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
6b41d536f7 |
vfs: take mnt_child/mnt_mounts to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
83adc75322 |
vfs: spread struct mount - work with counters
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Al Viro
|
a73324da7a |
vfs: move mnt_mountpoint to struct mount
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |