lineage-22.0
653 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
Greg Kroah-Hartman
|
875057880e |
This is the 5.10.222 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmaY9zYACgkQONu9yGCS aT6v5g//WMifSZz85CUFaqgs65rwVfhTMpYtUeL5LiDuy+SMou6ViV3A93FpTkmj FJBvrr2y0bn8Y5Dp/fwYj10XUz+THZte/yEVnPh/NkV107FZD3fKa6GTnJY7H/XY 4SoOGfPB4yfx+MpN6ZpLsu4cAt6FW8P+QfKOxBEboGkJSGpjEbGYFMtyZAMjknia QE8cKQ3LnMrQzHIizil5dZVlYaiMgJtlKTtUeVI1ixmaGDb3rCsnCVvMRvZnW95V aSgyJNrNix7a5tRgYwZHZp4t3p9iT2lyIFM3/y7TKcglVCMPw4nbsDdLNNq11qrk RdTdScR+9eKyJsEGVYOhXZFUFzOgHW22xyx0CCZmDMeu08WPNl4vhGewnndQy3yd 6jdTRYDrU6SQNQ0AjRZXcdmfopIQxetHE7ZEKvbgBW6+u9oySYU8phPCNkma2JWr O2eY5AOF8zgPAdAzvF9Bt/qTlwLNjP0zczoIRX7HSvV03Nh9cQvgzKdSCfuPDU4a FX7mlokgweYa7WoWGPkzOlgMaJZksqstDnhbuwONoMPrNFTUjgm429K87iPdwzqC Yv4uDrpFXgkhfD4Aoks4wDpE2LgBKWz5Wnpo+WW4fjcrXtcIV2tTD9FkMjBv3ECv A8TTWsXxQtm3V54R4h7fAXg9KnZBuIYYDnB2u1317ZdaDkZRuPQ= =X2/A -----END PGP SIGNATURE----- Merge 5.10.222 into android12-5.10-lts Changes in 5.10.222 Compiler Attributes: Add __uninitialized macro drm/lima: fix shared irq handling on driver remove media: dvb: as102-fe: Fix as10x_register_addr packing media: dvb-usb: dib0700_devices: Add missing release_firmware() IB/core: Implement a limit on UMAD receive List scsi: qedf: Make qedf_execute_tmf() non-preemptible crypto: aead,cipher - zeroize key buffer after use drm/amdgpu: Initialize timestamp for some legacy SOCs drm/amd/display: Check index msg_id before read or write drm/amd/display: Check pipe offset before setting vblank drm/amd/display: Skip finding free audio for unknown engine_id media: dw2102: Don't translate i2c read into write sctp: prefer struct_size over open coded arithmetic firmware: dmi: Stop decoding on broken entry Input: ff-core - prefer struct_size over open coded arithmetic net: dsa: mv88e6xxx: Correct check for empty list media: dvb-frontends: tda18271c2dd: Remove casting during div media: s2255: Use refcount_t instead of atomic_t for num_channels media: dvb-frontends: tda10048: Fix integer overflow i2c: i801: Annotate apanel_addr as __ro_after_init powerpc/64: Set _IO_BASE to POISON_POINTER_DELTA not 0 for CONFIG_PCI=n orangefs: fix out-of-bounds fsid access kunit: Fix timeout message powerpc/xmon: Check cpu id in commands "c#", "dp#" and "dx#" bpf: Avoid uninitialized value in BPF_CORE_READ_BITFIELD jffs2: Fix potential illegal address access in jffs2_free_inode s390/pkey: Wipe sensitive data on failure UPSTREAM: tcp: fix DSACK undo in fast recovery to call tcp_try_to_open() tcp_metrics: validate source addr length wifi: wilc1000: fix ies_len type in connect path bonding: Fix out-of-bounds read in bond_option_arp_ip_targets_set() selftests: fix OOM in msg_zerocopy selftest selftests: make order checking verbose in msg_zerocopy selftest inet_diag: Initialize pad field in struct inet_diag_req_v2 nilfs2: fix inode number range checks nilfs2: add missing check for inode numbers on directory entries mm: optimize the redundant loop of mm_update_owner_next() mm: avoid overflows in dirty throttling logic Bluetooth: qca: Fix BT enable failure again for QCA6390 after warm reboot can: kvaser_usb: Explicitly initialize family in leafimx driver_info struct fsnotify: Do not generate events for O_PATH file descriptors Revert "mm/writeback: fix possible divide-by-zero in wb_dirty_limits(), again" drm/nouveau: fix null pointer dereference in nouveau_connector_get_modes drm/amdgpu/atomfirmware: silence UBSAN warning mtd: rawnand: Bypass a couple of sanity checks during NAND identification bnx2x: Fix multiple UBSAN array-index-out-of-bounds bpf, sockmap: Fix sk->sk_forward_alloc warn_on in sk_stream_kill_queues ima: Avoid blocking in RCU read-side critical section media: dw2102: fix a potential buffer overflow i2c: pnx: Fix potential deadlock warning from del_timer_sync() call in isr ALSA: hda/realtek: Enable headset mic of JP-IK LEAP W502 with ALC897 nvme-multipath: find NUMA path only for online numa-node nvme: adjust multiples of NVME_CTRL_PAGE_SIZE in offset platform/x86: touchscreen_dmi: Add info for GlobalSpace SolT IVW 11.6" tablet platform/x86: touchscreen_dmi: Add info for the EZpad 6s Pro nvmet: fix a possible leak when destroy a ctrl during qp establishment kbuild: fix short log for AS in link-vmlinux.sh nilfs2: fix incorrect inode allocation from reserved inodes mm: prevent derefencing NULL ptr in pfn_section_valid() filelock: fix potential use-after-free in posix_lock_inode fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading vfs: don't mod negative dentry count when on shrinker list tcp: fix incorrect undo caused by DSACK of TLP retransmit octeontx2-af: Fix incorrect value output on error path in rvu_check_rsrc_availability() net: lantiq_etop: add blank line after declaration net: ethernet: lantiq_etop: fix double free in detach ppp: reject claimed-as-LCP but actually malformed packets ethtool: netlink: do not return SQI value if link is down udp: Set SOCK_RCU_FREE earlier in udp_lib_get_port(). net/sched: Fix UAF when resolving a clash s390: Mark psw in __load_psw_mask() as __unitialized ARM: davinci: Convert comma to semicolon octeontx2-af: fix detection of IP layer tcp: use signed arithmetic in tcp_rtx_probe0_timed_out() tcp: avoid too many retransmit packets net: ks8851: Fix potential TX stall after interface reopen USB: serial: option: add Telit generic core-dump composition USB: serial: option: add Telit FN912 rmnet compositions USB: serial: option: add Fibocom FM350-GL USB: serial: option: add support for Foxconn T99W651 USB: serial: option: add Netprisma LCUK54 series modules USB: serial: option: add Rolling RW350-GL variants USB: serial: mos7840: fix crash on resume USB: Add USB_QUIRK_NO_SET_INTF quirk for START BP-850k usb: gadget: configfs: Prevent OOB read/write in usb_string_copy() USB: core: Fix duplicate endpoint bug by clearing reserved bits in the descriptor hpet: Support 32-bit userspace nvmem: meson-efuse: Fix return value of nvmem callbacks ALSA: hda/realtek: Enable Mute LED on HP 250 G7 ALSA: hda/realtek: Limit mic boost on VAIO PRO PX libceph: fix race between delayed_work() and ceph_monc_stop() wireguard: allowedips: avoid unaligned 64-bit memory accesses wireguard: queueing: annotate intentional data race in cpu round robin wireguard: send: annotate intentional data race in checking empty queue x86/retpoline: Move a NOENDBR annotation to the SRSO dummy return thunk efi: ia64: move IA64-only declarations to new asm/efi.h header ipv6: annotate data-races around cnf.disable_ipv6 ipv6: prevent NULL dereference in ip6_output() bpf: Allow reads from uninit stack nilfs2: fix kernel bug on rename operation of broken directory i2c: rcar: bring hardware to known state when probing i2c: mark HostNotify target address as used i2c: rcar: Add R-Car Gen4 support i2c: rcar: reset controller is mandatory for Gen3+ i2c: rcar: introduce Gen4 devices i2c: rcar: ensure Gen3+ reset does not disturb local targets i2c: rcar: clear NO_RXDMA flag after resetting i2c: rcar: fix error code in probe() Linux 5.10.222 Change-Id: I39dedaef039a49c1b8b53dd83b83d481593ffb95 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Jinliang Zheng
|
f033241a7c |
mm: optimize the redundant loop of mm_update_owner_next()
commit cf3f9a593dab87a032d2b6a6fb205e7f3de4f0a1 upstream. When mm_update_owner_next() is racing with swapoff (try_to_unuse()) or /proc or ptrace or page migration (get_task_mm()), it is impossible to find an appropriate task_struct in the loop whose mm_struct is the same as the target mm_struct. If the above race condition is combined with the stress-ng-zombie and stress-ng-dup tests, such a long loop can easily cause a Hard Lockup in write_lock_irq() for tasklist_lock. Recognize this situation in advance and exit early. Link: https://lkml.kernel.org/r/20240620122123.3877432-1-alexjlzheng@tencent.com Signed-off-by: Jinliang Zheng <alexjlzheng@tencent.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tycho Andersen <tandersen@netflix.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
qiwu.chen
|
ca7dabaf67 |
ANDROID: Add vendor hook for task exiting routine
Currently, there are various global init exit issues encountered on Andriod/linux system. It's hard to debug these issues on product environment without a usable coredump. For example, it's hard to get the root cause why global init task exited from the below kmsg: [ 4.696032][T400001] e2fsck: /dev/block/by-name/metadata: clean, 35/8192 files, 2083/8192 blocks [ 4.696783][T500326] [bq27z561] fg_debug_dump_regs: slave_dump:Reg[0x0073] = 0x05C5 [ 4.700583][T400001] EXT4-fs (sdc17): mounted filesystem with ordered data mode. Opts: discard [ 4.706445][T400001] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00007f0000 [ 4.706459][T400001] CPU: 4 PID: 1 Comm: init Tainted: G S W 5.10.136-android12-9-00005-gf9a66cbe7091-ab9177899 #1 [ 4.706464][T400001] Hardware name: MT6983Z/TCZA (DT) [ 4.706469][T400001] Call trace: [ 4.706482][T400001] dump_backtrace.cfi_jt+0x0/0x8 [ 4.706493][T400001] dump_stack_lvl+0xc4/0x140 [ 4.706504][T400001] panic+0x178/0x464 [ 4.706511][T400001] do_exit+0xb30/0xf9c [ 4.706517][T400001] do_group_exit+0x130/0x1c8 [ 4.706523][T400001] do_group_exit+0x0/0x1c8 [ 4.706529][T400001] __do_sys_exit_group+0x0/0x18 [ 4.706535][T400001] __se_sys_exit_group+0x0/0x14 [ 4.706543][T400001] el0_svc_common+0xd4/0x270 [ 4.706551][T400001] el0_svc+0x28/0x88 [ 4.706559][T400001] el0_sync_handler+0x8c/0xf0 [ 4.706567][T400001] el0_sync+0x1b4/0x1c0 Add hook for task exiting routine, while will be helpful for OEMs to get the reason of any exiting task to be noticed such as dump last exit thread executable sections and registers info. Bug: 324013972 Link: https://lore.kernel.org/lkml/20231110032043.34516-1-qiwu.chen@transsion.com/T/ Change-Id: Ibb7c9012af18b99a1bb458d236f166e6450241c3 Signed-off-by: qiwu.chen <qiwu.chen@transsion.com> |
||
Greg Kroah-Hartman
|
0ddb73d446 |
Merge 5.10.166 into android12-5.10-lts
Changes in 5.10.166 clk: generalize devm_clk_get() a bit clk: Provide new devm_clk helpers for prepared and enabled clocks memory: atmel-sdramc: Fix missing clk_disable_unprepare in atmel_ramc_probe() memory: mvebu-devbus: Fix missing clk_disable_unprepare in mvebu_devbus_probe() ARM: dts: imx6ul-pico-dwarf: Use 'clock-frequency' ARM: dts: imx7d-pico: Use 'clock-frequency' ARM: dts: imx6qdl-gw560x: Remove incorrect 'uart-has-rtscts' arm64: dts: imx8mm-beacon: Fix ecspi2 pinmux ARM: imx: add missing of_node_put() HID: intel_ish-hid: Add check for ishtp_dma_tx_map EDAC/highbank: Fix memory leak in highbank_mc_probe() firmware: arm_scmi: Harden shared memory access in fetch_response firmware: arm_scmi: Harden shared memory access in fetch_notification tomoyo: fix broken dependency on *.conf.default RDMA/core: Fix ib block iterator counter overflow IB/hfi1: Reject a zero-length user expected buffer IB/hfi1: Reserve user expected TIDs IB/hfi1: Fix expected receive setup error exit issues IB/hfi1: Immediately remove invalid memory from hardware IB/hfi1: Remove user expected buffer invalidate race affs: initialize fsdata in affs_truncate() PM: AVS: qcom-cpr: Fix an error handling path in cpr_probe() phy: ti: fix Kconfig warning and operator precedence ARM: dts: at91: sam9x60: fix the ddr clock for sam9x60 amd-xgbe: TX Flow Ctrl Registers are h/w ver dependent amd-xgbe: Delay AN timeout during KR training bpf: Fix pointer-leak due to insufficient speculative store bypass mitigation phy: rockchip-inno-usb2: Fix missing clk_disable_unprepare() in rockchip_usb2phy_power_on() net: nfc: Fix use-after-free in local_cleanup() net: wan: Add checks for NULL for utdm in undo_uhdlc_init and unmap_si_regs gpio: mxc: Always set GPIOs used as interrupt source to INPUT mode wifi: rndis_wlan: Prevent buffer overflow in rndis_query_oid net/sched: sch_taprio: fix possible use-after-free l2tp: Serialize access to sk_user_data with sk_callback_lock l2tp: Don't sleep and disable BH under writer-side sk_callback_lock l2tp: convert l2tp_tunnel_list to idr l2tp: close all race conditions in l2tp_tunnel_register() net: usb: sr9700: Handle negative len net: mdio: validate parameter addr in mdiobus_get_phy() HID: check empty report_list in hid_validate_values() HID: check empty report_list in bigben_probe() net: stmmac: fix invalid call to mdiobus_get_phy() HID: revert CHERRY_MOUSE_000C quirk usb: gadget: f_fs: Prevent race during ffs_ep0_queue_wait usb: gadget: f_fs: Ensure ep0req is dequeued before free_request net: mlx5: eliminate anonymous module_init & module_exit drm/panfrost: fix GENERIC_ATOMIC64 dependency dmaengine: Fix double increment of client_count in dma_chan_get() net: macb: fix PTP TX timestamp failure due to packet padding l2tp: prevent lockdep issue in l2tp_tunnel_register() HID: betop: check shape of output reports dmaengine: xilinx_dma: call of_node_put() when breaking out of for_each_child_of_node() nvme-pci: fix timeout request state check tcp: avoid the lookup process failing to get sk in ehash table w1: fix deadloop in __w1_remove_master_device() w1: fix WARNING after calling w1_process() driver core: Fix test_async_probe_init saves device in wrong array net: dsa: microchip: ksz9477: port map correction in ALU table entry register tcp: fix rate_app_limited to default to 1 scsi: iscsi: Fix multiple iSCSI session unbind events sent to userspace cpufreq: Add Tegra234 to cpufreq-dt-platdev blocklist kcsan: test: don't put the expect array on the stack ASoC: fsl_micfil: Correct the number of steps on SX controls drm: Add orientation quirk for Lenovo ideapad D330-10IGL s390/debug: add _ASM_S390_ prefix to header guard cpufreq: armada-37xx: stop using 0 as NULL pointer ASoC: fsl_ssi: Rename AC'97 streams to avoid collisions with AC'97 CODEC ASoC: fsl-asoc-card: Fix naming of AC'97 CODEC widgets spi: spidev: remove debug messages that access spidev->spi without locking KVM: s390: interrupt: use READ_ONCE() before cmpxchg() scsi: hisi_sas: Set a port invalid only if there are no devices attached when refreshing port id platform/x86: touchscreen_dmi: Add info for the CSL Panther Tab HD platform/x86: asus-nb-wmi: Add alternate mapping for KEY_SCREENLOCK lockref: stop doing cpu_relax in the cmpxchg loop Revert "selftests/bpf: check null propagation only neither reg is PTR_TO_BTF_ID" netfilter: conntrack: do not renew entry stuck in tcp SYN_SENT state x86: ACPI: cstate: Optimize C3 entry on AMD CPUs fs: reiserfs: remove useless new_opts in reiserfs_remount sysctl: add a new register_sysctl_init() interface kernel/panic: move panic sysctls to its own file panic: unset panic_on_warn inside panic() ubsan: no need to unset panic_on_warn in ubsan_epilogue() kasan: no need to unset panic_on_warn in end_report() exit: Add and use make_task_dead. objtool: Add a missing comma to avoid string concatenation hexagon: Fix function name in die() h8300: Fix build errors from do_exit() to make_task_dead() transition csky: Fix function name in csky_alignment() and die() ia64: make IA64_MCA_RECOVERY bool instead of tristate panic: Separate sysctl logic from CONFIG_SMP exit: Put an upper limit on how often we can oops exit: Expose "oops_count" to sysfs exit: Allow oops_limit to be disabled panic: Consolidate open-coded panic_on_warn checks panic: Introduce warn_limit panic: Expose "warn_count" to sysfs docs: Fix path paste-o for /sys/kernel/warn_count exit: Use READ_ONCE() for all oops/warn limit reads Bluetooth: hci_sync: cancel cmd_timer if hci_open failed xhci: Set HCD flag to defer primary roothub registration scsi: hpsa: Fix allocation size for scsi_host_alloc() module: Don't wait for GOING modules tracing: Make sure trace_printk() can output as soon as it can be used trace_events_hist: add check for return value of 'create_hist_field' ftrace/scripts: Update the instructions for ftrace-bisect.sh cifs: Fix oops due to uncleared server->smbd_conn in reconnect KVM: x86/vmx: Do not skip segment attributes if unusable bit is set thermal: intel: int340x: Protect trip temperature from concurrent updates ARM: 9280/1: mm: fix warning on phys_addr_t to void pointer assignment EDAC/device: Respect any driver-supplied workqueue polling value EDAC/qcom: Do not pass llcc_driv_data as edac_device_ctl_info's pvt_info units: Add Watt units units: Add SI metric prefix definitions i2c: designware: Use DIV_ROUND_CLOSEST() macro i2c: designware: use casting of u64 in clock multiplication to avoid overflow netlink: prevent potential spectre v1 gadgets net: fix UaF in netns ops registration error path netfilter: nft_set_rbtree: Switch to node list walk for overlap detection netfilter: nft_set_rbtree: skip elements in transaction from garbage collection netlink: annotate data races around nlk->portid netlink: annotate data races around dst_portid and dst_group netlink: annotate data races around sk_state ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() ipv4: prevent potential spectre v1 gadget in fib_metrics_match() netfilter: conntrack: fix vtag checks for ABORT/SHUTDOWN_COMPLETE netrom: Fix use-after-free of a listening socket. net/sched: sch_taprio: do not schedule in taprio_reset() sctp: fail if no bound addresses can be used for a given scope net: ravb: Fix possible hang if RIS2_QFF1 happen thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type() net/tg3: resolve deadlock in tg3_reset_task() during EEH net: mdio-mux-meson-g12a: force internal PHY off on mux switch tools: gpio: fix -c option of gpio-event-mon Revert "Input: synaptics - switch touchpad on HP Laptop 15-da3001TU to RMI mode" nouveau: explicitly wait on the fence in nouveau_bo_move_m2mf nfsd: Ensure knfsd shuts down when the "nfsd" pseudofs is unmounted Revert "selftests/ftrace: Update synthetic event syntax errors" block: fix and cleanup bio_check_ro x86/i8259: Mark legacy PIC interrupts with IRQ_LEVEL netfilter: conntrack: unify established states for SCTP paths perf/x86/amd: fix potential integer overflow on shift of a int clk: Fix pointer casting to prevent oops in devm_clk_release() Linux 5.10.166 Change-Id: Ibf582f7504221c6ee1648da95c49b45e3678708c Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Greg Kroah-Hartman
|
8596b99884 |
Merge 5.10.162 into android12-5.10-lts
Changes in 5.10.162 kernel: provide create_io_thread() helper iov_iter: add helper to save iov_iter state saner calling conventions for unlazy_child() fs: add support for LOOKUP_CACHED fix handling of nd->depth on LOOKUP_CACHED failures in try_to_unlazy* Make sure nd->path.mnt and nd->path.dentry are always valid pointers fs: expose LOOKUP_CACHED through openat2() RESOLVE_CACHED tools headers UAPI: Sync openat2.h with the kernel sources net: provide __sys_shutdown_sock() that takes a socket net: add accept helper not installing fd signal: Add task_sigpending() helper fs: make do_renameat2() take struct filename file: Rename __close_fd_get_file close_fd_get_file fs: provide locked helper variant of close_fd_get_file() entry: Add support for TIF_NOTIFY_SIGNAL task_work: Use TIF_NOTIFY_SIGNAL if available x86: Wire up TIF_NOTIFY_SIGNAL arc: add support for TIF_NOTIFY_SIGNAL arm64: add support for TIF_NOTIFY_SIGNAL m68k: add support for TIF_NOTIFY_SIGNAL nios32: add support for TIF_NOTIFY_SIGNAL parisc: add support for TIF_NOTIFY_SIGNAL powerpc: add support for TIF_NOTIFY_SIGNAL mips: add support for TIF_NOTIFY_SIGNAL s390: add support for TIF_NOTIFY_SIGNAL um: add support for TIF_NOTIFY_SIGNAL sh: add support for TIF_NOTIFY_SIGNAL openrisc: add support for TIF_NOTIFY_SIGNAL csky: add support for TIF_NOTIFY_SIGNAL hexagon: add support for TIF_NOTIFY_SIGNAL microblaze: add support for TIF_NOTIFY_SIGNAL arm: add support for TIF_NOTIFY_SIGNAL xtensa: add support for TIF_NOTIFY_SIGNAL alpha: add support for TIF_NOTIFY_SIGNAL c6x: add support for TIF_NOTIFY_SIGNAL h8300: add support for TIF_NOTIFY_SIGNAL ia64: add support for TIF_NOTIFY_SIGNAL nds32: add support for TIF_NOTIFY_SIGNAL riscv: add support for TIF_NOTIFY_SIGNAL sparc: add support for TIF_NOTIFY_SIGNAL ia64: don't call handle_signal() unless there's actually a signal queued ARC: unbork 5.11 bootup: fix snafu in _TIF_NOTIFY_SIGNAL handling alpha: fix TIF_NOTIFY_SIGNAL handling task_work: remove legacy TWA_SIGNAL path kernel: remove checking for TIF_NOTIFY_SIGNAL coredump: Limit what can interrupt coredumps kernel: allow fork with TIF_NOTIFY_SIGNAL pending entry/kvm: Exit to user mode when TIF_NOTIFY_SIGNAL is set arch: setup PF_IO_WORKER threads like PF_KTHREAD arch: ensure parisc/powerpc handle PF_IO_WORKER in copy_thread() x86/process: setup io_threads more like normal user space threads kernel: stop masking signals in create_io_thread() kernel: don't call do_exit() for PF_IO_WORKER threads task_work: add helper for more targeted task_work canceling io_uring: import 5.15-stable io_uring signal: kill JOBCTL_TASK_WORK task_work: unconditionally run task_work from get_signal() net: remove cmsg restriction from io_uring based send/recvmsg calls Revert "proc: don't allow async path resolution of /proc/thread-self components" Revert "proc: don't allow async path resolution of /proc/self components" eventpoll: add EPOLL_URING_WAKE poll wakeup flag eventfd: provide a eventfd_signal_mask() helper io_uring: pass in EPOLL_URING_WAKE for eventfd signaling and wakeups Linux 5.10.162 Change-Id: I50a7b8bc8d38fac612113281b218cf5323b0af5e Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
Kees Cook
|
b98a8b731b |
exit: Use READ_ONCE() for all oops/warn limit reads
commit 7535b832c6399b5ebfc5b53af5c51dd915ee2538 upstream. Use a temporary variable to take full advantage of READ_ONCE() behavior. Without this, the report (and even the test) might be out of sync with the initial test. Reported-by: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/lkml/Y5x7GXeluFmZ8E0E@hirez.programming.kicks-ass.net Fixes: 9fc9e278a5c0 ("panic: Introduce warn_limit") Fixes: d4ccd54d28d3 ("exit: Put an upper limit on how often we can oops") Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jann Horn <jannh@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Petr Mladek <pmladek@suse.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Marco Elver <elver@google.com> Cc: tangmeng <tangmeng@uniontech.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Kees Cook
|
530cdae5c2 |
exit: Allow oops_limit to be disabled
commit de92f65719cd672f4b48397540b9f9eff67eca40 upstream. In preparation for keeping oops_limit logic in sync with warn_limit, have oops_limit == 0 disable checking the Oops counter. Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Eric Biggers <ebiggers@google.com> Cc: Huang Ying <ying.huang@intel.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: linux-doc@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Kees Cook
|
7cffbcd68f |
exit: Expose "oops_count" to sysfs
commit 9db89b41117024f80b38b15954017fb293133364 upstream. Since Oops count is now tracked and is a fairly interesting signal, add the entry /sys/kernel/oops_count to expose it to userspace. Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Jann Horn <jannh@google.com> Cc: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221117234328.594699-3-keescook@chromium.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Jann Horn
|
de586785b9 |
exit: Put an upper limit on how often we can oops
commit d4ccd54d28d3c8598e2354acc13e28c060961dbb upstream. Many Linux systems are configured to not panic on oops; but allowing an attacker to oops the system **really** often can make even bugs that look completely unexploitable exploitable (like NULL dereferences and such) if each crash elevates a refcount by one or a lock is taken in read mode, and this causes a counter to eventually overflow. The most interesting counters for this are 32 bits wide (like open-coded refcounts that don't use refcount_t). (The ldsem reader count on 32-bit platforms is just 16 bits, but probably nobody cares about 32-bit platforms that much nowadays.) So let's panic the system if the kernel is constantly oopsing. The speed of oopsing 2^32 times probably depends on several factors, like how long the stack trace is and which unwinder you're using; an empirically important one is whether your console is showing a graphical environment or a text console that oopses will be printed to. In a quick single-threaded benchmark, it looks like oopsing in a vfork() child with a very short stack trace only takes ~510 microseconds per run when a graphical console is active; but switching to a text console that oopses are printed to slows it down around 87x, to ~45 milliseconds per run. (Adding more threads makes this faster, but the actual oops printing happens under &die_lock on x86, so you can maybe speed this up by a factor of around 2 and then any further improvement gets eaten up by lock contention.) It looks like it would take around 8-12 days to overflow a 32-bit counter with repeated oopsing on a multi-core X86 system running a graphical environment; both me (in an X86 VM) and Seth (with a distro kernel on normal hardware in a standard configuration) got numbers in that ballpark. 12 days aren't *that* short on a desktop system, and you'd likely need much longer on a typical server system (assuming that people don't run graphical desktop environments on their servers), and this is a *very* noisy and violent approach to exploiting the kernel; and it also seems to take orders of magnitude longer on some machines, probably because stuff like EFI pstore will slow it down a ton if that's active. Signed-off-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/r/20221107201317.324457-1-jannh@google.com Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20221117234328.594699-2-keescook@chromium.org Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Eric W. Biederman
|
d9c740c765 |
exit: Add and use make_task_dead.
commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream. There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
Jens Axboe
|
788d082426 |
io_uring: import 5.15-stable io_uring
No upstream commit exists. This imports the io_uring codebase from 5.15.85, wholesale. Changes from that code base: - Drop IOCB_ALLOC_CACHE, we don't have that in 5.10. - Drop MKDIRAT/SYMLINKAT/LINKAT. Would require further VFS backports, and we don't support these in 5.10 to begin with. - sock_from_file() old style calling convention. - Use compat_get_bitmap() only for CONFIG_COMPAT=y Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Greg Kroah-Hartman
|
0c724b692d |
This is the 5.10.132 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmLZpvwACgkQONu9yGCS aT43nBAAhxJzkIcRI/641//eBLQrmbeNsS4TerYlpPIJAXwfXlF6KX6Ixl0rYcp/ GUid3QlXyDG4TTUB519M1FpaknDGq5vUCzNik82AogzMFLf/KWP6urx4FSeZCt1D xAdYQHHWKFiyNUlqjT22dPM3/QR1D0BtUKE6QLUdWWhyc1W+gvYx1m10GG6O1z55 eljZScRYvaacvVZ4LiN0ClU9J0n16SqfTg8/jEASr+3yqe4ZKdzFdngGlJrWUCZa SrR5ijscqoIQ5yTSA5DUZ/N4aAeTgSSXcMfXeZh1CoD4Ak87e2kwBHZAUWQWJrEe 0nfILwU0okZmEOKtwCtYz0iwfFEfB/wKwrZjJ0jV03dL3Ncm7ddj2bQDk0+fLDYZ AEjflhLZfusQEprM+jr0Qx9UlJo1TA4KssRn1A+cfocKvhfTrVneWO5LcR1Jf6Gq 9z7lgh8iRs4ncEfqh2cCRcSpIJLlPOmACmtA4eD2tk7heGRhfBpL9Hv2KBCHss5o iMaqRsvVXFZn2KCxZFOR4l0cQvkKkxHWBjiVxrYTV5SrELJ4d2DBc7r93a5vM7W/ tKKGi0IG+0V7fgHvKrRDVZnYWV05NEbit0xd0lgY5YZsOuJIVy024YayuWDFxT5S xulwcoSzAQiWnhqtrsD9eqWotA1E8i9wCuWHEAPMRmnzTBdENTA= =cxaP -----END PGP SIGNATURE----- Merge 5.10.132 into android12-5.10-lts Changes in 5.10.132 ALSA: hda - Add fixup for Dell Latitidue E5430 ALSA: hda/conexant: Apply quirk for another HP ProDesk 600 G3 model ALSA: hda/realtek: Fix headset mic for Acer SF313-51 ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc671 ALSA: hda/realtek - Fix headset mic problem for a HP machine with alc221 ALSA: hda/realtek - Enable the headset-mic on a Xiaomi's laptop xen/netback: avoid entering xenvif_rx_next_skb() with an empty rx queue fix race between exit_itimers() and /proc/pid/timers mm: split huge PUD on wp_huge_pud fallback tracing/histograms: Fix memory leak problem net: sock: tracing: Fix sock_exceed_buf_limit not to dereference stale pointer ip: fix dflt addr selection for connected nexthop ARM: 9213/1: Print message about disabled Spectre workarounds only once ARM: 9214/1: alignment: advance IT state after emulating Thumb instruction wifi: mac80211: fix queue selection for mesh/OCB interfaces cgroup: Use separate src/dst nodes when preloading css_sets for migration btrfs: return -EAGAIN for NOWAIT dio reads/writes on compressed and inline extents drm/panfrost: Put mapping instead of shmem obj on panfrost_mmu_map_fault_addr() error drm/panfrost: Fix shrinker list corruption by madvise IOCTL fs/remap: constrain dedupe of EOF blocks nilfs2: fix incorrect masking of permission flags for symlinks sh: convert nommu io{re,un}map() to static inline functions Revert "evm: Fix memleak in init_desc" ext4: fix race condition between ext4_write and ext4_convert_inline_data ARM: dts: imx6qdl-ts7970: Fix ngpio typo and count spi: amd: Limit max transfer and message size ARM: 9209/1: Spectre-BHB: avoid pr_info() every time a CPU comes out of idle ARM: 9210/1: Mark the FDT_FIXED sections as shareable net/mlx5e: kTLS, Fix build time constant test in TX net/mlx5e: kTLS, Fix build time constant test in RX net/mlx5e: Fix capability check for updating vnic env counters drm/i915: fix a possible refcount leak in intel_dp_add_mst_connector() ima: Fix a potential integer overflow in ima_appraise_measurement ASoC: sgtl5000: Fix noise on shutdown/remove ASoC: tas2764: Add post reset delays ASoC: tas2764: Fix and extend FSYNC polarity handling ASoC: tas2764: Correct playback volume range ASoC: tas2764: Fix amp gain register offset & default ASoC: Intel: Skylake: Correct the ssp rate discovery in skl_get_ssp_clks() ASoC: Intel: Skylake: Correct the handling of fmt_config flexible array net: stmmac: dwc-qos: Disable split header for Tegra194 sysctl: Fix data races in proc_dointvec(). sysctl: Fix data races in proc_douintvec(). sysctl: Fix data races in proc_dointvec_minmax(). sysctl: Fix data races in proc_douintvec_minmax(). sysctl: Fix data races in proc_doulongvec_minmax(). sysctl: Fix data races in proc_dointvec_jiffies(). tcp: Fix a data-race around sysctl_tcp_max_orphans. inetpeer: Fix data-races around sysctl. net: Fix data-races around sysctl_mem. cipso: Fix data-races around sysctl. icmp: Fix data-races around sysctl. ipv4: Fix a data-race around sysctl_fib_sync_mem. ARM: dts: at91: sama5d2: Fix typo in i2s1 node ARM: dts: sunxi: Fix SPI NOR campatible on Orange Pi Zero drm/i915/selftests: fix a couple IS_ERR() vs NULL tests drm/i915/gt: Serialize TLB invalidates with GT resets sysctl: Fix data-races in proc_dointvec_ms_jiffies(). icmp: Fix a data-race around sysctl_icmp_ratelimit. icmp: Fix a data-race around sysctl_icmp_ratemask. raw: Fix a data-race around sysctl_raw_l3mdev_accept. ipv4: Fix data-races around sysctl_ip_dynaddr. nexthop: Fix data-races around nexthop_compat_mode. net: ftgmac100: Hold reference returned by of_get_child_by_name() ima: force signature verification when CONFIG_KEXEC_SIG is configured ima: Fix potential memory leak in ima_init_crypto() sfc: fix use after free when disabling sriov seg6: fix skb checksum evaluation in SRH encapsulation/insertion seg6: fix skb checksum in SRv6 End.B6 and End.B6.Encaps behaviors seg6: bpf: fix skb checksum in bpf_push_seg6_encap() sfc: fix kernel panic when creating VF net: atlantic: remove deep parameter on suspend/resume functions net: atlantic: remove aq_nic_deinit() when resume KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() net/tls: Check for errors in tls_device_init mm: sysctl: fix missing numa_stat when !CONFIG_HUGETLB_PAGE virtio_mmio: Add missing PM calls to freeze/restore virtio_mmio: Restore guest page size on resume netfilter: br_netfilter: do not skip all hooks with 0 priority scsi: hisi_sas: Limit max hw sectors for v3 HW cpufreq: pmac32-cpufreq: Fix refcount leak bug platform/x86: hp-wmi: Ignore Sanitization Mode event net: tipc: fix possible refcount leak in tipc_sk_create() NFC: nxp-nci: don't print header length mismatch on i2c error nvme-tcp: always fail a request when sending it failed nvme: fix regression when disconnect a recovering ctrl net: sfp: fix memory leak in sfp_probe() ASoC: ops: Fix off by one in range control validation pinctrl: aspeed: Fix potential NULL dereference in aspeed_pinmux_set_mux() ASoC: SOF: Intel: hda-loader: Clarify the cl_dsp_init() flow ASoC: wm5110: Fix DRE control ASoC: dapm: Initialise kcontrol data for mux/demux controls ASoC: cs47l15: Fix event generation for low power mux control ASoC: madera: Fix event generation for OUT1 demux ASoC: madera: Fix event generation for rate controls irqchip: or1k-pic: Undefine mask_ack for level triggered hardware x86: Clear .brk area at early boot soc: ixp4xx/npe: Fix unused match warning ARM: dts: stm32: use the correct clock source for CEC on stm32mp151 Revert "can: xilinx_can: Limit CANFD brp to 2" nvme-pci: phison e16 has bogus namespace ids signal handling: don't use BUG_ON() for debugging USB: serial: ftdi_sio: add Belimo device ids usb: typec: add missing uevent when partner support PD usb: dwc3: gadget: Fix event pending check tty: serial: samsung_tty: set dma burst_size to 1 vt: fix memory overlapping when deleting chars in the buffer serial: 8250: fix return error code in serial8250_request_std_resource() serial: stm32: Clear prev values before setting RTS delays serial: pl011: UPSTAT_AUTORTS requires .throttle/unthrottle serial: 8250: Fix PM usage_count for console handover x86/pat: Fix x86_has_pat_wp() Linux 5.10.132 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> Change-Id: I450f357105f90b1b9549dea5de62dc9a160d4ba9 |
||
Oleg Nesterov
|
91530f675e |
fix race between exit_itimers() and /proc/pid/timers
commit d5b36a4dbd06c5e8e36ca8ccc552f679069e2946 upstream. As Chris explains, the comment above exit_itimers() is not correct, we can race with proc_timers_seq_ops. Change exit_itimers() to clear signal->posix_timers with ->siglock held. Cc: <stable@vger.kernel.org> Reported-by: chris@accessvector.net Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Liujie Xie
|
24149445ad |
ANDROID: vendor_hooks: Add hooks for memory when debug
Add vendors hooks for recording memory used Bug: 182443489 Signed-off-by: Liujie Xie <xieliujie@oppo.com> Change-Id: I62d8bb2b6650d8b187b433f97eb833ef0b784df1 |
||
Pavel Begunkov
|
7bf3fb6243 |
kernel/io_uring: cancel io_uring before task works
[ Upstream commit b1b6b5a30dce872f500dc43f067cba8e7f86fc7d ] For cancelling io_uring requests it needs either to be able to run currently enqueued task_works or having it shut down by that moment. Otherwise io_uring_cancel_files() may be waiting for requests that won't ever complete. Go with the first way and do cancellations before setting PF_EXITING and so before putting the task_work infrastructure into a transition state where task_work_run() would better not be called. Cc: stable@vger.kernel.org # 5.5+ Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
Al Viro
|
77f6ab8b77 |
don't dump the threads that had been already exiting when zapped.
Coredump logics needs to report not only the registers of the dumping thread, but (since 2.5.43) those of other threads getting killed. Doing that might require extra state saved on the stack in asm glue at kernel entry; signal delivery logics does that (we need to be able to save sigcontext there, at the very least) and so does seccomp. That covers all callers of do_coredump(). Secondary threads get hit with SIGKILL and caught as soon as they reach exit_mm(), which normally happens in signal delivery, so those are also fine most of the time. Unfortunately, it is possible to end up with secondary zapped when it has already entered exit(2) (or, worse yet, is oopsing). In those cases we reach exit_mm() when mm->core_state is already set, but the stack contents is not what we would have in signal delivery. At least on two architectures (alpha and m68k) it leads to infoleaks - we end up with a chunk of kernel stack written into coredump, with the contents consisting of normal C stack frames of the call chain leading to exit_mm() instead of the expected copy of userland registers. In case of alpha we leak 312 bytes of stack. Other architectures (including the regset-using ones) might have similar problems - the normal user of regsets is ptrace and the state of tracee at the time of such calls is special in the same way signal delivery is. Note that had the zapper gotten to the exiting thread slightly later, it wouldn't have been included into coredump anyway - we skip the threads that have already cleared their ->mm. So let's pretend that zapper always loses the race. IOW, have exit_mm() only insert into the dumper list if we'd gotten there from handling a fatal signal[*] As the result, the callers of do_exit() that have *not* gone through get_signal() are not seen by coredump logics as secondary threads. Which excludes voluntary exit()/oopsen/traps/etc. The dumper thread itself is unaffected by that, so seccomp is fine. [*] originally I intended to add a new flag in tsk->flags, but ebiederman pointed out that PF_SIGNALED is already doing just what we need. Cc: stable@vger.kernel.org Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3") History-tree: https://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
||
Minchan Kim
|
1aa92cd31c |
pid: move pidfd_get_pid() to pid.c
process_madvise syscall needs pidfd_get_pid function to translate pidfd to pid so this patch move the function to kernel/pid.c. Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jann Horn <jannh@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-5-minchan@kernel.org Link: http://lkml.kernel.org/r/20200622192900.22757-3-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-3-minchan@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Christian Brauner
|
ba7d25f3df
|
exit: support non-blocking pidfds
Passing a non-blocking pidfd to waitid() currently has no effect, i.e. is not supported. There are users which would like to use waitid() on pidfds that are O_NONBLOCK and mix it with pidfds that are blocking and both pass them to waitid(). The expected behavior is to have waitid() return -EAGAIN for non-blocking pidfds and to block for blocking pidfds without needing to perform any additional checks for flags set on the pidfd before passing it to waitid(). Non-blocking pidfds will return EAGAIN from waitid() when no child process is ready yet. Returning -EAGAIN for non-blocking pidfds makes it easier for event loops that handle EAGAIN specially. It also makes the API more consistent and uniform. In essence, waitid() is treated like a read on a non-blocking pidfd or a recvmsg() on a non-blocking socket. With the addition of support for non-blocking pidfds we support the same functionality that sockets do. For sockets() recvmsg() supports MSG_DONTWAIT for pidfds waitid() supports WNOHANG. Both flags are per-call options. In contrast non-blocking pidfds and non-blocking sockets are a setting on an open file description affecting all threads in the calling process as well as other processes that hold file descriptors referring to the same open file description. Both behaviors, per call and per open file description, have genuine use-cases. The implementation should be straightforward: - If a non-blocking pidfd is passed and WNOHANG is not raised we simply raise the WNOHANG flag internally. When do_wait() returns indicating that there are eligible child processes but none have exited yet we set EAGAIN. If no child process exists we continue returning ECHILD. - If a non-blocking pidfd is passed and WNOHANG is raised waitid() will continue returning 0, i.e. it will not set EAGAIN. This ensure backwards compatibility with applications passing WNOHANG explicitly with pidfds. A concrete use-case that was brought on-list was Josh's async pidfd library. Ever since the introduction of pidfds and more advanced async io various programming languages such as Rust have grown support for async event libraries. These libraries are created to help build epoll-based event loops around file descriptors. A common pattern is to automatically make all file descriptors they manage to O_NONBLOCK. For such libraries the EAGAIN error code is treated specially. When a function is called that returns EAGAIN the function isn't called again until the event loop indicates the the file descriptor is ready. Supporting EAGAIN when waiting on pidfds makes such libraries just work with little effort. Suggested-by: Josh Triplett <josh@joshtriplett.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Reviewed-by: Josh Triplett <josh@joshtriplett.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <keescook@chromium.org> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Jann Horn <jannh@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org> Link: https://lore.kernel.org/lkml/20200811181236.GA18763@localhost/ Link: https://github.com/joshtriplett/async-pidfd Link: https://lore.kernel.org/r/20200902102130.147672-3-christian.brauner@ubuntu.com |
||
Christoph Hellwig
|
8043fc147a |
kernel: add a kernel_wait helper
Add a helper that waits for a pid and stores the status in the passed in kernel pointer. Use it to fix the usage of kernel_wait4 in call_usermodehelper_exec_sync that only happens to work due to the implicit set_fs(KERNEL_DS) for kernel threads. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Link: http://lkml.kernel.org/r/20200721130449.5008-1-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Christoph Hellwig
|
fe81417596 |
exec: use force_uaccess_begin during exec and exit
Both exec and exit want to ensure that the uaccess routines actually do access user pointers. Use the newly added force_uaccess_begin helper instead of an open coded set_fs for that to prepare for kernel builds where set_fs() does not exist. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Link: http://lkml.kernel.org/r/20200710135706.537715-7-hch@lst.de Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
3950e97543 |
Merge branch 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull execve updates from Eric Biederman: "During the development of v5.7 I ran into bugs and quality of implementation issues related to exec that could not be easily fixed because of the way exec is implemented. So I have been diggin into exec and cleaning up what I can. This cycle I have been looking at different ideas and different implementations to see what is possible to improve exec, and cleaning the way exec interfaces with in kernel users. Only cleaning up the interfaces of exec with rest of the kernel has managed to stabalize and make it through review in time for v5.9-rc1 resulting in 2 sets of changes this cycle. - Implement kernel_execve - Make the user mode driver code a better citizen With kernel_execve the code size got a little larger as the copying of parameters from userspace and copying of parameters from userspace is now separate. The good news is kernel threads no longer need to play games with set_fs to use exec. Which when combined with the rest of Christophs set_fs changes should security bugs with set_fs much more difficult" * 'exec-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (23 commits) exec: Implement kernel_execve exec: Factor bprm_stack_limits out of prepare_arg_pages exec: Factor bprm_execve out of do_execve_common exec: Move bprm_mm_init into alloc_bprm exec: Move initialization of bprm->filename into alloc_bprm exec: Factor out alloc_bprm exec: Remove unnecessary spaces from binfmts.h umd: Stop using split_argv umd: Remove exit_umh bpfilter: Take advantage of the facilities of struct pid exit: Factor thread_group_exited out of pidfd_poll umd: Track user space drivers with struct pid bpfilter: Move bpfilter_umh back into init data exec: Remove do_execve_file umh: Stop calling do_execve_file umd: Transform fork_usermode_blob into fork_usermode_driver umd: Rename umd_info.cmdline umd_info.driver_name umd: For clarity rename umh_info umd_info umh: Separate the user mode driver and the user mode helper support umh: Remove call_usermodehelper_setup_file. ... |
||
Linus Torvalds
|
9ecc6ea491 |
seccomp updates for v5.9-rc1
- Improved selftest coverage, timeouts, and reporting - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner) - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers - Introduce "addfd" command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon) -----BEGIN PGP SIGNATURE----- iQJKBAABCgA0FiEEpcP2jyKd1g9yPm4TiXL039xtwCYFAl8oZcQWHGtlZXNjb29r QGNocm9taXVtLm9yZwAKCRCJcvTf3G3AJomDD/4x3j7eXREcXDsHOmlgEaHWGx4l JldHFQhV5GjmD7gOkPcoZSG7NfG7F6VpwAJg7ZoR3qUkem7K8DFucxqgo1RldCot nigleeLX6JeMS0Z+iwjAVZd+5t4xG4J/7GGDHIIMiG5qvwJ0Yf64o1bkjaB2Q/Bv tluBg0WF32kFMG/ZwyY/V2QDbbue97CFPflybOh1o2nWbVzmUlFEEum3UUvZsxc8 smMsattJyuAV7kcEKzKrs8b010NdFZqwdbub5Np9W3XEXGBYMdIPoNsOQGmB9wby j2ui0lzboXRG997jM7TCd1l/XZAv8aAwvPplw3FJRybzkOGs9NDyLMoz87yJpR1T xp511vnMyMbyKIGdungkt7cIyzaictHwaYzznsmuNdCPEjTaIQJr1ctsa4GEgtqf pnkktZ9YbMCcHU0CtZ8GlOVqA9wE+FUm0/u0zgikzJQsB+HcNItiARTTTHRyco7p VJCqK8o4Zx4ELV7QNkSH4nhFkVgRopvrvBiPAGro/qwGOofBg8W8wM8O1+V/MDmp zSU22v4SncT1Xb7dtmdJqDEeHfDikhaCAb4Je2hsGQWzbdAqwHGlpa7vpk9x3Q5r L+XyP+Z+rPHlXYyypJwUvvOQhXOmP0zYxcEHxByqIBfXiwy+3dN4tDDfatWbccwl uTlTDM8kmQn6QzSztA== =yb55 -----END PGP SIGNATURE----- Merge tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp updates from Kees Cook: "There are a bunch of clean ups and selftest improvements along with two major updates to the SECCOMP_RET_USER_NOTIF filter return: EPOLLHUP support to more easily detect the death of a monitored process, and being able to inject fds when intercepting syscalls that expect an fd-opening side-effect (needed by both container folks and Chrome). The latter continued the refactoring of __scm_install_fd() started by Christoph, and in the process found and fixed a handful of bugs in various callers. - Improved selftest coverage, timeouts, and reporting - Add EPOLLHUP support for SECCOMP_RET_USER_NOTIF (Christian Brauner) - Refactor __scm_install_fd() into __receive_fd() and fix buggy callers - Introduce 'addfd' command for SECCOMP_RET_USER_NOTIF (Sargun Dhillon)" * tag 'seccomp-v5.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (30 commits) selftests/seccomp: Test SECCOMP_IOCTL_NOTIF_ADDFD seccomp: Introduce addfd ioctl to seccomp user notifier fs: Expand __receive_fd() to accept existing fd pidfd: Replace open-coded receive_fd() fs: Add receive_fd() wrapper for __receive_fd() fs: Move __scm_install_fd() to __receive_fd() net/scm: Regularize compat handling of scm_detach_fds() pidfd: Add missing sock updates for pidfd_getfd() net/compat: Add missing sock updates for SCM_RIGHTS selftests/seccomp: Check ENOSYS under tracing selftests/seccomp: Refactor to use fixture variants selftests/harness: Clean up kern-doc for fixtures seccomp: Use -1 marker for end of mode 1 syscall list seccomp: Fix ioctl number for SECCOMP_IOCTL_NOTIF_ID_VALID selftests/seccomp: Rename user_trap_syscall() to user_notif_syscall() selftests/seccomp: Make kcmp() less required seccomp: Use pr_fmt selftests/seccomp: Improve calibration loop selftests/seccomp: use 90s as timeout selftests/seccomp: Expand benchmark to per-filter measurements ... |
||
Kees Cook
|
3f649ab728 |
treewide: Remove uninitialized_var() usage
Using uninitialized_var() is dangerous as it papers over real bugs[1] (or can in the future), and suppresses unrelated compiler warnings (e.g. "unused variable"). If the compiler thinks it is uninitialized, either simply initialize the variable or make compiler changes. In preparation for removing[2] the[3] macro[4], remove all remaining needless uses with the following script: git grep '\buninitialized_var\b' | cut -d: -f1 | sort -u | \ xargs perl -pi -e \ 's/\buninitialized_var\(([^\)]+)\)/\1/g; s:\s*/\* (GCC be quiet|to make compiler happy) \*/$::g;' drivers/video/fbdev/riva/riva_hw.c was manually tweaked to avoid pathological white-space. No outstanding warnings were found building allmodconfig with GCC 9.3.0 for x86_64, i386, arm64, arm, powerpc, powerpc64le, s390x, mips, sparc64, alpha, and m68k. [1] https://lore.kernel.org/lkml/20200603174714.192027-1-glider@google.com/ [2] https://lore.kernel.org/lkml/CA+55aFw+Vbj0i=1TGqCR5vQkCzWJ0QxK6CernOU6eedsudAixw@mail.gmail.com/ [3] https://lore.kernel.org/lkml/CA+55aFwgbgqhbp1fkxvRKEpzyR5J8n1vKT1VZdz9knmPuXhOeg@mail.gmail.com/ [4] https://lore.kernel.org/lkml/CA+55aFz2500WfbKXAx8s67wrm9=yVJu65TpLgN_ybYNv0VEOKA@mail.gmail.com/ Reviewed-by: Leon Romanovsky <leonro@mellanox.com> # drivers/infiniband and mlx4/mlx5 Acked-by: Jason Gunthorpe <jgg@mellanox.com> # IB Acked-by: Kalle Valo <kvalo@codeaurora.org> # wireless drivers Reviewed-by: Chao Yu <yuchao0@huawei.com> # erofs Signed-off-by: Kees Cook <keescook@chromium.org> |
||
Christian Brauner
|
3a15fb6ed9 |
seccomp: release filter after task is fully dead
The seccomp filter used to be released in free_task() which is called asynchronously via call_rcu() and assorted mechanisms. Since we need to inform tasks waiting on the seccomp notifier when a filter goes empty we will notify them as soon as a task has been marked fully dead in release_task(). To not split seccomp cleanup into two parts, move filter release out of free_task() and into release_task() after we've unhashed struct task from struct pid, exited signals, and unlinked it from the threadgroups' thread list. We'll put the empty filter notification infrastructure into it in a follow up patch. This also renames put_seccomp_filter() to seccomp_filter_release() which is a more descriptive name of what we're doing here especially once we've added the empty filter notification mechanism in there. We're also NULL-ing the task's filter tree entrypoint which seems cleaner than leaving a dangling pointer in there. Note that this shouldn't need any memory barriers since we're calling this when the task is in release_task() which means it's EXIT_DEAD. So it can't modify its seccomp filters anymore. You can also see this from the point where we're calling seccomp_filter_release(). It's after __exit_signal() and at this point, tsk->sighand will already have been NULLed which is required for thread-sync and filter installation alike. Cc: Tycho Andersen <tycho@tycho.ws> Cc: Kees Cook <keescook@chromium.org> Cc: Matt Denton <mpdenton@google.com> Cc: Sargun Dhillon <sargun@sargun.me> Cc: Jann Horn <jannh@google.com> Cc: Chris Palmer <palmer@google.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Robert Sesek <rsesek@google.com> Cc: Jeffrey Vander Stoep <jeffv@google.com> Cc: Linux Containers <containers@lists.linux-foundation.org> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Link: https://lore.kernel.org/r/20200531115031.391515-2-christian.brauner@ubuntu.com Signed-off-by: Kees Cook <keescook@chromium.org> |
||
Eric W. Biederman
|
8c2f526639 |
umd: Remove exit_umh
The bpfilter code no longer uses the umd_info.cleanup callback. This callback is what exit_umh exists to call. So remove exit_umh and all of it's associated booking. v1: https://lkml.kernel.org/r/87bll6dlte.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87y2o53abg.fsf_-_@x220.int.ebiederm.org Link: https://lkml.kernel.org/r/20200702164140.4468-15-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
38fd525a4c |
exit: Factor thread_group_exited out of pidfd_poll
Create an independent helper thread_group_exited which returns true when all threads have passed exit_notify in do_exit. AKA all of the threads are at least zombies and might be dead or completely gone. Create this helper by taking the logic out of pidfd_poll where it is already tested, and adding a READ_ONCE on the read of task->exit_state. I will be changing the user mode driver code to use this same logic to know when a user mode driver needs to be restarted. Place the new helper thread_group_exited in kernel/exit.c and EXPORT it so it can be used by modules. Link: https://lkml.kernel.org/r/20200702164140.4468-13-ebiederm@xmission.com Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
1c340ead18 |
umd: Track user space drivers with struct pid
Use struct pid instead of user space pid values that are prone to wrap araound. In addition track the entire thread group instead of just the first thread that is started by exec. There are no multi-threaded user mode drivers today but there is nothing preclucing user drivers from being multi-threaded, so it is just a good idea to track the entire process. Take a reference count on the tgid's in question to make it possible to remove exit_umh in a future change. As a struct pid is available directly use kill_pid_info. The prior process signalling code was iffy in using a userspace pid known to be in the initial pid namespace and then looking up it's task in whatever the current pid namespace is. It worked only because kernel threads always run in the initial pid namespace. As the tgid is now refcounted verify the tgid is NULL at the start of fork_usermode_driver to avoid the possibility of silent pid leaks. v1: https://lkml.kernel.org/r/87mu4qdlv2.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/a70l4oy8.fsf_-_@x220.int.ebiederm.org Link: https://lkml.kernel.org/r/20200702164140.4468-12-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Eric W. Biederman
|
884c5e683b |
umh: Separate the user mode driver and the user mode helper support
This makes it clear which code is part of the core user mode helper support and which code is needed to implement user mode drivers. This makes the kernel smaller for everyone who does not use a usermode driver. v1: https://lkml.kernel.org/r/87tuyyf0ln.fsf_-_@x220.int.ebiederm.org v2: https://lkml.kernel.org/r/87imf963s6.fsf_-_@x220.int.ebiederm.org Link: https://lkml.kernel.org/r/20200702164140.4468-5-ebiederm@xmission.com Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> |
||
Michel Lespinasse
|
c1e8d7c6a7 |
mmap locking API: convert mmap_sem comments
Convert comments that reference mmap_sem to reference mmap_lock instead. [akpm@linux-foundation.org: fix up linux-next leftovers] [akpm@linux-foundation.org: s/lockaphore/lock/, per Vlastimil] [akpm@linux-foundation.org: more linux-next fixups, per Michel] Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-13-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Michel Lespinasse
|
d8ed45c5dc |
mmap locking API: use coccinelle to convert mmap_sem rwsem call sites
This change converts the existing mmap_sem rwsem calls to use the new mmap locking API instead. The change is generated using coccinelle with the following rule: // spatch --sp-file mmap_lock_api.cocci --in-place --include-headers --dir . @@ expression mm; @@ ( -init_rwsem +mmap_init_lock | -down_write +mmap_write_lock | -down_write_killable +mmap_write_lock_killable | -down_write_trylock +mmap_write_trylock | -up_write +mmap_write_unlock | -downgrade_write +mmap_write_downgrade | -down_read +mmap_read_lock | -down_read_killable +mmap_read_lock_killable | -down_read_trylock +mmap_read_trylock | -up_read +mmap_read_unlock ) -(&mm->mmap_sem) +(mm) Signed-off-by: Michel Lespinasse <walken@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: David Rientjes <rientjes@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Liam Howlett <Liam.Howlett@oracle.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ying Han <yinghan@google.com> Link: http://lkml.kernel.org/r/20200520052908.204642-5-walken@google.com Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Mike Rapoport
|
e31cf2f4ca |
mm: don't include asm/pgtable.h if linux/mm.h is already included
Patch series "mm: consolidate definitions of page table accessors", v2. The low level page table accessors (pXY_index(), pXY_offset()) are duplicated across all architectures and sometimes more than once. For instance, we have 31 definition of pgd_offset() for 25 supported architectures. Most of these definitions are actually identical and typically it boils down to, e.g. static inline unsigned long pmd_index(unsigned long address) { return (address >> PMD_SHIFT) & (PTRS_PER_PMD - 1); } static inline pmd_t *pmd_offset(pud_t *pud, unsigned long address) { return (pmd_t *)pud_page_vaddr(*pud) + pmd_index(address); } These definitions can be shared among 90% of the arches provided XYZ_SHIFT, PTRS_PER_XYZ and xyz_page_vaddr() are defined. For architectures that really need a custom version there is always possibility to override the generic version with the usual ifdefs magic. These patches introduce include/linux/pgtable.h that replaces include/asm-generic/pgtable.h and add the definitions of the page table accessors to the new header. This patch (of 12): The linux/mm.h header includes <asm/pgtable.h> to allow inlining of the functions involving page table manipulations, e.g. pte_alloc() and pmd_alloc(). So, there is no point to explicitly include <asm/pgtable.h> in the files that include <linux/mm.h>. The include statements in such cases are remove with a simple loop: for f in $(git grep -l "include <linux/mm.h>") ; do sed -i -e '/include <asm\/pgtable.h>/ d' $f done Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Cain <bcain@codeaurora.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Greentime Hu <green.hu@gmail.com> Cc: Greg Ungerer <gerg@linux-m68k.org> Cc: Guan Xuetao <gxt@pku.edu.cn> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Helge Deller <deller@gmx.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Ley Foon Tan <ley.foon.tan@intel.com> Cc: Mark Salter <msalter@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Matt Turner <mattst88@gmail.com> Cc: Max Filippov <jcmvbkbc@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Weinberger <richard@nod.at> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: Vincent Chen <deanbo422@gmail.com> Cc: Vineet Gupta <vgupta@synopsys.com> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Link: http://lkml.kernel.org/r/20200514170327.31389-1-rppt@kernel.org Link: http://lkml.kernel.org/r/20200514170327.31389-2-rppt@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> |
||
Linus Torvalds
|
039aeb9deb |
ARM:
- Move the arch-specific code into arch/arm64/kvm - Start the post-32bit cleanup - Cherry-pick a few non-invasive pre-NV patches x86: - Rework of TLB flushing - Rework of event injection, especially with respect to nested virtualization - Nested AMD event injection facelift, building on the rework of generic code and fixing a lot of corner cases - Nested AMD live migration support - Optimization for TSC deadline MSR writes and IPIs - Various cleanups - Asynchronous page fault cleanups (from tglx, common topic branch with tip tree) - Interrupt-based delivery of asynchronous "page ready" events (host side) - Hyper-V MSRs and hypercalls for guest debugging - VMX preemption timer fixes s390: - Cleanups Generic: - switch vCPU thread wakeup from swait to rcuwait The other architectures, and the guest side of the asynchronous page fault work, will come next week. -----BEGIN PGP SIGNATURE----- iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAl7VJcYUHHBib256aW5p QHJlZGhhdC5jb20ACgkQv/vSX3jHroPf6QgAq4wU5wdd1lTGz/i3DIhNVJNJgJlp ozLzRdMaJbdbn5RpAK6PEBd9+pt3+UlojpFB3gpJh2Nazv2OzV4yLQgXXXyyMEx1 5Hg7b4UCJYDrbkCiegNRv7f/4FWDkQ9dx++RZITIbxeskBBCEI+I7GnmZhGWzuC4 7kj4ytuKAySF2OEJu0VQF6u0CvrNYfYbQIRKBXjtOwuRK4Q6L63FGMJpYo159MBQ asg3B1jB5TcuGZ9zrjL5LkuzaP4qZZHIRs+4kZsH9I6MODHGUxKonrkablfKxyKy CFK+iaHCuEXXty5K0VmWM3nrTfvpEjVjbMc7e1QGBQ5oXsDM0pqn84syRg== =v7Wn -----END PGP SIGNATURE----- Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm Pull kvm updates from Paolo Bonzini: "ARM: - Move the arch-specific code into arch/arm64/kvm - Start the post-32bit cleanup - Cherry-pick a few non-invasive pre-NV patches x86: - Rework of TLB flushing - Rework of event injection, especially with respect to nested virtualization - Nested AMD event injection facelift, building on the rework of generic code and fixing a lot of corner cases - Nested AMD live migration support - Optimization for TSC deadline MSR writes and IPIs - Various cleanups - Asynchronous page fault cleanups (from tglx, common topic branch with tip tree) - Interrupt-based delivery of asynchronous "page ready" events (host side) - Hyper-V MSRs and hypercalls for guest debugging - VMX preemption timer fixes s390: - Cleanups Generic: - switch vCPU thread wakeup from swait to rcuwait The other architectures, and the guest side of the asynchronous page fault work, will come next week" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (256 commits) KVM: selftests: fix rdtsc() for vmx_tsc_adjust_test KVM: check userspace_addr for all memslots KVM: selftests: update hyperv_cpuid with SynDBG tests x86/kvm/hyper-v: Add support for synthetic debugger via hypercalls x86/kvm/hyper-v: enable hypercalls regardless of hypercall page x86/kvm/hyper-v: Add support for synthetic debugger interface x86/hyper-v: Add synthetic debugger definitions KVM: selftests: VMX preemption timer migration test KVM: nVMX: Fix VMX preemption timer migration x86/kvm/hyper-v: Explicitly align hcall param for kvm_hyperv_exit KVM: x86/pmu: Support full width counting KVM: x86/pmu: Tweak kvm_pmu_get_msr to pass 'struct msr_data' in KVM: x86: announce KVM_FEATURE_ASYNC_PF_INT KVM: x86: acknowledgment mechanism for async pf page ready notifications KVM: x86: interrupt based APF 'page ready' event delivery KVM: introduce kvm_read_guest_offset_cached() KVM: rename kvm_arch_can_inject_async_page_present() to kvm_arch_can_dequeue_async_page_present() KVM: x86: extend struct kvm_vcpu_pv_apf_data with token info Revert "KVM: async_pf: Fix #DF due to inject "Page not Present" and "Page Ready" exceptions simultaneously" KVM: VMX: Replace zero-length array with flexible-array ... |
||
Linus Torvalds
|
d479c5a191 |
The changes in this cycle are:
- Optimize the task wakeup CPU selection logic, to improve scalability and reduce wakeup latency spikes - PELT enhancements - CFS bandwidth handling fixes - Optimize the wakeup path by remove rq->wake_list and replacing it with ->ttwu_pending - Optimize IPI cross-calls by making flush_smp_call_function_queue() process sync callbacks first. - Misc fixes and enhancements. Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7WPL0RHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1i0ThAAs0fbvMzNJ5SWFdwOQ4KZIlA+Im4dEBMK sx/XAZqa/hGxvkm1jS0RDVQl1V1JdOlru5UF4C42ctnAFGtBBHDriO5rn9oCpkSw DAoLc4eZqzldIXN6sDZ0xMtC14Eu15UAP40OyM4qxBc4GqGlOnnale6Vhn+n+pLQ jAuZlMJIkmmzeA6cuvtultevrVh+QUqJ/5oNUANlTER4OM48umjr5rNTOb8cIW53 9K3vbS3nmqSvJuIyqfRFoMy5GFM6+Jj2+nYuq8aTuYLEtF4qqWzttS3wBzC9699g XYRKILkCK8ZP4RB5Ps/DIKj6maZGZoICBxTJEkIgXujJlxlKKTD3mddk+0LBXChW Ijznanxn67akoAFpqi/Dnkhieg7cUrE9v1OPRS2J0xy550synSPFcSgOK3viizga iqbjptY4scUWkCwHQNjABerxc7MWzrwbIrRt+uNvCaqJLweUh0GnEcV5va8R+4I8 K20XwOdrzuPLo5KdDWA/BKOEv49guHZDvoykzlwMlR3gFfwHS/UsjzmSQIWK3gZG 9OMn8ibO2f1OzhRcEpDLFzp7IIj6NJmPFVSW+7xHyL9/vTveUx3ZXPLteb2qxJVP BYPsduVx8YeGRBlLya0PJriB23ajQr0lnHWo15g0uR9o/0Ds1ephcymiF3QJmCaA To3CyIuQN8M= =C2OP -----END PGP SIGNATURE----- Merge tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler updates from Ingo Molnar: "The changes in this cycle are: - Optimize the task wakeup CPU selection logic, to improve scalability and reduce wakeup latency spikes - PELT enhancements - CFS bandwidth handling fixes - Optimize the wakeup path by remove rq->wake_list and replacing it with ->ttwu_pending - Optimize IPI cross-calls by making flush_smp_call_function_queue() process sync callbacks first. - Misc fixes and enhancements" * tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits) irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too sched/headers: Split out open-coded prototypes into kernel/sched/smp.h sched: Replace rq::wake_list sched: Add rq::ttwu_pending irq_work, smp: Allow irq_work on call_single_queue smp: Optimize send_call_function_single_ipi() smp: Move irq_work_run() out of flush_smp_call_function_queue() smp: Optimize flush_smp_call_function_queue() sched: Fix smp_call_function_single_async() usage for ILB sched/core: Offload wakee task activation if it the wakee is descheduling sched/core: Optimize ttwu() spinning on p->on_cpu sched: Defend cfs and rt bandwidth quota against overflow sched/cpuacct: Fix charge cpuacct.usage_sys sched/fair: Replace zero-length array with flexible-array sched/pelt: Sync util/runnable_sum with PELT window when propagating sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr() sched/fair: Optimize enqueue_task_fair() sched: Make scheduler_ipi inline sched: Clean up scheduler_ipi() sched/core: Simplify sched_init() ... |
||
Linus Torvalds
|
e148a8f948 |
Merge branch 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess/readdir updates from Al Viro: "Finishing the conversion of readdir.c to unsafe_... API. This includes the uaccess_{read,write}_begin series by Christophe Leroy" * 'uaccess.readdir' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: readdir.c: get rid of the last __put_user(), drop now-useless access_ok() readdir.c: get compat_filldir() more or less in sync with filldir() switch readdir(2) to unsafe_copy_dirent_name() drm/i915/gem: Replace user_access_begin by user_write_access_begin uaccess: Selectively open read or write user access uaccess: Add user_read_access_begin/end and user_write_access_begin/end |
||
Paolo Bonzini
|
9d5272f5e3 | Merge tag 'noinstr-x86-kvm-2020-05-16' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into HEAD | ||
Davidlohr Bueso
|
9d9a6ebfea |
rcuwait: Let rcuwait_wake_up() return whether or not a task was awoken
Propagating the return value of wake_up_process() back to the caller can come in handy for future users, such as for statistics or accounting purposes. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Message-Id: <20200424054837.5138-3-dave@stgolabs.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Davidlohr Bueso
|
c9d64a1b2d |
rcuwait: Fix stale wake call name in comment
The 'trywake' name was renamed to simply 'wake', update the comment. Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Message-Id: <20200424054837.5138-2-dave@stgolabs.net> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |
||
Christophe Leroy
|
41cd780524 |
uaccess: Selectively open read or write user access
When opening user access to only perform reads, only open read access. When opening user access to only perform writes, only open write access. Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/2e73bc57125c2c6ab12a587586a4eed3a47105fc.1585898438.git.christophe.leroy@c-s.fr |
||
Jann Horn
|
586b58cac8 |
exit: Move preemption fixup up, move blocking operations down
With CONFIG_DEBUG_ATOMIC_SLEEP=y and CONFIG_CGROUPS=y, kernel oopses in
non-preemptible context look untidy; after the main oops, the kernel prints
a "sleeping function called from invalid context" report because
exit_signals() -> cgroup_threadgroup_change_begin() -> percpu_down_read()
can sleep, and that happens before the preempt_count_set(PREEMPT_ENABLED)
fixup.
It looks like the same thing applies to profile_task_exit() and
kcov_task_exit().
Fix it by moving the preemption fixup up and the calls to
profile_task_exit() and kcov_task_exit() down.
Fixes:
|
||
Eric W. Biederman
|
6ade99ec61 |
proc: Put thread_pid in release_task not proc_flush_pid
Oleg pointed out that in the unlikely event the kernel is compiled
with CONFIG_PROC_FS unset that release_task will now leak the pid.
Move the put_pid out of proc_flush_pid into release_task to fix this
and to guarantee I don't make that mistake again.
When possible it makes sense to keep get and put in the same function
so it can easily been seen how they pair up.
Fixes:
|
||
Linus Torvalds
|
d987ca1c6b |
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull exec/proc updates from Eric Biederman: "This contains two significant pieces of work: the work to sort out proc_flush_task, and the work to solve a deadlock between strace and exec. Fixing proc_flush_task so that it no longer requires a persistent mount makes improvements to proc possible. The removal of the persistent mount solves an old regression that that caused the hidepid mount option to only work on remount not on mount. The regression was found and reported by the Android folks. This further allows Alexey Gladkov's work making proc mount options specific to an individual mount of proc to move forward. The work on exec starts solving a long standing issue with exec that it takes mutexes of blocking userspace applications, which makes exec extremely deadlock prone. For the moment this adds a second mutex with a narrower scope that handles all of the easy cases. Which makes the tricky cases easy to spot. With a little luck the code to solve those deadlocks will be ready by next merge window" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (25 commits) signal: Extend exec_id to 64bits pidfd: Use new infrastructure to fix deadlocks in execve perf: Use new infrastructure to fix deadlocks in execve proc: io_accounting: Use new infrastructure to fix deadlocks in execve proc: Use new infrastructure to fix deadlocks in execve kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve kernel: doc: remove outdated comment cred.c mm: docs: Fix a comment in process_vm_rw_core selftests/ptrace: add test cases for dead-locks exec: Fix a deadlock in strace exec: Add exec_update_mutex to replace cred_guard_mutex exec: Move exec_mmap right after de_thread in flush_old_exec exec: Move cleanup of posix timers on exec out of de_thread exec: Factor unshare_sighand out of de_thread and call it separately exec: Only compute current once in flush_old_exec pid: Improve the comment about waiting in zap_pid_ns_processes proc: Remove the now unnecessary internal mount of proc uml: Create a private mount of proc for mconsole uml: Don't consult current to find the proc_mnt in mconsole_proc proc: Use a list of inodes to flush from proc ... |
||
Linus Torvalds
|
dbb381b619 |
timekeeping and timer updates:
Core: - Consolidation of the vDSO build infrastructure to address the difficulties of cross-builds for ARM64 compat vDSO libraries by restricting the exposure of header content to the vDSO build. This is achieved by splitting out header content into separate headers. which contain only the minimaly required information which is necessary to build the vDSO. These new headers are included from the kernel headers and the vDSO specific files. - Enhancements to the generic vDSO library allowing more fine grained control over the compiled in code, further reducing architecture specific storage and preparing for adopting the generic library by PPC. - Cleanup and consolidation of the exit related code in posix CPU timers. - Small cleanups and enhancements here and there Drivers: - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support - Correct the clock rate of PIT64b global clock - setup_irq() cleanup - Preparation for PWM and suspend support for the TI DM timer - Expand the fttmr010 driver to support ast2600 systems - The usual small fixes, enhancements and cleanups all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B+QETHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYofJ5D/94s5fpaqiuNcaAsLq2D3DRIrTnqxx7 yEeAOPcbYV1bM1SgY/M83L5yGc2S8ny787e26abwRTCZhZV3eAmRTphIFFIZR0Xk xS+i67odscbdJTRtztKj3uQ9rFxefszRuphyaa89pwSY9nnyMWLcahGSQOGs0LJK hvmgwPjyM1drNfPxgPiaFg7vDr2XxNATpQr/FBt+BhelvVan8TlAfrkcNPiLr++Y Axz925FP7jMaRRbZ1acji34gLiIAZk0jLCUdbix7YkPrqDB4GfO+v8Vez+fGClbJ uDOYeR4r1+Be/BtSJtJ2tHqtsKCcAL6agtaE2+epZq5HbzaZFRvBFaxgFNF8WVcn 3FFibdEMdsRNfZTUVp5wwgOLN0UIqE/7LifE12oLEL2oFB5H2PiNEUw3E02XHO11 rL3zgHhB6Ke1sXKPCjSGdmIQLbxZmV5kOlQFy7XuSeo5fmRapVzKNffnKcftIliF 1HNtZbgdA+3tdxMFCqoo1QX+kotl9kgpslmdZ0qHAbaRb3xqLoSskbqEjFRMuSCC 8bjJrwboD9T5GPfwodSCgqs/58CaSDuqPFbIjCay+p90Fcg6wWAkZtyG04ZLdPRc GgNNdN4gjTD9bnrRi8cH47z1g8OO4vt4K4SEbmjo8IlDW+9jYMxuwgR88CMeDXd7 hu7aKsr2I2q/WQ== =5o9G -----END PGP SIGNATURE----- Merge tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timekeeping and timer updates from Thomas Gleixner: "Core: - Consolidation of the vDSO build infrastructure to address the difficulties of cross-builds for ARM64 compat vDSO libraries by restricting the exposure of header content to the vDSO build. This is achieved by splitting out header content into separate headers. which contain only the minimaly required information which is necessary to build the vDSO. These new headers are included from the kernel headers and the vDSO specific files. - Enhancements to the generic vDSO library allowing more fine grained control over the compiled in code, further reducing architecture specific storage and preparing for adopting the generic library by PPC. - Cleanup and consolidation of the exit related code in posix CPU timers. - Small cleanups and enhancements here and there Drivers: - The obligatory new drivers: Ingenic JZ47xx and X1000 TCU support - Correct the clock rate of PIT64b global clock - setup_irq() cleanup - Preparation for PWM and suspend support for the TI DM timer - Expand the fttmr010 driver to support ast2600 systems - The usual small fixes, enhancements and cleanups all over the place" * tag 'timers-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) Revert "clocksource/drivers/timer-probe: Avoid creating dead devices" vdso: Fix clocksource.h macro detection um: Fix header inclusion arm64: vdso32: Enable Clang Compilation lib/vdso: Enable common headers arm: vdso: Enable arm to use common headers x86/vdso: Enable x86 to use common headers mips: vdso: Enable mips to use common headers arm64: vdso32: Include common headers in the vdso library arm64: vdso: Include common headers in the vdso library arm64: Introduce asm/vdso/processor.h arm64: vdso32: Code clean up linux/elfnote.h: Replace elf.h with UAPI equivalent scripts: Fix the inclusion order in modpost common: Introduce processor.h linux/ktime.h: Extract common header for vDSO linux/jiffies.h: Extract common header for vDSO linux/time64.h: Extract common header for vDSO linux/time32.h: Extract common header for vDSO linux/time.h: Extract common header for vDSO ... |
||
Linus Torvalds
|
4b9fd8a829 |
Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar: "The main changes in this cycle were: - Continued user-access cleanups in the futex code. - percpu-rwsem rewrite that uses its own waitqueue and atomic_t instead of an embedded rwsem. This addresses a couple of weaknesses, but the primary motivation was complications on the -rt kernel. - Introduce raw lock nesting detection on lockdep (CONFIG_PROVE_RAW_LOCK_NESTING=y), document the raw_lock vs. normal lock differences. This too originates from -rt. - Reuse lockdep zapped chain_hlocks entries, to conserve RAM footprint on distro-ish kernels running into the "BUG: MAX_LOCKDEP_CHAIN_HLOCKS too low!" depletion of the lockdep chain-entries pool. - Misc cleanups, smaller fixes and enhancements - see the changelog for details" * 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (55 commits) fs/buffer: Make BH_Uptodate_Lock bit_spin_lock a regular spinlock_t thermal/x86_pkg_temp: Make pkg_temp_lock a raw_spinlock_t Documentation/locking/locktypes: Minor copy editor fixes Documentation/locking/locktypes: Further clarifications and wordsmithing m68knommu: Remove mm.h include from uaccess_no.h x86: get rid of user_atomic_cmpxchg_inatomic() generic arch_futex_atomic_op_inuser() doesn't need access_ok() x86: don't reload after cmpxchg in unsafe_atomic_op2() loop x86: convert arch_futex_atomic_op_inuser() to user_access_begin/user_access_end() objtool: whitelist __sanitizer_cov_trace_switch() [parisc, s390, sparc64] no need for access_ok() in futex handling sh: no need of access_ok() in arch_futex_atomic_op_inuser() futex: arch_futex_atomic_op_inuser() calling conventions change completion: Use lockdep_assert_RT_in_threaded_ctx() in complete_all() lockdep: Add posixtimer context tracing bits lockdep: Annotate irq_work lockdep: Add hrtimer context tracing bits lockdep: Introduce wait-type checks completion: Use simple wait queues sched/swait: Prepare usage in completions ... |
||
Eric W. Biederman
|
b95e31c07c |
posix-cpu-timers: Stop disabling timers on mt-exec
The reasons why the extra posix_cpu_timers_exit_group() invocation has been
added are not entirely clear from the commit message. Today all that
posix_cpu_timers_exit_group() does is stop timers that are tracking the
task from firing. Every other operation on those timers is still allowed.
The practical implication of this is posix_cpu_timer_del() which could
not get the siglock after the thread group leader has exited (because
sighand == NULL) would be able to run successfully because the timer
was already dequeued.
With that locking issue fixed there is no point in disabling all of the
timers. So remove this ``tempoary'' hack.
Fixes:
|
||
Madhuparna Bhowmik
|
22a34c6fe0
|
exit: Fix Sparse errors and warnings
This patch fixes the following sparse error: kernel/exit.c:627:25: error: incompatible types in comparison expression And the following warning: kernel/exit.c:626:40: warning: incorrect type in assignment Signed-off-by: Madhuparna Bhowmik <madhuparnabhowmik10@gmail.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> [christian.brauner@ubuntu.com: edit commit message] Link: https://lore.kernel.org/r/20200130062028.4870-1-madhuparnabhowmik10@gmail.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> |
||
Eric W. Biederman
|
7bc3e6e55a |
proc: Use a list of inodes to flush from proc
Rework the flushing of proc to use a list of directory inodes that need to be flushed. The list is kept on struct pid not on struct task_struct, as there is a fixed connection between proc inodes and pids but at least for the case of de_thread the pid of a task_struct changes. This removes the dependency on proc_mnt which allows for different mounts of proc having different mount options even in the same pid namespace and this allows for the removal of proc_mnt which will trivially the first mount of proc to honor it's mount options. This flushing remains an optimization. The functions pid_delete_dentry and pid_revalidate ensure that ordinary dcache management will not attempt to use dentries past the point their respective task has died. When unused the shrinker will eventually be able to remove these dentries. There is a case in de_thread where proc_flush_pid can be called early for a given pid. Which winds up being safe (if suboptimal) as this is just an optiimization. Only pid directories are put on the list as the other per pid files are children of those directories and d_invalidate on the directory will get them as well. So that the pid can be used during flushing it's reference count is taken in release_task and dropped in proc_flush_pid. Further the call of proc_flush_pid is moved after the tasklist_lock is released in release_task so that it is certain that the pid has already been unhashed when flushing it taking place. This removes a small race where a dentry could recreated. As struct pid is supposed to be small and I need a per pid lock I reuse the only lock that currently exists in struct pid the the wait_pidfd.lock. The net result is that this adds all of this functionality with just a little extra list management overhead and a single extra pointer in struct pid. v2: Initialize pid->inodes. I somehow failed to get that initialization into the initial version of the patch. A boot failure was reported by "kernel test robot <lkp@intel.com>", and failure to initialize that pid->inodes matches all of the reported symptoms. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> |
||
Davidlohr Bueso
|
ac8dec4209 |
locking/percpu-rwsem: Fold __percpu_up_read()
Now that __percpu_up_read() is only ever used from percpu_up_read() merge them, it's a small function. Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Link: https://lkml.kernel.org/r/20200131151540.212415454@infradead.org |
||
Linus Torvalds
|
d9c82fd8c8 |
for-linus-2020-01-03
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCXg9C5wAKCRCRxhvAZXjc oiZXAPsGFXyDCWlKnShBpKufdFh6XugADlyZK0Si2ISWQoJJsgD/Ri1g3zg6V7YC HBG0sz8+vSk/Ys55yDQz+K1d1MTkdQ4= =8uQe -----END PGP SIGNATURE----- Merge tag 'for-linus-2020-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull thread fixes from Christian Brauner: "Here are two fixes: - Panic earlier when global init exits to generate useable coredumps. Currently, when global init and all threads in its thread-group have exited we panic via: do_exit() -> exit_notify() -> forget_original_parent() -> find_child_reaper() This makes it hard to extract a useable coredump for global init from a kernel crashdump because by the time we panic exit_mm() will have already released global init's mm. We now panic slightly earlier. This has been a problem in certain environments such as Android. - Fix a race in assigning and reading taskstats for thread-groups with more than one thread. This patch has been waiting for quite a while since people disagreed on what the correct fix was at first" * tag 'for-linus-2020-01-03' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: exit: panic before exit_mm() on global init exit taskstats: fix data-race |
||
chenqiwu
|
43cf75d964
|
exit: panic before exit_mm() on global init exit
Currently, when global init and all threads in its thread-group have exited we panic via: do_exit() -> exit_notify() -> forget_original_parent() -> find_child_reaper() This makes it hard to extract a useable coredump for global init from a kernel crashdump because by the time we panic exit_mm() will have already released global init's mm. This patch moves the panic futher up before exit_mm() is called. As was the case previously, we only panic when global init and all its threads in the thread-group have exited. Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Oleg Nesterov <oleg@redhat.com> [christian.brauner@ubuntu.com: fix typo, rewrite commit message] Link: https://lore.kernel.org/r/1576736993-10121-1-git-send-email-qiwuchen55@gmail.com Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> |
||
Linus Torvalds
|
6a965666b7 |
Pipework for general notification queue
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEqG5UsNXhtOCrfGQP+7dXa6fLC2sFAl3O0OoACgkQ+7dXa6fL C2tAwA//VH9Y81azemXFdflDF90sSH3TCASlKHVYHbBNAkH/QP5F00G4BEM4nNqH F3x7qcU9vzfGdumF1pc90Yt6XSYlsQEGF+xMyMw/VS2wKs40yv+b/doVbzOWbN9C NfrklgHeuuBk+JzU2llDisVqKRTLt4SmDpYu1ZdcchUQFZCCl3BpgdSEC+xXrHay +KlRPVNMSd2kXMCDuSWrr71lVNdCTdf3nNC5p1i780+VrgpIBIG/jmiNdCcd7PLH 1aesPlr8UZY3+bmRtqe587fVRAhT2qA2xibKtyf9R0hrDtUKR4NSnpPmaeIjb26e LhVntcChhYxQqzy/T4ScTDNVjpSlwi6QMo5DwAwzNGf2nf/v5/CZ+vGYDVdXRFHj tgH1+8eDpHsi7jJp6E4cmZjiolsUx/ePDDTrQ4qbdDMO7fmIV6YQKFAMTLJepLBY qnJVqoBq3qn40zv6tVZmKgWiXQ65jEkBItZhEUmcQRBiSbBDPweIdEzx/mwzkX7U 1gShGdut6YP4GX7BnOhkiQmzucS85mgkUfG43+mBfYXb+4zNTEjhhkqhEduz2SQP xnjHxEM+MTGCj3PozIpJxNKzMTEceYY7cAUdNEMDQcHog7OCnIdGBIc7BPnsN8yA CPzntwP4mmLfK3weq3PIGC6d9xfc9PpmiR9docxQOvE6sk2Ifeo= =FKC7 -----END PGP SIGNATURE----- Merge tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs Pull pipe rework from David Howells: "This is my set of preparatory patches for building a general notification queue on top of pipes. It makes a number of significant changes: - It removes the nr_exclusive argument from __wake_up_sync_key() as this is always 1. This prepares for the next step: - Adds wake_up_interruptible_sync_poll_locked() so that poll can be woken up from a function that's holding the poll waitqueue spinlock. - Change the pipe buffer ring to be managed in terms of unbounded head and tail indices rather than bounded index and length. This means that reading the pipe only needs to modify one index, not two. - A selection of helper functions are provided to query the state of the pipe buffer, plus a couple to apply updates to the pipe indices. - The pipe ring is allowed to have kernel-reserved slots. This allows many notification messages to be spliced in by the kernel without allowing userspace to pin too many pages if it writes to the same pipe. - Advance the head and tail indices inside the pipe waitqueue lock and use wake_up_interruptible_sync_poll_locked() to poke poll without having to take the lock twice. - Rearrange pipe_write() to preallocate the buffer it is going to write into and then drop the spinlock. This allows kernel notifications to then be added the ring whilst it is filling the buffer it allocated. The read side is stalled because the pipe mutex is still held. - Don't wake up readers on a pipe if there was already data in it when we added more. - Don't wake up writers on a pipe if the ring wasn't full before we removed a buffer" * tag 'notifications-pipe-prep-20191115' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs: pipe: Remove sync on wake_ups pipe: Increase the writer-wakeup threshold to reduce context-switch count pipe: Check for ring full inside of the spinlock in pipe_write() pipe: Remove redundant wakeup from pipe_write() pipe: Rearrange sequence in pipe_write() to preallocate slot pipe: Conditionalise wakeup in pipe_read() pipe: Advance tail pointer inside of wait spinlock in pipe_read() pipe: Allow pipes to have kernel-reserved slots pipe: Use head and tail pointers for the ring, not cursor and length Add wake_up_interruptible_sync_poll_locked() Remove the nr_exclusive argument from __wake_up_sync_key() pipe: Reduce #inclusion of pipe_fs_i.h |