This reverts commit 6bad1052c2, it is the
LTS merge that had to previously get reverted due to being merged too
early.
Cc: Todd Kjos <tkjos@google.com>
Change-Id: I31b7d660bd833cf022ac4870f6d01e723fda5182
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
This reverts commit 1dbafe61e3.
Reason for revert: Too early. Needs to wait until 2024-03-27
Change-Id: I769b944bd089aa2278659ec87f7ba4ac4e74ee4a
Signed-off-by: Todd Kjos <tkjos@google.com>
Changes in 6.1.72
keys, dns: Fix missing size check of V1 server-list header
block: Don't invalidate pagecache for invalid falloc modes
ALSA: hda/realtek: enable SND_PCI_QUIRK for hp pavilion 14-ec1xxx series
ALSA: hda/realtek: fix mute/micmute LEDs for a HP ZBook
ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP ProBook 440 G6
mptcp: prevent tcp diag from closing listener subflows
Revert "PCI/ASPM: Remove pcie_aspm_pm_state_change()"
drm/mgag200: Fix gamma lut not initialized for G200ER, G200EV, G200SE
cifs: cifs_chan_is_iface_active should be called with chan_lock held
cifs: do not depend on release_iface for maintaining iface_list
KVM: x86/pmu: fix masking logic for MSR_CORE_PERF_GLOBAL_CTRL
wifi: iwlwifi: pcie: don't synchronize IRQs from IRQ
drm/bridge: ti-sn65dsi86: Never store more than msg->size bytes in AUX xfer
netfilter: use skb_ip_totlen and iph_totlen
netfilter: nf_tables: set transport offset from mac header for netdev/egress
nfc: llcp_core: Hold a ref to llcp_local->dev when holding a ref to llcp_local
octeontx2-af: Fix marking couple of structure as __packed
drm/i915/dp: Fix passing the correct DPCD_REV for drm_dp_set_phy_test_pattern
ice: Fix link_down_on_close message
ice: Shut down VSI with "link-down-on-close" enabled
i40e: Fix filter input checks to prevent config with invalid values
igc: Report VLAN EtherType matching back to user
igc: Check VLAN TCI mask
igc: Check VLAN EtherType mask
ASoC: fsl_rpmsg: Fix error handler with pm_runtime_enable
ASoC: mediatek: mt8186: fix AUD_PAD_TOP register and offset
mlxbf_gige: fix receive packet race condition
net: sched: em_text: fix possible memory leak in em_text_destroy()
r8169: Fix PCI error on system resume
can: raw: add support for SO_MARK
net-timestamp: extend SOF_TIMESTAMPING_OPT_ID to HW timestamps
net: annotate data-races around sk->sk_tsflags
net: annotate data-races around sk->sk_bind_phc
net: Implement missing getsockopt(SO_TIMESTAMPING_NEW)
selftests: bonding: do not set port down when adding to bond
ARM: sun9i: smp: Fix array-index-out-of-bounds read in sunxi_mc_smp_init
sfc: fix a double-free bug in efx_probe_filters
net: bcmgenet: Fix FCS generation for fragmented skbuffs
netfilter: nft_immediate: drop chain reference counter on error
net: Save and restore msg_namelen in sock_sendmsg
i40e: fix use-after-free in i40e_aqc_add_filters()
ASoC: meson: g12a-toacodec: Validate written enum values
ASoC: meson: g12a-tohdmitx: Validate written enum values
ASoC: meson: g12a-toacodec: Fix event generation
ASoC: meson: g12a-tohdmitx: Fix event generation for S/PDIF mux
i40e: Restore VF MSI-X state during PCI reset
igc: Fix hicredit calculation
net/qla3xxx: fix potential memleak in ql_alloc_buffer_queues
net/smc: fix invalid link access in dumping SMC-R connections
octeontx2-af: Always configure NIX TX link credits based on max frame size
octeontx2-af: Re-enable MAC TX in otx2_stop processing
asix: Add check for usbnet_get_endpoints
net: ravb: Wait for operating mode to be applied
bnxt_en: Remove mis-applied code from bnxt_cfg_ntp_filters()
net: Implement missing SO_TIMESTAMPING_NEW cmsg support
selftests: secretmem: floor the memory size to the multiple of page_size
cpu/SMT: Create topology_smt_thread_allowed()
cpu/SMT: Make SMT control more robust against enumeration failures
srcu: Fix callbacks acceleration mishandling
bpf, x64: Fix tailcall infinite loop
bpf, x86: Simplify the parsing logic of structure parameters
bpf, x86: save/restore regs with BPF_DW size
net: Declare MSG_SPLICE_PAGES internal sendmsg() flag
udp: Convert udp_sendpage() to use MSG_SPLICE_PAGES
splice, net: Add a splice_eof op to file-ops and socket-ops
ipv4, ipv6: Use splice_eof() to flush
udp: introduce udp->udp_flags
udp: move udp->no_check6_tx to udp->udp_flags
udp: move udp->no_check6_rx to udp->udp_flags
udp: move udp->gro_enabled to udp->udp_flags
udp: move udp->accept_udp_{l4|fraglist} to udp->udp_flags
udp: lockless UDP_ENCAP_L2TPINUDP / UDP_GRO
udp: annotate data-races around udp->encap_type
wifi: iwlwifi: yoyo: swap cdb and jacket bits values
arm64: dts: qcom: sdm845: align RPMh regulator nodes with bindings
arm64: dts: qcom: sdm845: Fix PSCI power domain names
fbdev: imsttfb: Release framebuffer and dealloc cmap on error path
fbdev: imsttfb: fix double free in probe()
bpf: decouple prune and jump points
bpf: remove unnecessary prune and jump points
bpf: Remove unused insn_cnt argument from visit_[func_call_]insn()
bpf: clean up visit_insn()'s instruction processing
bpf: Support new 32bit offset jmp instruction
bpf: handle ldimm64 properly in check_cfg()
bpf: fix precision backtracking instruction iteration
blk-mq: make sure active queue usage is held for bio_integrity_prep()
net/mlx5: Increase size of irq name buffer
s390/mm: add missing arch_set_page_dat() call to vmem_crst_alloc()
s390/cpumf: support user space events for counting
f2fs: clean up i_compress_flag and i_compress_level usage
f2fs: convert to use bitmap API
f2fs: assign default compression level
f2fs: set the default compress_level on ioctl
selftests: mptcp: fix fastclose with csum failure
selftests: mptcp: set FAILING_LINKS in run_tests
media: camss: sm8250: Virtual channels for CSID
media: qcom: camss: Fix set CSI2_RX_CFG1_VC_MODE when VC is greater than 3
ext4: convert move_extent_per_page() to use folios
khugepage: replace try_to_release_page() with filemap_release_folio()
memory-failure: convert truncate_error_page() to use folio
mm: merge folio_has_private()/filemap_release_folio() call pairs
mm, netfs, fscache: stop read optimisation when folio removed from pagecache
filemap: add a per-mapping stable writes flag
block: update the stable_writes flag in bdev_add
smb: client: fix missing mode bits for SMB symlinks
net: dpaa2-eth: rearrange variable in dpaa2_eth_get_ethtool_stats
dpaa2-eth: recycle the RX buffer only after all processing done
ethtool: don't propagate EOPNOTSUPP from dumps
bpf, sockmap: af_unix stream sockets need to hold ref for pair sock
firmware: arm_scmi: Fix frequency truncation by promoting multiplier type
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7
genirq/affinity: Remove the 'firstvec' parameter from irq_build_affinity_masks
genirq/affinity: Pass affinity managed mask array to irq_build_affinity_masks
genirq/affinity: Don't pass irq_affinity_desc array to irq_build_affinity_masks
genirq/affinity: Rename irq_build_affinity_masks as group_cpus_evenly
genirq/affinity: Move group_cpus_evenly() into lib/
lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
mm/memory_hotplug: add missing mem_hotplug_lock
mm/memory_hotplug: fix error handling in add_memory_resource()
net: sched: call tcf_ct_params_free to free params in tcf_ct_init
netfilter: flowtable: allow unidirectional rules
netfilter: flowtable: cache info of last offload
net/sched: act_ct: offload UDP NEW connections
net/sched: act_ct: Fix promotion of offloaded unreplied tuple
netfilter: flowtable: GC pushes back packets to classic path
net/sched: act_ct: Take per-cb reference to tcf_ct_flow_table
octeontx2-af: Fix pause frame configuration
octeontx2-af: Support variable number of lmacs
btrfs: fix qgroup_free_reserved_data int overflow
btrfs: mark the len field in struct btrfs_ordered_sum as unsigned
ring-buffer: Fix 32-bit rb_time_read() race with rb_time_cmpxchg()
firewire: ohci: suppress unexpected system reboot in AMD Ryzen machines and ASM108x/VT630x PCIe cards
x86/kprobes: fix incorrect return address calculation in kprobe_emulate_call_indirect
i2c: core: Fix atomic xfer check for non-preempt config
mm: fix unmap_mapping_range high bits shift bug
drm/amdgpu: skip gpu_info fw loading on navi12
drm/amd/display: add nv12 bounding box
mmc: meson-mx-sdhc: Fix initialization frozen issue
mmc: rpmb: fixes pause retune on all RPMB partitions.
mmc: core: Cancel delayed work before releasing host
mmc: sdhci-sprd: Fix eMMC init failure after hw reset
genirq/affinity: Only build SMP-only helper functions on SMP kernels
f2fs: compress: fix to assign compress_level for lz4 correctly
net/sched: act_ct: additional checks for outdated flows
net/sched: act_ct: Always fill offloading tuple iifidx
bpf: Fix a verifier bug due to incorrect branch offset comparison with cpu=v4
bpf: syzkaller found null ptr deref in unix_bpf proto add
media: qcom: camss: Comment CSID dt_id field
smb3: Replace smb2pdu 1-element arrays with flex-arrays
Revert "interconnect: qcom: sm8250: Enable sync_state"
Linux 6.1.72
Change-Id: Id00eb2ae1159d4d5fa0ef914e672c5669cbf5b0a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit d91bdd96b55cc3ce98d883a60f133713821b80a6 ]
The SMT control mechanism got added as speculation attack vector
mitigation. The implemented logic relies on the primary thread mask to
be set up properly.
This turns out to be an issue with XEN/PV guests because their CPU hotplug
mechanics do not enumerate APICs and therefore the mask is never correctly
populated.
This went unnoticed so far because by chance XEN/PV ends up with
smp_num_siblings == 2. So smt_hotplug_control stays at its default value
CPU_SMT_ENABLED and the primary thread mask is never evaluated in the
context of CPU hotplug.
This stopped "working" with the upcoming overhaul of the topology
evaluation which legitimately provides a fake topology for XEN/PV. That
sets smp_num_siblings to 1, which causes the core CPU hot-plug core to
refuse to bring up the APs.
This happens because smt_hotplug_control is set to CPU_SMT_NOT_SUPPORTED
which causes cpu_smt_allowed() to evaluate the unpopulated primary thread
mask with the conclusion that all non-boot CPUs are not valid to be
plugged.
Make cpu_smt_allowed() more robust and take CPU_SMT_NOT_SUPPORTED and
CPU_SMT_NOT_IMPLEMENTED into account. Rename it to cpu_bootable() while at
it as that makes it more clear what the function is about.
The primary mask issue on x86 XEN/PV needs to be addressed separately as
there are users outside of the CPU hotplug code too.
Fixes: 05736e4ac1 ("cpu/hotplug: Provide knobs to control SMT")
Reported-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.149440843@linutronix.de
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38253464bc821d6de6bba81bb1412ebb36f6cbd1 ]
Some architectures allows partial SMT states, i.e. when not all SMT threads
are brought online.
To support that, add an architecture helper which checks whether a given
CPU is allowed to be brought online depending on how many SMT threads are
currently enabled. Since this is only applicable to architecture supporting
partial SMT, only these architectures should select the new configuration
variable CONFIG_SMT_NUM_THREADS_DYNAMIC. For the other architectures, not
supporting the partial SMT states, there is no need to define
topology_cpu_smt_allowed(), the generic code assumed that all the threads
are allowed or only the primary ones.
Call the helper from cpu_smt_enable(), and cpu_smt_allowed() when SMT is
enabled, to check if the particular thread should be onlined. Notably,
also call it from cpu_smt_disable() if CPU_SMT_ENABLED, to allow
offlining some threads to move from a higher to lower number of threads
online.
[ ldufour: Slightly reword the commit's description ]
[ ldufour: Introduce CONFIG_SMT_NUM_THREADS_DYNAMIC ]
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20230705145143.40545-7-ldufour@linux.ibm.com
Stable-dep-of: d91bdd96b55c ("cpu/SMT: Make SMT control more robust against enumeration failures")
Signed-off-by: Sasha Levin <sashal@kernel.org>
This reverts commit 75b5016ce3 which is
commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 upstream.
It breaks the Android kernel abi and can be brought back in the future
in an abi-safe way if it is really needed.
Bug: 161946584
Change-Id: Ifdc5474de6e7488eb24959df1697b61b2e3e7880
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
[ Upstream commit 5c0930ccaad5a74d74e8b18b648c5eb21ed2fe94 ]
2b8272ff4a70 ("cpu/hotplug: Prevent self deadlock on CPU hot-unplug")
solved the straight forward CPU hotplug deadlock vs. the scheduler
bandwidth timer. Yu discovered a more involved variant where a task which
has a bandwidth timer started on the outgoing CPU holds a lock and then
gets throttled. If the lock required by one of the CPU hotplug callbacks
the hotplug operation deadlocks because the unthrottling timer event is not
handled on the dying CPU and can only be recovered once the control CPU
reaches the hotplug state which pulls the pending hrtimers from the dead
CPU.
Solve this by pushing the hrtimers away from the dying CPU in the dying
callbacks. Nothing can queue a hrtimer on the dying CPU at that point because
all other CPUs spin in stop_machine() with interrupts disabled and once the
operation is finished the CPU is marked offline.
Reported-by: Yu Liao <liaoyu15@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Liu Tie <liutie4@huawei.com>
Link: https://lore.kernel.org/r/87a5rphara.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 38685e2a0476127db766f81b1c06019ddc4c9ffa ]
If a system has isolated CPUs via the "isolcpus=" command line parameter,
then an attempt to offline the last housekeeping CPU will result in a
WARN_ON() when rebuilding the scheduler domains and a subsequent panic due
to and unhandled empty CPU mas in partition_sched_domains_locked().
cpuset_hotplug_workfn()
rebuild_sched_domains_locked()
ndoms = generate_sched_domains(&doms, &attr);
cpumask_and(doms[0], top_cpuset.effective_cpus, housekeeping_cpumask(HK_FLAG_DOMAIN));
Thus results in an empty CPU mask which triggers the warning and then the
subsequent crash:
WARNING: CPU: 4 PID: 80 at kernel/sched/topology.c:2366 build_sched_domains+0x120c/0x1408
Call trace:
build_sched_domains+0x120c/0x1408
partition_sched_domains_locked+0x234/0x880
rebuild_sched_domains_locked+0x37c/0x798
rebuild_sched_domains+0x30/0x58
cpuset_hotplug_workfn+0x2a8/0x930
Unable to handle kernel paging request at virtual address fffe80027ab37080
partition_sched_domains_locked+0x318/0x880
rebuild_sched_domains_locked+0x37c/0x798
Aside of the resulting crash, it does not make any sense to offline the last
last housekeeping CPU.
Prevent this by masking out the non-housekeeping CPUs when selecting a
target CPU for initiating the CPU unplug operation via the work queue.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/202310171709530660462@zte.com.cn
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 2b8272ff4a70b866106ae13c36be7ecbef5d5da2 upstream.
Xiongfeng reported and debugged a self deadlock of the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.
CPU1 CPU2
T1 sets cfs_quota
starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU2
T1 initiates offlining of CPU1
Hotplug operation starts
...
'period_timer' expires and is re-enqueued on CPU1
...
take_cpu_down()
CPU1 shuts down and does not handle timers
anymore. They have to be migrated in the
post dead hotplug steps by the control task.
T1 runs the post dead offline operation
T1 is scheduled out
T1 waits for 'period_timer' to expire
T1 waits there forever if it is scheduled out before it can execute the hrtimer
offline callback hrtimers_dead_cpu().
Cure this by delegating the hotplug control operation to a worker thread on
an online CPU. This takes the initiating user space task, which might be
affected by the bandwidth timer, completely out of the picture.
Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 6f855b39e4602b6b42a8e5cbcfefb8a1b8b5f0be ]
The DYING/STARTING callbacks are not expected to fail. However, as reported
by Derek, buggy drivers such as tboot are still free to return errors
within those sections, which halts the hot(un)plug and leaves the CPU in an
unrecoverable state.
As there is no rollback possible, only log the failures and proceed with
the following steps.
This restores the hotplug behaviour prior to commit 453e410851
("cpu/hotplug: Add cpuhp_invoke_callback_range()")
Fixes: 453e410851 ("cpu/hotplug: Add cpuhp_invoke_callback_range()")
Reported-by: Derek Dolney <z23@posteo.net>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Derek Dolney <z23@posteo.net>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215867
Link: https://lore.kernel.org/r/20220927101259.1149636-1-vdonnefort@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 64ea6e44f85b9b75925ebe1ba0e6e8430cc4e06f ]
Writing the current state back in hotplug/target calls cpu_down()
which will set cpu dying even when it isn't and then nothing will
ever clear it. A stress test that reads values and writes them back
for all cpu device files in sysfs will trigger the BUG() in
select_fallback_rq once all cpus are marked as dying.
kernel/cpu.c::target_store()
...
if (st->state < target)
ret = cpu_up(dev->id, target);
else
ret = cpu_down(dev->id, target);
cpu_down() -> cpu_set_state()
bool bringup = st->state < target;
...
if (cpu_dying(cpu) != !bringup)
set_cpu_dying(cpu, !bringup);
Fix this by letting state==target fall through in the target_store()
conditional. Also make sure st->target == target in that case.
Fixes: 757c989b99 ("cpu/hotplug: Make target state writeable")
Signed-off-by: Phil Auld <pauld@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://lore.kernel.org/r/20221117162329.3164999-2-pauld@redhat.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
This is the Intel version of a confidential computing solution called
Trust Domain Extensions (TDX). This series adds support to run the
kernel as part of a TDX guest. It provides similar guest protections to
AMD's SEV-SNP like guest memory and register state encryption, memory
integrity protection and a lot more.
Design-wise, it differs from AMD's solution considerably: it uses
a software module which runs in a special CPU mode called (Secure
Arbitration Mode) SEAM. As the name suggests, this module serves as sort
of an arbiter which the confidential guest calls for services it needs
during its lifetime.
Just like AMD's SNP set, this series reworks and streamlines certain
parts of x86 arch code so that this feature can be properly accomodated.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKLbisACgkQEsHwGGHe
VUqZLg/7B55iygCwzz0W/KLcXL2cISatUpzGbFs1XTbE9DMz06BPkOsEjF2k8ckv
kfZjgqhSx3GvUI80gK0Tn2M2DfIj3nKuNSXd1pfextP7AxEf68FFJsQz1Ju7bHpT
pZaG+g8IK4+mnEHEKTCO9ANg/Zw8yqJLdtsCaCNE9SUGUfQ6m/ujTEfsambXDHNm
khyCAgpIGSOt51/4apoR9ebyrNCaeVbDawpIPjTy+iyFRc/WyaLFV9CQ8klw4gbw
r/90x2JYxvAf0/z/ifT9Wa+TnYiQ0d4VjFbfr0iJ4GcPn5L3EIoIKPE8vPGMpoSX
fLSzoNmAOT3ja57ytUUQ3o0edoRUIPEdixOebf9qWvE/aj7W37YRzrlJ8Ej/x9Jy
HcI4WZF6Dr1bh6FnI/xX2eVZRzLOL4j9gNyPCwIbvgr1NjDqQnxU7nhxVMmQhJrs
IdiEcP5WYerLKfka/uF//QfWUg5mDBgFa1/3xK57Z3j0iKWmgjaPpR0SWlOKjj8G
tr0gGN9ejikZTqXKGsHn8fv/R3bjXvbVD8z0IEcx+MIrRmZPnX2QBlg7UA1AXV5n
HoVwPFdH1QAtjZq1MRcL4hTOjz3FkS68rg7ZH0f2GWJAzWmEGytBIhECRnN/PFFq
VwRB4dCCt0bzqRxkiH5lzdgR+xqRe61juQQsMzg+Flv/trpXDqM=
=ac9K
-----END PGP SIGNATURE-----
Merge tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull Intel TDX support from Borislav Petkov:
"Intel Trust Domain Extensions (TDX) support.
This is the Intel version of a confidential computing solution called
Trust Domain Extensions (TDX). This series adds support to run the
kernel as part of a TDX guest. It provides similar guest protections
to AMD's SEV-SNP like guest memory and register state encryption,
memory integrity protection and a lot more.
Design-wise, it differs from AMD's solution considerably: it uses a
software module which runs in a special CPU mode called (Secure
Arbitration Mode) SEAM. As the name suggests, this module serves as
sort of an arbiter which the confidential guest calls for services it
needs during its lifetime.
Just like AMD's SNP set, this series reworks and streamlines certain
parts of x86 arch code so that this feature can be properly
accomodated"
* tag 'x86_tdx_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
x86/tdx: Fix RETs in TDX asm
x86/tdx: Annotate a noreturn function
x86/mm: Fix spacing within memory encryption features message
x86/kaslr: Fix build warning in KASLR code in boot stub
Documentation/x86: Document TDX kernel architecture
ACPICA: Avoid cache flush inside virtual machines
x86/tdx/ioapic: Add shared bit for IOAPIC base address
x86/mm: Make DMA memory shared for TD guest
x86/mm/cpa: Add support for TDX shared memory
x86/tdx: Make pages shared in ioremap()
x86/topology: Disable CPU online/offline control for TDX guests
x86/boot: Avoid #VE during boot for TDX platforms
x86/boot: Set CR0.NE early and keep it set during the boot
x86/acpi/x86/boot: Add multiprocessor wake-up support
x86/boot: Add a trampoline for booting APs via firmware handoff
x86/tdx: Wire up KVM hypercalls
x86/tdx: Port I/O: Add early boot support
x86/tdx: Port I/O: Add runtime hypercalls
x86/boot: Port I/O: Add decompression-time support for TDX
x86/boot: Port I/O: Allow to hook up alternative helpers
...
Rather than waiting until a CPU is first brought online, do the
initialisation of the cpuhp_cpu_state structure for each CPU during the
__init phase. This saves a (small) amount of non-__init memory and
avoids potential confusion about when the cpuhp_cpu_state struct is
valid.
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220411152233.474129-3-steven.price@arm.com
Currently the setting of the 'cpu' member of struct cpuhp_cpu_state in
cpuhp_create() is too late as it is used earlier in _cpu_up().
If kzalloc_node() in __smpboot_create_thread() fails then the rollback will
be done with st->cpu==0 causing CPU0 to be erroneously set to be dying,
causing the scheduler to get mightily confused and throw its toys out of
the pram.
However the cpu number is actually available directly, so simply remove
the 'cpu' member and avoid the problem in the first place.
Fixes: 2ea46c6fc9 ("cpumask/hotplug: Fix cpu_dying() state tracking")
Signed-off-by: Steven Price <steven.price@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20220411152233.474129-2-steven.price@arm.com
Unlike regular VMs, TDX guests use the firmware hand-off wakeup method
to wake up the APs during the boot process. This wakeup model uses a
mailbox to communicate with firmware to bring up the APs. As per the
design, this mailbox can only be used once for the given AP, which means
after the APs are booted, the same mailbox cannot be used to
offline/online the given AP. More details about this requirement can be
found in Intel TDX Virtual Firmware Design Guide, sec titled "AP
initialization in OS" and in sec titled "Hotplug Device".
Since the architecture does not support any method of offlining the
CPUs, disable CPU hotplug support in the kernel.
Since this hotplug disable feature can be re-used by other VM guests,
add a new CC attribute CC_ATTR_HOTPLUG_DISABLED and use it to disable
the hotplug support.
Attempt to offline CPU will fail with -EOPNOTSUPP.
Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20220405232939.73860-25-kirill.shutemov@linux.intel.com
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler build,
from the fast-headers tree - including placeholder *_api.h headers for
later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmI5rg8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hGrw/+M3QOk6fH7G48wjlNnBvcOife6ls+Ni4k
ixOAcF4JKoixO8HieU5vv0A7yf/83tAa6fpeXeMf1hkCGc0NSlmLtuIux+WOmoAL
LzCyDEYfiP8KnVh0A1Tui/lK0+AkGo21O6ADhQE2gh8o2LpslOHQMzvtyekSzeeb
mVxMYQN+QH0m518xdO2D8IQv9ctOYK0eGjmkqdNfntOlytypPZHeNel/tCzwklP/
dElJUjNiSKDlUgTBPtL3DfpoLOI/0mHF2p6NEXvNyULxSOqJTu8pv9Z2ADb2kKo1
0D56iXBDngMi9MHIJLgvzsA8gKzHLFSuPbpODDqkTZCa28vaMB9NYGhJ643NtEie
IXTJEvF1rmNkcLcZlZxo0yjL0fjvPkczjw4Vj27gbrUQeEBfb4mfuI4BRmij63Ep
qEkgQTJhduCqqrQP1rVyhwWZRk1JNcVug+F6N42qWW3fg1xhj0YSrLai2c9nPez6
3Zt98H8YGS1Z/JQomSw48iGXVqfTp/ETI7uU7jqHK8QcjzQ4lFK5H4GZpwuqGBZi
NJJ1l97XMEas+rPHiwMEN7Z1DVhzJLCp8omEj12QU+tGLofxxwAuuOVat3CQWLRk
f80Oya3TLEgd22hGIKDRmHa22vdWnNQyS0S15wJotawBzQf+n3auS9Q3/rh979+t
ES/qvlGxTIs=
=Z8uT
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Cleanups for SCHED_DEADLINE
- Tracing updates/fixes
- CPU Accounting fixes
- First wave of changes to optimize the overhead of the scheduler
build, from the fast-headers tree - including placeholder *_api.h
headers for later header split-ups.
- Preempt-dynamic using static_branch() for ARM64
- Isolation housekeeping mask rework; preperatory for further changes
- NUMA-balancing: deal with CPU-less nodes
- NUMA-balancing: tune systems that have multiple LLC cache domains per
node (eg. AMD)
- Updates to RSEQ UAPI in preparation for glibc usage
- Lots of RSEQ/selftests, for same
- Add Suren as PSI co-maintainer
* tag 'sched-core-2022-03-22' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (81 commits)
sched/headers: ARM needs asm/paravirt_api_clock.h too
sched/numa: Fix boot crash on arm64 systems
headers/prep: Fix header to build standalone: <linux/psi.h>
sched/headers: Only include <linux/entry-common.h> when CONFIG_GENERIC_ENTRY=y
cgroup: Fix suspicious rcu_dereference_check() usage warning
sched/preempt: Tell about PREEMPT_DYNAMIC on kernel headers
sched/topology: Remove redundant variable and fix incorrect type in build_sched_domains
sched/deadline,rt: Remove unused parameter from pick_next_[rt|dl]_entity()
sched/deadline,rt: Remove unused functions for !CONFIG_SMP
sched/deadline: Use __node_2_[pdl|dle]() and rb_first_cached() consistently
sched/deadline: Merge dl_task_can_attach() and dl_cpu_busy()
sched/deadline: Move bandwidth mgmt and reclaim functions into sched class source file
sched/deadline: Remove unused def_dl_bandwidth
sched/tracing: Report TASK_RTLOCK_WAIT tasks as TASK_UNINTERRUPTIBLE
sched/tracing: Don't re-read p->state when emitting sched_switch event
sched/rt: Plug rt_mutex_setprio() vs push_rt_task() race
sched/cpuacct: Remove redundant RCU read lock
sched/cpuacct: Optimize away RCU read lock
sched/cpuacct: Fix charge percpu cpuusage
sched/headers: Reorganize, clean up and optimize kernel/sched/sched.h dependencies
...
For the irq randomness fast pool, rather than having to use expensive
atomics, which were visibly the most expensive thing in the entire irq
handler, simply take care of the extreme edge case of resetting count to
zero in the cpuhp online handler, just after workqueues have been
reenabled. This simplifies the code a bit and lets us use vanilla
variables rather than atomics, and performance should be improved.
As well, very early on when the CPU comes up, while interrupts are still
disabled, we clear out the per-cpu crng and its batches, so that it
always starts with fresh randomness.
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Sultan Alsawaf <sultan@kerneltoast.com>
Cc: Dominik Brodowski <linux@dominikbrodowski.net>
Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Refer to housekeeping APIs using single feature types instead of flags.
This prevents from passing multiple isolation features at once to
housekeeping interfaces, which soon won't be possible anymore as each
isolation features will have their own cpumask.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Phil Auld <pauld@redhat.com>
Link: https://lore.kernel.org/r/20220207155910.527133-5-frederic@kernel.org
To hot unplug a CPU, the idle task on that CPU calls a few layers of C
code before finally leaving the kernel. When KASAN is in use, poisoned
shadow is left around for each of the active stack frames, and when
shadow call stacks are in use. When shadow call stacks (SCS) are in use
the task's saved SCS SP is left pointing at an arbitrary point within
the task's shadow call stack.
When a CPU is offlined than onlined back into the kernel, this stale
state can adversely affect execution. Stale KASAN shadow can alias new
stackframes and result in bogus KASAN warnings. A stale SCS SP is
effectively a memory leak, and prevents a portion of the shadow call
stack being used. Across a number of hotplug cycles the idle task's
entire shadow call stack can become unusable.
We previously fixed the KASAN issue in commit:
e1b77c9298 ("sched/kasan: remove stale KASAN poison after hotplug")
... by removing any stale KASAN stack poison immediately prior to
onlining a CPU.
Subsequently in commit:
f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
... the refactoring left the KASAN and SCS cleanup in one-time idle
thread initialization code rather than something invoked prior to each
CPU being onlined, breaking both as above.
We fixed SCS (but not KASAN) in commit:
63acd42c0d ("sched/scs: Reset the shadow stack when idle_task_exit")
... but as this runs in the context of the idle task being offlined it's
potentially fragile.
To fix these consistently and more robustly, reset the SCS SP and KASAN
shadow of a CPU's idle task immediately before we online that CPU in
bringup_cpu(). This ensures the idle task always has a consistent state
when it is running, and removes the need to so so when exiting an idle
task.
Whenever any thread is created, dup_task_struct() will give the task a
stack which is free of KASAN shadow, and initialize the task's SCS SP,
so there's no need to specially initialize either for idle thread within
init_idle(), as this was only necessary to handle hotplug cycles.
I've tested this on arm64 with:
* gcc 11.1.0, defconfig +KASAN_INLINE, KASAN_STACK
* clang 12.0.0, defconfig +KASAN_INLINE, KASAN_STACK, SHADOW_CALL_STACK
... offlining and onlining CPUS with:
| while true; do
| for C in /sys/devices/system/cpu/cpu*/online; do
| echo 0 > $C;
| echo 1 > $C;
| done
| done
Fixes: f1a0a376ca ("sched/core: Initialize the idle task with preemption disabled")
Reported-by: Qian Cai <quic_qiancai@quicinc.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Qian Cai <quic_qiancai@quicinc.com>
Link: https://lore.kernel.org/lkml/20211115113310.35693-1-mark.rutland@arm.com/
CPU hotplug callbacks can fail and cause a rollback to the previous
state. These failures are silent and therefore hard to debug.
Add pr_debug() to the up and down paths which provide information about the
error code, the CPU and the failed state. The debug printks can be enabled
via kernel command line or sysfs.
[ tglx: Adopt to current mainline, massage printk and changelog ]
Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Qais Yousef <qais.yousef@arm.com>
Link: https://lore.kernel.org/r/20210409055316.1709-1-dongli.zhang@oracle.com
Use DEVICE_ATTR_*() helper instead of plain DEVICE_ATTR,
which makes the code a bit shorter and easier to read.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210527141105.2312-1-yuehaibing@huawei.com
kernel/cpu.c:57: warning: cannot understand function prototype: 'struct cpuhp_cpu_state '
kernel/cpu.c:115: warning: cannot understand function prototype: 'struct cpuhp_step '
kernel/cpu.c:146: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
* cpuhp_invoke_callback _ Invoke the callbacks for a given state
kernel/cpu.c:75: warning: Function parameter or member 'fail' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'cpu' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'node' not described in 'cpuhp_cpu_state'
kernel/cpu.c:75: warning: Function parameter or member 'last' not described in 'cpuhp_cpu_state'
kernel/cpu.c:130: warning: Function parameter or member 'list' not described in 'cpuhp_step'
kernel/cpu.c:130: warning: Function parameter or member 'multi_instance' not described in 'cpuhp_step'
kernel/cpu.c:158: warning: No description found for return value of 'cpuhp_invoke_callback'
kernel/cpu.c:1188: warning: No description found for return value of 'cpu_device_down'
kernel/cpu.c:1400: warning: No description found for return value of 'cpu_device_up'
kernel/cpu.c:1425: warning: No description found for return value of 'bringup_hibernate_cpu'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210809223825.24512-1-rdunlap@infradead.org
Fixes the following W=1 kernel build warning(s):
kernel/cpu.c:1949: warning: Function parameter or member
'name' not described in '__cpuhp_setup_state_cpuslocked'
Signed-off-by: Baokun Li <libaokun1@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20210605063003.681049-1-libaokun1@huawei.com
cpusets delegate the hotplug work to a workqueue to prevent a lock order
inversion vs. the CPU hotplug lock. The work is not flushed before the
hotplug operation returns which creates user visible inconsistent state.
Prevent this by flushing the work after dropping CPU hotplug lock and
before releasing the outer mutex which serializes the CPU hotplug related
sysfs interface operations.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDbJLATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaHxD/9yQI2tNgGcjdQBtg8z9Vvs/j7kG5A0
l8R9zAWSqocXcWMsV8nqjiFcNatHIfuR5Q6vyysDftH2mqKRr5MHa3/m+Ow/VPGu
nqBDAla9M9wQIp8f2yjHHCBB5TjUDUD9htRkhgriCRr31pvhZHTAQKcOS+atLfxG
4FB0/n2hfcJJZtgWWlie4JAWBo8xQn5yOicH/OBs2bEnJSGoaSJrKmCiRCeBUjAr
fYUCMYKxqWfwwJzWQPefG/eVnSaE6fSHAgRFvFFdOOvaSwyFIly9LpP8H5MPOE/p
0rsMQAOwnr9dA+amPp3RNS3ZEMvE/FHamHVcjXaXa76HiunCw+SH866O9z+Ta3gm
P/SvqxusJAVhvyKQk6FsC/hYv9QX1xPsNVQRZ7oZb4ftBj/n964TO6kiCEzuSX0c
MJBeAfDbz20hDQol+NO8/JeKJXU2utLXQ+oyjzIkHtCka9PFt0fGlgfUzh3VtbOC
h/vlhtrr2DsWy9NbalkzjeyaIoetbQPsANs0iGBBFZ3hNyNmYlxB5ujTcUP+//vm
NEOBMmYOsxe3FnWPKnRnJmEEMJn23JuxoswZhHenSHvOHBs8eJkEhte0EK5lE1Fo
/JalFzKBb1WQqKkfDrde3WkevOHrkvBgs84PJDFJcl7uZzsLYCLhPLZC7JNNnrO0
IrWrpFOX9qj1rg==
=Lzoh
-----END PGP SIGNATURE-----
Merge tag 'smp-urgent-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Thomas Gleixner:
"A fix for the CPU hotplug and cpusets interaction:
cpusets delegate the hotplug work to a workqueue to prevent a lock
order inversion vs. the CPU hotplug lock. The work is not flushed
before the hotplug operation returns which creates user visible
inconsistent state. Prevent this by flushing the work after dropping
CPU hotplug lock and before releasing the outer mutex which serializes
the CPU hotplug related sysfs interface operations"
* tag 'smp-urgent-2021-06-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu/hotplug: Cure the cpusets trainwreck
Alexey and Joshua tried to solve a cpusets related hotplug problem which is
user space visible and results in unexpected behaviour for some time after
a CPU has been plugged in and the corresponding uevent was delivered.
cpusets delegate the hotplug work (rebuilding cpumasks etc.) to a
workqueue. This is done because the cpusets code has already a lock
nesting of cgroups_mutex -> cpu_hotplug_lock. A synchronous callback or
waiting for the work to finish with cpu_hotplug_lock held can and will
deadlock because that results in the reverse lock order.
As a consequence the uevent can be delivered before cpusets have consistent
state which means that a user space invocation of sched_setaffinity() to
move a task to the plugged CPU fails up to the point where the scheduled
work has been processed.
The same is true for CPU unplug, but that does not create user observable
failure (yet).
It's still inconsistent to claim that an operation is finished before it
actually is and that's the real issue at hand. uevents just make it
reliably observable.
Obviously the problem should be fixed in cpusets/cgroups, but untangling
that is pretty much impossible because according to the changelog of the
commit which introduced this 8 years ago:
3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
the lock order cgroups_mutex -> cpu_hotplug_lock is a design decision and
the whole code is built around that.
So bite the bullet and invoke the relevant cpuset function, which waits for
the work to finish, in _cpu_up/down() after dropping cpu_hotplug_lock and
only when tasks are not frozen by suspend/hibernate because that would
obviously wait forever.
Waiting there with cpu_add_remove_lock, which is protecting the present
and possible CPU maps, held is not a problem at all because neither work
queues nor cpusets/cgroups have any lockchains related to that lock.
Waiting in the hotplug machinery is not problematic either because there
are already state callbacks which wait for hardware queues to drain. It
makes the operations slightly slower, but hotplug is slow anyway.
This ensures that state is consistent before returning from a hotplug
up/down operation. It's still inconsistent during the operation, but that's
a different story.
Add a large comment which explains why this is done and why this is not a
dump ground for the hack of the day to work around half thought out locking
schemes. Document also the implications vs. hotplug operations and
serialization or the lack of it.
Thanks to Alexy and Joshua for analyzing why this temporary
sched_setaffinity() failure happened.
Fixes: 3a5a6d0c2b03("cpuset: don't nest cgroup_mutex inside get_online_cpus()")
Reported-by: Alexey Klimov <aklimov@redhat.com>
Reported-by: Joshua Baker <jobaker@redhat.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Alexey Klimov <aklimov@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/87tuowcnv3.ffs@nanos.tec.linutronix.de
It is unnecessary to invoke per_cpu_ptr() everytime to access cpuhp_state.
Use the available pointer instead.
Signed-off-by: Yuan ZhaoXiong <yuanzhaoxiong@baidu.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/1621776690-13264-1-git-send-email-yuanzhaoxiong@baidu.com
Vincent reported that for states with a NULL startup/teardown function
we do not call cpuhp_invoke_callback() (because there is none) and as
such we'll not update the cpu_dying() state.
The stale cpu_dying() can eventually lead to triggering BUG().
Rectify this by updating cpu_dying() in the exact same places the
hotplug machinery tracks its directional state, namely
cpuhp_set_state() and cpuhp_reset_state().
Reported-by: Vincent Donnefort <vincent.donnefort@arm.com>
Suggested-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/YH7r+AoQEReSvxBI@hirez.programming.kicks-ass.net
Introduce a cpumask that indicates (for each CPU) what direction the
CPU hotplug is currently going. Notably, it tracks rollbacks. Eg. when
an up fails and we do a roll-back down, it will accurately reflect the
direction.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20210310150109.151441252@infradead.org
Factorizing and unifying cpuhp callback range invocations, especially for
the hotunplug path, where two different ways of decrementing were used. The
first one, decrements before the callback is called:
cpuhp_thread_fun()
state = st->state;
st->state--;
cpuhp_invoke_callback(state);
The second one, after:
take_down_cpu()|cpuhp_down_callbacks()
cpuhp_invoke_callback(st->state);
st->state--;
This is problematic for rolling back the steps in case of error, as
depending on the decrement, the rollback will start from N or N-1. It also
makes tracing inconsistent, between steps run in the cpuhp thread and
the others.
Additionally, avoid useless cpuhp_thread_fun() loops by skipping empty
steps.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210216103506.416286-4-vincent.donnefort@arm.com
The atomic states (between CPUHP_AP_IDLE_DEAD and CPUHP_AP_ONLINE) are
triggered by the CPUHP_BRINGUP_CPU step. If the latter fails, no atomic
state can be rolled back.
DEAD callbacks too can't fail and disallow recovery. As a consequence,
during hotunplug, the fail injection interface should prohibit all states
from CPUHP_BRINGUP_CPU to CPUHP_ONLINE.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210216103506.416286-3-vincent.donnefort@arm.com
Currently, the only way of resetting the fail injection is to trigger a
hotplug, hotunplug or both. This is rather annoying for testing
and, as the default value for this file is -1, it seems pretty natural to
let a user write it.
Signed-off-by: Vincent Donnefort <vincent.donnefort@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210216103506.416286-2-vincent.donnefort@arm.com
This commit adds a lockdep_is_cpus_held() function to verify that the
proper locks are held and that various operations are running in the
correct context.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Paul E. McKenney <paulmck@kernel.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
- migrate_disable/enable() support which originates from the RT tree and
is now a prerequisite for the new preemptible kmap_local() API which aims
to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XwK4THHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoX28D/9cVrvziSQGfBfuQWnUiw8iOIq1QBa2
Me+Tvenhfrlt7xU6rbP9ciFu7eTN+fS06m5uQPGI+t22WuJmHzbmw1bJVXfkvYfI
/QoU+Hg7DkDAn1p7ZKXh0dRkV0nI9ixxSHl0E+Zf1ATBxCUMV2SO85flg6z/4qJq
3VWUye0dmR7/bhtkIjv5rwce9v2JB2g1AbgYXYTW9lHVoUdGoMSdiZAF4tGyHLnx
sJ6DMqQ+k+dmPyYO0z5MTzjW/fXit4n9w2e3z9TvRH/uBu58WSW1RBmQYX6aHBAg
dhT9F4lvTs6lJY23x5RSFWDOv6xAvKF5a0xfb8UZcyH5EoLYrPRvm42a0BbjdeRa
u0z7LbwIlKA+RFdZzFZWz8UvvO0ljyMjmiuqZnZ5dY9Cd80LSBuxrWeQYG0qg6lR
Y2povhhCepEG+q8AXIe2YjHKWKKC1s/l/VY3CNnCzcd21JPQjQ4Z5eWGmHif5IED
CntaeFFhZadR3w02tkX35zFmY3w4soKKrbI4EKWrQwd+cIEQlOSY7dEPI/b5BbYj
MWAb3P4EG9N77AWTNmbhK4nN0brEYb+rBbCA+5dtNBVhHTxAC7OTWElJOC2O66FI
e06dREjvwYtOkRUkUguWwErbIai2gJ2MH0VILV3hHoh64oRk7jjM8PZYnjQkdptQ
Gsq0rJW5iiu/OQ==
=Oz1V
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Thomas Gleixner:
- migrate_disable/enable() support which originates from the RT tree
and is now a prerequisite for the new preemptible kmap_local() API
which aims to replace kmap_atomic().
- A fair amount of topology and NUMA related improvements
- Improvements for the frequency invariant calculations
- Enhanced robustness for the global CPU priority tracking and decision
making
- The usual small fixes and enhancements all over the place
* tag 'sched-core-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (61 commits)
sched/fair: Trivial correction of the newidle_balance() comment
sched/fair: Clear SMT siblings after determining the core is not idle
sched: Fix kernel-doc markup
x86: Print ratio freq_max/freq_base used in frequency invariance calculations
x86, sched: Use midpoint of max_boost and max_P for frequency invariance on AMD EPYC
x86, sched: Calculate frequency invariance for AMD systems
irq_work: Optimize irq_work_single()
smp: Cleanup smp_call_function*()
irq_work: Cleanup
sched: Limit the amount of NUMA imbalance that can exist at fork time
sched/numa: Allow a floating imbalance between NUMA nodes
sched: Avoid unnecessary calculation of load imbalance at clone time
sched/numa: Rename nr_running and break out the magic number
sched: Make migrate_disable/enable() independent of RT
sched/topology: Condition EAS enablement on FIE support
arm64: Rebuild sched domains on invariance status changes
sched/topology,schedutil: Wrap sched domains rebuild
sched/uclamp: Allow to reset a task uclamp constraint value
sched/core: Fix typos in comments
Documentation: scheduler: fix information on arch SD flags, sched_domain and sched_debug
...
powerpc/64s keeps a counter in the mm which counts bits set in
mm_cpumask as well as other things. This means it can't use generic code
to clear bits out of the mask and doesn't adjust the arch specific
counter.
Add an arch override that allows powerpc/64s to use
clear_tasks_mm_cpumask().
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20201126102530.691335-4-npiggin@gmail.com
With the new mechanism which kicks tasks off the outgoing CPU at the end of
schedule() the situation on an outgoing CPU right before the stopper thread
brings it down completely is:
- All user tasks and all unbound kernel threads have either been migrated
away or are not running and the next wakeup will move them to a online CPU.
- All per CPU kernel threads, except cpu hotplug thread and the stopper
thread have either been unbound or parked by the responsible CPU hotplug
callback.
That means that at the last step before the stopper thread is invoked the
cpu hotplug thread is the last legitimate running task on the outgoing
CPU.
Add a final wait step right before the stopper thread is kicked which
ensures that any still running tasks on the way to park or on the way to
kick themself of the CPU are either sleeping or gone.
This allows to remove the migrate_tasks() crutch in sched_cpu_dying(). If
sched_cpu_dying() detects that there is still another running task aside of
the stopper thread then it will explode with the appropriate fireworks.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Link: https://lkml.kernel.org/r/20201023102346.547163969@infradead.org
- Optimize the task wakeup CPU selection logic, to improve scalability and
reduce wakeup latency spikes
- PELT enhancements
- CFS bandwidth handling fixes
- Optimize the wakeup path by remove rq->wake_list and replacing it with ->ttwu_pending
- Optimize IPI cross-calls by making flush_smp_call_function_queue()
process sync callbacks first.
- Misc fixes and enhancements.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAl7WPL0RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1i0ThAAs0fbvMzNJ5SWFdwOQ4KZIlA+Im4dEBMK
sx/XAZqa/hGxvkm1jS0RDVQl1V1JdOlru5UF4C42ctnAFGtBBHDriO5rn9oCpkSw
DAoLc4eZqzldIXN6sDZ0xMtC14Eu15UAP40OyM4qxBc4GqGlOnnale6Vhn+n+pLQ
jAuZlMJIkmmzeA6cuvtultevrVh+QUqJ/5oNUANlTER4OM48umjr5rNTOb8cIW53
9K3vbS3nmqSvJuIyqfRFoMy5GFM6+Jj2+nYuq8aTuYLEtF4qqWzttS3wBzC9699g
XYRKILkCK8ZP4RB5Ps/DIKj6maZGZoICBxTJEkIgXujJlxlKKTD3mddk+0LBXChW
Ijznanxn67akoAFpqi/Dnkhieg7cUrE9v1OPRS2J0xy550synSPFcSgOK3viizga
iqbjptY4scUWkCwHQNjABerxc7MWzrwbIrRt+uNvCaqJLweUh0GnEcV5va8R+4I8
K20XwOdrzuPLo5KdDWA/BKOEv49guHZDvoykzlwMlR3gFfwHS/UsjzmSQIWK3gZG
9OMn8ibO2f1OzhRcEpDLFzp7IIj6NJmPFVSW+7xHyL9/vTveUx3ZXPLteb2qxJVP
BYPsduVx8YeGRBlLya0PJriB23ajQr0lnHWo15g0uR9o/0Ds1ephcymiF3QJmCaA
To3CyIuQN8M=
=C2OP
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"The changes in this cycle are:
- Optimize the task wakeup CPU selection logic, to improve
scalability and reduce wakeup latency spikes
- PELT enhancements
- CFS bandwidth handling fixes
- Optimize the wakeup path by remove rq->wake_list and replacing it
with ->ttwu_pending
- Optimize IPI cross-calls by making flush_smp_call_function_queue()
process sync callbacks first.
- Misc fixes and enhancements"
* tag 'sched-core-2020-06-02' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
irq_work: Define irq_work_single() on !CONFIG_IRQ_WORK too
sched/headers: Split out open-coded prototypes into kernel/sched/smp.h
sched: Replace rq::wake_list
sched: Add rq::ttwu_pending
irq_work, smp: Allow irq_work on call_single_queue
smp: Optimize send_call_function_single_ipi()
smp: Move irq_work_run() out of flush_smp_call_function_queue()
smp: Optimize flush_smp_call_function_queue()
sched: Fix smp_call_function_single_async() usage for ILB
sched/core: Offload wakee task activation if it the wakee is descheduling
sched/core: Optimize ttwu() spinning on p->on_cpu
sched: Defend cfs and rt bandwidth quota against overflow
sched/cpuacct: Fix charge cpuacct.usage_sys
sched/fair: Replace zero-length array with flexible-array
sched/pelt: Sync util/runnable_sum with PELT window when propagating
sched/cpuacct: Use __this_cpu_add() instead of this_cpu_ptr()
sched/fair: Optimize enqueue_task_fair()
sched: Make scheduler_ipi inline
sched: Clean up scheduler_ipi()
sched/core: Simplify sched_init()
...
The refactored function is no longer required as the codepaths that call
freeze_secondary_cpus() are all suspend/resume related now.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lkml.kernel.org/r/20200430114004.17477-2-qais.yousef@arm.com
The single user could have called freeze_secondary_cpus() directly.
Since this function was a source of confusion, remove it as it's
just a pointless wrapper.
While at it, rename enable_nonboot_cpus() to thaw_secondary_cpus() to
preserve the naming symmetry.
Done automatically via:
git grep -l enable_nonboot_cpus | xargs sed -i 's/enable_nonboot_cpus/thaw_secondary_cpus/g'
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Link: https://lkml.kernel.org/r/20200430114004.17477-1-qais.yousef@arm.com
In the CPU-offline process, it calls mmdrop() after idle entry and the
subsequent call to cpuhp_report_idle_dead(). Once execution passes the
call to rcu_report_dead(), RCU is ignoring the CPU, which results in
lockdep complaining when mmdrop() uses RCU from either memcg or
debugobjects below.
Fix it by cleaning up the active_mm state from BP instead. Every arch
which has CONFIG_HOTPLUG_CPU should have already called idle_task_exit()
from AP. The only exception is parisc because it switches them to
&init_mm unconditionally (see smp_boot_one_cpu() and smp_cpu_init()),
but the patch will still work there because it calls mmgrab(&init_mm) in
smp_cpu_init() and then should call mmdrop(&init_mm) in finish_cpu().
WARNING: suspicious RCU usage
-----------------------------
kernel/workqueue.c:710 RCU or wq_pool_mutex should be held!
other info that might help us debug this:
RCU used illegally from offline CPU!
Call Trace:
dump_stack+0xf4/0x164 (unreliable)
lockdep_rcu_suspicious+0x140/0x164
get_work_pool+0x110/0x150
__queue_work+0x1bc/0xca0
queue_work_on+0x114/0x120
css_release+0x9c/0xc0
percpu_ref_put_many+0x204/0x230
free_pcp_prepare+0x264/0x570
free_unref_page+0x38/0xf0
__mmdrop+0x21c/0x2c0
idle_task_exit+0x170/0x1b0
pnv_smp_cpu_kill_self+0x38/0x2e0
cpu_die+0x48/0x64
arch_cpu_idle_dead+0x30/0x50
do_idle+0x2f4/0x470
cpu_startup_entry+0x38/0x40
start_secondary+0x7a8/0xa80
start_secondary_resume+0x10/0x14
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Qian Cai <cai@lca.pw>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lkml.kernel.org/r/20200401214033.8448-1-cai@lca.pw
- Support for locked CSD objects in smp_call_function_single_async()
which allows to simplify callsites in the scheduler core and MIPS
- Treewide consolidation of CPU hotplug functions which ensures the
consistency between the sysfs interface and kernel state. The low level
functions cpu_up/down() are now confined to the core code and not
longer accessible from random code.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl6B9VQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYodCyD/0WFYAe7LkOfNjkbLa0IeuyLjF9rnCi
ilcSXMLpaVwwoQvm7MopwkXUDdmEIyeJ0B641j3mC3AKCRap4+O36H2IEg2byrj7
twOvQNCfxpVVmCCD11FTH9aQa74LEB6AikTgjevhrRWj6eHsal7c2Ak26AzCgrt+
0eEkOAOWJbLAlbIiPdHlCZ3TMldcs3gg+lRSYd5QCGQVkZFnwpXzyOvpyJEUGGbb
R/JuvwJoLhRMiYAJDILoQQQg/J07ODuivse/R8PWaH2djkn+2NyRGrD794PhyyOg
QoTU0ZrYD3Z48ACXv+N3jLM7wXMcFzjYtr1vW1E3O/YGA7GVIC6XHGbMQ7tEihY0
ajtwq8DcnpKtuouviYnf7NuKgqdmJXkaZjz3Gms6n8nLXqqSVwuQELWV2CXkxNe6
9kgnnKK+xXMOGI4TUhN8bejvkXqRCmKMeQJcWyf+7RA9UIhAJw5o7WGo8gXfQWUx
tazCqDy/inYjqGxckW615fhi2zHfemlYTbSzIGOuMB1TEPKFcrgYAii/VMsYHQVZ
5amkYUXGQ5brlCOzOn38lzp5OkALBnFzD7xgvOcQgWT3ynVpdqADfBytXiEEHh4J
KSkSgSSRcS58397nIxnDcJgJouHLvAWYyPZ4UC6mfynuQIic31qMHGVqwdbEKMY3
4M5dGgqIfOBgYw==
=jwCg
-----END PGP SIGNATURE-----
Merge tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core SMP updates from Thomas Gleixner:
"CPU (hotplug) updates:
- Support for locked CSD objects in smp_call_function_single_async()
which allows to simplify callsites in the scheduler core and MIPS
- Treewide consolidation of CPU hotplug functions which ensures the
consistency between the sysfs interface and kernel state. The low
level functions cpu_up/down() are now confined to the core code and
not longer accessible from random code"
* tag 'smp-core-2020-03-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (22 commits)
cpu/hotplug: Ignore pm_wakeup_pending() for disable_nonboot_cpus()
cpu/hotplug: Hide cpu_up/down()
cpu/hotplug: Move bringup of secondary CPUs out of smp_init()
torture: Replace cpu_up/down() with add/remove_cpu()
firmware: psci: Replace cpu_up/down() with add/remove_cpu()
xen/cpuhotplug: Replace cpu_up/down() with device_online/offline()
parisc: Replace cpu_up/down() with add/remove_cpu()
sparc: Replace cpu_up/down() with add/remove_cpu()
powerpc: Replace cpu_up/down() with add/remove_cpu()
x86/smp: Replace cpu_up/down() with add/remove_cpu()
arm64: hibernate: Use bringup_hibernate_cpu()
cpu/hotplug: Provide bringup_hibernate_cpu()
arm64: Use reboot_cpu instead of hardconding it to 0
arm64: Don't use disable_nonboot_cpus()
ARM: Use reboot_cpu instead of hardcoding it to 0
ARM: Don't use disable_nonboot_cpus()
ia64: Replace cpu_down() with smp_shutdown_nonboot_cpus()
cpu/hotplug: Create a new function to shutdown nonboot cpus
cpu/hotplug: Add new {add,remove}_cpu() functions
sched/core: Remove rq.hrtick_csd_pending
...
A recent change to freeze_secondary_cpus() which added an early abort if a
wakeup is pending missed the fact that the function is also invoked for
shutdown, reboot and kexec via disable_nonboot_cpus().
In case of disable_nonboot_cpus() the wakeup event needs to be ignored as
the purpose is to terminate the currently running kernel.
Add a 'suspend' argument which is only set when the freeze is in context of
a suspend operation. If not set then an eventually pending wakeup event is
ignored.
Fixes: a66d955e91 ("cpu/hotplug: Abort disabling secondary CPUs if wakeup is pending")
Reported-by: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Pavankumar Kondeti <pkondeti@codeaurora.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/874kuaxdiz.fsf@nanos.tec.linutronix.de
Use separate functions for the device core to bring a CPU up and down.
Users outside the device core must use add/remove_cpu() which will take
care of extra housekeeping work like keeping sysfs in sync.
Make cpu_up/down() static and replace the extra layer of indirection.
[ tglx: Removed the extra wrapper functions and adjusted function names ]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-18-qais.yousef@arm.com
This is the last direct user of cpu_up() before it can become an internal
implementation detail of the cpu subsystem.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-17-qais.yousef@arm.com
arm64 uses cpu_up() in the resume from hibernation code to ensure that the
CPU on which the system hibernated is online. Provide a core function for
this.
[ tglx: Split out from the combo arm64 patch ]
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lkml.kernel.org/r/20200323135110.30522-9-qais.yousef@arm.com
This function will be used later in machine_shutdown() for some
architectures.
disable_nonboot_cpus() is not safe to use when doing machine_down(),
because it relies on freeze_secondary_cpus() which in turn is a
suspend/resume related freeze and could abort if the logic detects any
pending activities that can prevent finishing the offlining process.
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20200323135110.30522-3-qais.yousef@arm.com
The new functions use device_{online,offline}() which are userspace safe.
This is in preparation to move cpu_{up, down} kernel users to use a safer
interface that is not racy with userspace.
Suggested-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lkml.kernel.org/r/20200323135110.30522-2-qais.yousef@arm.com
As preparation for replacing the embedded rwsem, give percpu-rwsem its
own lockdep_map.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Davidlohr Bueso <dbueso@suse.de>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Waiman Long <longman@redhat.com>
Tested-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20200131151539.927625541@infradead.org
Pull scheduler updates from Ingo Molnar:
"These were the main changes in this cycle:
- More -rt motivated separation of CONFIG_PREEMPT and
CONFIG_PREEMPTION.
- Add more low level scheduling topology sanity checks and warnings
to filter out nonsensical topologies that break scheduling.
- Extend uclamp constraints to influence wakeup CPU placement
- Make the RT scheduler more aware of asymmetric topologies and CPU
capacities, via uclamp metrics, if CONFIG_UCLAMP_TASK=y
- Make idle CPU selection more consistent
- Various fixes, smaller cleanups, updates and enhancements - please
see the git log for details"
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (58 commits)
sched/fair: Define sched_idle_cpu() only for SMP configurations
sched/topology: Assert non-NUMA topology masks don't (partially) overlap
idle: fix spelling mistake "iterrupts" -> "interrupts"
sched/fair: Remove redundant call to cpufreq_update_util()
sched/psi: create /proc/pressure and /proc/pressure/{io|memory|cpu} only when psi enabled
sched/fair: Fix sgc->{min,max}_capacity calculation for SD_OVERLAP
sched/fair: calculate delta runnable load only when it's needed
sched/cputime: move rq parameter in irqtime_account_process_tick
stop_machine: Make stop_cpus() static
sched/debug: Reset watchdog on all CPUs while processing sysrq-t
sched/core: Fix size of rq::uclamp initialization
sched/uclamp: Fix a bug in propagating uclamp value in new cgroups
sched/fair: Load balance aggressively for SCHED_IDLE CPUs
sched/fair : Improve update_sd_pick_busiest for spare capacity case
watchdog: Remove soft_lockup_hrtimer_cnt and related code
sched/rt: Make RT capacity-aware
sched/fair: Make EAS wakeup placement consider uclamp restrictions
sched/fair: Make task_fits_capacity() consider uclamp restrictions
sched/uclamp: Rename uclamp_util_with() into uclamp_rq_util_with()
sched/uclamp: Make uclamp util helpers use and return UL values
...
When CONFIG_SYSFS is disabled, but CONFIG_HOTPLUG_SMT is enabled,
the kernel fails to link:
arch/x86/power/cpu.o: In function `hibernate_resume_nonboot_cpu_disable':
(.text+0x38d): undefined reference to `cpuhp_smt_enable'
arch/x86/power/hibernate.o: In function `arch_resume_nosmt':
hibernate.c:(.text+0x291): undefined reference to `cpuhp_smt_enable'
hibernate.c:(.text+0x29c): undefined reference to `cpuhp_smt_disable'
Move the exported functions out of the #ifdef section into its
own with the correct conditions.
The patch that caused this is marked for stable backports, so
this one may need to be backported as well.
Fixes: ec527c3180 ("x86/power: Fix 'nosmt' vs hibernation triple fault during resume")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Jiri Kosina <jkosina@suse.cz>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20191210195614.786555-1-arnd@arndb.de