78dfbf2f8f
42714 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
1dd387668d |
tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
commit 3d07fa1dd19035eb0b13ae6697efd5caa9033e74 upstream. The pipe cpumask used to serialize opens between the main and percpu trace pipes is not zeroed or initialized. This can result in spurious -EBUSY returns if underlying memory is not fully zeroed. This has been observed by immediate failure to read the main trace_pipe file on an otherwise newly booted and idle system: # cat /sys/kernel/debug/tracing/trace_pipe cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy Zero the allocation of pipe_cpumask to avoid the problem. Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com Cc: stable@vger.kernel.org Fixes: c2489bb7e6be ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes") Reviewed-by: Zheng Yejian <zhengyejian1@huawei.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
c96c67991a |
bpf: Fix issue in verifying allow_ptr_leaks
commit d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2 upstream. After we converted the capabilities of our networking-bpf program from cap_sys_admin to cap_net_admin+cap_bpf, our networking-bpf program failed to start. Because it failed the bpf verifier, and the error log is "R3 pointer comparison prohibited". A simple reproducer as follows, SEC("cls-ingress") int ingress(struct __sk_buff *skb) { struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr); if ((long)(iph + 1) > (long)skb->data_end) return TC_ACT_STOLEN; return TC_ACT_OK; } Per discussion with Yonghong and Alexei [1], comparison of two packet pointers is not a pointer leak. This patch fixes it. Our local kernel is 6.1.y and we expect this fix to be backported to 6.1.y, so stable is CCed. [1]. https://lore.kernel.org/bpf/CAADnVQ+Nmspr7Si+pxWn8zkE7hX-7s93ugwC+94aXSy4uQ9vBg@mail.gmail.com/ Suggested-by: Yonghong Song <yonghong.song@linux.dev> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20230823020703.3790-2-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
fea9dd8653 |
cpu/hotplug: Prevent self deadlock on CPU hot-unplug
commit 2b8272ff4a70b866106ae13c36be7ecbef5d5da2 upstream. Xiongfeng reported and debugged a self deadlock of the task which initiates and controls a CPU hot-unplug operation vs. the CFS bandwidth timer. CPU1 CPU2 T1 sets cfs_quota starts hrtimer cfs_bandwidth 'period_timer' T1 is migrated to CPU2 T1 initiates offlining of CPU1 Hotplug operation starts ... 'period_timer' expires and is re-enqueued on CPU1 ... take_cpu_down() CPU1 shuts down and does not handle timers anymore. They have to be migrated in the post dead hotplug steps by the control task. T1 runs the post dead offline operation T1 is scheduled out T1 waits for 'period_timer' to expire T1 waits there forever if it is scheduled out before it can execute the hrtimer offline callback hrtimers_dead_cpu(). Cure this by delegating the hotplug control operation to a worker thread on an online CPU. This takes the initiating user space task, which might be affected by the bandwidth timer, completely out of the picture. Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Yu Liao <liaoyu15@huawei.com> Acked-by: Vincent Guittot <vincent.guittot@linaro.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
2344b13976 |
printk: ringbuffer: Fix truncating buffer size min_t cast
commit 53e9e33ede37a247d926db5e4a9e56b55204e66c upstream.
If an output buffer size exceeded U16_MAX, the min_t(u16, ...) cast in
copy_data() was causing writes to truncate. This manifested as output
bytes being skipped, seen as %NUL bytes in pstore dumps when the available
record size was larger than 65536. Fix the cast to no longer truncate
the calculation.
Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Reported-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Link: https://lore.kernel.org/lkml/d8bb1ec7-a4c5-43a2-9de0-9643a70b899f@linux.microsoft.com/
Fixes:
|
||
|
74c85396bd |
tracing: Fix race issue between cpu buffer write and swap
[ Upstream commit 3163f635b20e9e1fb4659e74f47918c9dddfe64e ]
Warning happened in rb_end_commit() at code:
if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))
WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
rb_commit+0x402/0x4a0
Call Trace:
ring_buffer_unlock_commit+0x42/0x250
trace_buffer_unlock_commit_regs+0x3b/0x250
trace_event_buffer_commit+0xe5/0x440
trace_event_buffer_reserve+0x11c/0x150
trace_event_raw_event_sched_switch+0x23c/0x2c0
__traceiter_sched_switch+0x59/0x80
__schedule+0x72b/0x1580
schedule+0x92/0x120
worker_thread+0xa0/0x6f0
It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:
Write on CPU 0 Swap buffer by per_cpu/cpu0/snapshot on CPU 1
-------- --------
tracing_snapshot_write()
[...]
ring_buffer_lock_reserve()
cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
[...]
rb_reserve_next_event()
[...]
ring_buffer_swap_cpu()
if (local_read(&cpu_buffer_a->committing))
goto out_dec;
if (local_read(&cpu_buffer_b->committing))
goto out_dec;
buffer_a->buffers[cpu] = cpu_buffer_b;
buffer_b->buffers[cpu] = cpu_buffer_a;
// 2. cpu_buffer has swapped here.
rb_start_commit(cpu_buffer);
if (unlikely(READ_ONCE(cpu_buffer->buffer)
!= buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
[...] // has not changed here.
return NULL;
}
cpu_buffer_b->buffer = buffer_a;
cpu_buffer_a->buffer = buffer_b;
[...]
// 4. Reserve event from 'cpu_buffer_a'.
ring_buffer_unlock_commit()
[...]
cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
rb_commit(cpu_buffer)
rb_end_commit() // 6. WARN for the wrong 'committing' state !!!
Based on above analysis, we can easily reproduce by following testcase:
``` bash
#!/bin/bash
dmesg -n 7
sysctl -w kernel.panic_on_warn=1
TR=/sys/kernel/tracing
echo 7 > ${TR}/buffer_size_kb
echo "sched:sched_switch" > ${TR}/set_event
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
while [ true ]; do
echo 1 > ${TR}/per_cpu/cpu0/snapshot
done &
```
To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.
Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes:
|
||
|
fb34716c9e |
tracing: Remove extra space at the end of hwlat_detector/mode
[ Upstream commit 2cf0dee989a8b2501929eaab29473b6b1fa11057 ]
Space is printed after each mode value including the last one:
$ echo \"$(sudo cat /sys/kernel/tracing/hwlat_detector/mode)\"
"none [round-robin] per-cpu "
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Link: https://lore.kernel.org/linux-trace-kernel/20230825103432.7750-1-m.kobuk@ispras.ru
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes:
|
||
|
55a448e8d8 |
tick/rcu: Fix false positive "softirq work is pending" messages
[ Upstream commit 96c1fa04f089a7e977a44e4e8fdc92e81be20bef ] In commit |
||
|
848cd6f24a |
cgroup:namespace: Remove unused cgroup_namespaces_init()
[ Upstream commit 82b90b6c5b38e457c7081d50dff11ecbafc1e61a ]
cgroup_namspace_init() just return 0. Therefore, there is no need to
call it during start_kernel. Just remove it.
Fixes:
|
||
|
8199a46af2 |
cgroup/cpuset: Inherit parent's load balance state in v2
[ Upstream commit c8c926200c55454101f072a4b16c9ff5b8c9e56f ] Since commit |
||
|
3108f7c788 |
PCI: Allow drivers to request exclusive config regions
[ Upstream commit 278294798ac9118412c9624a801d3f20f2279363 ] PCI config space access from user space has traditionally been unrestricted with writes being an understood risk for device operation. Unfortunately, device breakage or odd behavior from config writes lacks indicators that can leave driver writers confused when evaluating failures. This is especially true with the new PCIe Data Object Exchange (DOE) mailbox protocol where backdoor shenanigans from user space through things such as vendor defined protocols may affect device operation without complete breakage. A prior proposal restricted read and writes completely.[1] Greg and Bjorn pointed out that proposal is flawed for a couple of reasons. First, lspci should always be allowed and should not interfere with any device operation. Second, setpci is a valuable tool that is sometimes necessary and it should not be completely restricted.[2] Finally methods exist for full lock of device access if required. Even though access should not be restricted it would be nice for driver writers to be able to flag critical parts of the config space such that interference from user space can be detected. Introduce pci_request_config_region_exclusive() to mark exclusive config regions. Such regions trigger a warning and kernel taint if accessed via user space. Create pci_warn_once() to restrict the user from spamming the log. [1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/ [2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/ Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://lore.kernel.org/r/20220926215711.2893286-2-ira.weiny@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com> Stable-dep-of: 5e70d0acf082 ("PCI: Add locking to RMW PCI Express Capability Register accessors") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
9ca08adb75 |
audit: fix possible soft lockup in __audit_inode_child()
[ Upstream commit b59bc6e37237e37eadf50cd5de369e913f524463 ]
Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
too many PATH records maybe cause soft lockup.
For example:
1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
2. auditctl -a exit,always -S open -k key
3. sysctl -w kernel.watchdog_thresh=5
4. mkdir /sys/kernel/debug/tracing/instances/test
There may be a soft lockup as follows:
watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
Kernel panic - not syncing: softlockup: hung tasks
Call trace:
dump_backtrace+0x0/0x30c
show_stack+0x20/0x30
dump_stack+0x11c/0x174
panic+0x27c/0x494
watchdog_timer_fn+0x2bc/0x390
__run_hrtimer+0x148/0x4fc
__hrtimer_run_queues+0x154/0x210
hrtimer_interrupt+0x2c4/0x760
arch_timer_handler_phys+0x48/0x60
handle_percpu_devid_irq+0xe0/0x340
__handle_domain_irq+0xbc/0x130
gic_handle_irq+0x78/0x460
el1_irq+0xb8/0x140
__audit_inode_child+0x240/0x7bc
tracefs_create_file+0x1b8/0x2a0
trace_create_file+0x18/0x50
event_create_dir+0x204/0x30c
__trace_add_new_event+0xac/0x100
event_trace_add_tracer+0xa0/0x130
trace_array_create_dir+0x60/0x140
trace_array_create+0x1e0/0x370
instance_mkdir+0x90/0xd0
tracefs_syscall_mkdir+0x68/0xa0
vfs_mkdir+0x21c/0x34c
do_mkdirat+0x1b4/0x1d4
__arm64_sys_mkdirat+0x4c/0x60
el0_svc_common.constprop.0+0xa8/0x240
do_el0_svc+0x8c/0xc0
el0_svc+0x20/0x30
el0_sync_handler+0xb0/0xb4
el0_sync+0x160/0x180
Therefore, we add cond_resched() to __audit_inode_child() to fix it.
Fixes:
|
||
|
912310dd84 |
bpf: Fix an error in verifying a field in a union
[ Upstream commit 33937607efa050d9e237e0c4ac4ada02d961c466 ]
We are utilizing BPF LSM to monitor BPF operations within our container
environment. When we add support for raw_tracepoint, it hits below
error.
; (const void *)attr->raw_tracepoint.name);
27: (79) r3 = *(u64 *)(r2 +0)
access beyond the end of member map_type (mend:4) in struct (anon) with off 0 size 8
It can be reproduced with below BPF prog.
SEC("lsm/bpf")
int BPF_PROG(bpf_audit, int cmd, union bpf_attr *attr, unsigned int size)
{
switch (cmd) {
case BPF_RAW_TRACEPOINT_OPEN:
bpf_printk("raw_tracepoint is %s", attr->raw_tracepoint.name);
break;
default:
break;
}
return 0;
}
The reason is that when accessing a field in a union, such as bpf_attr,
if the field is located within a nested struct that is not the first
member of the union, it can result in incorrect field verification.
union bpf_attr {
struct {
__u32 map_type; <<<< Actually it will find that field.
__u32 key_size;
__u32 value_size;
...
};
...
struct {
__u64 name; <<<< We want to verify this field.
__u32 prog_fd;
} raw_tracepoint;
};
Considering the potential deep nesting levels, finding a perfect
solution to address this issue has proven challenging. Therefore, I
propose a solution where we simply skip the verification process if the
field in question is located within a union.
Fixes:
|
||
|
780f072f4f |
bpf: Clear the probe_addr for uprobe
[ Upstream commit 5125e757e62f6c1d5478db4c2b61a744060ddf3f ]
To avoid returning uninitialized or random values when querying the file
descriptor (fd) and accessing probe_addr, it is necessary to clear the
variable prior to its use.
Fixes:
|
||
|
5fce29ab20 |
sched/rt: Fix sysctl_sched_rr_timeslice intial value
[ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]
There is a 10% rounding error in the intial value of the
sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.
This was found with LTP test sched_rr_get_interval01:
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
What this test does is to compare the return value from the
sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and
fails if they do not match.
The problem it found is the intial sysctl file value which was computed as:
static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;
which works fine as long as MSEC_PER_SEC is multiple of HZ, however it
introduces 10% rounding error for CONFIG_HZ_300:
(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)
(1000 / 300) * (100 * 300 / 1000)
3 * 30 = 90
This can be easily fixed by reversing the order of the multiplication
and division. After this fix we get:
(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ
(1000 * (100 * 300 / 1000)) / 300
(1000 * 30) / 300 = 100
Fixes:
|
||
|
e0322a255a |
refscale: Fix uninitalized use of wait_queue_head_t
[ Upstream commit f5063e8948dad7f31adb007284a5d5038ae31bb8 ]
Running the refscale test occasionally crashes the kernel with the
following error:
[ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
[ 8569.952900] #PF: supervisor read access in kernel mode
[ 8569.952902] #PF: error_code(0x0000) - not-present page
[ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
[ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
[ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
:
[ 8569.952940] Call Trace:
[ 8569.952941] <TASK>
[ 8569.952944] ref_scale_reader+0x380/0x4a0 [refscale]
[ 8569.952959] kthread+0x10e/0x130
[ 8569.952966] ret_from_fork+0x1f/0x30
[ 8569.952973] </TASK>
The likely cause is that init_waitqueue_head() is called after the call to
the torture_create_kthread() function that creates the ref_scale_reader
kthread. Although this init_waitqueue_head() call will very likely
complete before this kthread is created and starts running, it is
possible that the calling kthread will be delayed between the calls to
torture_create_kthread() and init_waitqueue_head(). In this case, the
new kthread will use the waitqueue head before it is properly initialized,
which is not good for the kernel's health and well-being.
The above crash happened here:
static inline void __add_wait_queue(...)
{
:
if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here
The offset of flags from list_head entry in wait_queue_entry is
-0x18. If reader_tasks[i].wq.head.next is NULL as allocated reader_task
structure is zero initialized, the instruction will try to access address
0xffffffffffffffe8, which is exactly the fault address listed above.
This commit therefore invokes init_waitqueue_head() before creating
the kthread.
Fixes:
|
||
|
10f358cd4b |
tracing: Introduce pipe_cpumask to avoid race on trace_pipes
[ Upstream commit c2489bb7e6be2e8cdced12c16c42fa128403ac03 ] There is race issue when concurrently splice_read main trace_pipe and per_cpu trace_pipes which will result in data read out being different from what actually writen. As suggested by Steven: > I believe we should add a ref count to trace_pipe and the per_cpu > trace_pipes, where if they are opened, nothing else can read it. > > Opening trace_pipe locks all per_cpu ref counts, if any of them are > open, then the trace_pipe open will fail (and releases any ref counts > it had taken). > > Opening a per_cpu trace_pipe will up the ref count for just that > CPU buffer. This will allow multiple tasks to read different per_cpu > trace_pipe files, but will prevent the main trace_pipe file from > being opened. But because we only need to know whether per_cpu trace_pipe is open or not, using a cpumask instead of using ref count may be easier. After this patch, users will find that: - Main trace_pipe can be opened by only one user, and if it is opened, all per_cpu trace_pipes cannot be opened; - Per_cpu trace_pipes can be opened by multiple users, but each per_cpu trace_pipe can only be opened by one user. And if one of them is opened, main trace_pipe cannot be opened. Link: https://lore.kernel.org/linux-trace-kernel/20230818022645.1948314-1-zhengyejian1@huawei.com Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
473a55cfc1 |
kprobes: Prohibit probing on CFI preamble symbol
[ Upstream commit de02f2ac5d8cfb311f44f2bf144cc20002f1fbbd ] Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those are used for CFI and not executed. Probing it will break the CFI. Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
6e5f182128 |
ANDROID: signal: Add vendor hook for memory reap
Add vendor hook to determine if the memory of a process that received the SIGKILL can be reaped. Partial cherry-pick of aosp/1724512 & aosp/2093626. Bug: 232062955 Change-Id: I75072bd264df33caff67d083821ee6f33ca83af9 Signed-off-by: Tangquan Zheng <zhengtangquan@oppo.com> |
||
|
f474c6446c |
Merge 6.1.44 into android14-6.1-lts
Changes in 6.1.44 init: Provide arch_cpu_finalize_init() x86/cpu: Switch to arch_cpu_finalize_init() ARM: cpu: Switch to arch_cpu_finalize_init() ia64/cpu: Switch to arch_cpu_finalize_init() loongarch/cpu: Switch to arch_cpu_finalize_init() m68k/cpu: Switch to arch_cpu_finalize_init() mips/cpu: Switch to arch_cpu_finalize_init() sh/cpu: Switch to arch_cpu_finalize_init() sparc/cpu: Switch to arch_cpu_finalize_init() um/cpu: Switch to arch_cpu_finalize_init() init: Remove check_bugs() leftovers init: Invoke arch_cpu_finalize_init() earlier init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init() x86/init: Initialize signal frame size late x86/fpu: Remove cpuinfo argument from init functions x86/fpu: Mark init functions __init x86/fpu: Move FPU initialization into arch_cpu_finalize_init() x86/speculation: Add Gather Data Sampling mitigation x86/speculation: Add force option to GDS mitigation x86/speculation: Add Kconfig option for GDS KVM: Add GDS_NO support to KVM x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build x86/xen: Fix secondary processors' FPU initialization x86/mm: fix poking_init() for Xen PV guests x86/mm: Use mm_alloc() in poking_init() mm: Move mm_cachep initialization to mm_init() x86/mm: Initialize text poking earlier Documentation/x86: Fix backwards on/off logic about YMM support x86/bugs: Increase the x86 bugs vector size to two u32s x86/cpu, kvm: Add support for CPUID_80000021_EAX x86/srso: Add a Speculative RAS Overflow mitigation x86/srso: Add IBPB_BRTYPE support x86/srso: Add SRSO_NO support x86/srso: Add IBPB x86/srso: Add IBPB on VMEXIT x86/srso: Fix return thunks in generated code x86/srso: Add a forgotten NOENDBR annotation x86/srso: Tie SBPB bit setting to microcode patch detection xen/netback: Fix buffer overrun triggered by unusual packet x86: fix backwards merge of GDS/SRSO bit Linux 6.1.44 Change-Id: Ia40e37c806ae2a2daf2127415aa28d0151660667 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
cf0f262265 |
Revert "locking/rtmutex: Fix task->pi_waiters integrity"
This reverts commit
|
||
|
38b64945f1 |
Revert "ring-buffer: Fix wrong stat of cpu_buffer->read"
This reverts commit
|
||
|
7f81705800 |
Merge 6.1.43 into android14-6.1-lts
Changes in 6.1.43 netfilter: nf_tables: fix underflow in object reference counter netfilter: nf_tables: fix underflow in chain reference counter platform/x86/amd/pmf: Notify OS power slider update platform/x86/amd/pmf: reduce verbosity of apmf_get_system_params drm/amd/display: Keep PHY active for dp config ovl: fix null pointer dereference in ovl_permission() drm/amd: Move helper for dynamic speed switch check out of smu13 drm/amd: Align SMU11 SMU_MSG_OverridePcieParameters implementation with SMU13 jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint blk-mq: Fix stall due to recursive flush plug powerpc/pseries/vas: Hold mmap_mutex after mmap lock during window close KVM: s390: pv: fix index value of replaced ASCE io_uring: don't audit the capability check in io_uring_create() gpio: tps68470: Make tps68470_gpio_output() always set the initial value pwm: Add a stub for devm_pwmchip_add() gpio: mvebu: Make use of devm_pwmchip_add gpio: mvebu: fix irq domain leak btrfs: fix race between quota disable and relocation i2c: Delete error messages for failed memory allocations i2c: Improve size determinations i2c: nomadik: Remove unnecessary goto label i2c: nomadik: Use devm_clk_get_enabled() i2c: nomadik: Remove a useless call in the remove function MIPS: Loongson: Move arch cflags to MIPS top level Makefile MIPS: Loongson: Fix build error when make modules_install PCI/ASPM: Return 0 or -ETIMEDOUT from pcie_retrain_link() PCI/ASPM: Factor out pcie_wait_for_retrain() PCI/ASPM: Avoid link retraining race PCI: rockchip: Remove writes to unused registers PCI: rockchip: Fix window mapping and address translation for endpoint PCI: rockchip: Don't advertise MSI-X in PCIe capabilities drm/amd/display: add FB_DAMAGE_CLIPS support drm/amd/display: Check if link state is valid drm/amd/display: Rework context change check drm/amd/display: Enable new commit sequence only for DCN32x drm/amd/display: Copy DC context in the commit streams drm/amd/display: Include surface of unaffected streams drm/amd/display: Use min transition for all SubVP plane add/remove drm/amd/display: add ODM case when looking for first split pipe drm/amd/display: use low clocks for no plane configs drm/amd/display: fix unbounded requesting for high pixel rate modes on dcn315 drm/amd/display: add pixel rate based CRB allocation support drm/amd/display: fix dcn315 single stream crb allocation drm/amd/display: Update correct DCN314 register header drm/amd/display: Set minimum requirement for using PSR-SU on Rembrandt drm/amd/display: Set minimum requirement for using PSR-SU on Phoenix drm/ttm: Don't print error message if eviction was interrupted drm/ttm: Don't leak a resource on eviction error n_tty: Rename tail to old_tail in n_tty_read() tty: fix hang on tty device with no_room set drm/ttm: never consider pinned BOs for eviction&swap KVM: arm64: Condition HW AF updates on config option arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2 mptcp: introduce 'sk' to replace 'sock->sk' in mptcp_listen() mptcp: do not rely on implicit state check in mptcp_listen() tracing/probes: Add symstr type for dynamic events tracing/probes: Fix to avoid double count of the string length on the array tracing: Allow synthetic events to pass around stacktraces Revert "tracing: Add "(fault)" name injection to kernel probes" tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails test_maple_tree: test modifications while iterating maple_tree: add __init and __exit to test module maple_tree: fix 32 bit mas_next testing drm/amd/display: Rework comments on dc file drm/amd/display: fix dc/core/dc.c kernel-doc drm/amd/display: Add FAMS validation before trying to use it drm/amd/display: update extended blank for dcn314 onwards drm/amd/display: Fix possible underflow for displays with large vblank drm/amd/display: Prevent vtotal from being set to 0 phy: phy-mtk-dp: Fix an error code in probe() phy: qcom-snps: correct struct qcom_snps_hsphy kerneldoc phy: qcom-snps-femto-v2: keep cfg_ahb_clk enabled during runtime suspend phy: qcom-snps-femto-v2: properly enable ref clock soundwire: qcom: update status correctly with mask media: staging: atomisp: select V4L2_FWNODE media: amphion: Fix firmware path to match linux-firmware i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir() iavf: fix potential deadlock on allocation failure iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILED net: phy: marvell10g: fix 88x3310 power up net: hns3: fix the imp capability bit cannot exceed 32 bits issue net: hns3: fix wrong tc bandwidth weight data issue net: hns3: fix wrong bw weight of disabled tc issue vxlan: calculate correct header length for GPE vxlan: generalize vxlan_parse_gpe_hdr and remove unused args vxlan: fix GRO with VXLAN-GPE phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe() atheros: fix return value check in atl1_tso() ethernet: atheros: fix return value check in atl1e_tso_csum() ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address tcp: Reduce chance of collisions in inet6_hashfn(). ice: Fix memory management in ice_ethtool_fdir.c bonding: reset bond's flags when down link is P2P device team: reset team's flags when down link is P2P device octeontx2-af: Removed unnecessary debug messages. octeontx2-af: Fix hash extraction enable configuration net: stmmac: Apply redundant write work around on 4.xx too platform/x86: msi-laptop: Fix rfkill out-of-sync on MSI Wind U100 x86/traps: Fix load_unaligned_zeropad() handling for shared TDX memory igc: Fix Kernel Panic during ndo_tx_timeout callback netfilter: nft_set_rbtree: fix overlap expiration walk netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID mm: suppress mm fault logging if fatal signal already pending net/sched: mqprio: refactor nlattr parsing to a separate function net/sched: mqprio: add extack to mqprio_parse_nlattr() net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64 benet: fix return value check in be_lancer_xmit_workarounds() tipc: check return value of pskb_trim() tipc: stop tipc crypto on failure in tipc_node_create RDMA/mlx4: Make check for invalid flags stricter drm/msm/dpu: drop enum dpu_core_perf_data_bus_id drm/msm/adreno: Fix snapshot BINDLESS_DATA size RDMA/irdma: Add missing read barriers RDMA/irdma: Fix data race on CQP completion stats RDMA/irdma: Fix data race on CQP request done RDMA/mthca: Fix crash when polling CQ for shared QPs RDMA/bnxt_re: Prevent handling any completions after qp destroy drm/msm: Fix IS_ERR_OR_NULL() vs NULL check in a5xx_submit_in_rb() cxl/acpi: Fix a use-after-free in cxl_parse_cfmws() cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws() ASoC: fsl_spdif: Silence output on stop block: Fix a source code comment in include/uapi/linux/blkzoned.h smb3: do not set NTLMSSP_VERSION flag for negotiate not auth request drm/i915: Fix an error handling path in igt_write_huge() xenbus: check xen_domain in xenbus_probe_initcall dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths dm raid: clean up four equivalent goto tags in raid_ctr() dm raid: protect md_stop() with 'reconfig_mutex' drm/amd: Fix an error handling mistake in psp_sw_init() drm/amd/display: Unlock on error path in dm_handle_mst_sideband_msg_ready_event() RDMA/irdma: Fix op_type reporting in CQEs RDMA/irdma: Report correct WC error drm/msm: Switch idr_lock to spinlock drm/msm: Disallow submit with fence id 0 ublk_drv: move ublk_get_device_from_id into ublk_ctrl_uring_cmd ublk: fail to start device if queue setup is interrupted ublk: fail to recover device if queue setup is interrupted ata: pata_ns87415: mark ns87560_tf_read static ring-buffer: Fix wrong stat of cpu_buffer->read tracing: Fix warning in trace_buffered_event_disable() Revert "usb: gadget: tegra-xudc: Fix error check in tegra_xudc_powerdomain_init()" usb: gadget: call usb_gadget_check_config() to verify UDC capability USB: gadget: Fix the memory leak in raw_gadget driver usb: gadget: core: remove unbalanced mutex_unlock in usb_gadget_activate KVM: Grab a reference to KVM for VM and vCPU stats file descriptors KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid serial: qcom-geni: drop bogus runtime pm state update serial: 8250_dw: Preserve original value of DLF register serial: sifive: Fix sifive_serial_console_setup() section USB: serial: option: support Quectel EM060K_128 USB: serial: option: add Quectel EC200A module support USB: serial: simple: add Kaufmann RKS+CAN VCP USB: serial: simple: sort driver entries can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED usb: typec: Set port->pd before adding device for typec_port usb: typec: Iterate pds array when showing the pd list usb: typec: Use sysfs_emit_at when concatenating the string Revert "usb: dwc3: core: Enable AutoRetry feature in the controller" usb: dwc3: pci: skip BYT GPIO lookup table for hardwired phy usb: dwc3: don't reset device side if dwc3 was configured as host-only usb: misc: ehset: fix wrong if condition usb: ohci-at91: Fix the unhandle interrupt when resume USB: quirks: add quirk for Focusrite Scarlett usb: cdns3: fix incorrect calculation of ep_buf_size when more than one config usb: xhci-mtk: set the dma max_seg_size Revert "usb: xhci: tegra: Fix error check" Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group Documentation: security-bugs.rst: clarify CVE handling staging: r8712: Fix memory leak in _r8712_init_xmit_priv() staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext() tty: n_gsm: fix UAF in gsm_cleanup_mux Revert "xhci: add quirk for host controllers that don't update endpoint DCS" ALSA: hda/realtek: Support ASUS G713PV laptop ALSA: hda/relatek: Enable Mute LED on HP 250 G8 hwmon: (k10temp) Enable AMD3255 Proc to show negative temperature hwmon: (nct7802) Fix for temp6 (PECI1) processed even if PECI1 disabled btrfs: account block group tree when calculating global reserve size btrfs: check if the transaction was aborted at btrfs_wait_for_commit() btrfs: check for commit error at btrfs_attach_transaction_barrier() x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks file: always lock position for FMODE_ATOMIC_POS nfsd: Remove incorrect check in nfsd4_validate_stateid ACPI/IORT: Remove erroneous id_count check in iort_node_get_rmr_info() tpm_tis: Explicitly check for error code irq-bcm6345-l1: Do not assume a fixed block to cpu mapping irqchip/gic-v4.1: Properly lock VPEs when doing a directLPI invalidation locking/rtmutex: Fix task->pi_waiters integrity proc/vmcore: fix signedness bug in read_from_oldmem() xen: speed up grant-table reclaim virtio-net: fix race between set queues and probe net: dsa: qca8k: fix search_and_insert wrong handling of new rule net: dsa: qca8k: fix broken search_and_del net: dsa: qca8k: fix mdb add/del case with 0 VID selftests: mptcp: join: only check for ip6tables if needed soundwire: fix enumeration completion Revert "um: Use swap() to make code cleaner" LoongArch: BPF: Fix check condition to call lu32id in move_imm() LoongArch: BPF: Enable bpf_probe_read{, str}() on LoongArch s390/dasd: fix hanging device after quiesce/resume s390/dasd: print copy pair message only for the correct error ASoC: wm8904: Fill the cache for WM8904_ADC_TEST_0 register arm64/sme: Set new vector length before reallocating PM: sleep: wakeirq: fix wake irq arming ceph: never send metrics if disable_send_metrics is set drm/i915/dpt: Use shmem for dpt objects dm cache policy smq: ensure IO doesn't prevent cleaner policy progress rbd: make get_lock_owner_info() return a single locker or NULL rbd: harden get_lock_owner_info() a bit rbd: retrieve and check lock owner twice before blocklisting drm/amd/display: set per pipe dppclk to 0 when dpp is off tracing: Fix trace_event_raw_event_synth() if else statement drm/amd/display: perform a bounds check before filling dirty rectangles drm/amd/display: Write to correct dirty_rect ACPI: processor: perflib: Use the "no limit" frequency QoS ACPI: processor: perflib: Avoid updating frequency QoS unnecessarily cpufreq: intel_pstate: Drop ACPI _PSS states table patching mptcp: ensure subflow is unhashed before cleaning the backlog selftests: mptcp: sockopt: use 'iptables-legacy' if available test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation dma-buf: keep the signaling time of merged fences v3 dma-buf: fix an error pointer vs NULL bug Linux 6.1.43 Change-Id: Id1d61f2351c51edad33ab654f1f3d911b9a75830 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
e401e169ba |
Merge remote-tracking branch into HEAD
* keystone/mirror-android14-6.1-2023-08: (162 commits) ANDROID: uid_sys_stats: Use llist for deferred work UPSTREAM: usb: typec: ucsi: Fix command cancellation ANDROID: GKI: update symbol list file for xiaomi UPSTREAM: erofs: avoid infinite loop in z_erofs_do_read_page() when reading beyond EOF UPSTREAM: erofs: avoid useless loops in z_erofs_pcluster_readmore() when reading beyond EOF UPSTREAM: erofs: Fix detection of atomic context UPSTREAM: erofs: fix compact 4B support for 16k block size UPSTREAM: erofs: kill hooked chains to avoid loops on deduplicated compressed images UPSTREAM: erofs: fix potential overflow calculating xattr_isize UPSTREAM: erofs: stop parsing non-compact HEAD index if clusterofs is invalid UPSTREAM: erofs: initialize packed inode after root inode is assigned ANDROID: GKI: Update ABI for zsmalloc fixes BACKPORT: zsmalloc: fix races between modifications of fullness and isolated UPSTREAM: zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks BACKPORT: FROMGIT: mm: handle faults that merely update the accessed bit under the VMA lock FROMLIST: mm: Allow fault_dirty_shared_page() to be called under the VMA lock FROMGIT: mm: handle swap and NUMA PTE faults under the VMA lock FROMGIT: mm: run the fault-around code under the VMA lock FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check down from do_fault() FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault() ... Change-Id: Ic33be5a9dae71958c187029751cb599a83110ab9 |
||
|
2490ab50e7 |
ANDROID: sched: Add vendor hook for rt util update
Vendor may have need to track rt util. Bug: 201261299 Signed-off-by: Rick Yiu <rickyiu@google.com> Change-Id: I2f4e5142c6bc8574ee3558042e1fb0dae13b702d |
||
|
6d97f75abc |
ANDROID: sched: Add vendor hook for util-update related functions
Vendor may have the need to implement their own util tracking. Bug: 297343949 Signed-off-by: Rick Yiu <rickyiu@google.com> Change-Id: I973902e6ff82a85ecd029ac5a78692d629df1ebe |
||
|
e08c5de06e |
ANDROID: sched: Add vendor hooks for override sugov behavior
Upstream moved the sugov to DEADLINE class which has higher prio than RT so it can potentially block many RT use case in Android. Also currently iowait doesn't distinguish background/foreground tasks and we have seen cases where device run to high frequency unnecessarily when running some background I/O. Bug: 297343949 Signed-off-by: Wei Wang <wvw@google.com> Change-Id: I21e9bfe9ef75a4178279574389e417c3f38e65ac |
||
|
5762974151 |
ANDROID: Add new hook to enable overriding uclamp_validate()
We want to add more special values, specifically for uclamp_max so that it can be set automatically to the most efficient value based on the core it's running on. Bug: 297343949 Signed-off-by: Qais Yousef <qyousef@google.com> Change-Id: I57343c4544f6cac621c855cbb94de0b8d80c51fa |
||
|
b57e3c1d99 |
ANDROID: sched/uclamp: Don't enable uclamp_is_used static key by in-kernel requests
We do have now in-kernel users of uclamp to implement inheritance. The static_branch_enable() path unconditionally holds the cpus_read_lock() which might_sleep(). The path in binder that implements inheritance happens from in_atomic() context which leads to a splat like this one: [ 147.529960] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:56 [ 147.530196] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2586, name: RenderThread [ 147.530410] INFO: lockdep is turned off. [ 147.530518] Preemption disabled at: [ 147.530521] [<ffffffc008ca2cec>] binder_proc_transaction+0x78/0x41c [ 147.530793] CPU: 8 PID: 2586 Comm: RenderThread Tainted: G S W O 5.15.76-android14-5-00086-gc01afe5d262f #1 [ 147.531214] Call trace: [ 147.531288] dump_backtrace+0xe8/0x134 [ 147.531444] show_stack+0x1c/0x4c [ 147.531598] dump_stack_lvl+0x74/0x94 [ 147.531766] dump_stack+0x14/0x3c [ 147.531920] ___might_sleep+0x210/0x230 [ 147.532094] __might_sleep+0x54/0x84 [ 147.532259] cpus_read_lock+0x2c/0x160 [ 147.532429] static_key_enable+0x1c/0x34 [ 147.532608] __sched_setscheduler+0x2a8/0x99c [ 147.532802] sched_setattr_nocheck+0x1c/0x24 [ 147.532994] binder_do_set_priority+0x31c/0x4a4 [ 147.533195] binder_transaction_priority+0x200/0x3f4 [ 147.533413] binder_proc_transaction+0x220/0x41c [ 147.533618] binder_transaction+0x1df0/0x234c [ 147.533812] binder_thread_write+0xd84/0x2398 [ 147.534007] binder_ioctl_write_read+0x19c/0xb28 [ 147.534212] binder_ioctl+0x344/0x1a3c [ 147.534382] __arm64_sys_ioctl+0x94/0xc8 [ 147.534561] invoke_syscall+0x44/0xf8 [ 147.534729] el0_svc_common+0xc8/0x10c [ 147.534900] do_el0_svc+0x20/0x28 [ 147.535053] el0_svc+0x58/0xe0 [ 147.535198] el0t_64_sync_handler+0x7c/0xe4 [ 147.535386] el0t_64_sync+0x188/0x18c Prevent enabling the lock for !user initiated sched_setattr() operations. Generally we don't expect in-kernel uclamp users. Bug: 259145692 Signed-off-by: Qais Yousef <qyousef@google.com> Change-Id: Iac5be139b5ffd39f5e1c0431ce253133d81b98cf |
||
|
eb9686932b |
ANDROID: sched: Export symbols needed for vendor hooks
Bug: 297343949 Change-Id: I0cb65e85b36687bfaae6a185ca373d7fb8de0a77 Signed-off-by: Rick Yiu <rickyiu@google.com> |
||
|
d05fcdbc3e |
sched/walt: Remove references to EXPORT_SYMBOL
In accordance with guidelines, remove any references to EXPORT_SYMBOL and use EXPORT_SYMBOL_GPL instead. Change-Id: Ie37d5d5da24e0a5daf0569054622399723fe98f8 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
469cf75fcc |
Revert "sched/psi: Fix avgs_work re-arm in psi_avgs_work()"
This reverts commit
|
||
|
d18fe3efda |
Revert "sched/psi: Rearrange polling code in preparation"
This reverts commit
|
||
|
5b039dbb91 |
Revert "sched/psi: Rename existing poll members in preparation"
This reverts commit
|
||
|
ed063a7e76 |
Revert "sched/psi: Extract update_triggers side effect"
This reverts commit
|
||
|
2c1e89916b |
Revert "sched/psi: Allow unprivileged polling of N*2s period"
This reverts commit
|
||
|
ffed79e366 |
Revert "sched/psi: use kernfs polling functions for PSI trigger polling"
This reverts commit
|
||
|
5d0fe30be4 |
modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules
commit 9011e49d54dcc7653ebb8a1e05b5badb5ecfa9f9 upstream. It has recently come to my attention that nvidia is circumventing the protection added in |
||
|
8976ff249f |
Merge 6.1.42 into android14-6.1-lts
Changes in 6.1.42 io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq ALSA: hda/realtek - remove 3k pull low procedure ALSA: hda/realtek: Add quirk for Clevo NS70AU ALSA: hda/realtek: Enable Mute LED on HP Laptop 15s-eq2xxx maple_tree: set the node limit when creating a new root node maple_tree: fix node allocation testing on 32 bit keys: Fix linking a duplicate key to a keyring's assoc_array perf probe: Add test for regression introduced by switch to die_get_decl_file() btrfs: fix warning when putting transaction with qgroups enabled after abort fuse: revalidate: don't invalidate if interrupted fuse: Apply flags2 only when userspace set the FUSE_INIT_EXT btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expand btrfs: zoned: fix memory leak after finding block group with super blocks fuse: ioctl: translate ENOSYS in outarg btrfs: fix race between balance and cancel/pause selftests: tc: set timeout to 15 minutes selftests: tc: add 'ct' action kconfig dep regmap: Drop initial version of maximum transfer length fixes of: Preserve "of-display" device name for compatibility regmap: Account for register length in SMBus I/O limits arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout can: bcm: Fix UAF in bcm_proc_show() can: gs_usb: gs_can_open(): improve error handling selftests: tc: add ConnTrack procfs kconfig dma-buf/dma-resv: Stop leaking on krealloc() failure drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel drm/amdgpu/pm: make gfxclock consistent for sienna cichlid drm/amdgpu/pm: make mclk consistent for smu 13.0.7 drm/client: Fix memory leak in drm_client_target_cloned drm/client: Fix memory leak in drm_client_modeset_probe drm/amd/display: only accept async flips for fast updates drm/amd/display: Disable MPC split by default on special asic drm/amd/display: check TG is non-null before checking if enabled drm/amd/display: Keep PHY active for DP displays on DCN31 ASoC: fsl_sai: Disable bit clock with transmitter ASoC: fsl_sai: Revert "ASoC: fsl_sai: Enable MCTL_MCLK_EN bit for master mode" ASoC: tegra: Fix ADX byte map ASoC: rt5640: Fix sleep in atomic context ASoC: cs42l51: fix driver to properly autoload with automatic module loading ASoC: codecs: wcd938x: fix missing clsh ctrl error handling ASoC: codecs: wcd-mbhc-v2: fix resource leaks on component remove ASoC: qdsp6: audioreach: fix topology probe deferral ASoC: tegra: Fix AMX byte map ASoC: codecs: wcd938x: fix resource leaks on component remove ASoC: codecs: wcd938x: fix missing mbhc init error handling ASoC: codecs: wcd934x: fix resource leaks on component remove ASoC: codecs: wcd938x: fix codec initialisation race ASoC: codecs: wcd938x: fix soundwire initialisation race ext4: correct inline offset when handling xattrs in inode body drm/radeon: Fix integer overflow in radeon_cs_parser_init ALSA: emu10k1: roll up loops in DSP setup code for Audigy quota: Properly disable quotas when add_dquot_ref() fails quota: fix warning in dqgrab() HID: add quirk for 03f0:464a HP Elite Presenter Mouse ovl: check type and offset of struct vfsmount in ovl_entry udf: Fix uninitialized array access for some pathnames fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev MIPS: dec: prom: Address -Warray-bounds warning FS: JFS: Fix null-ptr-deref Read in txBegin FS: JFS: Check for read-only mounted filesystem in txBegin ACPI: video: Add backlight=native DMI quirk for Dell Studio 1569 rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic() rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp sched/fair: Don't balance task to its current running CPU wifi: ath11k: fix registration of 6Ghz-only phy without the full channel range bpf: Print a warning only if writing to unprivileged_bpf_disabled. bpf: Address KCSAN report on bpf_lru_list bpf: tcp: Avoid taking fast sock lock in iterator wifi: ath11k: add support default regdb while searching board-2.bin for WCN6855 wifi: mac80211_hwsim: Fix possible NULL dereference spi: dw: Add compatible for Intel Mount Evans SoC wifi: ath11k: fix memory leak in WMI firmware stats net: ethernet: litex: add support for 64 bit stats devlink: report devlink_port_type_warn source device wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point() wifi: iwlwifi: Add support for new PCI Id wifi: iwlwifi: mvm: avoid baid size integer overflow wifi: iwlwifi: pcie: add device id 51F1 for killer 1675 igb: Fix igb_down hung on surprise removal net: hns3: fix strncpy() not using dest-buf length as length issue ASoC: amd: acp: fix for invalid dai id handling in acp_get_byte_count() ASoC: codecs: wcd938x: fix mbhc impedance loglevel ASoC: codecs: wcd938x: fix dB range for HPHL and HPHR ASoC: qcom: q6apm: do not close GPR port before closing graph sched/fair: Use recent_used_cpu to test p->cpus_ptr sched/psi: Fix avgs_work re-arm in psi_avgs_work() sched/psi: Rearrange polling code in preparation sched/psi: Rename existing poll members in preparation sched/psi: Extract update_triggers side effect sched/psi: Allow unprivileged polling of N*2s period sched/psi: use kernfs polling functions for PSI trigger polling pinctrl: renesas: rzv2m: Handle non-unique subnode names pinctrl: renesas: rzg2l: Handle non-unique subnode names spi: bcm63xx: fix max prepend length fbdev: imxfb: warn about invalid left/right margin fbdev: imxfb: Removed unneeded release_mem_region perf build: Fix library not found error when using CSLIBS btrfs: be a bit more careful when setting mirror_num_ret in btrfs_map_block spi: s3c64xx: clear loopback bit after loopback test kallsyms: Improve the performance of kallsyms_lookup_name() kallsyms: Correctly sequence symbols when CONFIG_LTO_CLANG=y kallsyms: strip LTO-only suffixes from promoted global functions dsa: mv88e6xxx: Do a final check before timing out net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field() bridge: Add extack warning when enabling STP in netns. net: ethernet: mtk_eth_soc: handle probe deferral cifs: fix mid leak during reconnection after timeout threshold ASoC: SOF: ipc3-dtrace: uninitialized data in dfsentry_trace_filter_write() net: sched: cls_matchall: Undo tcf_bind_filter in case of failure after mall_set_parms net: sched: cls_u32: Undo tcf_bind_filter if u32_replace_hw_knode net: sched: cls_u32: Undo refcount decrement in case update failed net: sched: cls_bpf: Undo tcf_bind_filter in case of an error net: dsa: microchip: ksz8: Separate static MAC table operations for code reuse net: dsa: microchip: ksz8: Make ksz8_r_sta_mac_table() static net: dsa: microchip: ksz8_r_sta_mac_table(): Avoid using error code for empty entries net: dsa: microchip: correct KSZ8795 static MAC table access iavf: Fix use-after-free in free_netdev iavf: Fix out-of-bounds when setting channels on remove iavf: use internal state to free traffic IRQs iavf: Move netdev_update_features() into watchdog task iavf: send VLAN offloading caps once after VFR iavf: make functions static where possible iavf: Wait for reset in callbacks which trigger it iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies iavf: fix reset task race with iavf_remove() security: keys: Modify mismatched function name octeontx2-pf: Dont allocate BPIDs for LBK interfaces bpf: Fix subprog idx logic in check_max_stack_depth bpf: Repeat check_max_stack_depth for async callbacks bpf, arm64: Fix BTI type used for freplace attached functions igc: Avoid transmit queue timeout for XDP igc: Prevent garbled TX queue with XDP ZEROCOPY net: ipv4: use consistent txhash in TIME_WAIT and SYN_RECV tcp: annotate data-races around tcp_rsk(req)->txhash tcp: annotate data-races around tcp_rsk(req)->ts_recent net: ipv4: Use kfree_sensitive instead of kfree net:ipv6: check return value of pskb_trim() Revert "tcp: avoid the lookup process failing to get sk in ehash table" fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe llc: Don't drop packet from non-root netns. ALSA: hda/realtek: Fix generic fixup definition for cs35l41 amp netfilter: nf_tables: fix spurious set element insertion failure netfilter: nf_tables: can't schedule in nft_chain_validate netfilter: nft_set_pipapo: fix improper element removal netfilter: nf_tables: skip bound chain in netns release path netfilter: nf_tables: skip bound chain on rule flush Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync Bluetooth: hci_event: call disconnect callback before deleting conn Bluetooth: ISO: fix iso_conn related locking and validity issues Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor() tcp: annotate data-races around tp->tcp_tx_delay tcp: annotate data-races around tp->tsoffset tcp: annotate data-races around tp->keepalive_time tcp: annotate data-races around tp->keepalive_intvl tcp: annotate data-races around tp->keepalive_probes tcp: annotate data-races around icsk->icsk_syn_retries tcp: annotate data-races around tp->linger2 tcp: annotate data-races around rskq_defer_accept tcp: annotate data-races around tp->notsent_lowat tcp: annotate data-races around icsk->icsk_user_timeout tcp: annotate data-races around fastopenq.max_qlen net: phy: prevent stale pointer dereference in phy_init() jbd2: recheck chechpointing non-dirty buffer tracing/histograms: Return an error if we fail to add histogram to hist_vars list drm/ttm: fix bulk_move corruption when adding a entry spi: dw: Remove misleading comment for Mount Evans SoC kallsyms: add kallsyms_seqs_of_names to list of special symbols scripts/kallsyms.c Make the comment up-to-date with current implementation scripts/kallsyms: update the usage in the comment block bpf: allow precision tracking for programs with subprogs bpf: stop setting precise in current state bpf: aggressively forget precise markings during state checkpointing selftests/bpf: make test_align selftest more robust selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code selftests/bpf: Fix sk_assign on s390x drm/amd/display: use max_dsc_bpp in amdgpu_dm drm/amd/display: fix some coding style issues drm/dp_mst: Clear MSG_RDY flag before sending new message drm/amd/display: force connector state when bpc changes during compliance drm/amd/display: Clean up errors & warnings in amdgpu_dm.c drm/amd/display: fix linux dp link lost handled only one time drm/amd/display: Add polling method to handle MST reply packet Revert "drm/amd/display: edp do not add non-edid timings" Linux 6.1.42 Change-Id: I6b7257a16f9a025d0c23dfd3eb43317c1c164a93 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
176d72d941 |
ANDROID: vendor_hooks: export cgroup_threadgroup_rwsem
When the task wakes up from percpu_rwsem_wait, it will enter a long runnable state, which will cause frame loss when the application starts. In order to solve this problem, we need to let the process enter the "vip" queue when it is woken up, so we need to set a flag for the process holding the lock to prove that it is about to hold the lock. Most of this long runnable state occurs in the cgroup_threadgroup_rwsem, so we only care cgroup_threadgroup_rwsem, and cgroup_threadgroup_rwsem should be exported. Finally, if the semaphore is of cgroup_threadgroup_rwsem type and has a flag, then let it join the "vip" queue. Bug: 297785167 Signed-off-by: liuxudong <liuxudong5@xiaomi.com> Change-Id: I2297dfbc2f2681581241f85a3b4fd59415ea67db |
||
|
f1311733c2 |
Merge 6.1.40 into android14-6.1-lts
Changes in 6.1.40 HID: amd_sfh: Rename the float32 variable HID: amd_sfh: Fix for shift-out-of-bounds net: lan743x: Don't sleep in atomic context workqueue: clean up WORK_* constant types, clarify masking ksmbd: add missing compound request handing in some commands ksmbd: fix out of bounds read in smb2_sess_setup drm/panel: simple: Add connector_type for innolux_at043tn24 drm/bridge: ti-sn65dsi86: Fix auxiliary bus lifetime swiotlb: always set the number of areas before allocating the pool swiotlb: reduce the swiotlb buffer size on allocation failure swiotlb: reduce the number of areas to match actual memory pool size drm/panel: simple: Add Powertip PH800480T013 drm_display_mode flags ice: Fix max_rate check while configuring TX rate limits igc: Remove delay during TX ring configuration net/mlx5e: fix double free in mlx5e_destroy_flow_table net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create net/mlx5e: fix memory leak in mlx5e_ptp_open net/mlx5e: Check for NOT_READY flag state after locking igc: set TP bit in 'supported' and 'advertising' fields of ethtool_link_ksettings igc: Handle PPS start time programming for past time values blk-crypto: use dynamic lock class for blk_crypto_profile::lock scsi: qla2xxx: Fix error code in qla2x00_start_sp() scsi: ufs: ufs-mediatek: Add dependency for RESET_CONTROLLER bpf: Fix max stack depth check for async callbacks net: mvneta: fix txq_map in case of txq_number==1 net/sched: cls_fw: Fix improper refcount update leads to use-after-free gve: Set default duplex configuration to full octeontx2-af: Promisc enable/disable through mbox octeontx2-af: Move validation of ptp pointer before its usage ionic: remove WARN_ON to prevent panic_on_warn net: bgmac: postpone turning IRQs off to avoid SoC hangs net: prevent skb corruption on frag list segmentation icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev(). udp6: fix udp6_ehashfn() typo ntb: idt: Fix error handling in idt_pci_driver_init() NTB: amd: Fix error handling in amd_ntb_pci_driver_init() ntb: intel: Fix error handling in intel_ntb_pci_driver_init() NTB: ntb_transport: fix possible memory leak while device_register() fails NTB: ntb_tool: Add check for devm_kcalloc ipv6/addrconf: fix a potential refcount underflow for idev net: dsa: qca8k: Add check for skb_copy platform/x86: wmi: Break possible infinite loop when parsing GUID kernel/trace: Fix cleanup logic of enable_trace_eprobe igc: Fix launchtime before start of cycle igc: Fix inserting of empty frame for launchtime nvme: fix the NVME_ID_NS_NVM_STS_MASK definition riscv, bpf: Fix inconsistent JIT image generation drm/i915: Don't preserve dpll_hw_state for slave crtc in Bigjoiner drm/i915: Fix one wrong caching mode enum usage octeontx2-pf: Add additional check for MCAM rules erofs: avoid useless loops in z_erofs_pcluster_readmore() when reading beyond EOF erofs: avoid infinite loop in z_erofs_do_read_page() when reading beyond EOF erofs: fix fsdax unavailability for chunk-based regular files wifi: airo: avoid uninitialized warning in airo_get_rate() bpf: cpumap: Fix memory leak in cpu_map_update_elem net/sched: flower: Ensure both minimum and maximum ports are specified riscv: mm: fix truncation warning on RV32 netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write() net/sched: make psched_mtu() RTNL-less safe wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set() net/sched: sch_qfq: refactor parsing of netlink parameters net/sched: sch_qfq: account for stab overhead in qfq_enqueue nvme-pci: fix DMA direction of unmapping integrity data fs/ntfs3: Check fields while reading ovl: let helper ovl_i_path_real() return the realinode ovl: fix null pointer dereference in ovl_get_acl_rcu() cifs: fix session state check in smb2_find_smb_ses drm/client: Send hotplug event after registering a client drm/amdgpu/sdma4: set align mask to 255 drm/amd/pm: revise the ASPM settings for thunderbolt attached scenario drm/amdgpu: add the fan abnormal detection feature drm/amdgpu: Fix minmax warning drm/amd/pm: add abnormal fan detection for smu 13.0.0 f2fs: fix the wrong condition to determine atomic context f2fs: fix deadlock in i_xattr_sem and inode page lock pinctrl: amd: Add Z-state wake control bits pinctrl: amd: Adjust debugfs output pinctrl: amd: Add fields for interrupt status and wake status pinctrl: amd: Detect internal GPIO0 debounce handling pinctrl: amd: Fix mistake in handling clearing pins at startup pinctrl: amd: Detect and mask spurious interrupts pinctrl: amd: Revert "pinctrl: amd: disable and mask interrupts on probe" pinctrl: amd: Only use special debounce behavior for GPIO 0 pinctrl: amd: Use amd_pinconf_set() for all config options pinctrl: amd: Drop pull up select configuration pinctrl: amd: Unify debounce handling into amd_pinconf_set() tpm: Do not remap from ACPI resources again for Pluton TPM tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation tpm: tis_i2c: Limit read bursts to I2C_SMBUS_BLOCK_MAX (32) bytes tpm: tis_i2c: Limit write bursts to I2C_SMBUS_BLOCK_MAX (32) bytes tpm: return false from tpm_amd_is_rng_defective on non-x86 platforms mtd: rawnand: meson: fix unaligned DMA buffers handling net: bcmgenet: Ensure MDIO unregistration has clocks enabled net: phy: dp83td510: fix kernel stall during netboot in DP83TD510E PHY driver kasan: add kasan_tag_mismatch prototype tracing/user_events: Fix incorrect return value for writing operation when events are disabled powerpc: Fail build if using recordmcount with binutils v2.37 misc: fastrpc: Create fastrpc scalar with correct buffer count powerpc/security: Fix Speculation_Store_Bypass reporting on Power10 powerpc/64s: Fix native_hpte_remove() to be irq-safe MIPS: Loongson: Fix cpu_probe_loongson() again MIPS: KVM: Fix NULL pointer dereference ext4: Fix reusing stale buffer heads from last failed mounting ext4: fix wrong unit use in ext4_mb_clear_bb ext4: get block from bh in ext4_free_blocks for fast commit replay ext4: fix wrong unit use in ext4_mb_new_blocks ext4: fix to check return value of freeze_bdev() in ext4_shutdown() ext4: turn quotas off if mount failed after enabling quotas ext4: only update i_reserved_data_blocks on successful block allocation fs: dlm: revert check required context while close soc: qcom: mdt_loader: Fix unconditional call to scm_pas_mem_setup ext2/dax: Fix ext2_setsize when len is page aligned jfs: jfs_dmap: Validate db_l2nbperpage while mounting hwrng: imx-rngc - fix the timeout for init and self check dm integrity: reduce vmalloc space footprint on 32-bit architectures scsi: mpi3mr: Propagate sense data for admin queue SCSI I/O s390/zcrypt: do not retry administrative requests PCI/PM: Avoid putting EloPOS E2/S2/H2 PCIe Ports in D3cold PCI: Release resource invalidated by coalescing PCI: Add function 1 DMA alias quirk for Marvell 88SE9235 PCI: qcom: Disable write access to read only registers for IP v2.3.3 PCI: epf-test: Fix DMA transfer completion initialization PCI: epf-test: Fix DMA transfer completion detection PCI: rockchip: Assert PCI Configuration Enable bit after probe PCI: rockchip: Write PCI Device ID to correct register PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked PCI: rockchip: Fix legacy IRQ generation for RK3399 PCIe endpoint core PCI: rockchip: Use u32 variable to access 32-bit registers PCI: rockchip: Set address alignment for endpoint mode misc: pci_endpoint_test: Free IRQs before removing the device misc: pci_endpoint_test: Re-init completion for every test mfd: pm8008: Fix module autoloading md/raid0: add discard support for the 'original' layout dm init: add dm-mod.waitfor to wait for asynchronously probed block devices fs: dlm: return positive pid value for F_GETLK fs: dlm: fix cleanup pending ops when interrupted fs: dlm: interrupt posix locks only when process is killed fs: dlm: make F_SETLK use unkillable wait_event fs: dlm: fix mismatch of plock results from userspace scsi: lpfc: Fix double free in lpfc_cmpl_els_logo_acc() caused by lpfc_nlp_not_used() drm/atomic: Allow vblank-enabled + self-refresh "disable" drm/rockchip: vop: Leave vblank enabled in self-refresh drm/amd/display: fix seamless odm transitions drm/amd/display: edp do not add non-edid timings drm/amd/display: Remove Phantom Pipe Check When Calculating K1 and K2 drm/amd/display: disable seamless boot if force_odm_combine is enabled drm/amdgpu: fix clearing mappings for BOs that are always valid in VM drm/amd: Disable PSR-SU on Parade 0803 TCON drm/amd/display: add a NULL pointer check drm/amd/display: Correct `DMUB_FW_VERSION` macro drm/amd/display: Add monitor specific edid quirk drm/amdgpu: avoid restore process run into dead loop. drm/ttm: Don't leak a resource on swapout move error serial: atmel: don't enable IRQs prematurely tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk tty: serial: imx: fix rs485 rx after tx firmware: stratix10-svc: Fix a potential resource leak in svc_create_memory_pool() libceph: harden msgr2.1 frame segment length checks ceph: add a dedicated private data for netfs rreq ceph: fix blindly expanding the readahead windows ceph: don't let check_caps skip sending responses for revoke msgs xhci: Fix resume issue of some ZHAOXIN hosts xhci: Fix TRB prefetch issue of ZHAOXIN hosts xhci: Show ZHAOXIN xHCI root hub speed correctly meson saradc: fix clock divider mask length opp: Fix use-after-free in lazy_opp_tables after probe deferral soundwire: qcom: fix storing port config out-of-bounds Revert "8250: add support for ASIX devices with a FIFO bug" bus: ixp4xx: fix IXP4XX_EXP_T1_MASK s390/decompressor: fix misaligned symbol build error dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter tracing/histograms: Add histograms to hist_vars if they have referenced variables tracing: Fix memory leak of iter->temp when reading trace_pipe nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices samples: ftrace: Save required argument registers in sample trampolines perf: RISC-V: Remove PERF_HES_STOPPED flag checking in riscv_pmu_start() regmap-irq: Fix out-of-bounds access when allocating config buffers net: ena: fix shift-out-of-bounds in exponential backoff ring-buffer: Fix deadloop issue on reading trace_pipe ftrace: Fix possible warning on checking all pages used in ftrace_process_locs() drm/amd/pm: share the code around SMU13 pcie parameters update drm/amd/pm: conditionally disable pcie lane/speed switching for SMU13 cifs: if deferred close is disabled then close files immediately xtensa: ISS: fix call to split_if_spec perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR PM: QoS: Restore support for default value on frequency QoS pwm: meson: modify and simplify calculation in meson_pwm_get_state pwm: meson: fix handling of period/duty if greater than UINT_MAX fprobe: Release rethook after the ftrace_ops is unregistered fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free() tracing: Fix null pointer dereference in tracing_err_log_open() selftests: mptcp: connect: fail if nft supposed to work selftests: mptcp: sockopt: return error if wrong mark selftests: mptcp: userspace_pm: use correct server port selftests: mptcp: userspace_pm: report errors with 'remove' tests selftests: mptcp: depend on SYN_COOKIES selftests: mptcp: pm_nl_ctl: fix 32-bit support tracing/probes: Fix not to count error code to total length tracing/probes: Fix to update dynamic data counter if fetcharg uses it tracing/user_events: Fix struct arg size match check scsi: qla2xxx: Multi-que support for TMF scsi: qla2xxx: Fix task management cmd failure scsi: qla2xxx: Fix task management cmd fail due to unavailable resource scsi: qla2xxx: Fix hang in task management scsi: qla2xxx: Wait for io return on terminate rport scsi: qla2xxx: Fix mem access after free scsi: qla2xxx: Array index may go out of bound scsi: qla2xxx: Avoid fcport pointer dereference scsi: qla2xxx: Fix buffer overrun scsi: qla2xxx: Fix potential NULL pointer dereference scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport() scsi: qla2xxx: Correct the index of array scsi: qla2xxx: Pointer may be dereferenced scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue scsi: qla2xxx: Fix end of loop test MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled Revert "drm/amd: Disable PSR-SU on Parade 0803 TCON" swiotlb: mark swiotlb_memblock_alloc() as __init net/sched: sch_qfq: reintroduce lmax bound check for MTU drm/atomic: Fix potential use-after-free in nonblocking commits net/ncsi: make one oem_gma function for all mfr id net/ncsi: change from ndo_set_mac_address to dev_set_mac_address Linux 6.1.40 Change-Id: I5cc6aab178c66d2a23fe2a8d21e71cc4a8b15acf Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
c057db2f88 |
Revert "bpf: Remove bpf trampoline selector"
This reverts commit
|
||
|
b435525822 |
This is the 6.1.39 stable release
-----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmS38qMACgkQONu9yGCS aT56yQ//ZuDuw8Ev3HISVgZhE9FpuXC1RSYXiMCAvwA9rH3KnJ4wKVPEhEWLy9P4 jdJaatSLbLOvA7ME7JnwZxz2qahjBxo1tpx6u2S3zrzz4UlAPNLwCxTxxp4X07VI 3fBNvsmucqFSayCrA8t9xgkaJizuCvHZm7eSoyVIigPwbB5igc2b+bNSRcx1Zo+j SHl4Y4nGK8a47XU9RSlDLVKow0/6rrQLHQ9DLpxACArRHw3h451vD0DMcgOuU/Uv 6qq9u3COcdVw3oc5VENu9XklPmvQkxo3RaCUHyRadVstuc0H/BBUDvEhPn5PcVOV EdBWlTjmhsQo0aUziK4kotLNeX1VRgKa+rrIUBJn68OHv1SRRPZU/eJ8hkL81dCi FDPzXDOszixO7pPv1jj7O9kNcwKPuiHPmdaNPCY6jviOHhZnAEub44DpQamxWvU/ kb5MZRRY72wt9iWeI3kscCCSbf6eyjlmDMoYIeLuYn10n7gIDU80eUOBl9bqEsz/ X+OUxaY+XuKbCoucpNmSHHLmynJ5D0CXhl/5qnlgMoSo4UJ5BUIMj2e3ZqsKLfrR e/09MCRX79y9J+TxUunnQZfq5vBlH1tRsvUyhIfYfW4AaC9BrkOL2XZviQldKY6x FUmsxh62O3iGRtLOWDKQA5MwoJuD54qVcHr1iidWkO2G8T3ctCc= =kyUh -----END PGP SIGNATURE----- Merge 6.1.39 into android14-6.1-lts Changes in 6.1.39 drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2 fs: pipe: reveal missing function protoypes block: Fix the type of the second bdev_op_is_zoned_write() argument erofs: clean up cached I/O strategies erofs: avoid tagged pointers to mark sync decompression erofs: remove tagged pointer helpers erofs: move zdata.h into zdata.c erofs: kill hooked chains to avoid loops on deduplicated compressed images x86/resctrl: Only show tasks' pid in current pid namespace blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost x86/sev: Fix calculation of end address based on number of pages virt: sevguest: Add CONFIG_CRYPTO dependency blk-mq: fix potential io hang by wrong 'wake_batch' lockd: drop inappropriate svc_get() from locked_get() nvme-auth: rename __nvme_auth_[reset|free] to nvme_auth[reset|free]_dhchap nvme-auth: rename authentication work elements nvme-auth: remove symbol export from nvme_auth_reset nvme-auth: no need to reset chap contexts on re-authentication nvme-core: fix memory leak in dhchap_secret_store nvme-core: fix memory leak in dhchap_ctrl_secret nvme-auth: don't ignore key generation failures when initializing ctrl keys nvme-core: add missing fault-injection cleanup nvme-core: fix dev_pm_qos memleak md/raid10: check slab-out-of-bounds in md_bitmap_get_counter md/raid10: fix overflow of md/safe_mode_delay md/raid10: fix wrong setting of max_corr_read_errors md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request md/raid10: fix io loss while replacement replace rdev md/raid1-10: factor out a helper to add bio to plug md/raid1-10: factor out a helper to submit normal write md/raid1-10: submit write io directly if bitmap is not enabled block: fix blktrace debugfs entries leakage irqchip/stm32-exti: Fix warning on initialized field overwritten irqchip/jcore-aic: Fix missing allocation of IRQ descriptors svcrdma: Prevent page release when nothing was received erofs: simplify iloc() erofs: fix compact 4B support for 16k block size posix-timers: Prevent RT livelock in itimer_delete() tick/rcu: Fix bogus ratelimit condition tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode(). clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe PM: domains: fix integer overflow issues in genpd_parse_state() perf/arm-cmn: Fix DTC reset x86/mm: Allow guest.enc_status_change_prepare() to fail x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad() drivers/perf: hisi: Don't migrate perf to the CPU going to teardown powercap: RAPL: Fix CONFIG_IOSF_MBI dependency PM: domains: Move the verification of in-params from genpd_add_device() ARM: 9303/1: kprobes: avoid missing-declaration warnings cpufreq: intel_pstate: Fix energy_performance_preference for passive thermal/drivers/sun8i: Fix some error handling paths in sun8i_ths_probe() rcu: Make rcu_cpu_starting() rely on interrupts being disabled rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs rcutorture: Correct name of use_softirq module parameter rcuscale: Move shutdown from wait_event() to wait_event_idle() rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup() rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale kselftest: vDSO: Fix accumulation of uninitialized ret when CLOCK_REALTIME is undefined perf/ibs: Fix interface via core pmu events x86/mm: Fix __swp_entry_to_pte() for Xen PV guests locking/atomic: arm: fix sync ops evm: Complete description of evm_inode_setattr() evm: Fix build warnings ima: Fix build warnings pstore/ram: Add check for kstrdup igc: Enable and fix RX hash usage by netstack wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx libbpf: btf_dump_type_data_check_overflow needs to consider BTF_MEMBER_BITFIELD_SIZE samples/bpf: Fix buffer overflow in tcp_basertt spi: spi-geni-qcom: Correct CS_TOGGLE bit in SPI_TRANS_CFG wifi: wilc1000: fix for absent RSN capabilities WFA testcase wifi: mwifiex: Fix the size of a memory allocation in mwifiex_ret_802_11_scan() sctp: add bpf_bypass_getsockopt proto callback libbpf: fix offsetof() and container_of() to work with CO-RE bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen spi: dw: Round of n_bytes to power of 2 nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect() bpftool: JIT limited misreported as negative value on aarch64 bpf: Remove bpf trampoline selector bpf: Fix memleak due to fentry attach failure selftests/bpf: Do not use sign-file as testcase regulator: core: Fix more error checking for debugfs_create_dir() regulator: core: Streamline debugfs operations wifi: orinoco: Fix an error handling path in spectrum_cs_probe() wifi: orinoco: Fix an error handling path in orinoco_cs_probe() wifi: atmel: Fix an error handling path in atmel_probe() wifi: wl3501_cs: Fix an error handling path in wl3501_probe() wifi: ray_cs: Fix an error handling path in ray_probe() wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes samples/bpf: xdp1 and xdp2 reduce XDPBUFSIZE to 60 wifi: ath10k: Trigger STA disconnect after reconfig complete on hardware restart wifi: mac80211: recalc min chandef for new STA links selftests/bpf: Fix check_mtu using wrong variable type wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown ice: handle extts in the miscellaneous interrupt thread selftests: cgroup: fix unexpected failure on test_memcg_low watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config watchdog/perf: more properly prevent false positives with turbo modes kexec: fix a memory leak in crash_shrink_memory() mmc: mediatek: Avoid ugly error message when SDIO wakeup IRQ isn't used memstick r592: make memstick_debug_get_tpc_name() static wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key() wifi: mac80211: Fix permissions for valid_links debugfs entry rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO wifi: ath11k: Add missing check for ioremap wifi: iwlwifi: pull from TXQs with softirqs disabled wifi: iwlwifi: pcie: fix NULL pointer dereference in iwl_pcie_irq_rx_msix_handler() wifi: mac80211: Remove "Missing iftype sband data/EHT cap" spam wifi: cfg80211: rewrite merging of inherited elements wifi: cfg80211: drop incorrect nontransmitted BSS update code wifi: cfg80211: fix regulatory disconnect with OCB/NAN wifi: cfg80211/mac80211: Fix ML element common size calculation wifi: ieee80211: Fix the common size calculation for reconfiguration ML mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019 wifi: iwlwifi: mvm: indicate HW decrypt for beacon protection wifi: ath9k: convert msecs to jiffies where needed bpf: Factor out socket lookup functions for the TC hookpoint. bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings can: length: fix bitstuffing count can: kvaser_pciefd: Add function to set skb hwtstamps can: kvaser_pciefd: Set hardware timestamp on transmitted packets net: stmmac: fix double serdes powerdown netlink: fix potential deadlock in netlink_set_err() netlink: do not hard code device address lenth in fdb dumps bonding: do not assume skb mac_header is set selftests: rtnetlink: remove netdevsim device after ipsec offload test gtp: Fix use-after-free in __gtp_encap_destroy(). net: axienet: Move reset before 64-bit DMA detection ocfs2: Fix use of slab data with sendpage sfc: fix crash when reading stats while NIC is resetting net: nfc: Fix use-after-free caused by nfc_llcp_find_local lib/ts_bm: reset initial match offset for every block of text netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value. ipvlan: Fix return value of ipvlan_queue_xmit() netlink: Add __sock_i_ino() for __netlink_diag_dump(). drm/amd/display: Add logging for display MALL refresh setting radeon: avoid double free in ci_dpm_init() drm/amd/display: Explicitly specify update type per plane info change drm/bridge: it6505: Move a variable assignment behind a null pointer check in receive_timing_debugfs_show() Input: drv260x - sleep between polling GO bit drm/bridge: ti-sn65dsi83: Fix enable error path drm/bridge: tc358768: always enable HS video mode drm/bridge: tc358768: fix PLL parameters computation drm/bridge: tc358768: fix PLL target frequency drm/bridge: tc358768: fix TCLK_ZEROCNT computation drm/bridge: tc358768: Add atomic_get_input_bus_fmts() implementation drm/bridge: tc358768: fix TCLK_TRAILCNT computation drm/bridge: tc358768: fix THS_ZEROCNT computation drm/bridge: tc358768: fix TXTAGOCNT computation drm/bridge: tc358768: fix THS_TRAILCNT computation drm/vram-helper: fix function names in vram helper doc ARM: dts: BCM5301X: Drop "clock-names" from the SPI node ARM: dts: meson8b: correct uart_B and uart_C clock references mm: call arch_swap_restore() from do_swap_page() clk: vc5: Use `clamp()` to restrict PLL range bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page clk: vc5: Fix .driver_data content in i2c_device_id clk: vc7: Fix .driver_data content in i2c_device_id clk: rs9: Fix .driver_data content in i2c_device_id Input: adxl34x - do not hardcode interrupt trigger type drm: sun4i_tcon: use devm_clk_get_enabled in `sun4i_tcon_init_clocks` drm/panel: sharp-ls043t1le01: adjust mode settings driver: soc: xilinx: use _safe loop iterator to avoid a use after free ASoC: Intel: sof_sdw: remove SOF_SDW_TGL_HDMI for MeteorLake devices drm/vkms: isolate pixel conversion functionality drm: Add fixed-point helper to get rounded integer values drm/vkms: Fix RGB565 pixel conversion ARM: dts: stm32: Move ethernet MAC EEPROM from SoM to carrier boards bus: ti-sysc: Fix dispc quirk masking bool variables arm64: dts: microchip: sparx5: do not use PSCI on reference boards drm/bridge: tc358767: Switch to devm MIPI-DSI helpers clk: imx: scu: use _safe list iterator to avoid a use after free hwmon: (f71882fg) prevent possible division by zero RDMA/bnxt_re: Disable/kill tasklet only if it is enabled RDMA/bnxt_re: Fix to remove unnecessary return labels RDMA/bnxt_re: Use unique names while registering interrupts RDMA/bnxt_re: Remove a redundant check inside bnxt_re_update_gid RDMA/bnxt_re: Fix to remove an unnecessary log drm/msm/dsi: don't allow enabling 14nm VCO with unprogrammed rate drm/msm/disp/dpu: get timing engine status from intf status register drm/msm/dpu: Set DPU_DATA_HCTL_EN for in INTF_SC7180_MASK iommu/virtio: Detach domain on endpoint release iommu/virtio: Return size mapped for a detached domain clk: renesas: rzg2l: Fix CPG_SIPLL5_CLK1 register write ARM: dts: gta04: Move model property out of pinctrl node drm/bridge: anx7625: Convert to i2c's .probe_new() drm/bridge: anx7625: Prevent endless probe loop ARM: dts: qcom: msm8974: do not use underscore in node name (again) arm64: dts: qcom: msm8916: correct camss unit address arm64: dts: qcom: msm8916: correct MMC unit address arm64: dts: qcom: msm8994: correct SPMI unit address arm64: dts: qcom: msm8996: correct camss unit address arm64: dts: qcom: sdm630: correct camss unit address arm64: dts: qcom: sdm845: correct camss unit address arm64: dts: qcom: sm8350: Add GPI DMA compatible fallback arm64: dts: qcom: sm8350: correct DMA controller unit address arm64: dts: qcom: sdm845-polaris: add missing touchscreen child node reg arm64: dts: qcom: apq8016-sbc: Fix regulator constraints arm64: dts: qcom: apq8016-sbc: Fix 1.8V power rail on LS expansion drm/bridge: Introduce pre_enable_prev_first to alter bridge init order drm/bridge: ti-sn65dsi83: Fix enable/disable flow to meet spec drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H ARM: ep93xx: fix missing-prototype warnings ARM: omap2: fix missing tick_broadcast() prototype arm64: dts: qcom: pm7250b: add missing spmi-vadc include arm64: dts: qcom: apq8096: fix fixed regulator name property arm64: dts: mediatek: mt8183: Add mediatek,broken-save-restore-fw to kukui ARM: dts: stm32: Shorten the AV96 HDMI sound card name memory: brcmstb_dpfe: fix testing array offset after use ARM: dts: qcom: apq8074-dragonboard: Set DMA as remotely controlled ASoC: es8316: Increment max value for ALC Capture Target Volume control ASoC: es8316: Do not set rate constraints for unsupported MCLKs ARM: dts: meson8: correct uart_B and uart_C clock references soc/fsl/qe: fix usb.c build errors RDMA/irdma: avoid fortify-string warning in irdma_clr_wqes IB/hfi1: Fix wrong mmu_node used for user SDMA packet after invalidate RDMA/hns: Fix hns_roce_table_get return value ARM: dts: iwg20d-q7-common: Fix backlight pwm specifier arm64: dts: renesas: ulcb-kf: Remove flow control for SCIF1 drm/msm/dpu: set DSC flush bit correctly at MDP CTL flush register fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe() arm64: dts: ti: k3-j7200: Fix physical address of pin Input: pm8941-powerkey - fix debounce on gen2+ PMICs ARM: dts: stm32: Fix audio routing on STM32MP15xx DHCOM PDK2 ARM: dts: stm32: fix i2s endpoint format property for stm32mp15xx-dkx hwmon: (gsc-hwmon) fix fan pwm temperature scaling hwmon: (pmbus/adm1275) Fix problems with temperature monitoring on ADM1272 ARM: dts: BCM5301X: fix duplex-full => full-duplex clk: Export clk_hw_forward_rate_request() drm/amd/display: Fix a test CalculatePrefetchSchedule() drm/amd/display: Fix a test dml32_rq_dlg_get_rq_reg() drm/amdkfd: Fix potential deallocation of previously deallocated memory. soc: mediatek: SVS: Fix MT8192 GPU node name drm/amd/display: Fix artifacting on eDP panels when engaging freesync video mode drm/radeon: fix possible division-by-zero errors HID: uclogic: Modular KUnit tests should not depend on KUNIT=y RDMA/rxe: Add ibdev_dbg macros for rxe RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mw.c RDMA/rxe: Fix access checks in rxe_check_bind_mw amdgpu: validate offset_in_bo of drm_amdgpu_gem_va drm/msm/a5xx: really check for A510 in a5xx_gpu_init RDMA/bnxt_re: wraparound mbox producer index RDMA/bnxt_re: Avoid calling wake_up threads from spin_lock context clk: imx: clk-imxrt1050: fix memory leak in imxrt1050_clocks_probe clk: imx: clk-imx8mn: fix memory leak in imx8mn_clocks_probe clk: imx93: fix memory leak and missing unwind goto in imx93_clocks_probe clk: imx: clk-imx8mp: improve error handling in imx8mp_clocks_probe() arm64: dts: qcom: sdm845: Flush RSC sleep & wake votes arm64: dts: qcom: sm8250-edo: Panel framebuffer is 2.5k instead of 4k clk: bcm: rpi: Fix off by one in raspberrypi_discover_clocks() clk: clocking-wizard: Fix Oops in clk_wzrd_register_divider() clk: tegra: tegra124-emc: Fix potential memory leak ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer drm/msm/dpu: do not enable color-management if DSPPs are not available drm/msm/dpu: Fix slice_last_group_size calculation drm/msm/dsi: Use DSC slice(s) packet size to compute word count drm/msm/dsi: Flip greater-than check for slice_count and slice_per_intf drm/msm/dsi: Remove incorrect references to slice_count drm/msm/dp: Free resources after unregistering them arm64: dts: mediatek: Add cpufreq nodes for MT8192 arm64: dts: mediatek: mt8192: Fix CPUs capacity-dmips-mhz drm/amdgpu: Fix memcpy() in sienna_cichlid_append_powerplay_table function. drm/amdgpu: Fix usage of UMC fill record in RAS drm/msm/dpu: correct MERGE_3D length clk: vc5: check memory returned by kasprintf() clk: cdce925: check return value of kasprintf() clk: si5341: return error if one synth clock registration fails clk: si5341: check return value of {devm_}kasprintf() clk: si5341: free unused memory on probe failure clk: keystone: sci-clk: check return value of kasprintf() clk: ti: clkctrl: check return value of kasprintf() drivers: meson: secure-pwrc: always enable DMA domain ovl: update of dentry revalidate flags after copy up ASoC: imx-audmix: check return value of devm_kasprintf() clk: Fix memory leak in devm_clk_notifier_register() ARM: dts: lan966x: kontron-d10: fix board reset ARM: dts: lan966x: kontron-d10: fix SPI CS ASoC: amd: acp: clear pdm dma interrupt mask PCI: cadence: Fix Gen2 Link Retraining process PCI: vmd: Reset VMD config register between soft reboots scsi: qedf: Fix NULL dereference in error handling pinctrl: bcm2835: Handle gpiochip_add_pin_range() errors platform/x86: lenovo-yogabook: Fix work race on remove() platform/x86: lenovo-yogabook: Reprobe devices on remove() platform/x86: lenovo-yogabook: Set default keyboard backligh brightness on probe() PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe() PCI: pciehp: Cancel bringup sequence if card is not present PCI: ftpci100: Release the clock resources pinctrl: sunplus: Add check for kmalloc PCI: Add pci_clear_master() stub for non-CONFIG_PCI scsi: lpfc: Revise NPIV ELS unsol rcv cmpl logic to drop ndlp based on nlp_state perf bench: Add missing setlocale() call to allow usage of %'d style formatting pinctrl: cherryview: Return correct value if pin in push-pull mode platform/x86: think-lmi: mutex protection around multiple WMI calls platform/x86: think-lmi: Correct System password interface platform/x86: think-lmi: Correct NVME password handling pinctrl:sunplus: Add check for kmalloc pinctrl: npcm7xx: Add missing check for ioremap kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures powerpc/interrupt: Don't read MSR from interrupt_exit_kernel_prepare() powerpc/signal32: Force inlining of __unsafe_save_user_regs() and save_tm_user_regs_unsafe() perf script: Fix allocation of evsel->priv related to per-event dump files platform/x86: thinkpad_acpi: Fix lkp-tests warnings for platform profiles perf dwarf-aux: Fix off-by-one in die_get_varname() platform/x86/dell/dell-rbtn: Fix resources leaking on error path perf tool x86: Consolidate is_amd check into single function perf tool x86: Fix perf_env memory leak powerpc/64s: Fix VAS mm use after free pinctrl: microchip-sgpio: check return value of devm_kasprintf() pinctrl: at91-pio4: check return value of devm_kasprintf() powerpc/powernv/sriov: perform null check on iov before dereferencing iov powerpc: simplify ppc_save_regs powerpc: update ppc_save_regs to save current r1 in pt_regs PCI: qcom: Remove PCIE20_ prefix from register definitions PCI: qcom: Sort and group registers and bitfield definitions PCI: qcom: Use lower case for hex PCI: qcom: Use DWC helpers for modifying the read-only DBI registers PCI: qcom: Disable write access to read only registers for IP v2.9.0 riscv: uprobes: Restore thread.bad_cause powerpc/book3s64/mm: Fix DirectMap stats in /proc/meminfo powerpc/mm/dax: Fix the condition when checking if altmap vmemap can cross-boundary PCI: endpoint: Fix Kconfig indent style PCI: endpoint: Fix a Kconfig prompt of vNTB driver PCI: endpoint: functions/pci-epf-test: Fix dma_chan direction PCI: vmd: Fix uninitialized variable usage in vmd_enable_domain() vfio/mdev: Move the compat_class initialization to module init hwrng: virtio - Fix race on data_avail and actual data modpost: remove broken calculation of exception_table_entry size crypto: nx - fix build warnings when DEBUG_FS is not enabled modpost: fix section mismatch message for R_ARM_ABS32 modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24} crypto: marvell/cesa - Fix type mismatch warning crypto: jitter - correct health test during initialization modpost: fix off by one in is_executable_section() ARC: define ASM_NL and __ALIGN(_STR) outside #ifdef __ASSEMBLY__ guard crypto: kpp - Add helper to set reqsize crypto: qat - Use helper to set reqsize crypto: qat - unmap buffer before free for DH crypto: qat - unmap buffers before free for RSA NFSv4.2: fix wrong shrinker_id NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION SMB3: Do not send lease break acknowledgment if all file handles have been closed dax: Fix dax_mapping_release() use after free dax: Introduce alloc_dev_dax_id() dax/kmem: Pass valid argument to memory_group_register_static hwrng: st - keep clock enabled while hwrng is registered kbuild: Disable GCOV for *.mod.o efi/libstub: Disable PCI DMA before grabbing the EFI memory map cifs: prevent use-after-free by freeing the cfile later cifs: do all necessary checks for credits within or before locking smb: client: fix broken file attrs with nodfs mounts ksmbd: avoid field overflow warning arm64: sme: Use STR P to clear FFR context field in streaming SVE mode x86/efi: Make efi_set_virtual_address_map IBT safe md/raid1-10: fix casting from randomized structure in raid1_submit_write() USB: serial: option: add LARA-R6 01B PIDs usb: dwc3: gadget: Propagate core init errors to UDC during pullup phy: tegra: xusb: Clear the driver reference in usb-phy dev iio: adc: ad7192: Fix null ad7192_state pointer access iio: adc: ad7192: Fix internal/external clock selection iio: accel: fxls8962af: errata bug only applicable for FXLS8962AF iio: accel: fxls8962af: fixup buffer scan element type Revert "drm/amd/display: edp do not add non-edid timings" mm/mmap: Fix VM_LOCKED check in do_vmi_align_munmap() ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook ALSA: hda/realtek: Add quirk for Clevo NPx0SNx ALSA: jack: Fix mutex call in snd_jack_report() ALSA: pcm: Fix potential data race at PCM memory allocation helpers block: fix signed int overflow in Amiga partition support block: add overflow checks for Amiga partition support block: change all __u32 annotations to __be32 in affs_hardblocks.h block: increment diskseq on all media change events btrfs: fix race when deleting free space root from the dirty cow roots list SUNRPC: Fix UAF in svc_tcp_listen_data_ready() w1: w1_therm: fix locking behavior in convert_t w1: fix loop in w1_fini() dt-bindings: power: reset: qcom-pon: Only allow reboot-mode pre-pmk8350 f2fs: do not allow to defragment files have FI_COMPRESS_RELEASED sh: j2: Use ioremap() to translate device tree address into kernel memory usb: dwc2: platform: Improve error reporting for problems during .remove() usb: dwc2: Fix some error handling paths serial: 8250: omap: Fix freeing of resources on failed register clk: qcom: mmcc-msm8974: remove oxili_ocmemgx_clk clk: qcom: camcc-sc7180: Add parent dependency to all camera GDSCs clk: qcom: gcc-ipq6018: Use floor ops for sdcc clocks clk: qcom: gcc-qcm2290: Mark RCGs shared where applicable media: usb: Check az6007_read() return value media: amphion: drop repeated codec data for vc1l format media: amphion: drop repeated codec data for vc1g format media: amphion: initiate a drain of the capture queue in dynamic resolution change media: videodev2.h: Fix struct v4l2_input tuner index comment media: usb: siano: Fix warning due to null work_func_t function pointer media: i2c: Correct format propagation for st-mipid02 media: hi846: fix usage of pm_runtime_get_if_in_use() media: mediatek: vcodec: using decoder status instead of core work count clk: qcom: reset: support resetting multiple bits clk: qcom: ipq6018: fix networking resets clk: qcom: dispcc-qcm2290: Fix BI_TCXO_AO handling clk: qcom: dispcc-qcm2290: Fix GPLL0_OUT_DIV handling clk: qcom: mmcc-msm8974: use clk_rcg2_shared_ops for mdp_clk_src clock staging: vchiq_arm: mark vchiq_platform_init() static usb: dwc3: qcom: Fix potential memory leak usb: gadget: u_serial: Add null pointer check in gserial_suspend extcon: Fix kernel doc of property fields to avoid warnings extcon: Fix kernel doc of property capability fields to avoid warnings usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe() usb: hide unused usbfs_notify_suspend/resume functions usb: misc: eud: Fix eud sysfs path (use 'qcom_eud') serial: core: lock port for stop_rx() in uart_suspend_port() serial: 8250: lock port for stop_rx() in omap8250_irq() serial: core: lock port for start_rx() in uart_resume_port() serial: 8250: lock port for UART_IER access in omap8250_irq() kernfs: fix missing kernfs_idr_lock to remove an ID from the IDR lkdtm: replace ll_rw_block with submit_bh i3c: master: svc: fix cpu schedule in spin lock coresight: Fix loss of connection info when a module is unloaded mfd: rt5033: Drop rt5033-battery sub-device media: venus: helpers: Fix ALIGN() of non power of two media: atomisp: gmin_platform: fix out_len in gmin_get_config_dsm_var() sh: Avoid using IRQ0 on SH3 and SH4 gfs2: Fix duplicate should_fault_in_pages() call f2fs: fix potential deadlock due to unpaired node_write lock use f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io() KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes usb: dwc3: qcom: Release the correct resources in dwc3_qcom_remove() usb: dwc3: qcom: Fix an error handling path in dwc3_qcom_probe() usb: common: usb-conn-gpio: Set last role to unknown before initial detection usb: dwc3-meson-g12a: Fix an error handling path in dwc3_meson_g12a_probe() mfd: wcd934x: Fix an error handling path in wcd934x_slim_probe() mfd: intel-lpss: Add missing check for platform_get_resource Revert "usb: common: usb-conn-gpio: Set last role to unknown before initial detection" serial: 8250_omap: Use force_suspend and resume for system suspend device property: Fix documentation for fwnode_get_next_parent() device property: Clarify description of returned value in some functions drivers: fwnode: fix fwnode_irq_get[_byname]() nvmem: sunplus-ocotp: release otp->clk before return nvmem: rmem: Use NVMEM_DEVID_AUTO bus: fsl-mc: don't assume child devices are all fsl-mc devices mfd: stmfx: Fix error path in stmfx_chip_init mfd: stmfx: Nullify stmfx->vdd in case of error KVM: s390: vsie: fix the length of APCB bitmap KVM: s390/diag: fix racy access of physical cpu number in diag 9c handler cpufreq: mediatek: correct voltages for MT7622 and MT7623 misc: fastrpc: check return value of devm_kasprintf() clk: qcom: mmcc-msm8974: fix MDSS_GDSC power flags hwtracing: hisi_ptt: Fix potential sleep in atomic context mfd: stmpe: Only disable the regulators if they are enabled phy: tegra: xusb: check return value of devm_kzalloc() lib/bitmap: drop optimization of bitmap_{from,to}_arr64 pwm: imx-tpm: force 'real_period' to be zero in suspend pwm: sysfs: Do not apply state to already disabled PWMs pwm: ab8500: Fix error code in probe() pwm: mtk_disp: Fix the disable flow of disp_pwm md/raid10: fix the condition to call bio_end_io_acct() rtc: st-lpc: Release some resources in st_rtc_probe() in case of error drm/i915/psr: Use hw.adjusted mode when calculating io/fast wake times drm/i915/guc/slpc: Apply min softlimit correctly f2fs: check return value of freeze_super() media: cec: i2c: ch7322: also select REGMAP sctp: fix potential deadlock on &net->sctp.addr_wq_lock net/sched: act_ipt: add sanity checks on table name and hook locations net: add a couple of helpers for iph tot_len net/sched: act_ipt: add sanity checks on skb before calling target spi: spi-geni-qcom: enable SPI_CONTROLLER_MUST_TX for GPI DMA mode net: mscc: ocelot: don't report that RX timestamping is enabled by default net: mscc: ocelot: don't keep PTP configuration of all ports in single structure net: dsa: felix: don't drop PTP frames with tag_8021q when RX timestamping is disabled net: dsa: sja1105: always enable the INCL_SRCPT option net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT Add MODULE_FIRMWARE() for FIRMWARE_TG357766. Bluetooth: fix invalid-bdaddr quirk for non-persistent setup Bluetooth: ISO: use hci_sync for setting CIG parameters Bluetooth: MGMT: add CIS feature bits to controller information Bluetooth: MGMT: Use BIT macro when defining bitfields Bluetooth: MGMT: Fix marking SCAN_RSP as not connectable ibmvnic: Do not reset dql stats on NON_FATAL err net: dsa: vsc73xx: fix MTU configuration mlxsw: minimal: fix potential memory leak in mlxsw_m_linecards_init spi: bcm-qspi: return error if neither hif_mspi nor mspi is available drm/amdgpu: fix number of fence calculations drm/amd: Don't try to enable secure display TA multiple times mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0 f2fs: fix error path handling in truncate_dnode() octeontx2-af: Fix mapping for NIX block from CGX connection octeontx2-af: Add validation before accessing cgx and lmac ntfs: Fix panic about slab-out-of-bounds caused by ntfs_listxattr() powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y powerpc: dts: turris1x.dts: Fix PCIe MEM size for pci2 node net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge mode net: fix net_dev_start_xmit trace event vs skb_transport_offset() tcp: annotate data races in __tcp_oow_rate_limited() bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set() xsk: Honor SO_BINDTODEVICE on bind net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX fanotify: disallow mount/sb marks on kernel internal pseudo fs riscv: move memblock_allow_resize() after linear mapping is ready pptp: Fix fib lookup calls. net: dsa: tag_sja1105: fix MAC DA patching from meta frames net: dsa: sja1105: always enable the send_meta options octeontx-af: fix hardware timestamp configuration afs: Fix accidental truncation when storing data s390/qeth: Fix vipa deletion sh: dma: Fix DMA channel offset calculation apparmor: fix missing error check for rhashtable_insert_fast i2c: xiic: Don't try to handle more interrupt events after error dm: fix undue/missing spaces dm: avoid split of quoted strings where possible dm ioctl: have constant on the right side of the test dm ioctl: Avoid double-fetch of version extcon: usbc-tusb320: Convert to i2c's .probe_new() extcon: usbc-tusb320: Unregister typec port on driver removal btrfs: do not BUG_ON() on tree mod log failure at balance_level() i2c: qup: Add missing unwind goto in qup_i2c_probe() irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment NFSD: add encoding of op_recall flag for write delegation irqchip/loongson-pch-pic: Fix initialization of HT vector register io_uring: wait interruptibly for request completions on exit mmc: core: disable TRIM on Kingston EMMC04G-M627 mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M mmc: mmci: Set PROBE_PREFER_ASYNCHRONOUS mmc: sdhci: fix DMA configure compatibility issue when 64bit DMA mode is used. wifi: cfg80211: fix regulatory disconnect for non-MLO wifi: ath10k: Serialize wake_tx_queue ops wifi: mt76: mt7921e: fix init command fail with enabled device bcache: fixup btree_cache_wait list damage bcache: Remove unnecessary NULL point check in node allocations bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent watch_queue: prevent dangling pipe pointer um: Use HOST_DIR for mrproper integrity: Fix possible multiple allocation in integrity_inode_get() autofs: use flexible array in ioctl structure mm/damon/ops-common: atomically test and clear young on ptes and pmds shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs jffs2: reduce stack usage in jffs2_build_xattr_subsystem() fs: avoid empty option when generating legacy mount string ext4: Remove ext4 locking of moved directory Revert "f2fs: fix potential corruption when moving a directory" fs: Establish locking order for unrelated directories fs: Lock moved directories i2c: nvidia-gpu: Add ACPI property to align with device-tree i2c: nvidia-gpu: Remove ccgx,firmware-build property usb: typec: ucsi: Mark dGPUs as DEVICE scope ipvs: increase ip_vs_conn_tab_bits range for 64BIT btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile btrfs: delete unused BGs while reclaiming BGs btrfs: bail out reclaim process if filesystem is read-only btrfs: add block-group tree to lockdep classes btrfs: reinsert BGs failed to reclaim btrfs: fix race when deleting quota root from the dirty cow roots list btrfs: fix extent buffer leak after tree mod log failure at split_node() btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block() ASoC: mediatek: mt8173: Fix irq error path ASoC: mediatek: mt8173: Fix snd_soc_component_initialize error path regulator: tps65219: Fix matching interrupts for their regulators ARM: dts: qcom: ipq4019: fix broken NAND controller properties override ARM: orion5x: fix d2net gpio initialization leds: trigger: netdev: Recheck NETDEV_LED_MODE_LINKUP on dev rename blktrace: use inline function for blk_trace_remove() while blktrace is disabled fs: no need to check source xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately xfs: check that per-cpu inodegc workers actually run on that cpu xfs: disable reaping in fscounters scrub xfs: fix xfs_inodegc_stop racing with mod_delayed_work mm/mmap: Fix extra maple tree write drm/i915: Fix TypeC mode initialization during system resume drm/i915/tc: Fix TC port link ref init for DP MST during HW readout drm/i915/tc: Fix system resume MST mode restore for DP-alt sinks mtd: parsers: refer to ARCH_BCMBCA instead of ARCH_BCM4908 netfilter: nf_tables: unbind non-anonymous set if rule construction fails netfilter: conntrack: Avoid nf_ct_helper_hash uses after free netfilter: nf_tables: do not ignore genmask when looking up chain by id netfilter: nf_tables: prevent OOB access in nft_byteorder_eval wireguard: queueing: use saner cpu selection wrapping wireguard: netlink: send staged packets when setting initial private key tty: serial: fsl_lpuart: add earlycon for imx8ulp platform block/partition: fix signedness issue for Amiga partitions sh: mach-r2d: Handle virq offset in cascaded IRL demux sh: mach-highlander: Handle virq offset in cascaded IRL demux sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ io_uring: Use io_schedule* in cqring wait Linux 6.1.39 Change-Id: I5867c943c99c157fa599ecd08da961c632e58302 Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
583a8426ab |
kallsyms: Fix kallsyms_selftest failure
commit 33f0467fe06934d5e4ea6e24ce2b9c65ce618e26 upstream. Kernel test robot reported a kallsyms_test failure when clang lto is enabled (thin or full) and CONFIG_KALLSYMS_SELFTEST is also enabled. I can reproduce in my local environment with the following error message with thin lto: [ 1.877897] kallsyms_selftest: Test for 1750th symbol failed: (tsc_cs_mark_unstable) addr=ffffffff81038090 [ 1.877901] kallsyms_selftest: abort It appears that commit 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions") caused the failure. Commit 8cc32a9bbf29 changed cleanup_symbol_name() based on ".llvm." instead of '.' where ".llvm." is appended to a before-lto-optimization local symbol name. We need to propagate such knowledge in kallsyms_selftest.c as well. Further more, compare_symbol_name() in kallsyms.c needs change as well. In scripts/kallsyms.c, kallsyms_names and kallsyms_seqs_of_names are used to record symbol names themselves and index to symbol names respectively. For example: kallsyms_names: ... __amd_smn_rw._entry <== seq 1000 __amd_smn_rw._entry.5 <== seq 1001 __amd_smn_rw.llvm.<hash> <== seq 1002 ... kallsyms_seqs_of_names are sorted based on cleanup_symbol_name() through, so the order in kallsyms_seqs_of_names actually has index 1000: seq 1002 <== __amd_smn_rw.llvm.<hash> (actual symbol comparison using '__amd_smn_rw') index 1001: seq 1000 <== __amd_smn_rw._entry index 1002: seq 1001 <== __amd_smn_rw._entry.5 Let us say at a particular point, at index 1000, symbol '__amd_smn_rw.llvm.<hash>' is comparing to '__amd_smn_rw._entry' where '__amd_smn_rw._entry' is the one to search e.g., with function kallsyms_on_each_match_symbol(). The current implementation will find out '__amd_smn_rw._entry' is less than '__amd_smn_rw.llvm.<hash>' and then continue to search e.g., index 999 and never found a match although the actual index 1001 is a match. To fix this issue, let us do cleanup_symbol_name() first and then do comparison. In the above case, comparing '__amd_smn_rw' vs '__amd_smn_rw._entry' and '__amd_smn_rw._entry' being greater than '__amd_smn_rw', the next comparison will be > index 1000 and eventually index 1001 will be hit an a match is found. For any symbols not having '.llvm.' substr, there is no functionality change for compare_symbol_name(). Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202308232200.1c932a90-oliver.sang@intel.com Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Reviewed-by: Song Liu <song@kernel.org> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Link: https://lore.kernel.org/r/20230825034659.1037627-1-yonghong.song@linux.dev Cc: stable@vger.kernel.org Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
b3d099df68 |
lockdep: fix static memory detection even more
commit 0a6b58c5cd0dfd7961e725212f0fc8dfc5d96195 upstream. On the parisc architecture, lockdep reports for all static objects which are in the __initdata section (e.g. "setup_done" in devtmpfs, "kthreadd_done" in init/main.c) this warning: INFO: trying to register non-static key. The warning itself is wrong, because those objects are in the __initdata section, but the section itself is on parisc outside of range from _stext to _end, which is why the static_obj() functions returns a wrong answer. While fixing this issue, I noticed that the whole existing check can be simplified a lot. Instead of checking against the _stext and _end symbols (which include code areas too) just check for the .data and .bss segments (since we check a data object). This can be done with the existing is_kernel_core_data() macro. In addition objects in the __initdata section can be checked with init_section_contains(), and is_kernel_rodata() allows keys to be in the _ro_after_init section. This partly reverts and simplifies commit |
||
|
207e228bf1 |
module: Expose module_init_layout_section()
commit 2abcc4b5a64a65a2d2287ba0be5c2871c1552416 upstream.
module_init_layout_section() choses whether the core module loader
considers a section as init or not. This affects the placement of the
exit section when module unloading is disabled. This code will never run,
so it can be free()d once the module has been initialised.
arm and arm64 need to count the number of PLTs they need before applying
relocations based on the section name. The init PLTs are stored separately
so they can be free()d. arm and arm64 both use within_module_init() to
decide which list of PLTs to use when applying the relocation.
Because within_module_init()'s behaviour changes when module unloading
is disabled, both architecture would need to take this into account when
counting the PLTs.
Today neither architecture does this, meaning when module unloading is
disabled there are insufficient PLTs in the init section to load some
modules, resulting in warnings:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
| module_emit_plt_entry+0x184/0x1cc
| apply_relocate_add+0x2bc/0x8e4
| load_module+0xe34/0x1bd4
| init_module_from_file+0x84/0xc0
| __arm64_sys_finit_module+0x1b8/0x27c
| invoke_syscall.constprop.0+0x5c/0x104
| do_el0_svc+0x58/0x160
| el0_svc+0x38/0x110
| el0t_64_sync_handler+0xc0/0xc4
| el0t_64_sync+0x190/0x194
Instead of duplicating module_init_layout_section()s logic, expose it.
Reported-by: Adam Johnston <adam.johnston@arm.com>
Fixes:
|
||
|
d3ff67076b |
cgroup/cpuset: Free DL BW in case can_attach() fails
commit 2ef269ef1ac006acf974793d975539244d77b28f upstream. cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks have been checked. DL BW is not allocated per-task but as a sum over all DL tasks migrating. If multiple controllers are attached to the cgroup next to the cpuset controller a non-cpuset can_attach() can fail. In this case free DL BW in cpuset_cancel_attach(). Finally, update cpuset DL task count (nr_deadline_tasks) only in cpuset_attach(). Suggested-by: Waiman Long <longman@redhat.com> Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
f0135131bb |
sched/deadline: Create DL BW alloc, free & check overflow interface
commit 85989106feb734437e2d598b639991b9185a43a6 upstream. While moving a set of tasks between exclusive cpusets, cpuset_can_attach() -> task_can_attach() calls dl_cpu_busy(..., p) for DL BW overflow checking and per-task DL BW allocation on the destination root_domain for the DL tasks in this set. This approach has the issue of not freeing already allocated DL BW in the following error cases: (1) The set of tasks includes multiple DL tasks and DL BW overflow checking fails for one of the subsequent DL tasks. (2) Another controller next to the cpuset controller which is attached to the same cgroup fails in its can_attach(). To address this problem rework dl_cpu_busy(): (1) Split it into dl_bw_check_overflow() & dl_bw_alloc() and add a dedicated dl_bw_free(). (2) dl_bw_alloc() & dl_bw_free() take a `u64 dl_bw` parameter instead of a `struct task_struct *p` used in dl_cpu_busy(). This allows to allocate DL BW for a set of tasks too rather than only for a single task. Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
064b960dbe |
cgroup/cpuset: Iterate only if DEADLINE tasks are present
commit c0f78fd5edcf29b2822ac165f9248a6c165e8554 upstream. update_tasks_root_domain currently iterates over all tasks even if no DEADLINE task is present on the cpuset/root domain for which bandwidth accounting is being rebuilt. This has been reported to introduce 10+ ms delays on suspend-resume operations. Skip the costly iteration for cpusets that don't contain DEADLINE tasks. Reported-by: Qais Yousef (Google) <qyousef@layalina.io> Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/ Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
d1b4262b78 |
sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
commit 6c24849f5515e4966d94fa5279bdff4acf2e9489 upstream. Qais reported that iterating over all tasks when rebuilding root domains for finding out which ones are DEADLINE and need their bandwidth correctly restored on such root domains can be a costly operation (10+ ms delays on suspend-resume). To fix the problem keep track of the number of DEADLINE tasks belonging to each cpuset and then use this information (followup patch) to only perform the above iteration if DEADLINE tasks are actually present in the cpuset for which a corresponding root domain is being rebuilt. Reported-by: Qais Yousef (Google) <qyousef@layalina.io> Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/ Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
9bcfe15278 |
sched/cpuset: Bring back cpuset_mutex
commit 111cd11bbc54850f24191c52ff217da88a5e639b upstream.
Turns out percpu_cpuset_rwsem - commit
|
||
|
7030fbf75f |
cgroup/cpuset: Rename functions dealing with DEADLINE accounting
commit ad3a557daf6915296a43ef97a3e9c48e076c9dd8 upstream. rebuild_root_domains() and update_tasks_root_domain() have neutral names, but actually deal with DEADLINE bandwidth accounting. Rename them to use 'dl_' prefix so that intent is more clear. No functional change. Suggested-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Juri Lelli <juri.lelli@redhat.com> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
2cb0c037c9 |
tracing: Fix memleak due to race between current_tracer and trace
[ Upstream commit eecb91b9f98d6427d4af5fdb8f108f52572a39e7 ]
Kmemleak report a leak in graph_trace_open():
unreferenced object 0xffff0040b95f4a00 (size 128):
comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
hex dump (first 32 bytes):
e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
backtrace:
[<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
[<000000007df90faa>] graph_trace_open+0xb0/0x344
[<00000000737524cd>] __tracing_open+0x450/0xb10
[<0000000098043327>] tracing_open+0x1a0/0x2a0
[<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
[<000000004015bcd6>] vfs_open+0x98/0xd0
[<000000002b5f60c9>] do_open+0x520/0x8d0
[<00000000376c7820>] path_openat+0x1c0/0x3e0
[<00000000336a54b5>] do_filp_open+0x14c/0x324
[<000000002802df13>] do_sys_openat2+0x2c4/0x530
[<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
[<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
[<00000000313647bf>] do_el0_svc+0xac/0xec
[<000000002ef1c651>] el0_svc+0x20/0x30
[<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
[<000000000c309c35>] el0_sync+0x160/0x180
The root cause is descripted as follows:
__tracing_open() { // 1. File 'trace' is being opened;
...
*iter->trace = *tr->current_trace; // 2. Tracer 'function_graph' is
// currently set;
...
iter->trace->open(iter); // 3. Call graph_trace_open() here,
// and memory are allocated in it;
...
}
s_start() { // 4. The opened file is being read;
...
*iter->trace = *tr->current_trace; // 5. If tracer is switched to
// 'nop' or others, then memory
// in step 3 are leaked!!!
...
}
To fix it, in s_start(), close tracer before switching then reopen the
new tracer after switching. And some tracers like 'wakeup' may not update
'iter->private' in some cases when reopen, then it should be cleared
to avoid being mistakenly closed again.
Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com
Fixes:
|
||
|
7d0c2b0de2 |
tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
[ Upstream commit b71645d6af10196c46cbe3732de2ea7d36b3ff6d ]
Trace ring buffer can no longer record anything after executing
following commands at the shell prompt:
# cd /sys/kernel/tracing
# cat tracing_cpumask
fff
# echo 0 > tracing_cpumask
# echo 1 > snapshot
# echo fff > tracing_cpumask
# echo 1 > tracing_on
# echo "hello world" > trace_marker
-bash: echo: write error: Bad file descriptor
The root cause is that:
1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers
in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask());
2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped
with 'tr->max_buffer.buffer', then the 'record_disabled' became 0
(see update_max_tr());
3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1;
Then array_buffer and max_buffer are both unavailable due to value of
'record_disabled' is not 0.
To fix it, enable or disable both array_buffer and max_buffer at the same
time in tracing_set_cpumask().
Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Fixes:
|
||
|
e1fcc974b1 |
Merge keystone/android14-6.1-keystone-qcom-release.6.1.25 (af4467f) into
qcom-6.1 * refs/heads/tmp-af4467f: ANDROID: ABI: Update STG ABI to format version 2 ANDROID: GKI: Update pixel symbol list for thermal ANDROID: thermal: Add vendor thermal genl check ANDROID: ABI: Update symbol for Exynos SoC ANDROID: GKI: Update mtk ABI symbol list ANDROID: ABI: Update symbol list for imx FROMGIT: Multi-gen LRU: Fix per-zone reclaim ANDROID: GKI: Update abi_gki_aarch64_qcom ANDROID: ABI: Update STG ABI to format version 2 BACKPORT: FROMGIT: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627 ANDROID: ABI: update symbol list for Xclipse GPU ANDROID: drm/ttm: export ttm_tt_unpopulate() ANDROID: fuse-bpf: Add partial flock support ANDROID: Incremental fs: Allocate data buffer based on input request size UPSTREAM: gfs2: Don't deref jdesc in evict ANDROID: KVM: arm64: Fix MMU context save/restore over TLB invalidation ANDROID: Update symbol list for VIVO ANDROID: add initial symbol list file for ExynosAuto SoCs ANDROID: sched: Export sched_domains_mutex for lockdep ANDROID: Update symbol for Exynos SoC ANDROID: ABI: Update symbol for Exynos SoC ANDROID: Update symbol list for mtk UPSTREAM: dma-remap: use kvmalloc_array/kvfree for larger dma memory remap ANDROID: vendor_hooks: Supplement the missing hook call point. ANDROID: GKI: Add WWAN as GKI protected module ANDROID: GKI: regmap: Add regmap vendor hook for of_syscon_register UPSTREAM: kasan: suppress recursive reports for HW_TAGS UPSTREAM: kasan, arm64: add arch_suppress_tag_checks_start/stop UPSTREAM: arm64: mte: rename TCO routines BACKPORT: kasan, arm64: rename tagging-related routines UPSTREAM: kasan: drop empty tagging-related defines ANDROID: usb: xhci-plat: Fix double-free in xhci_plat_remove ANDROID: ABI: update symbol list for galaxy ANDROID: GKI: update the ABI symbol list ANDROID: ABI: Update symbol for Exynos SoC ANDROID: GKI: ABI: update whitelist for the kmsg_dump and native_hang symbols used by unisoc for kernel6.1 ANDROID: ABI: Update symbols to unisoc whitelist for ims_bridge module ANDROID: abi_gki_aarch64_qcom: Add drm_plane_from_index and drm_gem_prime_export ANDROID: abi_gki_aarch64_qcom: Update symbol list UPSTREAM: fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds UPSTREAM: fsverity: explicitly check for buffer overflow in build_merkle_tree() ANDROID: update unisoc symbol list ANDROID: update symbol for unisoc whitelist UPSTREAM: f2fs: fix deadlock in i_xattr_sem and inode page lock ANDROID: GKI: update xiaomi symbol list Revert "FROMLIST: f2fs: remove i_xattr_sem to avoid deadlock and fix the original issue" ANDROID: ABI: Update pixel symbol list ANDROID: Set arch attribute for allmodconfig builds UPSTREAM: usb: gadget: udc: renesas_usb3: Fix use after free bug in renesas_usb3_remove due to race condition ANDROID: ABI: Add to QCOM symbols list UPSTREAM: arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block UPSTREAM: media: rkvdec: fix use after free bug in rkvdec_remove ANDROID: GKI: Update symbol list for MediatTek UPSTREAM: scsi: ufs: core: Remove dedicated hwq for dev command BACKPORT: scsi: ufs: mcq: Fix the incorrect OCS value for the device command FROMLIST: scsi: ufs: ufs-mediatek: Add MCQ support for MTK platform FROMLIST: scsi: ufs: core: Export symbols for MTK driver module UPSTREAM: blk-mq: check on cpu id when there is only one ctx mapping UPSTREAM: relayfs: fix out-of-bounds access in relay_file_read UPSTREAM: net/sched: flower: fix possible OOB write in fl_set_geneve_opt() UPSTREAM: x86/mm: Avoid using set_pgd() outside of real PGD pages UPSTREAM: iommu/amd: Add missing domain type checks UPSTREAM: tty: serial: qcom_geni: avoid duplicate struct member init UPSTREAM: scsi: ufs: core: bsg: Fix cast to restricted __be16 warning UPSTREAM: netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE ANDROID: fix build error when use cpu_cgroup_online vh ANDROID: ABI: add android_debug_symbol to whitelist ANDROID: defconfig: Enable debug_symbol driver ANDROID: android: Create debug_symbols driver ANDROID: ABI: update symbol list for exynos ANDROID: KVM: arm64: Remove 'struct kvm_vcpu' from the KMI UPSTREAM: KVM: arm64: Restore GICv2-on-GICv3 functionality UPSTREAM: KVM: arm64: vgic: Wrap vgic_its_create() with config_lock UPSTREAM: KVM: arm64: vgic: Fix a circular locking issue UPSTREAM: KVM: arm64: vgic: Don't acquire its_lock before config_lock BACKPORT: KVM: arm64: Avoid lock inversion when setting the VM register width UPSTREAM: KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON BACKPORT: KVM: arm64: Use config_lock to protect data ordered against KVM_RUN UPSTREAM: KVM: arm64: Use config_lock to protect vgic state BACKPORT: KVM: arm64: Add helper vgic_write_guest_lock() ANDROID: sound: usb: Fix wrong behavior of vendor hooking ANDROID: GKI: USB: XHCI: add Android ABI padding to struct xhci_vendor_ops Revert "ANDROID: android: Create debug_symbols driver" ANDROID: android: Create debug_symbols driver UPSTREAM: ipvlan:Fix out-of-bounds caused by unclear skb->cb ANDROID: update symbol list for unisoc vendor hook ANDROID: thermal: Add hook to enable/disable thermal power throttle ANDROID: ABI: Update symbol for Exynos SoC BACKPORT: FROMGIT: usb: gadget: udc: Handle gadget_connect failure during bind operation FROMGIT: usb: dwc3: gadget: Bail out in pullup if soft reset timeout happens ANDROID: GKI: Update symbol list for xiaomi ANDROID: vendor_hooks: vendor hook for MM ANDROID: add a symbol to unisoc symbol list ANDROID: GKI: update symbol list file for xiaomi UPSTREAM: net/sched: cls_u32: Fix reference counter leak leading to overflow ANDROID: db845c: Fix build when using --kgdb FROMGIT: usb: host: xhci-plat: Set XHCI_STATE_REMOVING before resuming XHCI HC FROMGIT: usb: host: xhci: Do not re-initialize the XHCI HC if being removed FROMLIST: kheaders: dereferences the source tree FROMLIST: f2fs: remove i_xattr_sem to avoid deadlock and fix the original issue ANDROID: db845c: Local define for db845c targets ANDROID: GKI: Update symbols to symbol list ANDROID: Export memcg functions to allow module to add new files ANDROID: rockpi4: Fix build when using --kgdb ANDROID: GKI: update symbol list file for xiaomi ANDROID: kleaf: android/gki_system_dlkm_modules is generated. ANDROID: ABI: Update pixel symbol list ANDROID: fuse-bpf: Move FUSE_RELEASE to correct place ANDROID: fuse-bpf: Ensure bpf field can never be nulled ANDROID: GKI: Increase CMA areas to 32 ANDROID: Delete MODULES_LIST from build configs. ANDROID: ABI: Update symbols to unisoc whitelist ANDROID: HID: Only utilise UHID provided exports if UHID is enabled Conflicts: BUILD.bazel Change-Id: Ibeee32bbc28dd5ad943cfb512ae73094cce2027c Upstream-Build: ks_qcom-android14-6.1-keystone-qcom-release@10659679 UKQ2.230815.001 Signed-off-by: jianzhou <quic_jianzhou@quicinc.com> |
||
|
ad947e0d93 | Merge "sched/walt: Fix cluster index swap" | ||
|
7551a1a2a1 |
ANDROID: cgroup: Add android_rvh_cgroup_force_kthread_migration
In Android GKI, CONFIG_FAIR_GROUP_SCHED is enabled [1] to help
prioritize important work. Given that CPU shares of root cgroup
can't be changed, leaving the tasks inside root cgroup will give
them higher share compared to the other tasks inside important
cgroups. This is mitigated by moving all tasks inside root cgroup to
a different cgroup after Android is booted. However, there are many
kernel tasks stuck in the root cgroup after the boot.
It is possible to relax kernel threads and kworkers migrations under
certain scenarios. However the patch [2] posted at upstream is not
accepted. Hence add a restricted vendor hook to notify modules when a
kernel thread is requested for cgroup migration. The modules can relax
the restrictions forced by the kernel and allow the cgroup migration.
[1]
|
||
|
7afa84fbb9 |
ANDROID: vendor_hooks: Add hooks for waking up and exiting control
Add hooks at process waking up and exiting routines so that oems can control these procedures. One possible benifit is the peak of system load can be shaved and load can be more smooth when a large number of threads is killed once upon a time, while a sudden peak of system load can probably lead to user junk issues. Bug: 296493318 Change-Id: Ide5f9e63a4f50d6a9e3ffbc9516de9ce48ededef Signed-off-by: xieliujie <xieliujie@oppo.com> |
||
|
e42c084b1f | Merge "sched/walt: Ensure always in state2 if no partial halted CPUs" | ||
|
8ac97c5dab |
sched/walt: Fix cluster index swap
In the event of a 4 cluster system, where gold- cluster has a lower max capacity than gold cluster, the positions of the sched_clusters are swapped to ensure preset order. However, the cluster ids themselves were improperly set. Fix this by ensuring the cluster IDs are appropriately swapped. Change-Id: I43447a5c1cf89bdbbc7f3ed4eff1a970ada1e3a7 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
2d7f87b0ff |
ANDROID: vendor_hooks:vendor hook for percpu-rwsem
We need a new vendor hook for two reasons: 1.The position of the previous vendor hook is inappropriate: when the task wakes up from percpu_rwsem_wait, it will enter a long runnable state, which will cause frame loss when the application starts. In order to solve this problem, we need to let the process enter the "vip" queue when it is woken up, so we need to set a flag for the process holding the lock to prove that it is about to hold the lock. The timing of setting the flag should be at the beginning of percpu_down_read/percpu_down_write rather than the end. 2.Most of this long runnable state occurs in the cgroup_threadgroup_rwsem, so we only care cgroup_threadgroup_rwsem, and cgroup_threadgroup_rwsem should be exported. At the same time, one more parameter "struct percpu_rw_semaphore *sem", is needed for this vendor hook. Bug: 294496814 Change-Id: I5f014cfb68a60c29bbfd21452336e381e31e81b1 Signed-off-by: liuxudong5 <liuxudong5@xiaomi.com> |
||
|
7acd4b7dba |
sched/walt: Ensure always in state2 if no partial halted CPUs
is_state1 is supposed to return true, if and only if all CPUs that are capable of being partially halted are either partially halted or fully halted. In the event that there are no partially halted CPUs in a system, meaning min_partial_cpus is defined as 0, the expectation is for is_state1 to return false. However, cpumask_subset will return true if the first source CPU mask is empty. This leads to unexpected behavior, as it would result in a case where the system is always under state1 when min_partial_cpus is set to 0, resulting in a side effect where frequencies would never be synced. Fix this by ensuring that if the number of CPUs that are capable of being partially halted is 0, is_state1 returns false, thereby ensuring that in such a system, state2, and therefore frequency sync, is always the norm. Change-Id: I2fb7cf27659d42fe713bdacf08db9b7c88c06800 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
97f56eed2b | Merge "sched/walt: Introduce bug_on lockdep failures" | ||
|
fc59f95c92 | Merge "sched/walt: Fix WALT_BUG crash observed" | ||
|
8b02e8901d |
Merge branch 'android14-6.1' into 'android14-6.1-lts'
Catches the android14-6.1-lts branch up with the android14-6.1 branch which has had a lot of changes that are needed here to resolve future LTS merges and to ensure that the ABI is kept stable. It contains the following commits: * |
||
|
8517d73992 |
sched/fair: Remove capacity inversion detection
commit a2e90611b9f425adbbfcdaa5b5e49958ddf6f61b upstream. Remove the capacity inversion detection which is now handled by util_fits_cpu() returning -1 when we need to continue to look for a potential CPU with better performance. This ends up almost reverting patches below except for some comments: commit da07d2f9c153 ("sched/fair: Fixes for capacity inversion detection") commit aa69c36f31aa ("sched/fair: Consider capacity inversion in util_fits_cpu()") commit 44c7b80bffc3 ("sched/fair: Detect capacity inversion") Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20230201143628.270912-3-vincent.guittot@linaro.org Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
e8acf9971f |
sched/fair: unlink misfit task from cpu overutilized
commit e5ed0550c04c5469ecdc1634d8aa18c8609590f0 upstream. By taking into account uclamp_min, the 1:1 relation between task misfit and cpu overutilized is no more true as a task with a small util_avg may not fit a high capacity cpu because of uclamp_min constraint. Add a new state in util_fits_cpu() to reflect the case that task would fit a CPU except for the uclamp_min hint which is a performance requirement. Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we can use this new value to take additional action to select the best CPU that doesn't match uclamp_min hint. When util_fits_cpu() returns -1, we will continue to look for a possible CPU with better performance, which replaces Capacity Inversion detection with capacity_orig_of() - thermal_load_avg to detect a capacity inversion. Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Reviewed-and-tested-by: Qais Yousef <qyousef@layalina.io> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com> Link: https://lore.kernel.org/r/20230201143628.270912-2-vincent.guittot@linaro.org Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
128c06a34c |
ring-buffer: Do not swap cpu_buffer during resize process
[ Upstream commit 8a96c0288d0737ad77882024974c075345c72011 ] When ring_buffer_swap_cpu was called during resize process, the cpu buffer was swapped in the middle, resulting in incorrect state. Continuing to run in the wrong state will result in oops. This issue can be easily reproduced using the following two scripts: /tmp # cat test1.sh //#! /bin/sh for i in `seq 0 100000` do echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb sleep 0.5 echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb sleep 0.5 done /tmp # cat test2.sh //#! /bin/sh for i in `seq 0 100000` do echo irqsoff > /sys/kernel/debug/tracing/current_tracer sleep 1 echo nop > /sys/kernel/debug/tracing/current_tracer sleep 1 done /tmp # ./test1.sh & /tmp # ./test2.sh & A typical oops log is as follows, sometimes with other different oops logs. [ 231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8 [ 231.713375] Modules linked in: [ 231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.5.0-rc1-00276-g20edcec23f92 #15 [ 231.716750] Hardware name: linux,dummy-virt (DT) [ 231.718152] Workqueue: events update_pages_handler [ 231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 231.721171] pc : rb_update_pages+0x378/0x3f8 [ 231.722212] lr : rb_update_pages+0x25c/0x3f8 [ 231.723248] sp : ffff800082b9bd50 [ 231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000 [ 231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0 [ 231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a [ 231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000 [ 231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510 [ 231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002 [ 231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558 [ 231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001 [ 231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000 [ 231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208 [ 231.744196] Call trace: [ 231.744892] rb_update_pages+0x378/0x3f8 [ 231.745893] update_pages_handler+0x1c/0x38 [ 231.746893] process_one_work+0x1f0/0x468 [ 231.747852] worker_thread+0x54/0x410 [ 231.748737] kthread+0x124/0x138 [ 231.749549] ret_from_fork+0x10/0x20 [ 231.750434] ---[ end trace 0000000000000000 ]--- [ 233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 [ 233.721696] Mem abort info: [ 233.721935] ESR = 0x0000000096000004 [ 233.722283] EC = 0x25: DABT (current EL), IL = 32 bits [ 233.722596] SET = 0, FnV = 0 [ 233.722805] EA = 0, S1PTW = 0 [ 233.723026] FSC = 0x04: level 0 translation fault [ 233.723458] Data abort info: [ 233.723734] ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000 [ 233.724176] CM = 0, WnR = 0, TnD = 0, TagAccess = 0 [ 233.724589] GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0 [ 233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000 [ 233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000 [ 233.726231] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP [ 233.726720] Modules linked in: [ 233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G W 6.5.0-rc1-00276-g20edcec23f92 #15 [ 233.727777] Hardware name: linux,dummy-virt (DT) [ 233.728225] Workqueue: events update_pages_handler [ 233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--) [ 233.729054] pc : rb_update_pages+0x1a8/0x3f8 [ 233.729334] lr : rb_update_pages+0x154/0x3f8 [ 233.729592] sp : ffff800082b9bd50 [ 233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000 [ 233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418 [ 233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003 [ 233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58 [ 233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001 [ 233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000 [ 233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c [ 233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0 [ 233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000 [ 233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000 [ 233.734418] Call trace: [ 233.734593] rb_update_pages+0x1a8/0x3f8 [ 233.734853] update_pages_handler+0x1c/0x38 [ 233.735148] process_one_work+0x1f0/0x468 [ 233.735525] worker_thread+0x54/0x410 [ 233.735852] kthread+0x124/0x138 [ 233.736064] ret_from_fork+0x10/0x20 [ 233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060) [ 233.736959] ---[ end trace 0000000000000000 ]--- After analysis, the seq of the error is as follows [1-5]: int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size, int cpu_id) { for_each_buffer_cpu(buffer, cpu) { cpu_buffer = buffer->buffers[cpu]; //1. get cpu_buffer, aka cpu_buffer(A) ... ... schedule_work_on(cpu, &cpu_buffer->update_pages_work); //2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to // update_pages_handler, do the update process, set 'update_done' in // complete(&cpu_buffer->update_done) and to wakeup resize process. //----> //3. Just at this moment, ring_buffer_swap_cpu is triggered, //cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer. //ring_buffer_swap_cpu is called as the 'Call trace' below. Call trace: dump_backtrace+0x0/0x2f8 show_stack+0x18/0x28 dump_stack+0x12c/0x188 ring_buffer_swap_cpu+0x2f8/0x328 update_max_tr_single+0x180/0x210 check_critical_timing+0x2b4/0x2c8 tracer_hardirqs_on+0x1c0/0x200 trace_hardirqs_on+0xec/0x378 el0_svc_common+0x64/0x260 do_el0_svc+0x90/0xf8 el0_svc+0x20/0x30 el0_sync_handler+0xb0/0xb8 el0_sync+0x180/0x1c0 //<---- /* wait for all the updates to complete */ for_each_buffer_cpu(buffer, cpu) { cpu_buffer = buffer->buffers[cpu]; //4. get cpu_buffer, cpu_buffer(B) is used in the following process, //the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong. //for example, cpu_buffer(A)->update_done will leave be set 1, and will //not 'wait_for_completion' at the next resize round. if (!cpu_buffer->nr_pages_to_update) continue; if (cpu_online(cpu)) wait_for_completion(&cpu_buffer->update_done); cpu_buffer->nr_pages_to_update = 0; } ... } //5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong, //Continuing to run in the wrong state, then oops occurs. Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cn Signed-off-by: Chen Lin <chen.lin5@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
b7a34e30d4 |
dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
[ Upstream commit 51ff97d54f02b4444dfc42e380ac4c058e12d5dd ] If dma_direct_alloc() alloc memory in size of 64MB, the inner function dma_common_contiguous_remap() will allocate 128KB memory by invoking the function kmalloc_array(). and the kmalloc_array seems to fail to try to allocate 128KB mem. Call trace: [14977.928623] qcrosvm: page allocation failure: order:5, mode:0x40cc0 [14977.928638] dump_backtrace.cfi_jt+0x0/0x8 [14977.928647] dump_stack_lvl+0x80/0xb8 [14977.928652] warn_alloc+0x164/0x200 [14977.928657] __alloc_pages_slowpath+0x9f0/0xb4c [14977.928660] __alloc_pages+0x21c/0x39c [14977.928662] kmalloc_order+0x48/0x108 [14977.928666] kmalloc_order_trace+0x34/0x154 [14977.928668] __kmalloc+0x548/0x7e4 [14977.928673] dma_direct_alloc+0x11c/0x4f8 [14977.928678] dma_alloc_attrs+0xf4/0x138 [14977.928680] gh_vm_ioctl_set_fw_name+0x3c4/0x610 [gunyah] [14977.928698] gh_vm_ioctl+0x90/0x14c [gunyah] [14977.928705] __arm64_sys_ioctl+0x184/0x210 work around by doing kvmalloc_array instead. Signed-off-by: Gao Xu <gaoxu2@hihonor.com> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
303abc30ea | Merge "sched/walt: revise incorrect passed parameter for pipeline_reset_boost()" | ||
|
c44359bbad | Merge "sched/walt: manual pipeline should allow cpu 0" | ||
|
09d0b48fe9 |
sched/walt: revise incorrect passed parameter for pipeline_reset_boost()
Revise incorrect passed parameter for pipeline_reset_boost(). Change-Id: I01b4f0f546fcebfea1b474f675ddfe05d5d50c84 Signed-off-by: Tengfei Fan <quic_tengfan@quicinc.com> |
||
|
bd7a3e0fba |
sched/walt: manual pipeline should allow cpu 0
If cpu 0 is to be allowed as a pipeline cpu, then the existing code is incorrect because it will by default skip cpu 0 for manual pipeline assignment. Correct this by feeding -1 into cpumask_any_and, in the event that it is time to roll-over and start re-selecting from the cpus mask again. Change-Id: Ia5e170e287db9ff7010555d51da3ea9789e2ec1e Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
ff989fe194 |
sched/walt: Fix WALT_BUG crash observed
A WALT_BUG crash was intermittently observed, where a task's mark_start would appear to have moved ahead of a task's window_start by more than one window. In order to debug this further, a series of additional WALT_BUGs were introduce, which yielded in a crash where mark_task_starting was being called by a task whose mark_start had already been set. In wake_up_new_task, prior to trace_android_rvh_new_task_stats being referenced (which is where mark_start is supposed to be initialized), it is possible that another call to UTRA sneaks by during the fork balancing step, at the end of which, mark_start would be initialized. Address this by reorganizing mark_task_starting to acknowledge that another UTRA may have initialized mark_start, and further add clarity by updating the CPU cycles through a reference to UTRA rather than doing so directly. Change-Id: Iae4efad890417d3708048ae95eea0532776ad24a Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
91de49da25 | Merge "sched/walt: Print colocation thresholds" | ||
|
976ef61b65 | Merge "sched/walt: Introduce WALT_BUG if mark_task_starting occurs twice" | ||
|
4f09aeb3c8 |
sched/walt: Introduce bug_on lockdep failures
In the event that lockdep_asserts fail, indicating that runqueue locks are not held in a path where they should be in WALT, crash instantly to prevent unpredictable behavior and gain insights to enable fixing the underlying issue. Change-Id: I940c958a77221cd1d8eb1976e0f69dfc3b33ce1e Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
3cec0c1971 |
sched/walt: Print colocation thresholds
Add colocation thresholds in the trace which helps debug information regarding colocation status. Change-Id: I908cd868b0cc9b431a3ba49696d5a0328748bf1c Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
12a7d8230a |
sched/walt: Add tracepoint to find early up/down margin changes
Introduce tracepoint to print the early up/down migration margin values in the event they are changed by userspace. Change-Id: Ib642fc24318da654431e8b32fe3a089cee02d27b Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
71b73ba11c |
sched/walt: Add tracepoint to find up/down margin changes
Introduce tracepoint to print the up/down migration margin values in the event they are changed by userspace. Change-Id: I7bb39ac5ee26f1f62fc0ed10713b1e3f6edaaab3 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
72bc8c7d98 |
sched/walt: Introduce WALT_BUG if mark_task_starting occurs twice
mark_task_starting should be called on a newly woken up task. At this point, it's mark_start should not be set. Therefore, if mark_start happens to already be set, do a WALT_BUG. Change-Id: I47105da7305df428ce31677442a13a7eb2a67b51 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
dd71e2906a |
sched/walt: Introduce mark start ts
Print out the timestamp at which the task's mark_start is first set to help debug issues. Change-Id: Id440e873a75901e8888a68fbe09ba6c51e3acfb1 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
42783f9b06 |
sched/walt: Introduce WALT_BUG if mark start progresses too far
A crash was observed where a task's mark start was observed to be greater than 1 window boundary away from the task's window start. This is problematic as it indicates there was a UTRA call that took place, where the runqueue's window start was updated, but the task's window start wasn't. Introduce bug_on to catch this issue earlier, and gain insights from the stack trace and event to better understand the background. Change-Id: Ia1546e537ea5722a6065581898e1516274ed53ea Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
dc509aa20a |
Merge remote-tracking branch into HEAD
* keystone/mirror-android14-6.1-2023-07: (111 commits) ANDROID: ABI: Update STG ABI to format version 2 ANDROID: GKI: Update pixel symbol list for thermal ANDROID: thermal: Add vendor thermal genl check ANDROID: ABI: Update symbol for Exynos SoC ANDROID: GKI: Update mtk ABI symbol list ANDROID: ABI: Update symbol list for imx ANDROID: GKI: Update abi_gki_aarch64_qcom BACKPORT: FROMGIT: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627 ANDROID: ABI: update symbol list for Xclipse GPU ANDROID: drm/ttm: export ttm_tt_unpopulate() ANDROID: fuse-bpf: Add partial flock support ANDROID: Incremental fs: Allocate data buffer based on input request size UPSTREAM: gfs2: Don't deref jdesc in evict ANDROID: KVM: arm64: Fix MMU context save/restore over TLB invalidation ANDROID: Update symbol list for VIVO ANDROID: add initial symbol list file for ExynosAuto SoCs ANDROID: sched: Export sched_domains_mutex for lockdep ANDROID: Update symbol for Exynos SoC ANDROID: ABI: Update symbol for Exynos SoC ANDROID: Update symbol list for mtk ... Change-Id: I0186f02e9e3b07ea279334a06e33131b2a78c2f4 |
||
|
fadc35923d |
ANDROID: vendor_hook: fix the error record position of mutex
Make sure vendorhook trace_android_vh_record_mutex_lock_starttime woking both in fastpath unlock and slowpath unlock.
Fixes:
|
||
|
7a1178a367 |
bpf, cpumap: Make sure kthread is running before map update returns
commit 640a604585aa30f93e39b17d4d6ba69fcb1e66c9 upstream.
The following warning was reported when running stress-mode enabled
xdp_redirect_cpu with some RT threads:
------------[ cut here ]------------
WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135
CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: events cpu_map_kthread_stop
RIP: 0010:put_cpu_map_entry+0xda/0x220
......
Call Trace:
<TASK>
? show_regs+0x65/0x70
? __warn+0xa5/0x240
......
? put_cpu_map_entry+0xda/0x220
cpu_map_kthread_stop+0x41/0x60
process_one_work+0x6b0/0xb80
worker_thread+0x96/0x720
kthread+0x1a5/0x1f0
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
</TASK>
The root cause is the same as commit 436901649731 ("bpf: cpumap: Fix memory
leak in cpu_map_update_elem"). The kthread is stopped prematurely by
kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call
cpu_map_kthread_run() at all but XDP program has already queued some
frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks
the ptr_ring, it will find it was not emptied and report a warning.
An alternative fix is to use __cpu_map_ring_cleanup() to drop these
pending frames or skbs when kthread_stop() returns -EINTR, but it may
confuse the user, because these frames or skbs have been handled
correctly by XDP program. So instead of dropping these frames or skbs,
just make sure the per-cpu kthread is running before
__cpu_map_entry_alloc() returns.
After apply the fix, the error handle for kthread_stop() will be
unnecessary because it will always return 0, so just remove it.
Fixes:
|
||
|
36dd8ca330 |
bpf: Disable preemption in bpf_event_output
commit d62cc390c2e99ae267ffe4b8d7e2e08b6c758c32 upstream.
We received report [1] of kernel crash, which is caused by
using nesting protection without disabled preemption.
The bpf_event_output can be called by programs executed by
bpf_prog_run_array_cg function that disabled migration but
keeps preemption enabled.
This can cause task to be preempted by another one inside the
nesting protection and lead eventually to two tasks using same
perf_sample_data buffer and cause crashes like:
BUG: kernel NULL pointer dereference, address: 0000000000000001
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
...
? perf_output_sample+0x12a/0x9a0
? finish_task_switch.isra.0+0x81/0x280
? perf_event_output+0x66/0xa0
? bpf_event_output+0x13a/0x190
? bpf_event_output_data+0x22/0x40
? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
? xa_load+0x87/0xe0
? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
? release_sock+0x3e/0x90
? sk_setsockopt+0x1a1/0x12f0
? udp_pre_connect+0x36/0x50
? inet_dgram_connect+0x93/0xa0
? __sys_connect+0xb4/0xe0
? udp_setsockopt+0x27/0x40
? __pfx_udp_push_pending_frames+0x10/0x10
? __sys_setsockopt+0xdf/0x1a0
? __x64_sys_connect+0xf/0x20
? do_syscall_64+0x3a/0x90
? entry_SYSCALL_64_after_hwframe+0x72/0xdc
Fixing this by disabling preemption in bpf_event_output.
[1] https://github.com/cilium/cilium/issues/26756
Cc: stable@vger.kernel.org
Reported-by: Oleg "livelace" Popov <o.popov@livelace.ru>
Closes: https://github.com/cilium/cilium/issues/26756
Fixes:
|
||
|
3654ed5daf |
bpf: Disable preemption in bpf_perf_event_output
commit f2c67a3e60d1071b65848efaa8c3b66c363dd025 upstream.
The nesting protection in bpf_perf_event_output relies on disabled
preemption, which is guaranteed for kprobes and tracepoints.
However bpf_perf_event_output can be also called from uprobes context
through bpf_prog_run_array_sleepable function which disables migration,
but keeps preemption enabled.
This can cause task to be preempted by another one inside the nesting
protection and lead eventually to two tasks using same perf_sample_data
buffer and cause crashes like:
kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
BUG: unable to handle page fault for address: ffffffff82be3eea
...
Call Trace:
? __die+0x1f/0x70
? page_fault_oops+0x176/0x4d0
? exc_page_fault+0x132/0x230
? asm_exc_page_fault+0x22/0x30
? perf_output_sample+0x12b/0x910
? perf_event_output+0xd0/0x1d0
? bpf_perf_event_output+0x162/0x1d0
? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87
? __uprobe_perf_func+0x12b/0x540
? uprobe_dispatcher+0x2c4/0x430
? uprobe_notify_resume+0x2da/0xce0
? atomic_notifier_call_chain+0x7b/0x110
? exit_to_user_mode_prepare+0x13e/0x290
? irqentry_exit_to_user_mode+0x5/0x30
? asm_exc_int3+0x35/0x40
Fixing this by disabling preemption in bpf_perf_event_output.
Cc: stable@vger.kernel.org
Fixes:
|
||
|
cbd0004518 |
bpf, cpumap: Handle skb as well when clean up ptr_ring
[ Upstream commit 7c62b75cd1a792e14b037fa4f61f9b18914e7de1 ]
The following warning was reported when running xdp_redirect_cpu with
both skb-mode and stress-mode enabled:
------------[ cut here ]------------
Incorrect XDP memory type (-2128176192) usage
WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405
Modules linked in:
CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G 6.5.0-rc2+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
Workqueue: events __cpu_map_entry_free
RIP: 0010:__xdp_return+0x1e4/0x4a0
......
Call Trace:
<TASK>
? show_regs+0x65/0x70
? __warn+0xa5/0x240
? __xdp_return+0x1e4/0x4a0
......
xdp_return_frame+0x4d/0x150
__cpu_map_entry_free+0xf9/0x230
process_one_work+0x6b0/0xb80
worker_thread+0x96/0x720
kthread+0x1a5/0x1f0
ret_from_fork+0x3a/0x70
ret_from_fork_asm+0x1b/0x30
</TASK>
The reason for the warning is twofold. One is due to the kthread
cpu_map_kthread_run() is stopped prematurely. Another one is
__cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in
ptr_ring as XDP frames.
Prematurely-stopped kthread will be fixed by the preceding patch and
ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But
as the comments in __cpu_map_ring_cleanup() said, handling and freeing
skbs in ptr_ring as well to "catch any broken behaviour gracefully".
Fixes:
|
||
|
15c22cd1de |
perf: Fix function pointer case
commit 1af6239d1d3e61d33fd2f0ba53d3d1a67cc50574 upstream.
With the advent of CFI it is no longer acceptible to cast function
pointers.
The robot complains thusly:
kernel-events-core.c⚠️cast-from-int-(-)(struct-perf_cpu_pmu_context-)-to-remote_function_f-(aka-int-(-)(void-)-)-converts-to-incompatible-function-type
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Cixi Geng <cixi.geng1@unisoc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
||
|
572508aff3 |
Merge keystone/android14-6.1-keystone-qcom-release.6.1.25 (b9d4167 ) into qcom-6.1
* refs/heads/tmp-b9d4167: ANDROID: Snap to android14-6.1-2023-06 ANDROID: fuse-bpf: Move FUSE_RELEASE to correct place BACKPORT: FROMLIST: ovl: get_acl: Fix null pointer dereference at realinode in rcu-walk mode BACKPORT: FROMLIST: ovl: ovl_permission: Fix null pointer dereference at realinode in rcu-walk mode BACKPORT: FROMLIST: ovl: Let helper ovl_i_path_real() return the realinode Conflicts: android/abi_gki_aarch64.stg Change-Id: I1c41d9c5d104ea48b379f9d3e0637447637607ff Upstream-Build: ks_qcom-android14-6.1-keystone-qcom-release@10638318 UKQ2.230809.001 Signed-off-by: jianzhou <quic_jianzhou@quicinc.com> |
||
|
8472a839c4 | Merge "msm-sysstats: protect task->files in get_task_unreclaimable_info()" | ||
|
b7e8439a23 |
ANDROID: Snap to android14-6.1-2023-06
Snap tree to commit
|
||
|
e0fd83a193 |
mm: Move mm_cachep initialization to mm_init()
commit af80602799681c78f14fbe20b6185a56020dedee upstream. In order to allow using mm_alloc() much earlier, move initializing mm_cachep into mm_init(). Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
9ae15aaff3 |
x86/mm: Use mm_alloc() in poking_init()
commit 3f4c8211d982099be693be9aa7d6fc4607dff290 upstream. Instead of duplicating init_mm, allocate a fresh mm. The advantage is that mm_alloc() has much simpler dependencies. Additionally it makes more conceptual sense, init_mm has no (and must not have) user state to duplicate. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
7cf19bf953 | Merge "sched/walt: Fix issue caused by incorrect ws setting" | ||
|
745444b04c | Merge "sched/walt: Improve the scheduler" | ||
|
8578f05924 | Merge "sched/walt: Switch to using new api for state1/2 determination" | ||
|
93655eb27d | Merge "sched/walt: Use softaffinity for RT tasks" | ||
|
843dc11a1b | Merge "sched/walt: single-big-thread code passes global variable to halt" | ||
|
5366a12273 | Merge "sched/walt: Force pipeline promotion to prime if prime worthy" | ||
|
0441c44154 |
tracing: Fix trace_event_raw_event_synth() if else statement
commit 9971c3f944489ff7aacb9d25e0cde841a5f6018a upstream. The test to check if the field is a stack is to be done if it is not a string. But the code had: } if (event->fields[i]->is_stack) { and not } else if (event->fields[i]->is_stack) { which would cause it to always be tested. Worse yet, this also included an "else" statement that was only to be called if the field was not a string and a stack, but this code allows it to be called if it was a string (and not a stack). Also fixed some whitespace issues. Link: https://lore.kernel.org/all/202301302110.mEtNwkBD-lkp@intel.com/ Link: https://lore.kernel.org/linux-trace-kernel/20230131095237.63e3ca8d@gandalf.local.home Cc: Tom Zanussi <zanussi@kernel.org> Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
7f1715d827 |
locking/rtmutex: Fix task->pi_waiters integrity
[ Upstream commit f7853c34241807bb97673a5e97719123be39a09e ]
Henry reported that rt_mutex_adjust_prio_check() has an ordering
problem and puts the lie to the comment in [7]. Sharing the sort key
between lock->waiters and owner->pi_waiters *does* create problems,
since unlike what the comment claims, holding [L] is insufficient.
Notably, consider:
A
/ \
M1 M2
| |
B C
That is, task A owns both M1 and M2, B and C block on them. In this
case a concurrent chain walk (B & C) will modify their resp. sort keys
in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L]
is meaningless, they're different Ls.
This then gives rise to a race condition between [7] and [11], where
the requeue of pi_waiters will observe an inconsistent tree order.
B C
(holds M1->wait_lock, (holds M2->wait_lock,
holds B->pi_lock) holds A->pi_lock)
[7]
waiter_update_prio();
...
[8]
raw_spin_unlock(B->pi_lock);
...
[10]
raw_spin_lock(A->pi_lock);
[11]
rt_mutex_enqueue_pi();
// observes inconsistent A->pi_waiters
// tree order
Fixing this means either extending the range of the owner lock from
[10-13] to [6-13], with the immediate problem that this means [6-8]
hold both blocked and owner locks, or duplicating the sort key.
Since the locking in chain walk is horrible enough without having to
consider pi_lock nesting rules, duplicate the sort key instead.
By giving each tree their own sort key, the above race becomes
harmless, if C sees B at the old location, then B will correct things
(if they need correcting) when it walks up the chain and reaches A.
Fixes:
|
||
|
a3a3c7bdda |
tracing: Fix warning in trace_buffered_event_disable()
[ Upstream commit dea499781a1150d285c62b26659f62fb00824fce ]
Warning happened in trace_buffered_event_disable() at
WARN_ON_ONCE(!trace_buffered_event_ref)
Call Trace:
? __warn+0xa5/0x1b0
? trace_buffered_event_disable+0x189/0x1b0
__ftrace_event_enable_disable+0x19e/0x3e0
free_probe_data+0x3b/0xa0
unregister_ftrace_function_probe_func+0x6b8/0x800
event_enable_func+0x2f0/0x3d0
ftrace_process_regex.isra.0+0x12d/0x1b0
ftrace_filter_write+0xe6/0x140
vfs_write+0x1c9/0x6f0
[...]
The cause of the warning is in __ftrace_event_enable_disable(),
trace_buffered_event_enable() was called once while
trace_buffered_event_disable() was called twice.
Reproduction script show as below, for analysis, see the comments:
```
#!/bin/bash
cd /sys/kernel/tracing/
# 1. Register a 'disable_event' command, then:
# 1) SOFT_DISABLED_BIT was set;
# 2) trace_buffered_event_enable() was called first time;
echo 'cmdline_proc_show:disable_event:initcall:initcall_finish' > \
set_ftrace_filter
# 2. Enable the event registered, then:
# 1) SOFT_DISABLED_BIT was cleared;
# 2) trace_buffered_event_disable() was called first time;
echo 1 > events/initcall/initcall_finish/enable
# 3. Try to call into cmdline_proc_show(), then SOFT_DISABLED_BIT was
# set again!!!
cat /proc/cmdline
# 4. Unregister the 'disable_event' command, then:
# 1) SOFT_DISABLED_BIT was cleared again;
# 2) trace_buffered_event_disable() was called second time!!!
echo '!cmdline_proc_show:disable_event:initcall:initcall_finish' > \
set_ftrace_filter
```
To fix it, IIUC, we can change to call trace_buffered_event_enable() at
fist time soft-mode enabled, and call trace_buffered_event_disable() at
last time soft-mode disabled.
Link: https://lore.kernel.org/linux-trace-kernel/20230726095804.920457-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Fixes:
|
||
|
77996fa5c6 |
ring-buffer: Fix wrong stat of cpu_buffer->read
[ Upstream commit 2d093282b0d4357373497f65db6a05eb0c28b7c8 ]
When pages are removed in rb_remove_pages(), 'cpu_buffer->read' is set
to 0 in order to make sure any read iterators reset themselves. However,
this will mess 'entries' stating, see following steps:
# cd /sys/kernel/tracing/
# 1. Enlarge ring buffer prepare for later reducing:
# echo 20 > per_cpu/cpu0/buffer_size_kb
# 2. Write a log into ring buffer of cpu0:
# taskset -c 0 echo "hello1" > trace_marker
# 3. Read the log:
# cat per_cpu/cpu0/trace_pipe
<...>-332 [000] ..... 62.406844: tracing_mark_write: hello1
# 4. Stop reading and see the stats, now 0 entries, and 1 event readed:
# cat per_cpu/cpu0/stats
entries: 0
[...]
read events: 1
# 5. Reduce the ring buffer
# echo 7 > per_cpu/cpu0/buffer_size_kb
# 6. Now entries became unexpected 1 because actually no entries!!!
# cat per_cpu/cpu0/stats
entries: 1
[...]
read events: 0
To fix it, introduce 'page_removed' field to count total removed pages
since last reset, then use it to let read iterators reset themselves
instead of changing the 'read' pointer.
Link: https://lore.kernel.org/linux-trace-kernel/20230724054040.3489499-1-zhengyejian1@huawei.com
Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Fixes:
|
||
|
7218974aba |
mm: suppress mm fault logging if fatal signal already pending
[ Upstream commit 5f0bc0b042fc77ff70e14c790abdec960cde4ec1 ] Commit eda0047296a1 ("mm: make the page fault mmap locking killable") intentionally made it much easier to trigger the "page fault fails because a fatal signal is pending" situation, by having the mmap locking fail early in that case. We have long aborted page faults in other fatal cases when the actual IO for a page is interrupted by SIGKILL - which is particularly useful for the traditional case of NFS hanging due to network issues, but local filesystems could cause it too if you happened to get the SIGKILL while waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()). So aborting the page fault wasn't a new condition - but it now triggers earlier, before we even get to 'handle_mm_fault()'. And as a result the error doesn't go through our 'fault_signal_pending()' logic, and doesn't get filtered away there. Normally you'd never even notice, because if a fatal signal is pending, the new SIGSEGV we send ends up being ignored anyway. But it turns out that there is one very noticeable exception: if you enable 'show_unhandled_signals', the aborted page fault will be logged in the kernel messages, and you'll get a scary line looking something like this in your logs: pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0) which is rather misleading. It's not really a segfault at all, it's just "the thread was killed before the page fault completed, so we aborted the page fault". Fix this by just making it clear that a pending fatal signal means that any new signal coming in after that is implicitly handled. This will avoid the misleading logging, since now the signal isn't 'unhandled' any more. Reported-and-tested-by: Fiona Ebner <f.ebner@proxmox.com> Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com> Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/ Acked-by: Oleg Nesterov <oleg@redhat.com> Fixes: eda0047296a1 ("mm: make the page fault mmap locking killable") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
a6e2a0e414 |
tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails
[ Upstream commit 797311bce5c2ac90b8d65e357603cfd410d36ebb ]
Fix to record 0-length data to data_loc in fetch_store_string*() if it fails
to get the string data.
Currently those expect that the data_loc is updated by store_trace_args() if
it returns the error code. However, that does not work correctly if the
argument is an array of strings. In that case, store_trace_args() only clears
the first entry of the array (which may have no error) and leaves other
entries. So it should be cleared by fetch_store_string*() itself.
Also, 'dyndata' and 'maxlen' in store_trace_args() should be updated
only if it is used (ret > 0 and argument is a dynamic data.)
Link: https://lore.kernel.org/all/168908496683.123124.4761206188794205601.stgit@devnote2/
Fixes:
|
||
|
bee9946688 |
Revert "tracing: Add "(fault)" name injection to kernel probes"
[ Upstream commit 4ed8f337dee32df71435689c19d22e4ee846e15a ] This reverts commit |
||
|
f3baa42afe |
tracing: Allow synthetic events to pass around stacktraces
[ Upstream commit 00cf3d672a9dd409418647e9f98784c339c3ff63 ] Allow a stacktrace from one event to be displayed by the end event of a synthetic event. This is very useful when looking for the longest latency of a sleep or something blocked on I/O. # cd /sys/kernel/tracing/ # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace if prev_state == 1||prev_state == 2' > events/sched/sched_switch/trigger # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger The above creates a "block_lat" synthetic event that take the stacktrace of when a task schedules out in either the interruptible or uninterruptible states, and on a new per process max $delta (the time it was scheduled out), will print the process id and the stacktrace. # echo 1 > events/synthetic/block_lat/enable # cat trace # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | kworker/u16:0-767 [006] d..4. 560.645045: block_lat: pid=767 delta=66 stack=STACK: => __schedule => schedule => pipe_read => vfs_read => ksys_read => do_syscall_64 => 0x966000aa <idle>-0 [003] d..4. 561.132117: block_lat: pid=0 delta=413787 stack=STACK: => __schedule => schedule => schedule_hrtimeout_range_clock => do_sys_poll => __x64_sys_poll => do_syscall_64 => 0x966000aa <...>-153 [006] d..4. 562.068407: block_lat: pid=153 delta=54 stack=STACK: => __schedule => schedule => io_schedule => rq_qos_wait => wbt_wait => __rq_qos_throttle => blk_mq_submit_bio => submit_bio_noacct_nocheck => ext4_bio_write_page => mpage_submit_page => mpage_process_page_bufs => mpage_prepare_extent_to_map => ext4_do_writepages => ext4_writepages => do_writepages => __writeback_single_inode Link: https://lkml.kernel.org/r/20230117152236.010941267@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Ross Zwisler <zwisler@google.com> Cc: Ching-lin Yu <chinglinyu@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Stable-dep-of: 797311bce5c2 ("tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
d92ee6bce1 |
tracing/probes: Fix to avoid double count of the string length on the array
[ Upstream commit 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08 ]
If an array is specified with the ustring or symstr, the length of the
strings are accumlated on both of 'ret' and 'total', which means the
length is double counted.
Just set the length to the 'ret' value for avoiding double counting.
Link: https://lore.kernel.org/all/168908492917.123124.15076463491122036025.stgit@devnote2/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/8819b154-2ba1-43c3-98a2-cbde20892023@moroto.mountain/
Fixes:
|
||
|
16cc222026 |
tracing/probes: Add symstr type for dynamic events
[ Upstream commit b26a124cbfa80f42bfc4e63e1d5643ca98159d66 ] Add 'symstr' type for storing the kernel symbol as a string data instead of the symbol address. This allows us to filter the events by wildcard symbol name. e.g. # echo 'e:wqfunc workqueue.workqueue_execute_start symname=$function:symstr' >> dynamic_events # cat events/eprobes/wqfunc/format name: wqfunc ID: 2110 format: field:unsigned short common_type; offset:0; size:2; signed:0; field:unsigned char common_flags; offset:2; size:1; signed:0; field:unsigned char common_preempt_count; offset:3; size:1; signed:0; field:int common_pid; offset:4; size:4; signed:1; field:__data_loc char[] symname; offset:8; size:4; signed:1; print fmt: " symname=\"%s\"", __get_str(symname) Note that there is already 'symbol' type which just change the print format (so it still stores the symbol address in the tracing ring buffer.) On the other hand, 'symstr' type stores the actual "symbol+offset/size" data as a string. Link: https://lore.kernel.org/all/166679930847.1528100.4124308529180235965.stgit@devnote3/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Stable-dep-of: 66bcf65d6cf0 ("tracing/probes: Fix to avoid double count of the string length on the array") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
6bb50d6b0d |
sched/walt: single-big-thread code passes global variable to halt
There is a single global variable used for a mask of which cpus to manipulate for single-big-thread (sbt) purposes. This mask is passed into the halt and resume apis. This is incorrect, because the halt/resume apis can manipulate the passed mask, indicating which cpus were actually halted or unhalted in a given operation. Since the mask gets changed, the code forgets which cpus to use for sbt. Update the main sbt check routine such that a local copy of the cpus to be used for sbt is made, and that local copy passed to the halt/resume routines. Change-Id: I9280f1bf600565dc63f0a9d9f84536d50e31fbd6 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
6896fa5076 |
sched/walt: core control sbt trace
Add tracepoint to see current state of sbt, as well as critical decision making criteria. Change-Id: Id24a14714ec9e28469f78be48f12b71725cc8bba Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
0b93b90ef7 |
msm-sysstats: protect task->files in get_task_unreclaimable_info()
Race is observed between task exiting and accessing task's files from get_task_unreclaimable_info(), fix it by locking it. Change-Id: Ie43426696aa09222fdc4bbb2533c8a8fd18a0f7c Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com> |
||
|
7beed73af0 |
ANDROID: GKI: Create symbol files in include/config
Create input symbol files to generate GKI modules header under include/config. By placing files in this generated directory, the default filters that ignore certain files will work without any special handling required, and they will also be available to inspect after the build to inspect for the debugging purposes. abi_gki_protected_exports: Input for gki_module_protected_exports.h From :- ${objtree}/abi_gki_protected_exports To :- include/config/abi_gki_protected_exports all_kmi_symbols: Input for gki_module_unprotected.h - Rename to abi_gki_kmi_symbols From :- all_kmi_symbols To :- include/config/abi_gki_kmi_symbols Bug: 286529877 Test: TH Test: Manual verification of the generated files Change-Id: Iafa10631e7712a8e1e87a2f56cfd614de6b1053a Signed-off-by: Ramji Jiyani <ramjiyani@google.com> |
||
|
06dde6c45f |
sched/walt: Force pipeline promotion to prime if prime worthy
For automatic (heavy) pipeline searching the prime cpu will not get used when there are only a few pipeline entries. This will leave the prime cpu unoccupied, when one of the pipeline tasks may be prime worthy. To handle this, detect if there are any heavy tasks that are prime worthy already, and promote the prime worthy tasks to prime. Since pipeline cpus are reassigned every window rollover, and automatic detection assigns cpus from lowest to highest, it is unnecessary to demote a task from prime as this will regardless. For pipeline (manual) searching the same issue exists and must be handled. For the manual case, when the number of pipeline tasks is few, but a prime_wts has been found, determine if the task on prime is prime worthy. If it isn't, it must be demoted to non-prime. Any remaining prime worthy task must then be found. Change-Id: I15c9417a14c5860bf48edc1c3443fdc0b1255f42 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
f56bc00a2c |
sched/walt: move code to find prime and other wts to common
In preparation for finding and promoting (and demoting) prime worth (and unworthy) tasks, create an api that can be reused to find the pipeline task currently assigned to prime and the pipeline task that is max demand. Change-Id: I3b2334482b74c62598a2449e8938c920cfda85b2 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
3ee1a9ef9b |
sched/walt: add window based hysteresis
Ensure that pipeline tasks which have a demand greater than another pipeline task running on prime only swap with prime after 4 windows. Change-Id: I28b3e46f476f4f09682ae2caffc2cae04d76fae5 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
d0e8a6729a |
sched/walt: Generalize pipeline types enum
Clean up the enum of pipeline types, in an effort to simplify usage in followup changes. Change-Id: I451f04fc0b0f4d5f4cf98f071961e1ed451e94cc Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
71c0eb4846 |
sched/walt: Use coloc demand for pipeline task swaps
Since coloc demand represents averaged demand across history of past few windows, the growth of pipeline task demands will be steadier, theoretically resulting in less bouncing between prime and other CPUs. Change-Id: I9a5c92fbc1c26591889e51be1e65273ebe2d27b4 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
395e904279 |
sched/walt: pipeline_set_boost must handle auto and manual pipeline
pipeline_set_boost does not handle the case properly, where both manual and auto pipeline hinting are enabled or disabled, independently. For example, this series of events pipeline_set_boost(true, MANUAL_PIPELINE); - auto pipeline enabled pipeline_set_boost(true, AUTO_PIPELINE); - auto pipeline disabled\ pipeline_set_boost(false, MANUAL_PIPELINE); With the above, pipeline boost will be enabled, even though manual pipeline is no longer requesting it be enabled, and auto pipeline is disabled. Correct the code to independently track state of the AUTO or MANUAL pipeline, and using that information to decide if boost is currently requested or not. Change-Id: Ia31eb8cd45b417f55f1e6827953073f708820ce6 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
0b472b73c3 |
sched/walt: Use softaffinity for RT tasks
Ensure that RT task placements respect userspace defined soft-affinity, but keep using broader affinity in dire straights. Change-Id: I01d004c3660ebab8aa3451468fa31b7c13d64e64 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
7ba3485efe |
sched/walt: Add per task reduce affinity feature
Create the skeleton for a per task tunable, to be used to indicate when certain CPUs of a task should be ignored even if a task is affined to them. Change-Id: I8c0f1f01bbc416b5f5569dd4de1fc18971b8a5d4 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
3971aa7788 |
sched/walt: Fix issue caused by incorrect ws setting
WALT cpufreq gov notes freq transitions and window rollovers so that it can generate an accurate avg freq the cluster was subject to. For this accounting purposes, it needs to have an incrementing view of window start timestamps as freq updates come in. Specifically it needs the current window_start, walt_load->ws, greater than the last recorded window start, wg_policy->last_ws. A crash was observed with walt_load->ws set to zero while wg_policy->last_ws is valid. This likely is because of an issue in 'enable_shared_rail_boost' feature where the current code fails to update the walt_load->ws, and simply runs off whatever was set during an earlier update - coupled with a reset of walt_load->ws when a cpu comes online in the gov->start callback. Fix this issue by ensuring the walt_load->ws is set regardless of the sync features by simply calling __cpu_util_freq_walt() prior to adjusting the loads based on siblings. Change-Id: I57a598ec64cd0d1ae8c1d6d88e3232191839ded7 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
e453c52b94 |
sched/walt: Adjust cpu clusters for mid cap CPUs
In the event of a 4 cluster system, where gold- cluster has a lower max capacity than gold cluster, ensure the ordering of the 4 cluster system is maintained in the preset order. Change-Id: Iff6cfb9ad93917c8b5ee94986165fabdf92b7c57 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
f4bd28d5e8 |
sched/walt: Switch to using new api for state1/2 determination
If a cluster containing CPUs that are capable of being partially halted is in a state where all CPUs in that cluster are either partially halted or fully halted, consider the system to be in state1. Change-Id: I470690bb74b5617d028e15a211bc5f877c17b8a3 Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
0d9960403c |
UPSTREAM: fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages and COW-shares them with the child being forked using copy_present_pte(). We must not take any concurrent page faults on the source vma's as they are being processed, as we expect both the vma and the pte's behind it to be stable. For example, the anon_vma_fork() expects the parents vma->anon_vma to not change during the vma copy. A concurrent page fault on a page newly marked read-only by the page copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the source vma, defeating the anon_vma_clone() that wasn't done because the parent vma originally didn't have an anon_vma, but we now might end up copying a pte entry for a page that has one. Before the per-vma lock based changes, the mmap_lock guaranteed exclusion with concurrent page faults. But now we need to do a vma_start_write() to make sure no concurrent faults happen on this vma while it is being processed. This fix can potentially regress some fork-heavy workloads. Kernel build time did not show noticeable regression on a 56-core machine while a stress test mapping 10000 VMAs and forking 5000 times in a tight loop shows ~5% regression. If such fork time regression is unacceptable, disabling CONFIG_PER_VMA_LOCK should restore its performance. Further optimizations are possible if this regression proves to be problematic. Suggested-by: David Hildenbrand <david@redhat.com> Reported-by: Jiri Slaby <jirislaby@kernel.org> Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/ Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com> Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/ Reported-by: Jacob Young <jacobly.alt@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624 Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first") Cc: stable@vger.kernel.org Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit fb49c455323ff8319a123dd312be9082c49a23a5) Change-Id: Ic5aa9dc51a888b5b0319ec4ec6d2941424573ca0 Signed-off-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@google.com> |
||
|
4c8f30a2ad |
bpf: aggressively forget precise markings during state checkpointing
[ Upstream commit 7a830b53c17bbadcf99f778f28aaaa4e6c41df5f ] Exploit the property of about-to-be-checkpointed state to be able to forget all precise markings up to that point even more aggressively. We now clear all potentially inherited precise markings right before checkpointing and branching off into child state. If any of children states require precise knowledge of any SCALAR register, those will be propagated backwards later on before this state is finalized, preserving correctness. There is a single selftests BPF program change, but tremendous one: 25x reduction in number of verified instructions and states in trace_virtqueue_add_sgs. Cilium results are more modest, but happen across wider range of programs. SELFTESTS RESULTS ================= $ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results.csv ~/imprecise-aggressive-results.csv | grep -v '+0' File Program Total insns (A) Total insns (B) Total insns (DIFF) Total states (A) Total states (B) Total states (DIFF) ------------------- ----------------------- --------------- --------------- ------------------ ---------------- ---------------- ------------------- loop6.bpf.linked1.o trace_virtqueue_add_sgs 398057 15114 -382943 (-96.20%) 8717 336 -8381 (-96.15%) ------------------- ----------------------- --------------- --------------- ------------------ ---------------- ---------------- ------------------- CILIUM RESULTS ============== $ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results-cilium.csv ~/imprecise-aggressive-results-cilium.csv | grep -v '+0' File Program Total insns (A) Total insns (B) Total insns (DIFF) Total states (A) Total states (B) Total states (DIFF) ------------- -------------------------------- --------------- --------------- ------------------ ---------------- ---------------- ------------------- bpf_host.o tail_handle_nat_fwd_ipv4 23426 23221 -205 (-0.88%) 1537 1515 -22 (-1.43%) bpf_host.o tail_handle_nat_fwd_ipv6 13009 12904 -105 (-0.81%) 719 708 -11 (-1.53%) bpf_host.o tail_nodeport_nat_ingress_ipv6 5261 5196 -65 (-1.24%) 247 243 -4 (-1.62%) bpf_host.o tail_nodeport_nat_ipv6_egress 3446 3406 -40 (-1.16%) 203 198 -5 (-2.46%) bpf_lxc.o tail_handle_nat_fwd_ipv4 23426 23221 -205 (-0.88%) 1537 1515 -22 (-1.43%) bpf_lxc.o tail_handle_nat_fwd_ipv6 13009 12904 -105 (-0.81%) 719 708 -11 (-1.53%) bpf_lxc.o tail_ipv4_ct_egress 5074 4897 -177 (-3.49%) 255 248 -7 (-2.75%) bpf_lxc.o tail_ipv4_ct_ingress 5100 4923 -177 (-3.47%) 255 248 -7 (-2.75%) bpf_lxc.o tail_ipv4_ct_ingress_policy_only 5100 4923 -177 (-3.47%) 255 248 -7 (-2.75%) bpf_lxc.o tail_ipv6_ct_egress 4558 4536 -22 (-0.48%) 188 187 -1 (-0.53%) bpf_lxc.o tail_ipv6_ct_ingress 4578 4556 -22 (-0.48%) 188 187 -1 (-0.53%) bpf_lxc.o tail_ipv6_ct_ingress_policy_only 4578 4556 -22 (-0.48%) 188 187 -1 (-0.53%) bpf_lxc.o tail_nodeport_nat_ingress_ipv6 5261 5196 -65 (-1.24%) 247 243 -4 (-1.62%) bpf_overlay.o tail_nodeport_nat_ingress_ipv6 5261 5196 -65 (-1.24%) 247 243 -4 (-1.62%) bpf_overlay.o tail_nodeport_nat_ipv6_egress 3482 3442 -40 (-1.15%) 204 201 -3 (-1.47%) bpf_xdp.o tail_nodeport_nat_egress_ipv4 17200 15619 -1581 (-9.19%) 1111 1010 -101 (-9.09%) ------------- -------------------------------- --------------- --------------- ------------------ ---------------- ---------------- ------------------- Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221104163649.121784-6-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
8b57a37d0e |
bpf: stop setting precise in current state
[ Upstream commit f63181b6ae79fd3b034cde641db774268c2c3acf ] Setting reg->precise to true in current state is not necessary from correctness standpoint, but it does pessimise the whole precision (or rather "imprecision", because that's what we want to keep as much as possible) tracking. Why is somewhat subtle and my best attempt to explain this is recorded in an extensive comment for __mark_chain_precise() function. Some more careful thinking and code reading is probably required still to grok this completely, unfortunately. Whiteboarding and a bunch of extra handwaiving in person would be even more helpful, but is deemed impractical in Git commit. Next patch pushes this imprecision property even further, building on top of the insights described in this patch. End results are pretty nice, we get reduction in number of total instructions and states verified due to a better states reuse, as some of the states are now more generic and permissive due to less unnecessary precise=true requirements. SELFTESTS RESULTS ================= $ ./veristat -C -e file,prog,insns,states ~/subprog-precise-results.csv ~/imprecise-early-results.csv | grep -v '+0' File Program Total insns (A) Total insns (B) Total insns (DIFF) Total states (A) Total states (B) Total states (DIFF) --------------------------------------- ---------------------- --------------- --------------- ------------------ ---------------- ---------------- ------------------- bpf_iter_ksym.bpf.linked1.o dump_ksym 347 285 -62 (-17.87%) 20 19 -1 (-5.00%) pyperf600_bpf_loop.bpf.linked1.o on_event 3678 3736 +58 (+1.58%) 276 285 +9 (+3.26%) setget_sockopt.bpf.linked1.o skops_sockopt 4038 3947 -91 (-2.25%) 347 343 -4 (-1.15%) test_l4lb.bpf.linked1.o balancer_ingress 4559 2611 -1948 (-42.73%) 118 105 -13 (-11.02%) test_l4lb_noinline.bpf.linked1.o balancer_ingress 6279 6268 -11 (-0.18%) 237 236 -1 (-0.42%) test_misc_tcp_hdr_options.bpf.linked1.o misc_estab 1307 1303 -4 (-0.31%) 100 99 -1 (-1.00%) test_sk_lookup.bpf.linked1.o ctx_narrow_access 456 447 -9 (-1.97%) 39 38 -1 (-2.56%) test_sysctl_loop1.bpf.linked1.o sysctl_tcp_mem 1389 1384 -5 (-0.36%) 26 25 -1 (-3.85%) test_tc_dtime.bpf.linked1.o egress_fwdns_prio101 518 485 -33 (-6.37%) 51 46 -5 (-9.80%) test_tc_dtime.bpf.linked1.o egress_host 519 468 -51 (-9.83%) 50 44 -6 (-12.00%) test_tc_dtime.bpf.linked1.o ingress_fwdns_prio101 842 1000 +158 (+18.76%) 73 88 +15 (+20.55%) xdp_synproxy_kern.bpf.linked1.o syncookie_tc 405757 373173 -32584 (-8.03%) 25735 22882 -2853 (-11.09%) xdp_synproxy_kern.bpf.linked1.o syncookie_xdp 479055 371590 -107465 (-22.43%) 29145 22207 -6938 (-23.81%) --------------------------------------- ---------------------- --------------- --------------- ------------------ ---------------- ---------------- ------------------- Slight regression in test_tc_dtime.bpf.linked1.o/ingress_fwdns_prio101 is left for a follow up, there might be some more precision-related bugs in existing BPF verifier logic. CILIUM RESULTS ============== $ ./veristat -C -e file,prog,insns,states ~/subprog-precise-results-cilium.csv ~/imprecise-early-results-cilium.csv | grep -v '+0' File Program Total insns (A) Total insns (B) Total insns (DIFF) Total states (A) Total states (B) Total states (DIFF) ------------- ------------------------------ --------------- --------------- ------------------ ---------------- ---------------- ------------------- bpf_host.o cil_from_host 762 556 -206 (-27.03%) 43 37 -6 (-13.95%) bpf_host.o tail_handle_nat_fwd_ipv4 23541 23426 -115 (-0.49%) 1538 1537 -1 (-0.07%) bpf_host.o tail_nodeport_nat_egress_ipv4 33592 33566 -26 (-0.08%) 2163 2161 -2 (-0.09%) bpf_lxc.o tail_handle_nat_fwd_ipv4 23541 23426 -115 (-0.49%) 1538 1537 -1 (-0.07%) bpf_overlay.o tail_nodeport_nat_egress_ipv4 33581 33543 -38 (-0.11%) 2160 2157 -3 (-0.14%) bpf_xdp.o tail_handle_nat_fwd_ipv4 21659 20920 -739 (-3.41%) 1440 1376 -64 (-4.44%) bpf_xdp.o tail_handle_nat_fwd_ipv6 17084 17039 -45 (-0.26%) 907 905 -2 (-0.22%) bpf_xdp.o tail_lb_ipv4 73442 73430 -12 (-0.02%) 4370 4369 -1 (-0.02%) bpf_xdp.o tail_lb_ipv6 152114 151895 -219 (-0.14%) 6493 6479 -14 (-0.22%) bpf_xdp.o tail_nodeport_nat_egress_ipv4 17377 17200 -177 (-1.02%) 1125 1111 -14 (-1.24%) bpf_xdp.o tail_nodeport_nat_ingress_ipv6 6405 6397 -8 (-0.12%) 309 308 -1 (-0.32%) bpf_xdp.o tail_rev_nodeport_lb4 7126 6934 -192 (-2.69%) 414 402 -12 (-2.90%) bpf_xdp.o tail_rev_nodeport_lb6 18059 17905 -154 (-0.85%) 1105 1096 -9 (-0.81%) ------------- ------------------------------ --------------- --------------- ------------------ ---------------- ---------------- ------------------- Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221104163649.121784-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
56675ddcb0 |
bpf: allow precision tracking for programs with subprogs
[ Upstream commit be2ef8161572ec1973124ebc50f56dafc2925e07 ] Stop forcing precise=true for SCALAR registers when BPF program has any subprograms. Current restriction means that any BPF program, as soon as it uses subprograms, will end up not getting any of the precision tracking benefits in reduction of number of verified states. This patch keeps the fallback mark_all_scalars_precise() behavior if precise marking has to cross function frames. E.g., if subprogram requires R1 (first input arg) to be marked precise, ideally we'd need to backtrack to the parent function and keep marking R1 and its dependencies as precise. But right now we give up and force all the SCALARs in any of the current and parent states to be forced to precise=true. We can lift that restriction in the future. But this patch fixes two issues identified when trying to enable precision tracking for subprogs. First, prevent "escaping" from top-most state in a global subprog. While with entry-level BPF program we never end up requesting precision for R1-R5 registers, because R2-R5 are not initialized (and so not readable in correct BPF program), and R1 is PTR_TO_CTX, not SCALAR, and so is implicitly precise. With global subprogs, though, it's different, as global subprog a) can have up to 5 SCALAR input arguments, which might get marked as precise=true and b) it is validated in isolation from its main entry BPF program. b) means that we can end up exhausting parent state chain and still not mark all registers in reg_mask as precise, which would lead to verifier bug warning. To handle that, we need to consider two cases. First, if the very first state is not immediately "checkpointed" (i.e., stored in state lookup hashtable), it will get correct first_insn_idx and last_insn_idx instruction set during state checkpointing. As such, this case is already handled and __mark_chain_precision() already handles that by just doing nothing when we reach to the very first parent state. st->parent will be NULL and we'll just stop. Perhaps some extra check for reg_mask and stack_mask is due here, but this patch doesn't address that issue. More problematic second case is when global function's initial state is immediately checkpointed before we manage to process the very first instruction. This is happening because when there is a call to global subprog from the main program the very first subprog's instruction is marked as pruning point, so before we manage to process first instruction we have to check and checkpoint state. This patch adds a special handling for such "empty" state, which is identified by having st->last_insn_idx set to -1. In such case, we check that we are indeed validating global subprog, and with some sanity checking we mark input args as precise if requested. Note that we also initialize state->first_insn_idx with correct start insn_idx offset. For main program zero is correct value, but for any subprog it's quite confusing to not have first_insn_idx set. This doesn't have any functional impact, but helps with debugging and state printing. We also explicitly initialize state->last_insns_idx instead of relying on is_state_visited() to do this with env->prev_insns_idx, which will be -1 on the very first instruction. This concludes necessary changes to handle specifically global subprog's precision tracking. Second identified problem was missed handling of BPF helper functions that call into subprogs (e.g., bpf_loop and few others). From precision tracking and backtracking logic's standpoint those are effectively calls into subprogs and should be called as BPF_PSEUDO_CALL calls. This patch takes the least intrusive way and just checks against a short list of current BPF helpers that do call subprogs, encapsulated in is_callback_calling_function() function. But to prevent accidentally forgetting to add new BPF helpers to this "list", we also do a sanity check in __check_func_call, which has to be called for each such special BPF helper, to validate that BPF helper is indeed recognized as callback-calling one. This should catch any missed checks in the future. Adding some special flags to be added in function proto definitions seemed like an overkill in this case. With the above changes, it's possible to remove forceful setting of reg->precise to true in __mark_reg_unknown, which turns on precision tracking both inside subprogs and entry progs that have subprogs. No warnings or errors were detected across all the selftests, but also when validating with veristat against internal Meta BPF objects and Cilium objects. Further, in some BPF programs there are noticeable reduction in number of states and instructions validated due to more effective precision tracking, especially benefiting syncookie test. $ ./veristat -C -e file,prog,insns,states ~/baseline-results.csv ~/subprog-precise-results.csv | grep -v '+0' File Program Total insns (A) Total insns (B) Total insns (DIFF) Total states (A) Total states (B) Total states (DIFF) ---------------------------------------- -------------------------- --------------- --------------- ------------------ ---------------- ---------------- ------------------- pyperf600_bpf_loop.bpf.linked1.o on_event 3966 3678 -288 (-7.26%) 306 276 -30 (-9.80%) pyperf_global.bpf.linked1.o on_event 7563 7530 -33 (-0.44%) 520 517 -3 (-0.58%) pyperf_subprogs.bpf.linked1.o on_event 36358 36934 +576 (+1.58%) 2499 2531 +32 (+1.28%) setget_sockopt.bpf.linked1.o skops_sockopt 3965 4038 +73 (+1.84%) 343 347 +4 (+1.17%) test_cls_redirect_subprogs.bpf.linked1.o cls_redirect 64965 64901 -64 (-0.10%) 4619 4612 -7 (-0.15%) test_misc_tcp_hdr_options.bpf.linked1.o misc_estab 1491 1307 -184 (-12.34%) 110 100 -10 (-9.09%) test_pkt_access.bpf.linked1.o test_pkt_access 354 349 -5 (-1.41%) 25 24 -1 (-4.00%) test_sock_fields.bpf.linked1.o egress_read_sock_fields 435 375 -60 (-13.79%) 22 20 -2 (-9.09%) test_sysctl_loop2.bpf.linked1.o sysctl_tcp_mem 1508 1501 -7 (-0.46%) 29 28 -1 (-3.45%) test_tc_dtime.bpf.linked1.o egress_fwdns_prio100 468 435 -33 (-7.05%) 45 41 -4 (-8.89%) test_tc_dtime.bpf.linked1.o ingress_fwdns_prio100 398 408 +10 (+2.51%) 42 39 -3 (-7.14%) test_tc_dtime.bpf.linked1.o ingress_fwdns_prio101 1096 842 -254 (-23.18%) 97 73 -24 (-24.74%) test_tcp_hdr_options.bpf.linked1.o estab 2758 2408 -350 (-12.69%) 208 181 -27 (-12.98%) test_urandom_usdt.bpf.linked1.o urand_read_with_sema 466 448 -18 (-3.86%) 31 28 -3 (-9.68%) test_urandom_usdt.bpf.linked1.o urand_read_without_sema 466 448 -18 (-3.86%) 31 28 -3 (-9.68%) test_urandom_usdt.bpf.linked1.o urandlib_read_with_sema 466 448 -18 (-3.86%) 31 28 -3 (-9.68%) test_urandom_usdt.bpf.linked1.o urandlib_read_without_sema 466 448 -18 (-3.86%) 31 28 -3 (-9.68%) test_xdp_noinline.bpf.linked1.o balancer_ingress_v6 4302 4294 -8 (-0.19%) 257 256 -1 (-0.39%) xdp_synproxy_kern.bpf.linked1.o syncookie_tc 583722 405757 -177965 (-30.49%) 35846 25735 -10111 (-28.21%) xdp_synproxy_kern.bpf.linked1.o syncookie_xdp 609123 479055 -130068 (-21.35%) 35452 29145 -6307 (-17.79%) ---------------------------------------- -------------------------- --------------- --------------- ------------------ ---------------- ---------------- ------------------- Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20221104163649.121784-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
61622fa379 |
tracing/histograms: Return an error if we fail to add histogram to hist_vars list
commit 4b8b3905165ef98386a3c06f196c85d21292d029 upstream. Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables") added a check to fail histogram creation if save_hist_vars() failed to add histogram to hist_vars list. But the commit failed to set ret to failed return code before jumping to unregister histogram, fix it. Link: https://lore.kernel.org/linux-trace-kernel/20230714203341.51396-1-mkhalfella@purestorage.com Cc: stable@vger.kernel.org Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables") Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
8620c53ced |
bpf: Repeat check_max_stack_depth for async callbacks
[ Upstream commit b5e9ad522c4ccd32d322877515cff8d47ed731b9 ]
While the check_max_stack_depth function explores call chains emanating
from the main prog, which is typically enough to cover all possible call
chains, it doesn't explore those rooted at async callbacks unless the
async callback will have been directly called, since unlike non-async
callbacks it skips their instruction exploration as they don't
contribute to stack depth.
It could be the case that the async callback leads to a callchain which
exceeds the stack depth, but this is never reachable while only
exploring the entry point from main subprog. Hence, repeat the check for
the main subprog *and* all async callbacks marked by the symbolic
execution pass of the verifier, as execution of the program may begin at
any of them.
Consider functions with following stack depths:
main: 256
async: 256
foo: 256
main:
rX = async
bpf_timer_set_callback(...)
async:
foo()
Here, async is not descended as it does not contribute to stack depth of
main (since it is referenced using bpf_pseudo_func and not
bpf_pseudo_call). However, when async is invoked asynchronously, it will
end up breaching the MAX_BPF_STACK limit by calling foo.
Hence, in addition to main, we also need to explore call chains
beginning at all async callback subprogs in a program.
Fixes:
|
||
|
d55ff358b0 |
bpf: Fix subprog idx logic in check_max_stack_depth
[ Upstream commit ba7b3e7d5f9014be65879ede8fd599cb222901c9 ]
The assignment to idx in check_max_stack_depth happens once we see a
bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of
the code performs a few checks and then pushes the frame to the frame
stack, except the case of async callbacks. If the async callback case
causes the loop iteration to be skipped, the idx assignment will be
incorrect on the next iteration of the loop. The value stored in the
frame stack (as the subprogno of the current subprog) will be incorrect.
This leads to incorrect checks and incorrect tail_call_reachable
marking. Save the target subprog in a new variable and only assign to
idx once we are done with the is_async_cb check which may skip pushing
of frame to the frame stack and subsequent stack depth checks and tail
call markings.
Fixes:
|
||
|
f4c0a6b8ce |
kallsyms: strip LTO-only suffixes from promoted global functions
[ Upstream commit 8cc32a9bbf2934d90762d9de0187adcb5ad46a11 ] Commit |
||
|
28fdfda791 |
kallsyms: Improve the performance of kallsyms_lookup_name()
[ Upstream commit 60443c88f3a89fd303a9e8c0e84895910675c316 ] Currently, to search for a symbol, we need to expand the symbols in 'kallsyms_names' one by one, and then use the expanded string for comparison. It's O(n). If we sort names in ascending order like addresses, we can also use binary search. It's O(log(n)). In order not to change the implementation of "/proc/kallsyms", the table kallsyms_names[] is still stored in a one-to-one correspondence with the address in ascending order. Add array kallsyms_seqs_of_names[], it's indexed by the sequence number of the sorted names, and the corresponding content is the sequence number of the sorted addresses. For example: Assume that the index of NameX in array kallsyms_seqs_of_names[] is 'i', the content of kallsyms_seqs_of_names[i] is 'k', then the corresponding address of NameX is kallsyms_addresses[k]. The offset in kallsyms_names[] is get_symbol_offset(k). Note that the memory usage will increase by (4 * kallsyms_num_syms) bytes, the next two patches will reduce (1 * kallsyms_num_syms) bytes and properly handle the case CONFIG_LTO_CLANG=y. Performance test results: (x86) Before: min=234, max=10364402, avg=5206926 min=267, max=11168517, avg=5207587 After: min=1016, max=90894, avg=7272 min=1014, max=93470, avg=7293 The average lookup performance of kallsyms_lookup_name() improved 715x. Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Stable-dep-of: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
92cc015332 |
sched/psi: use kernfs polling functions for PSI trigger polling
[ Upstream commit aff037078ecaecf34a7c2afab1341815f90fba5e ]
Destroying psi trigger in cgroup_file_release causes UAF issues when
a cgroup is removed from under a polling process. This is happening
because cgroup removal causes a call to cgroup_file_release while the
actual file is still alive. Destroying the trigger at this point would
also destroy its waitqueue head and if there is still a polling process
on that file accessing the waitqueue, it will step on the freed pointer:
do_select
vfs_poll
do_rmdir
cgroup_rmdir
kernfs_drain_open_files
cgroup_file_release
cgroup_pressure_release
psi_trigger_destroy
wake_up_pollfree(&t->event_wait)
// vfs_poll is unblocked
synchronize_rcu
kfree(t)
poll_freewait -> UAF access to the trigger's waitqueue head
Patch [1] fixed this issue for epoll() case using wake_up_pollfree(),
however the same issue exists for synchronous poll() case.
The root cause of this issue is that the lifecycles of the psi trigger's
waitqueue and of the file associated with the trigger are different. Fix
this by using kernfs_generic_poll function when polling on cgroup-specific
psi triggers. It internally uses kernfs_open_node->poll waitqueue head
with its lifecycle tied to the file's lifecycle. This also renders the
fix in [1] obsolete, so revert it.
[1] commit c2dbe32d5db5 ("sched/psi: Fix use-after-free in ep_remove_wait_queue()")
Fixes:
|
||
|
d5dca19776 |
sched/psi: Allow unprivileged polling of N*2s period
[ Upstream commit d82caa273565b45fcf103148950549af76c314b0 ] PSI offers 2 mechanisms to get information about a specific resource pressure. One is reading from /proc/pressure/<resource>, which gives average pressures aggregated every 2s. The other is creating a pollable fd for a specific resource and cgroup. The trigger creation requires CAP_SYS_RESOURCE, and gives the possibility to pick specific time window and threshold, spawing an RT thread to aggregate the data. Systemd would like to provide containers the option to monitor pressure on their own cgroup and sub-cgroups. For example, if systemd launches a container that itself then launches services, the container should have the ability to poll() for pressure in individual services. But neither the container nor the services are privileged. This patch implements a mechanism to allow unprivileged users to create pressure triggers. The difference with privileged triggers creation is that unprivileged ones must have a time window that's a multiple of 2s. This is so that we can avoid unrestricted spawning of rt threads, and use instead the same aggregation mechanism done for the averages, which runs independently of any triggers. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20230330105418.77061-5-cerasuolodomenico@gmail.com Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
fb4bc32fc1 |
sched/psi: Extract update_triggers side effect
[ Upstream commit 4468fcae49f08e88fbbffe05b29496192df89991 ] This change moves update_total flag out of update_triggers function, currently called only in psi_poll_work. In the next patch, update_triggers will be called also in psi_avgs_work, but the total update information is specific to psi_poll_work. Returning update_total value to the caller let us avoid differentiating the implementation of update_triggers for different aggregators. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20230330105418.77061-4-cerasuolodomenico@gmail.com Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
c1623d4d0b |
sched/psi: Rename existing poll members in preparation
[ Upstream commit 65457b74aa9437418e552e8d52d7112d4f9901a6 ] Renaming in PSI implementation to make a clear distinction between privileged and unprivileged triggers code to be implemented in the next patch. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20230330105418.77061-3-cerasuolodomenico@gmail.com Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
c176dda0a6 |
sched/psi: Rearrange polling code in preparation
[ Upstream commit 7fab21fa0d000a0ea32d73ce8eec68557c6c268b ] Move a few functions up in the file to avoid forward declaration needed in the patch implementing unprivileged PSI triggers. Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Link: https://lore.kernel.org/r/20230330105418.77061-2-cerasuolodomenico@gmail.com Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
7d8bba4da1 |
sched/psi: Fix avgs_work re-arm in psi_avgs_work()
[ Upstream commit 2fcd7bbae90a6d844da8660a9d27079281dfbba2 ]
Pavan reported a problem that PSI avgs_work idle shutoff is not
working at all. Because PSI_NONIDLE condition would be observed in
psi_avgs_work()->collect_percpu_times()->get_recent_times() even if
only the kworker running avgs_work on the CPU.
Although commit
|
||
|
45f739e8fb |
sched/fair: Use recent_used_cpu to test p->cpus_ptr
[ Upstream commit ae2ad293d6be143ad223f5f947cca07bcbe42595 ]
When checking whether a recently used CPU can be a potential idle
candidate, recent_used_cpu should be used to test p->cpus_ptr as
p->recent_used_cpu is not equal to recent_used_cpu and candidate
decision is made based on recent_used_cpu here.
Fixes:
|
||
|
c006fe361c |
bpf: Address KCSAN report on bpf_lru_list
[ Upstream commit ee9fd0ac3017c4313be91a220a9ac4c99dde7ad4 ] KCSAN reported a data-race when accessing node->ref. Although node->ref does not have to be accurate, take this chance to use a more common READ_ONCE() and WRITE_ONCE() pattern instead of data_race(). There is an existing bpf_lru_node_is_ref() and bpf_lru_node_set_ref(). This patch also adds bpf_lru_node_clear_ref() to do the WRITE_ONCE(node->ref, 0) also. ================================================================== BUG: KCSAN: data-race in __bpf_lru_list_rotate / __htab_lru_percpu_map_update_elem write to 0xffff888137038deb of 1 bytes by task 11240 on cpu 1: __bpf_lru_node_move kernel/bpf/bpf_lru_list.c:113 [inline] __bpf_lru_list_rotate_active kernel/bpf/bpf_lru_list.c:149 [inline] __bpf_lru_list_rotate+0x1bf/0x750 kernel/bpf/bpf_lru_list.c:240 bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:329 [inline] bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline] bpf_lru_pop_free+0x638/0xe20 kernel/bpf/bpf_lru_list.c:499 prealloc_lru_pop kernel/bpf/hashtab.c:290 [inline] __htab_lru_percpu_map_update_elem+0xe7/0x820 kernel/bpf/hashtab.c:1316 bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313 bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200 generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687 bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534 __sys_bpf+0x338/0x810 __do_sys_bpf kernel/bpf/syscall.c:5096 [inline] __se_sys_bpf kernel/bpf/syscall.c:5094 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd read to 0xffff888137038deb of 1 bytes by task 11241 on cpu 0: bpf_lru_node_set_ref kernel/bpf/bpf_lru_list.h:70 [inline] __htab_lru_percpu_map_update_elem+0x2f1/0x820 kernel/bpf/hashtab.c:1332 bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313 bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200 generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687 bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534 __sys_bpf+0x338/0x810 __do_sys_bpf kernel/bpf/syscall.c:5096 [inline] __se_sys_bpf kernel/bpf/syscall.c:5094 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094 do_syscall_x64 arch/x86/entry/common.c:50 [inline] do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80 entry_SYSCALL_64_after_hwframe+0x63/0xcd value changed: 0x01 -> 0x00 Reported by Kernel Concurrency Sanitizer on: CPU: 0 PID: 11241 Comm: syz-executor.3 Not tainted 6.3.0-rc7-syzkaller-00136-g6a66fdd29ea1 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023 ================================================================== Reported-by: syzbot+ebe648a84e8784763f82@syzkaller.appspotmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/20230511043748.1384166-1-martin.lau@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
10fa03a9c1 |
bpf: Print a warning only if writing to unprivileged_bpf_disabled.
[ Upstream commit fedf99200ab086c42a572fca1d7266b06cdc3e3f ] Only print the warning message if you are writing to "/proc/sys/kernel/unprivileged_bpf_disabled". The kernel may print an annoying warning when you read "/proc/sys/kernel/unprivileged_bpf_disabled" saying WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible via Spectre v2 BHB attacks! However, this message is only meaningful when the feature is disabled or enabled. Signed-off-by: Kui-Feng Lee <kuifeng@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20230502181418.308479-1-kuifeng@meta.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
78a5f711ef |
sched/fair: Don't balance task to its current running CPU
[ Upstream commit 0dd37d6dd33a9c23351e6115ae8cdac7863bc7de ] We've run into the case that the balancer tries to balance a migration disabled task and trigger the warning in set_task_cpu() like below: ------------[ cut here ]------------ WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240 Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip> CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G O 6.1.0-rc4+ #1 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021 pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : set_task_cpu+0x188/0x240 lr : load_balance+0x5d0/0xc60 sp : ffff80000803bc70 x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040 x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001 x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78 x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000 x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000 x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000 x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530 x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001 Call trace: set_task_cpu+0x188/0x240 load_balance+0x5d0/0xc60 rebalance_domains+0x26c/0x380 _nohz_idle_balance.isra.0+0x1e0/0x370 run_rebalance_domains+0x6c/0x80 __do_softirq+0x128/0x3d8 ____do_softirq+0x18/0x24 call_on_irq_stack+0x2c/0x38 do_softirq_own_stack+0x24/0x3c __irq_exit_rcu+0xcc/0xf4 irq_exit_rcu+0x18/0x24 el1_interrupt+0x4c/0xe4 el1h_64_irq_handler+0x18/0x2c el1h_64_irq+0x74/0x78 arch_cpu_idle+0x18/0x4c default_idle_call+0x58/0x194 do_idle+0x244/0x2b0 cpu_startup_entry+0x30/0x3c secondary_start_kernel+0x14c/0x190 __secondary_switched+0xb0/0xb4 ---[ end trace 0000000000000000 ]--- Further investigation shows that the warning is superfluous, the migration disabled task is just going to be migrated to its current running CPU. This is because that on load balance if the dst_cpu is not allowed by the task, we'll re-select a new_dst_cpu as a candidate. If no task can be balanced to dst_cpu we'll try to balance the task to the new_dst_cpu instead. In this case when the migration disabled task is not on CPU it only allows to run on its current CPU, load balance will select its current CPU as new_dst_cpu and later triggers the warning above. The new_dst_cpu is chosen from the env->dst_grpmask. Currently it contains CPUs in sched_group_span() and if we have overlapped groups it's possible to run into this case. This patch makes env->dst_grpmask of group_balance_mask() which exclude any CPUs from the busiest group and solve the issue. For balancing in a domain with no overlapped groups the behaviour keeps same as before. Suggested-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lore.kernel.org/r/20230530082507.10444-1-yangyicong@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
896f4d6046 |
rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
[ Upstream commit 9146eb25495ea8bfb5010192e61e3ed5805ce9ef ] The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated only on the instance corresponding to the current CPU, but can be read more widely. Unmarked accesses are OK from the corresponding CPU, but only if interrupts are disabled, given that interrupt handlers can and do modify this field. Unfortunately, although the load from rcu_preempt_deferred_qs() is always carried out from the corresponding CPU, interrupts are not necessarily disabled. This commit therefore upgrades this load to READ_ONCE. Similarly, the diagnostic access from synchronize_rcu_expedited_wait() might run with interrupts disabled and from some other CPU. This commit therefore marks this load with data_race(). Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as is because interrupts are disabled and this load is always from the corresponding CPU. This commit adds a comment giving the rationale for this access being safe. This data race was reported by KCSAN. Not appropriate for backporting due to failure being unlikely. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
9027d69221 |
rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
[ Upstream commit 5fc8cbe4cf0fd34ded8045c385790c3bf04f6785 ] pr_info() is called with rtp->cbs_gbl_lock spin lock locked. Because pr_info() calls printk() that might sleep, this will result in BUG like below: [ 0.206455] cblist_init_generic: Setting adjustable number of callback queues. [ 0.206463] [ 0.206464] ============================= [ 0.206464] [ BUG: Invalid wait context ] [ 0.206465] 5.19.0-00428-g9de1f9c8ca51 #5 Not tainted [ 0.206466] ----------------------------- [ 0.206466] swapper/0/1 is trying to lock: [ 0.206467] ffffffffa0167a58 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x327/0x4a0 [ 0.206473] other info that might help us debug this: [ 0.206473] context-{5:5} [ 0.206474] 3 locks held by swapper/0/1: [ 0.206474] #0: ffffffff9eb597e0 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1f0 [ 0.206478] #1: ffffffff9eb579c0 (console_lock){+.+.}-{0:0}, at: _printk+0x63/0x7e [ 0.206482] #2: ffffffff9ea77780 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.0+0x111/0x330 [ 0.206485] stack backtrace: [ 0.206486] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-00428-g9de1f9c8ca51 #5 [ 0.206488] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014 [ 0.206489] Call Trace: [ 0.206490] <TASK> [ 0.206491] dump_stack_lvl+0x6a/0x9f [ 0.206493] __lock_acquire.cold+0x2d7/0x2fe [ 0.206496] ? stack_trace_save+0x46/0x70 [ 0.206497] lock_acquire+0xd1/0x2f0 [ 0.206499] ? serial8250_console_write+0x327/0x4a0 [ 0.206500] ? __lock_acquire+0x5c7/0x2720 [ 0.206502] _raw_spin_lock_irqsave+0x3d/0x90 [ 0.206504] ? serial8250_console_write+0x327/0x4a0 [ 0.206506] serial8250_console_write+0x327/0x4a0 [ 0.206508] console_emit_next_record.constprop.0+0x180/0x330 [ 0.206511] console_unlock+0xf7/0x1f0 [ 0.206512] vprintk_emit+0xf7/0x330 [ 0.206514] _printk+0x63/0x7e [ 0.206516] cblist_init_generic.constprop.0.cold+0x24/0x32 [ 0.206518] rcu_init_tasks_generic+0x5/0xd9 [ 0.206522] kernel_init_freeable+0x15b/0x2a2 [ 0.206523] ? rest_init+0x160/0x160 [ 0.206526] kernel_init+0x11/0x120 [ 0.206527] ret_from_fork+0x1f/0x30 [ 0.206530] </TASK> [ 0.207018] cblist_init_generic: Setting shift to 1 and lim to 1. This patch moves pr_info() so that it is called without rtp->cbs_gbl_lock locked. Signed-off-by: Shigeru Yoshida <syoshida@redhat.com> Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
544ae28cf6 |
ANDROID: Inherit "user-aware property" across rtmutex.
Since upstream commit
|
||
|
92e7a47840 |
sched/walt: Improve the scheduler
This change is for general scheduler improvement. Change-Id: Ib435f0ef2992a6eb526c47639e3a2b9edaf1bd2c Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
1ef7816a50 |
Merge branch 'android14-6.1' into 'android14-6.1-lts'
Catches the android14-6.1-lts branch up with the android14-6.1 branch which has had a lot of changes that are needed here to resolve future LTS merges and to ensure that the ABI is kept stable. It contains the following commits: * |
||
|
bf01a58857 |
Merge "Merge keystone/android14-6.1-keystone-qcom-release.6.1.25 (8823053 ) into qcom-6.1"
|
||
|
0abc74db1a |
ANDROID: GKI: Move GKI module headers to generated includes
Change build time generated GKI module headers location From :- kernel/module/gki_module_*.h To :- include/generated/gki_module_*.h This prevents the kernel source from being contaminated. By placing the header files in a generated directory, the default filters that ignore certain files will work without any special handling required. Bug: 286529877 Test: Manual verification & TH Change-Id: Ie247d1c132ddae54906de2e2850e95d7ae9edd50 Signed-off-by: Ramji Jiyani <ramjiyani@google.com> (cherry picked from commit e9cba885543fc50a5b59ff7234d02b74a380573c) |
||
|
ff06cd411a |
swiotlb: mark swiotlb_memblock_alloc() as __init
commit 9b07d27d0fbb7f7441aa986859a0f53ec93a0335 upstream. swiotlb_memblock_alloc() calls memblock_alloc(), which calls (__init) memblock_alloc_try_nid(). However, swiotlb_membloc_alloc() can be marked as __init since it is only called by swiotlb_init_remap(), which is already marked as __init. This prevents a modpost build warning/error: WARNING: modpost: vmlinux.o: section mismatch in reference: swiotlb_memblock_alloc (section: .text) -> memblock_alloc_try_nid (section: .init.text) WARNING: modpost: vmlinux.o: section mismatch in reference: swiotlb_memblock_alloc (section: .text) -> memblock_alloc_try_nid (section: .init.text) This fixes the build warning/error seen on ARM64, PPC64, S390, i386, and x86_64. Fixes: 8d58aa484920 ("swiotlb: reduce the swiotlb buffer size on allocation failure") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Alexey Kardashevskiy <aik@amd.com> Cc: Christoph Hellwig <hch@lst.de> Cc: iommu@lists.linux.dev Cc: Mike Rapoport <rppt@kernel.org> Cc: linux-mm@kvack.org Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
95e34129f3 |
tracing/user_events: Fix struct arg size match check
commit d0a3022f30629a208e5944022caeca3568add9e7 upstream.
When users register an event the name of the event and it's argument are
checked to ensure they match if the event already exists. Normally all
arguments are in the form of "type name", except for when the type
starts with "struct ". In those cases, the size of the struct is passed
in addition to the name, IE: "struct my_struct a 20" for an argument
that is of type "struct my_struct" with a field name of "a" and has the
size of 20 bytes.
The current code does not honor the above case properly when comparing
a match. This causes the event register to fail even when the same
string was used for events that contain a struct argument within them.
The example above "struct my_struct a 20" generates a match string of
"struct my_struct a" omitting the size field.
Add the struct size of the existing field when generating a comparison
string for a struct field to ensure proper match checking.
Link: https://lkml.kernel.org/r/20230629235049.581-2-beaub@linux.microsoft.com
Cc: stable@vger.kernel.org
Fixes:
|
||
|
a95c1fede2 |
tracing/probes: Fix to update dynamic data counter if fetcharg uses it
commit e38e2c6a9efc435f9de344b7c91f7697e01b47d5 upstream.
Fix to update dynamic data counter ('dyndata') and max length ('maxlen')
only if the fetcharg uses the dynamic data. Also get out arg->dynamic
from unlikely(). This makes dynamic data address wrong if
process_fetch_insn() returns error on !arg->dynamic case.
Link: https://lore.kernel.org/all/168908494781.123124.8160245359962103684.stgit@devnote2/
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20230710233400.5aaf024e@gandalf.local.home/
Fixes:
|
||
|
837f92d27f |
tracing/probes: Fix not to count error code to total length
commit b41326b5e0f82e93592c4366359917b5d67b529f upstream.
Fix not to count the error code (which is minus value) to the total
used length of array, because it can mess up the return code of
process_fetch_insn_bottom(). Also clear the 'ret' value because it
will be used for calculating next data_loc entry.
Link: https://lore.kernel.org/all/168908493827.123124.2175257289106364229.stgit@devnote2/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/8819b154-2ba1-43c3-98a2-cbde20892023@moroto.mountain/
Fixes:
|
||
|
938d5b7a75 |
tracing: Fix null pointer dereference in tracing_err_log_open()
commit 02b0095e2fbbc060560c1065f86a211d91e27b26 upstream.
Fix an issue in function 'tracing_err_log_open'.
The function doesn't call 'seq_open' if the file is opened only with
write permissions, which results in 'file->private_data' being left as null.
If we then use 'lseek' on that opened file, 'seq_lseek' dereferences
'file->private_data' in 'mutex_lock(&m->lock)', resulting in a kernel panic.
Writing to this node requires root privileges, therefore this bug
has very little security impact.
Tracefs node: /sys/kernel/tracing/error_log
Example Kernel panic:
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
Call trace:
mutex_lock+0x30/0x110
seq_lseek+0x34/0xb8
__arm64_sys_lseek+0x6c/0xb8
invoke_syscall+0x58/0x13c
el0_svc_common+0xc4/0x10c
do_el0_svc+0x24/0x98
el0_svc+0x24/0x88
el0t_64_sync_handler+0x84/0xe4
el0t_64_sync+0x1b4/0x1b8
Code: d503201f aa0803e0 aa1f03e1 aa0103e9 (c8e97d02)
---[ end trace 561d1b49c12cf8a5 ]---
Kernel panic - not syncing: Oops: Fatal exception
Link: https://lore.kernel.org/linux-trace-kernel/20230703155237eucms1p4dfb6a19caa14c79eb6c823d127b39024@eucms1p4
Link: https://lore.kernel.org/linux-trace-kernel/20230704102706eucms1p30d7ecdcc287f46ad67679fc8491b2e0f@eucms1p3
Cc: stable@vger.kernel.org
Fixes:
|
||
|
fbcd0c2b56 |
fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
commit 195b9cb5b288fec1c871ef89f78cc9a7461aad3a upstream. Ensure running fprobe_exit_handler() has finished before calling rethook_free() in the unregister_fprobe() so that caller can free the fprobe right after unregister_fprobe(). unregister_fprobe() ensured that all running fprobe_entry/exit_handler() have finished by calling unregister_ftrace_function() which synchronizes RCU. But commit 5f81018753df ("fprobe: Release rethook after the ftrace_ops is unregistered") changed to call rethook_free() after unregister_ftrace_function(). So call rethook_stop() to make rethook disabled before unregister_ftrace_function() and ensure it again. Here is the possible code flow that can call the exit handler after unregister_fprobe(). ------ CPU1 CPU2 call unregister_fprobe(fp) ... __fprobe_handler() rethook_hook() on probed function unregister_ftrace_function() return from probed function rethook hooks find rh->handler == fprobe_exit_handler call fprobe_exit_handler() rethook_free(): set rh->handler = NULL; return from unreigster_fprobe; call fp->exit_handler() <- (*) ------ (*) At this point, the exit handler is called after returning from unregister_fprobe(). This fixes it as following; ------ CPU1 CPU2 call unregister_fprobe() ... rethook_stop(): set rh->handler = NULL; __fprobe_handler() rethook_hook() on probed function unregister_ftrace_function() return from probed function rethook hooks find rh->handler == NULL return from rethook rethook_free() return from unreigster_fprobe; ------ Link: https://lore.kernel.org/all/168873859949.156157.13039240432299335849.stgit@devnote2/ Fixes: 5f81018753df ("fprobe: Release rethook after the ftrace_ops is unregistered") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
ce3ec57faf |
fprobe: Release rethook after the ftrace_ops is unregistered
commit 5f81018753dfd4989e33ece1f0cb6b8aae498b82 upstream.
While running bpf selftests it's possible to get following fault:
general protection fault, probably for non-canonical address \
0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
...
Call Trace:
<TASK>
fprobe_handler+0xc1/0x270
? __pfx_bpf_testmod_init+0x10/0x10
? __pfx_bpf_testmod_init+0x10/0x10
? bpf_fentry_test1+0x5/0x10
? bpf_fentry_test1+0x5/0x10
? bpf_testmod_init+0x22/0x80
? do_one_initcall+0x63/0x2e0
? rcu_is_watching+0xd/0x40
? kmalloc_trace+0xaf/0xc0
? do_init_module+0x60/0x250
? __do_sys_finit_module+0xac/0x120
? do_syscall_64+0x37/0x90
? entry_SYSCALL_64_after_hwframe+0x72/0xdc
</TASK>
In unregister_fprobe function we can't release fp->rethook while it's
possible there are some of its users still running on another cpu.
Moving rethook_free call after fp->ops is unregistered with
unregister_ftrace_function call.
Link: https://lore.kernel.org/all/20230615115236.3476617-1-jolsa@kernel.org/
Fixes:
|
||
|
9a2c57fd32 |
PM: QoS: Restore support for default value on frequency QoS
commit 3a8395b565b5b4f019b3dc182be4c4541eb35ac8 upstream. Commit |
||
|
99fe81d219 |
ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
commit 26efd79c4624294e553aeaa3439c646729bad084 upstream. As comments in ftrace_process_locs(), there may be NULL pointers in mcount_loc section: > Some architecture linkers will pad between > the different mcount_loc sections of different > object files to satisfy alignments. > Skip any NULL pointers. After commit |
||
|
8b0b63fdac |
ring-buffer: Fix deadloop issue on reading trace_pipe
commit 7e42907f3a7b4ce3a2d1757f6d78336984daf8f5 upstream.
Soft lockup occurs when reading file 'trace_pipe':
watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [cat:4488]
[...]
RIP: 0010:ring_buffer_empty_cpu+0xed/0x170
RSP: 0018:ffff88810dd6fc48 EFLAGS: 00000246
RAX: 0000000000000000 RBX: 0000000000000246 RCX: ffffffff93d1aaeb
RDX: ffff88810a280040 RSI: 0000000000000008 RDI: ffff88811164b218
RBP: ffff88811164b218 R08: 0000000000000000 R09: ffff88815156600f
R10: ffffed102a2acc01 R11: 0000000000000001 R12: 0000000051651901
R13: 0000000000000000 R14: ffff888115e49500 R15: 0000000000000000
[...]
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f8d853c2000 CR3: 000000010dcd8000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
__find_next_entry+0x1a8/0x4b0
? peek_next_entry+0x250/0x250
? down_write+0xa5/0x120
? down_write_killable+0x130/0x130
trace_find_next_entry_inc+0x3b/0x1d0
tracing_read_pipe+0x423/0xae0
? tracing_splice_read_pipe+0xcb0/0xcb0
vfs_read+0x16b/0x490
ksys_read+0x105/0x210
? __ia32_sys_pwrite64+0x200/0x200
? switch_fpu_return+0x108/0x220
do_syscall_64+0x33/0x40
entry_SYSCALL_64_after_hwframe+0x61/0xc6
Through the vmcore, I found it's because in tracing_read_pipe(),
ring_buffer_empty_cpu() found some buffer is not empty but then it
cannot read anything due to "rb_num_of_entries() == 0" always true,
Then it infinitely loop the procedure due to user buffer not been
filled, see following code path:
tracing_read_pipe() {
... ...
waitagain:
tracing_wait_pipe() // 1. find non-empty buffer here
trace_find_next_entry_inc() // 2. loop here try to find an entry
__find_next_entry()
ring_buffer_empty_cpu(); // 3. find non-empty buffer
peek_next_entry() // 4. but peek always return NULL
ring_buffer_peek()
rb_buffer_peek()
rb_get_reader_page()
// 5. because rb_num_of_entries() == 0 always true here
// then return NULL
// 6. user buffer not been filled so goto 'waitgain'
// and eventually leads to an deadloop in kernel!!!
}
By some analyzing, I found that when resetting ringbuffer, the 'entries'
of its pages are not all cleared (see rb_reset_cpu()). Then when reducing
the ringbuffer, and if some reduced pages exist dirty 'entries' data, they
will be added into 'cpu_buffer->overrun' (see rb_remove_pages()), which
cause wrong 'overrun' count and eventually cause the deadloop issue.
To fix it, we need to clear every pages in rb_reset_cpu().
Link: https://lore.kernel.org/linux-trace-kernel/20230708225144.3785600-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes:
|
||
|
be970e22c5 |
tracing: Fix memory leak of iter->temp when reading trace_pipe
commit d5a821896360cc8b93a15bd888fabc858c038dc0 upstream.
kmemleak reports:
unreferenced object 0xffff88814d14e200 (size 256):
comm "cat", pid 336, jiffies 4294871818 (age 779.490s)
hex dump (first 32 bytes):
04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00 ................
0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff .........Z......
backtrace:
[<ffffffff9bdff18f>] __kmalloc+0x4f/0x140
[<ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
[<ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
[<ffffffff9bc94490>] print_trace_line+0x3e0/0x950
[<ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
[<ffffffff9bf03a43>] vfs_read+0x143/0x520
[<ffffffff9bf04c2d>] ksys_read+0xbd/0x160
[<ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
[<ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
in trace_find_next_entry() but not freed before 'trace_pipe' is closed.
To fix it, free 'iter->temp' in tracing_release_pipe().
Link: https://lore.kernel.org/linux-trace-kernel/20230713141435.1133021-1-zhengyejian1@huawei.com
Cc: stable@vger.kernel.org
Fixes:
|
||
|
5fd32eb6fa |
tracing/histograms: Add histograms to hist_vars if they have referenced variables
commit 6018b585e8c6fa7d85d4b38d9ce49a5b67be7078 upstream.
Hist triggers can have referenced variables without having direct
variables fields. This can be the case if referenced variables are added
for trigger actions. In this case the newly added references will not
have field variables. Not taking such referenced variables into
consideration can result in a bug where it would be possible to remove
hist trigger with variables being refenced. This will result in a bug
that is easily reproducable like so
$ cd /sys/kernel/tracing
$ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
$ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
$ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
$ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
[ 100.263533] ==================================================================
[ 100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
[ 100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439
[ 100.266320]
[ 100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
[ 100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[ 100.268561] Call Trace:
[ 100.268902] <TASK>
[ 100.269189] dump_stack_lvl+0x4c/0x70
[ 100.269680] print_report+0xc5/0x600
[ 100.270165] ? resolve_var_refs+0xc7/0x180
[ 100.270697] ? kasan_complete_mode_report_info+0x80/0x1f0
[ 100.271389] ? resolve_var_refs+0xc7/0x180
[ 100.271913] kasan_report+0xbd/0x100
[ 100.272380] ? resolve_var_refs+0xc7/0x180
[ 100.272920] __asan_load8+0x71/0xa0
[ 100.273377] resolve_var_refs+0xc7/0x180
[ 100.273888] event_hist_trigger+0x749/0x860
[ 100.274505] ? kasan_save_stack+0x2a/0x50
[ 100.275024] ? kasan_set_track+0x29/0x40
[ 100.275536] ? __pfx_event_hist_trigger+0x10/0x10
[ 100.276138] ? ksys_write+0xd1/0x170
[ 100.276607] ? do_syscall_64+0x3c/0x90
[ 100.277099] ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.277771] ? destroy_hist_data+0x446/0x470
[ 100.278324] ? event_hist_trigger_parse+0xa6c/0x3860
[ 100.278962] ? __pfx_event_hist_trigger_parse+0x10/0x10
[ 100.279627] ? __kasan_check_write+0x18/0x20
[ 100.280177] ? mutex_unlock+0x85/0xd0
[ 100.280660] ? __pfx_mutex_unlock+0x10/0x10
[ 100.281200] ? kfree+0x7b/0x120
[ 100.281619] ? ____kasan_slab_free+0x15d/0x1d0
[ 100.282197] ? event_trigger_write+0xac/0x100
[ 100.282764] ? __kasan_slab_free+0x16/0x20
[ 100.283293] ? __kmem_cache_free+0x153/0x2f0
[ 100.283844] ? sched_mm_cid_remote_clear+0xb1/0x250
[ 100.284550] ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
[ 100.285221] ? event_trigger_write+0xbc/0x100
[ 100.285781] ? __kasan_check_read+0x15/0x20
[ 100.286321] ? __bitmap_weight+0x66/0xa0
[ 100.286833] ? _find_next_bit+0x46/0xe0
[ 100.287334] ? task_mm_cid_work+0x37f/0x450
[ 100.287872] event_triggers_call+0x84/0x150
[ 100.288408] trace_event_buffer_commit+0x339/0x430
[ 100.289073] ? ring_buffer_event_data+0x3f/0x60
[ 100.292189] trace_event_raw_event_sys_enter+0x8b/0xe0
[ 100.295434] syscall_trace_enter.constprop.0+0x18f/0x1b0
[ 100.298653] syscall_enter_from_user_mode+0x32/0x40
[ 100.301808] do_syscall_64+0x1a/0x90
[ 100.304748] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[ 100.307775] RIP: 0033:0x7f686c75c1cb
[ 100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
[ 100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021
[ 100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb
[ 100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a
[ 100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a
[ 100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
[ 100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007
[ 100.338381] </TASK>
We hit the bug because when second hist trigger has was created
has_hist_vars() returned false because hist trigger did not have
variables. As a result of that save_hist_vars() was not called to add
the trigger to trace_array->hist_vars. Later on when we attempted to
remove the first histogram find_any_var_ref() failed to detect it is
being used because it did not find the second trigger in hist_vars list.
With this change we wait until trigger actions are created so we can take
into consideration if hist trigger has variable references. Also, now we
check the return value of save_hist_vars() and fail trigger creation if
save_hist_vars() fails.
Link: https://lore.kernel.org/linux-trace-kernel/20230712223021.636335-1-mkhalfella@purestorage.com
Cc: stable@vger.kernel.org
Fixes:
|
||
|
5aea2ac374 |
tracing/user_events: Fix incorrect return value for writing operation when events are disabled
commit f6d026eea390d59787a6cdc2ef5c983d02e029d0 upstream.
The writing operation return the count of writes regardless of whether events
are enabled or disabled. Switch it to return -EBADF to indicates that the event
is disabled.
Link: https://lkml.kernel.org/r/20230626111344.19136-2-sunliming@kylinos.cn
Cc: stable@vger.kernel.org
|
||
|
b11a9b4f28 |
bpf: cpumap: Fix memory leak in cpu_map_update_elem
[ Upstream commit 4369016497319a9635702da010d02af1ebb1849d ]
Syzkaller reported a memory leak as follows:
BUG: memory leak
unreferenced object 0xff110001198ef748 (size 192):
comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
hex dump (first 32 bytes):
00 00 00 00 4a 19 00 00 80 ad e3 e4 fe ff c0 00 ....J...........
00 b2 d3 0c 01 00 11 ff 28 f5 8e 19 01 00 11 ff ........(.......
backtrace:
[<ffffffffadd28087>] __cpu_map_entry_alloc+0xf7/0xb00
[<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
[<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
[<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
[<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
[<ffffffffb029cc80>] do_syscall_64+0x30/0x40
[<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
BUG: memory leak
unreferenced object 0xff110001198ef528 (size 192):
comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffffadd281f0>] __cpu_map_entry_alloc+0x260/0xb00
[<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
[<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
[<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
[<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
[<ffffffffb029cc80>] do_syscall_64+0x30/0x40
[<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
BUG: memory leak
unreferenced object 0xff1100010fd93d68 (size 8):
comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
hex dump (first 8 bytes):
00 00 00 00 00 00 00 00 ........
backtrace:
[<ffffffffade5db3e>] kvmalloc_node+0x11e/0x170
[<ffffffffadd28280>] __cpu_map_entry_alloc+0x2f0/0xb00
[<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
[<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
[<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
[<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
[<ffffffffb029cc80>] do_syscall_64+0x30/0x40
[<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
In the cpu_map_update_elem flow, when kthread_stop is called before
calling the threadfn of rcpu->kthread, since the KTHREAD_SHOULD_STOP bit
of kthread has been set by kthread_stop, the threadfn of rcpu->kthread
will never be executed, and rcpu->refcnt will never be 0, which will
lead to the allocated rcpu, rcpu->queue and rcpu->queue->queue cannot be
released.
Calling kthread_stop before executing kthread's threadfn will return
-EINTR. We can complete the release of memory resources in this state.
Fixes:
|
||
|
f92a82dc48 |
kernel/trace: Fix cleanup logic of enable_trace_eprobe
[ Upstream commit cf0a624dc706c306294c14e6b3e7694702f25191 ]
The enable_trace_eprobe() function enables all event probes, attached
to given trace probe. If an error occurs in enabling one of the event
probes, all others should be roll backed. There is a bug in that roll
back logic - instead of all event probes, only the failed one is
disabled.
Link: https://lore.kernel.org/all/20230703042853.1427493-1-tz.stoyanov@gmail.com/
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes:
|
||
|
b2e74dedb0 |
bpf: Fix max stack depth check for async callbacks
[ Upstream commit 5415ccd50a8620c8cbaa32d6f18c946c453566f5 ]
The check_max_stack_depth pass happens after the verifier's symbolic
execution, and attempts to walk the call graph of the BPF program,
ensuring that the stack usage stays within bounds for all possible call
chains. There are two cases to consider: bpf_pseudo_func and
bpf_pseudo_call. In the former case, the callback pointer is loaded into
a register, and is assumed that it is passed to some helper later which
calls it (however there is no way to be sure), but the check remains
conservative and accounts the stack usage anyway. For this particular
case, asynchronous callbacks are skipped as they execute asynchronously
when their corresponding event fires.
The case of bpf_pseudo_call is simpler and we know that the call is
definitely made, hence the stack depth of the subprog is accounted for.
However, the current check still skips an asynchronous callback even if
a bpf_pseudo_call was made for it. This is erroneous, as it will miss
accounting for the stack usage of the asynchronous callback, which can
be used to breach the maximum stack depth limit.
Fix this by only skipping asynchronous callbacks when the instruction is
not a pseudo call to the subprog.
Fixes:
|
||
|
fd5b64c1cf |
swiotlb: reduce the number of areas to match actual memory pool size
[ Upstream commit 8ac04063354a01a484d2e55d20ed1958aa0d3392 ]
Although the desired size of the SWIOTLB memory pool is increased in
swiotlb_adjust_nareas() to match the number of areas, the actual allocation
may be smaller, which may require reducing the number of areas.
For example, Xen uses swiotlb_init_late(), which in turn uses the page
allocator. On x86, page size is 4 KiB and MAX_ORDER is 10 (1024 pages),
resulting in a maximum memory pool size of 4 MiB. This corresponds to 2048
slots of 2 KiB each. The minimum area size is 128 (IO_TLB_SEGSIZE),
allowing at most 2048 / 128 = 16 areas.
If num_possible_cpus() is greater than the maximum number of areas, areas
are smaller than IO_TLB_SEGSIZE and contiguous groups of free slots will
span multiple areas. When allocating and freeing slots, only one area will
be properly locked, causing race conditions on the unlocked slots and
ultimately data corruption, kernel hangs and crashes.
Fixes:
|
||
|
fc3db7fbdf |
swiotlb: reduce the swiotlb buffer size on allocation failure
[ Upstream commit 8d58aa484920c4f9be4834a7aeb446cdced21a37 ] At the moment the AMD encrypted platform reserves 6% of RAM for SWIOTLB or 1GB, whichever is less. However it is possible that there is no block big enough in the low memory which make SWIOTLB allocation fail and the kernel continues without DMA. In such case a VM hangs on DMA. This moves alloc+remap to a helper and calls it from a loop where the size is halved on each iteration. This updates default_nslabs on successful allocation which looks like an oversight as not doing so should have broken callers of swiotlb_size_or_default(). Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Stable-dep-of: 8ac04063354a ("swiotlb: reduce the number of areas to match actual memory pool size") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
24b24863a0 |
swiotlb: always set the number of areas before allocating the pool
[ Upstream commit aabd12609f91155f26584508b01f548215cc3c0c ]
The number of areas defaults to the number of possible CPUs. However, the
total number of slots may have to be increased after adjusting the number
of areas. Consequently, the number of areas must be determined before
allocating the memory pool. This is even explained with a comment in
swiotlb_init_remap(), but swiotlb_init_late() adjusts the number of areas
after slots are already allocated. The areas may end up being smaller than
IO_TLB_SEGSIZE, which breaks per-area locking.
While fixing swiotlb_init_late(), move all relevant comments before the
definition of swiotlb_adjust_nareas() and convert them to kernel-doc.
Fixes:
|
||
|
2d57a1590f |
workqueue: clean up WORK_* constant types, clarify masking
commit afa4bb778e48d79e4a642ed41e3b4e0de7489a6c upstream. Dave Airlie reports that gcc-13.1.1 has started complaining about some of the workqueue code in 32-bit arm builds: kernel/workqueue.c: In function ‘get_work_pwq’: kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast] 713 | return (void *)(data & WORK_STRUCT_WQ_DATA_MASK); | ^ [ ... a couple of other cases ... ] and while it's not immediately clear exactly why gcc started complaining about it now, I suspect it's some C23-induced enum type handlign fixup in gcc-13 is the cause. Whatever the reason for starting to complain, the code and data types are indeed disgusting enough that the complaint is warranted. The wq code ends up creating various "helper constants" (like that WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of confused. The mask needs to be 'unsigned long', not some unspecified enum type. To make matters worse, the actual "mask and cast to a pointer" is repeated a couple of times, and the cast isn't even always done to the right pointer, but - as the error case above - to a 'void *' with then the compiler finishing the job. That's now how we roll in the kernel. So create the masks using the proper types rather than some ambiguous enumeration, and use a nice helper that actually does the type conversion in one well-defined place. Incidentally, this magically makes clang generate better code. That, admittedly, is really just a sign of clang having been seriously confused before, and cleaning up the typing unconfuses the compiler too. Reported-by: Dave Airlie <airlied@gmail.com> Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/ Cc: Arnd Bergmann <arnd@arndb.de> Cc: Tejun Heo <tj@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
7dd60ce804 |
ANDROID: vendor_hooks: add vendor hook to support SAGT
Add vendor hook of android_rvh_before_do_sched_yield Bug: 291726037 Change-Id: I1f2d65739a297812f279b83085e3680e40d4cb6e Signed-off-by: lijun14 <lijun14@xiaomi.corp-partner.google.com> |
||
|
c1e64563dc | Merge "sched/walt: Fix problem in do_freq_qos_request" | ||
|
ead247e5cc |
sched/walt: Combine common pipeline code for swapping with prime
In preparation for creating additional pipeline promotion cabilities, simplify the existing code by combining common portions between heavy and manual pipeline selection. Change-Id: I356d596398ad6cb695248820d6440676a3f19477 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
524f946fbc |
Merge branch 'android14-6.1' into 'android14-6.1-lts'
Catches the android14-6.1-lts branch up with the android14-6.1 branch which has had a lot of changes that are needed here to resolve future LTS merges and to ensure that the ABI is kept stable. It contains the following commits: abb897fe2f8e Merge branch 'android14-6.1' into 'android14-6.1-lts' |
||
|
219a9ec09d |
watch_queue: prevent dangling pipe pointer
commit 943211c87427f25bd22e0e63849fb486bb5f87fa upstream. NULL the dangling pipe reference while clearing watch_queue. If not done, a reference to a freed pipe remains in the watch_queue, as this function is called before freeing a pipe in free_pipe_info() (see line 834 of fs/pipe.c). The sole use of wqueue->defunct is for checking if the watch queue has been cleared, but wqueue->pipe is also NULLed while clearing. Thus, wqueue->defunct is superfluous, as wqueue->pipe can be checked for NULL. Hence, the former can be removed. Tested with keyutils testsuite. Cc: stable@vger.kernel.org # 6.1 Signed-off-by: Siddh Raman Pant <code@siddh.me> Acked-by: David Howells <dhowells@redhat.com> Message-Id: <20230605143616.640517-1-code@siddh.me> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
6baa6e4836 |
bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()
[ Upstream commit 3de4d22cc9ac7c9f38e10edcf54f9a8891a9c2aa ]
__register_btf_kfunc_id_set() assumes .BTF to be part of the module's .ko
file if CONFIG_DEBUG_INFO_BTF is enabled. If that's not the case, the
function prints an error message and return an error. As a result, such
modules cannot be loaded.
However, the section could be stripped out during a build process. It would
be better to let the modules loaded, because their basic functionalities
have no problem [0], though the BTF functionalities will not be supported.
Make the function to lower the level of the message from error to warn, and
return no error.
[0] https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion
Fixes:
|
||
|
081f642b31 |
kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures
[ Upstream commit 353e7300a1db928e427462f2745f9a2cd1625b3d ]
Activating KCSAN on a 32 bits architecture leads to the following
link-time failure:
LD .tmp_vmlinux.kallsyms1
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_load':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_load_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_store':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_store_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_exchange':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_exchange_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_add':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_add_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_sub':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_sub_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_and':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_and_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_or':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_or_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_xor':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_xor_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_nand':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_nand_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_strong':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_weak':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_val':
kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
32 bits architectures don't have 64 bits atomic builtins. Only
include DEFINE_TSAN_ATOMIC_OPS(64) on 64 bits architectures.
Fixes:
|
||
|
fd4f89302f |
kexec: fix a memory leak in crash_shrink_memory()
[ Upstream commit 1cba6c4309f03de570202c46f03df3f73a0d4c82 ] Patch series "kexec: enable kexec_crash_size to support two crash kernel regions". When crashkernel=X fails to reserve region under 4G, it will fall back to reserve region above 4G and a region of the default size will also be reserved under 4G. Unfortunately, /sys/kernel/kexec_crash_size only supports one crash kernel region now, the user cannot sense the low memory reserved by reading /sys/kernel/kexec_crash_size. Also, low memory cannot be freed by writing this file. For example: resource_size(crashk_res) = 512M resource_size(crashk_low_res) = 256M The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be 768M. When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB, which is incorrect. Since crashk_res manages the memory with high address and crashk_low_res manages the memory with low address, crashk_low_res is shrunken only when all crashk_res is shrunken. And because when there is only one crash kernel region, crashk_res is always used. Therefore, if all crashk_res is shrunken and crashk_low_res still exists, swap them. This patch (of 6): If the value of parameter 'new_size' is in the semi-open and semi-closed interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the calculation result of ram_res is: ram_res->start = crashk_res.end + 1 ram_res->end = crashk_res.end The operation of insert_resource() fails, and ram_res is not added to iomem_resource. As a result, the memory of the control block ram_res is leaked. In fact, on all architectures, the start address and size of crashk_res are already aligned by KEXEC_CRASH_MEM_ALIGN. Therefore, we do not need to round up crashk_res.start again. Instead, we should round up 'new_size' in advance. Link: https://lkml.kernel.org/r/20230527123439.772-1-thunder.leizhen@huawei.com Link: https://lkml.kernel.org/r/20230527123439.772-2-thunder.leizhen@huawei.com Fixes: |
||
|
6525435d14 |
watchdog/perf: more properly prevent false positives with turbo modes
[ Upstream commit 4379e59fe5665cfda737e45b8bf2f05321ef049c ]
Currently, in the watchdog_overflow_callback() we first check to see if
the watchdog had been touched and _then_ we handle the workaround for
turbo mode. This order should be reversed.
Specifically, "touching" the hardlockup detector's watchdog should avoid
lockups being detected for one period that should be roughly the same
regardless of whether we're running turbo or not. That means that we
should do the extra accounting for turbo _before_ we look at (and clear)
the global indicating that we've been touched.
NOTE: this fix is made based on code inspection. I am not aware of any
reports where the old code would have generated false positives. That
being said, this order seems more correct and also makes it easier down
the line to share code with the "buddy" hardlockup detector.
Link: https://lkml.kernel.org/r/20230519101840.v5.2.I843b0d1de3e096ba111a179f3adb16d576bef5c7@changeid
Fixes:
|
||
|
20109ddd5b |
bpf: Fix memleak due to fentry attach failure
[ Upstream commit 108598c39eefbedc9882273ac0df96127a629220 ]
If it fails to attach fentry, the allocated bpf trampoline image will be
left in the system. That can be verified by checking /proc/kallsyms.
This meamleak can be verified by a simple bpf program as follows:
SEC("fentry/trap_init")
int fentry_run()
{
return 0;
}
It will fail to attach trap_init because this function is freed after
kernel init, and then we can find the trampoline image is left in the
system by checking /proc/kallsyms.
$ tail /proc/kallsyms
ffffffffc0613000 t bpf_trampoline_6442453466_1 [bpf]
ffffffffc06c3000 t bpf_trampoline_6442453466_1 [bpf]
$ bpftool btf dump file /sys/kernel/btf/vmlinux | grep "FUNC 'trap_init'"
[2522] FUNC 'trap_init' type_id=119 linkage=static
$ echo $((6442453466 & 0x7fffffff))
2522
Note that there are two left bpf trampoline images, that is because the
libbpf will fallback to raw tracepoint if -EINVAL is returned.
Fixes:
|
||
|
8ea165e1f8 |
bpf: Remove bpf trampoline selector
[ Upstream commit 47e79cbeea4b3891ad476047f4c68543eb51c8e0 ] After commit |
||
|
c6a9fc82fe |
bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
[ Upstream commit 29ebbba7d46136cba324264e513a1e964ca16c0a ]
With the way the hooks implemented right now, we have a special
condition: optval larger than PAGE_SIZE will expose only first 4k into
BPF; any modifications to the optval are ignored. If the BPF program
doesn't handle this condition by resetting optlen to 0,
the userspace will get EFAULT.
The intention of the EFAULT was to make it apparent to the
developers that the program is doing something wrong.
However, this inadvertently might affect production workloads
with the BPF programs that are not too careful (i.e., returning EFAULT
for perfectly valid setsockopt/getsockopt calls).
Let's try to minimize the chance of BPF program screwing up userspace
by ignoring the output of those BPF programs (instead of returning
EFAULT to the userspace). pr_info_once those cases to
the dmesg to help with figuring out what's going wrong.
Fixes:
|
||
|
b8a6ba524d |
rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
[ Upstream commit 23fc8df26dead16687ae6eb47b0561a4a832e2f6 ]
Running the 'kfree_rcu_test' test case [1] results in a splat [2].
The root cause is the kfree_scale_thread thread(s) continue running
after unloading the rcuscale module. This commit fixes that isue by
invoking kfree_scale_cleanup() from rcu_scale_cleanup() when removing
the rcuscale module.
[1] modprobe rcuscale kfree_rcu_test=1
// After some time
rmmod rcuscale
rmmod torture
[2] BUG: unable to handle page fault for address: ffffffffc0601a87
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0
Oops: 0010 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 1798 Comm: kfree_scale_thr Not tainted 6.3.0-rc1-rcu+ #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
RIP: 0010:0xffffffffc0601a87
Code: Unable to access opcode bytes at 0xffffffffc0601a5d.
RSP: 0018:ffffb25bc2e57e18 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffffffffc061f0b6 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff962fd0de RDI: ffffffff962fd0de
RBP: ffffb25bc2e57ea8 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
R13: 0000000000000000 R14: 000000000000000a R15: 00000000001c1dbe
FS: 0000000000000000(0000) GS:ffff921fa2200000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc0601a5d CR3: 000000011de4c006 CR4: 0000000000370ee0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
? kvfree_call_rcu+0xf0/0x3a0
? kthread+0xf3/0x120
? kthread_complete_and_exit+0x20/0x20
? ret_from_fork+0x1f/0x30
</TASK>
Modules linked in: rfkill sunrpc ... [last unloaded: torture]
CR2: ffffffffc0601a87
---[ end trace 0000000000000000 ]---
Fixes:
|
||
|
3506e64ec1 |
rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
[ Upstream commit bf5ddd736509a7d9077c0b6793e6f0852214dbea ] This code-movement-only commit moves the rcu_scale_cleanup() and rcu_scale_shutdown() functions to follow kfree_scale_cleanup(). This is code movement is in preparation for a bug-fix patch that invokes kfree_scale_cleanup() from rcu_scale_cleanup(). Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Stable-dep-of: 23fc8df26dea ("rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
7a34922194 |
rcuscale: Move shutdown from wait_event() to wait_event_idle()
[ Upstream commit ef1ef3d47677dc191b88650a9f7f91413452cc1b ] The rcu_scale_shutdown() and kfree_scale_shutdown() kthreads/functions use wait_event() to wait for the rcuscale test to complete. However, each updater thread in such a test waits for at least 100 grace periods. If each grace period takes more than 1.2 seconds, which is long, but not insanely so, this can trigger the hung-task timeout. This commit therefore replaces those wait_event() calls with calls to wait_event_idle(), which do not trigger the hung-task timeout. Reported-by: kernel test robot <yujie.liu@intel.com> Reported-by: Liam Howlett <liam.howlett@oracle.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Yujie Liu <yujie.liu@intel.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Stable-dep-of: 23fc8df26dea ("rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
b1cdc56bc1 |
rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
[ Upstream commit 401b0de3ae4fa49d1014c8941e26d9a25f37e7cf ]
The rcu_tasks_invoke_cbs() function relies on queue_work_on() to silently
fall back to WORK_CPU_UNBOUND when the specified CPU is offline. However,
the queue_work_on() function's silent fallback mechanism relies on that
CPU having been online at some time in the past. When queue_work_on()
is passed a CPU that has never been online, workqueue lockups ensue,
which can be bad for your kernel's general health and well-being.
This commit therefore checks whether a given CPU has ever been online,
and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to
queue_work_on(). Why not simply omit the queue_work_on() call entirely?
Because this function is flooding callback-invocation notifications
to all CPUs, and must deal with possibilities that include a sparse
cpu_possible_mask.
This commit also moves the setting of the rcu_data structure's
->beenonline field to rcu_cpu_starting(), which executes on the
incoming CPU before that CPU has ever enabled interrupts. This ensures
that the required workqueues are present. In addition, because the
incoming CPU has not yet enabled its interrupts, there cannot yet have
been any softirq handlers running on this CPU, which means that the
WARN_ON_ONCE(!rdp->beenonline) within the RCU_SOFTIRQ handler cannot
have triggered yet.
Fixes:
|
||
|
d58f0f0ce6 |
rcu: Make rcu_cpu_starting() rely on interrupts being disabled
[ Upstream commit 15d44dfa40305da1648de4bf001e91cc63148725 ] Currently, rcu_cpu_starting() is written so that it might be invoked with interrupts enabled. However, it is always called when interrupts are disabled, either by rcu_init(), notify_cpu_starting(), or from a call point prior to the call to notify_cpu_starting(). But why bother requiring that interrupts be disabled? The purpose is to allow the rcu_data structure's ->beenonline flag to be set after all early processing has completed for the incoming CPU, thus allowing this flag to be used to determine when workqueues have been set up for the incoming CPU, while still allowing this flag to be used as a diagnostic within rcu_core(). This commit therefore makes rcu_cpu_starting() rely on interrupts being disabled. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Stable-dep-of: 401b0de3ae4f ("rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs") Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
77cc52f1b8 |
tick/rcu: Fix bogus ratelimit condition
[ Upstream commit a7e282c77785c7eabf98836431b1f029481085ad ]
The ratelimit logic in report_idle_softirq() is broken because the
exit condition is always true:
static int ratelimit;
if (ratelimit < 10)
return false; ---> always returns here
ratelimit++; ---> no chance to run
Make it check for >= 10 instead.
Fixes:
|
||
|
e7aff15ba2 |
posix-timers: Prevent RT livelock in itimer_delete()
[ Upstream commit 9d9e522010eb5685d8b53e8a24320653d9d4cbbf ]
itimer_delete() has a retry loop when the timer is concurrently expired. On
non-RT kernels this just spin-waits until the timer callback has completed,
except for posix CPU timers which have HAVE_POSIX_CPU_TIMERS_TASK_WORK
enabled.
In that case and on RT kernels the existing task could live lock when
preempting the task which does the timer delivery.
Replace spin_unlock() with an invocation of timer_wait_running() to handle
it the same way as the other retry loops in the posix timer code.
Fixes:
|
||
|
826f36ec77 | Merge "sched/walt: Introduce shared rail sibling" | ||
|
dace646e99 |
sched/walt: Fix problem in do_freq_qos_request
Currently, there is an issue where the request to execute a QOS request is not being serviced, as requests will only be honored if a CPU is offline. Fix this by ensuring QOS requests are made on online CPUs. Change-Id: I454a5ddf50e373dc6d4fe5ac06674288aa529f1e Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
302ef6d2df | Merge "sched/walt: create an api for cluster boost for pipeline" | ||
|
e766132452 | Merge "sched/walt: Optimize single candidate EAS" | ||
|
dddf5ac2dd |
sched/walt: Introduce shared rail sibling
Treat Gold and Prime CPUs as shared rail siblings and ensure frequency sync when pipeline feature is enabled. This also ensures sync of gold/prime under intercluster migrations involving either clusters. Change-Id: Iaaa8a7c0c23b71f3486539396f21c466a75a051c Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
aa882d2168 |
sched/walt: update pipeline code to selectively boost clusters
Use the core_ctl_set_cluster_boost api to choose which clusters should be boosted. Specifically, the boost should be applied to only those clusters which have a cpu selected by the cpus_for_pipeline mask. Change-Id: Ibaf77e0896bb834c014e51e0e97ba8d04978f027 Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com> |
||
|
608b4af07d |
sched/walt: create an api for cluster boost for pipeline
Not all clusters should be unhalted when pipeline is enabled, but all clusters are being unhalted now (through core_ctl_sched_boost). To prepare for only unhalting the clusters that are part of the user selected pipeline cpus, create a per-cluster boost api which controls the boost characteristics of only the cluster idx specified. Change-Id: I92fdc13bcb6ee84782b5ea1b5dcb74caabee8ccf Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> |
||
|
9902aeb581 |
sched/walt: specify cpus for pipeline threads
Cleanup the assignment of pipeline cpus such that the value comes from a sysctl node, which specifies exactly which cpus are to be used for pipeline tasks (not including prime). Change-Id: Ib108e1128c0eae4f18049f2328045f8637a79efb Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com> Signed-off-by: Abhijeet Dharmapurikar <quic_adharmap@quicinc.com> |
||
|
0270812f0e | Merge "sched/walt: Fix potential sleep under atomic context" | ||
|
e406d916fc | Merge "sched: walt: add stalls to sched_switch_with_ctrs" |