Commit Graph

42714 Commits

Author SHA1 Message Date
Brian Foster
1dd387668d tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY
commit 3d07fa1dd19035eb0b13ae6697efd5caa9033e74 upstream.

The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:

 # cat /sys/kernel/debug/tracing/trace_pipe
 cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy

Zero the allocation of pipe_cpumask to avoid the problem.

Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com

Cc: stable@vger.kernel.org
Fixes: c2489bb7e6be ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes")
Reviewed-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Brian Foster <bfoster@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13 09:43:05 +02:00
Yafang Shao
c96c67991a bpf: Fix issue in verifying allow_ptr_leaks
commit d75e30dddf73449bc2d10bb8e2f1a2c446bc67a2 upstream.

After we converted the capabilities of our networking-bpf program from
cap_sys_admin to cap_net_admin+cap_bpf, our networking-bpf program
failed to start. Because it failed the bpf verifier, and the error log
is "R3 pointer comparison prohibited".

A simple reproducer as follows,

SEC("cls-ingress")
int ingress(struct __sk_buff *skb)
{
	struct iphdr *iph = (void *)(long)skb->data + sizeof(struct ethhdr);

	if ((long)(iph + 1) > (long)skb->data_end)
		return TC_ACT_STOLEN;
	return TC_ACT_OK;
}

Per discussion with Yonghong and Alexei [1], comparison of two packet
pointers is not a pointer leak. This patch fixes it.

Our local kernel is 6.1.y and we expect this fix to be backported to
6.1.y, so stable is CCed.

[1]. https://lore.kernel.org/bpf/CAADnVQ+Nmspr7Si+pxWn8zkE7hX-7s93ugwC+94aXSy4uQ9vBg@mail.gmail.com/

Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230823020703.3790-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13 09:43:02 +02:00
Thomas Gleixner
fea9dd8653 cpu/hotplug: Prevent self deadlock on CPU hot-unplug
commit 2b8272ff4a70b866106ae13c36be7ecbef5d5da2 upstream.

Xiongfeng reported and debugged a self deadlock of the task which initiates
and controls a CPU hot-unplug operation vs. the CFS bandwidth timer.

    CPU1      			                 	 CPU2

T1 sets cfs_quota
   starts hrtimer cfs_bandwidth 'period_timer'
T1 is migrated to CPU2
						T1 initiates offlining of CPU1
Hotplug operation starts
  ...
'period_timer' expires and is re-enqueued on CPU1
  ...
take_cpu_down()
  CPU1 shuts down and does not handle timers
  anymore. They have to be migrated in the
  post dead hotplug steps by the control task.

						T1 runs the post dead offline operation
					      	T1 is scheduled out
						T1 waits for 'period_timer' to expire

T1 waits there forever if it is scheduled out before it can execute the hrtimer
offline callback hrtimers_dead_cpu().

Cure this by delegating the hotplug control operation to a worker thread on
an online CPU. This takes the initiating user space task, which might be
affected by the bandwidth timer, completely out of the picture.

Reported-by: Xiongfeng Wang <wangxiongfeng2@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Yu Liao <liaoyu15@huawei.com>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/lkml/8e785777-03aa-99e1-d20e-e956f5685be6@huawei.com
Link: https://lore.kernel.org/r/87h6oqdq0i.ffs@tglx
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13 09:43:00 +02:00
Kees Cook
2344b13976 printk: ringbuffer: Fix truncating buffer size min_t cast
commit 53e9e33ede37a247d926db5e4a9e56b55204e66c upstream.

If an output buffer size exceeded U16_MAX, the min_t(u16, ...) cast in
copy_data() was causing writes to truncate. This manifested as output
bytes being skipped, seen as %NUL bytes in pstore dumps when the available
record size was larger than 65536. Fix the cast to no longer truncate
the calculation.

Cc: Petr Mladek <pmladek@suse.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: John Ogness <john.ogness@linutronix.de>
Reported-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Link: https://lore.kernel.org/lkml/d8bb1ec7-a4c5-43a2-9de0-9643a70b899f@linux.microsoft.com/
Fixes: b6cf8b3f33 ("printk: add lockless ringbuffer")
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Tested-by: Vijay Balakrishna <vijayb@linux.microsoft.com>
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Reviewed-by: Tyler Hicks (Microsoft) <code@tyhicks.com>
Tested-by: Tyler Hicks (Microsoft) <code@tyhicks.com>
Reviewed-by: John Ogness <john.ogness@linutronix.de>
Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
Link: https://lore.kernel.org/r/20230811054528.never.165-kees@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-13 09:43:00 +02:00
Zheng Yejian
74c85396bd tracing: Fix race issue between cpu buffer write and swap
[ Upstream commit 3163f635b20e9e1fb4659e74f47918c9dddfe64e ]

Warning happened in rb_end_commit() at code:
	if (RB_WARN_ON(cpu_buffer, !local_read(&cpu_buffer->committing)))

  WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
	rb_commit+0x402/0x4a0
  Call Trace:
   ring_buffer_unlock_commit+0x42/0x250
   trace_buffer_unlock_commit_regs+0x3b/0x250
   trace_event_buffer_commit+0xe5/0x440
   trace_event_buffer_reserve+0x11c/0x150
   trace_event_raw_event_sched_switch+0x23c/0x2c0
   __traceiter_sched_switch+0x59/0x80
   __schedule+0x72b/0x1580
   schedule+0x92/0x120
   worker_thread+0xa0/0x6f0

It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:

  Write on CPU 0             Swap buffer by per_cpu/cpu0/snapshot on CPU 1
  --------                   --------
                             tracing_snapshot_write()
                               [...]

  ring_buffer_lock_reserve()
    cpu_buffer = buffer->buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
    [...]
    rb_reserve_next_event()
      [...]

                               ring_buffer_swap_cpu()
                                 if (local_read(&cpu_buffer_a->committing))
                                     goto out_dec;
                                 if (local_read(&cpu_buffer_b->committing))
                                     goto out_dec;
                                 buffer_a->buffers[cpu] = cpu_buffer_b;
                                 buffer_b->buffers[cpu] = cpu_buffer_a;
                                 // 2. cpu_buffer has swapped here.

      rb_start_commit(cpu_buffer);
      if (unlikely(READ_ONCE(cpu_buffer->buffer)
          != buffer)) { // 3. This check passed due to 'cpu_buffer->buffer'
        [...]           //    has not changed here.
        return NULL;
      }
                                 cpu_buffer_b->buffer = buffer_a;
                                 cpu_buffer_a->buffer = buffer_b;
                                 [...]

      // 4. Reserve event from 'cpu_buffer_a'.

  ring_buffer_unlock_commit()
    [...]
    cpu_buffer = buffer->buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
    rb_commit(cpu_buffer)
      rb_end_commit()  // 6. WARN for the wrong 'committing' state !!!

Based on above analysis, we can easily reproduce by following testcase:
  ``` bash
  #!/bin/bash

  dmesg -n 7
  sysctl -w kernel.panic_on_warn=1
  TR=/sys/kernel/tracing
  echo 7 > ${TR}/buffer_size_kb
  echo "sched:sched_switch" > ${TR}/set_event
  while [ true ]; do
          echo 1 > ${TR}/per_cpu/cpu0/snapshot
  done &
  while [ true ]; do
          echo 1 > ${TR}/per_cpu/cpu0/snapshot
  done &
  while [ true ]; do
          echo 1 > ${TR}/per_cpu/cpu0/snapshot
  done &
  ```

To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.

Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Fixes: f1affcaaa8 ("tracing: Add snapshot in the per_cpu trace directories")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:57 +02:00
Mikhail Kobuk
fb34716c9e tracing: Remove extra space at the end of hwlat_detector/mode
[ Upstream commit 2cf0dee989a8b2501929eaab29473b6b1fa11057 ]

Space is printed after each mode value including the last one:
$ echo \"$(sudo cat /sys/kernel/tracing/hwlat_detector/mode)\"
"none [round-robin] per-cpu "

Found by Linux Verification Center (linuxtesting.org) with SVACE.

Link: https://lore.kernel.org/linux-trace-kernel/20230825103432.7750-1-m.kobuk@ispras.ru

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 8fa826b734 ("trace/hwlat: Implement the mode config option")
Signed-off-by: Mikhail Kobuk <m.kobuk@ispras.ru>
Reviewed-by: Alexey Khoroshilov <khoroshilov@ispras.ru>
Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:57 +02:00
Paul Gortmaker
55a448e8d8 tick/rcu: Fix false positive "softirq work is pending" messages
[ Upstream commit 96c1fa04f089a7e977a44e4e8fdc92e81be20bef ]

In commit 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle") the
new function report_idle_softirq() was created by breaking code out of the
existing can_stop_idle_tick() for kernels v5.18 and newer.

In doing so, the code essentially went from a one conditional:

	if (a && b && c)
		warn();

to a three conditional:

	if (!a)
		return;
	if (!b)
		return;
	if (!c)
		return;
	warn();

But that conversion got the condition for the RT specific
local_bh_blocked() wrong. The original condition was:

   	!local_bh_blocked()

but the conversion failed to negate it so it ended up as:

        if (!local_bh_blocked())
		return false;

This issue lay dormant until another fixup for the same commit was added
in commit a7e282c77785 ("tick/rcu: Fix bogus ratelimit condition").
This commit realized the ratelimit was essentially set to zero instead
of ten, and hence *no* softirq pending messages would ever be issued.

Once this commit was backported via linux-stable, both the v6.1 and v6.4
preempt-rt kernels started printing out 10 instances of this at boot:

  NOHZ tick-stop error: local softirq work is pending, handler #80!!!

Remove the negation and return when local_bh_blocked() evaluates to true to
bring the correct behaviour back.

Fixes: 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Ahmad Fatoum <a.fatoum@pengutronix.de>
Reviewed-by: Wen Yang <wenyang.linux@foxmail.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230818200757.1808398-1-paul.gortmaker@windriver.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:57 +02:00
Lu Jialin
848cd6f24a cgroup:namespace: Remove unused cgroup_namespaces_init()
[ Upstream commit 82b90b6c5b38e457c7081d50dff11ecbafc1e61a ]

cgroup_namspace_init() just return 0. Therefore, there is no need to
call it during start_kernel. Just remove it.

Fixes: a79a908fd2 ("cgroup: introduce cgroup namespaces")
Signed-off-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:56 +02:00
Waiman Long
8199a46af2 cgroup/cpuset: Inherit parent's load balance state in v2
[ Upstream commit c8c926200c55454101f072a4b16c9ff5b8c9e56f ]

Since commit f28e22441f ("cgroup/cpuset: Add a new isolated
cpus.partition type"), the CS_SCHED_LOAD_BALANCE bit of a v2 cpuset
can be on or off. The child cpusets of a partition root must have the
same setting as its parent or it may screw up the rebuilding of sched
domains. Fix this problem by making sure the a child v2 cpuset will
follows its parent cpuset load balance state unless the child cpuset
is a new partition root itself.

Fixes: f28e22441f ("cgroup/cpuset: Add a new isolated cpus.partition type")
Signed-off-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:49 +02:00
Ira Weiny
3108f7c788 PCI: Allow drivers to request exclusive config regions
[ Upstream commit 278294798ac9118412c9624a801d3f20f2279363 ]

PCI config space access from user space has traditionally been
unrestricted with writes being an understood risk for device operation.

Unfortunately, device breakage or odd behavior from config writes lacks
indicators that can leave driver writers confused when evaluating
failures.  This is especially true with the new PCIe Data Object
Exchange (DOE) mailbox protocol where backdoor shenanigans from user
space through things such as vendor defined protocols may affect device
operation without complete breakage.

A prior proposal restricted read and writes completely.[1]  Greg and
Bjorn pointed out that proposal is flawed for a couple of reasons.
First, lspci should always be allowed and should not interfere with any
device operation.  Second, setpci is a valuable tool that is sometimes
necessary and it should not be completely restricted.[2]  Finally
methods exist for full lock of device access if required.

Even though access should not be restricted it would be nice for driver
writers to be able to flag critical parts of the config space such that
interference from user space can be detected.

Introduce pci_request_config_region_exclusive() to mark exclusive config
regions.  Such regions trigger a warning and kernel taint if accessed
via user space.

Create pci_warn_once() to restrict the user from spamming the log.

[1] https://lore.kernel.org/all/161663543465.1867664.5674061943008380442.stgit@dwillia2-desk3.amr.corp.intel.com/
[2] https://lore.kernel.org/all/YF8NGeGv9vYcMfTV@kroah.com/

Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lore.kernel.org/r/20220926215711.2893286-2-ira.weiny@intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Stable-dep-of: 5e70d0acf082 ("PCI: Add locking to RMW PCI Express Capability Register accessors")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:46 +02:00
Gaosheng Cui
9ca08adb75 audit: fix possible soft lockup in __audit_inode_child()
[ Upstream commit b59bc6e37237e37eadf50cd5de369e913f524463 ]

Tracefs or debugfs maybe cause hundreds to thousands of PATH records,
too many PATH records maybe cause soft lockup.

For example:
  1. CONFIG_KASAN=y && CONFIG_PREEMPTION=n
  2. auditctl -a exit,always -S open -k key
  3. sysctl -w kernel.watchdog_thresh=5
  4. mkdir /sys/kernel/debug/tracing/instances/test

There may be a soft lockup as follows:
  watchdog: BUG: soft lockup - CPU#45 stuck for 7s! [mkdir:15498]
  Kernel panic - not syncing: softlockup: hung tasks
  Call trace:
   dump_backtrace+0x0/0x30c
   show_stack+0x20/0x30
   dump_stack+0x11c/0x174
   panic+0x27c/0x494
   watchdog_timer_fn+0x2bc/0x390
   __run_hrtimer+0x148/0x4fc
   __hrtimer_run_queues+0x154/0x210
   hrtimer_interrupt+0x2c4/0x760
   arch_timer_handler_phys+0x48/0x60
   handle_percpu_devid_irq+0xe0/0x340
   __handle_domain_irq+0xbc/0x130
   gic_handle_irq+0x78/0x460
   el1_irq+0xb8/0x140
   __audit_inode_child+0x240/0x7bc
   tracefs_create_file+0x1b8/0x2a0
   trace_create_file+0x18/0x50
   event_create_dir+0x204/0x30c
   __trace_add_new_event+0xac/0x100
   event_trace_add_tracer+0xa0/0x130
   trace_array_create_dir+0x60/0x140
   trace_array_create+0x1e0/0x370
   instance_mkdir+0x90/0xd0
   tracefs_syscall_mkdir+0x68/0xa0
   vfs_mkdir+0x21c/0x34c
   do_mkdirat+0x1b4/0x1d4
   __arm64_sys_mkdirat+0x4c/0x60
   el0_svc_common.constprop.0+0xa8/0x240
   do_el0_svc+0x8c/0xc0
   el0_svc+0x20/0x30
   el0_sync_handler+0xb0/0xb4
   el0_sync+0x160/0x180

Therefore, we add cond_resched() to __audit_inode_child() to fix it.

Fixes: 5195d8e217 ("audit: dynamically allocate audit_names when not enough space is in the names array")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:42 +02:00
Yafang Shao
912310dd84 bpf: Fix an error in verifying a field in a union
[ Upstream commit 33937607efa050d9e237e0c4ac4ada02d961c466 ]

We are utilizing BPF LSM to monitor BPF operations within our container
environment. When we add support for raw_tracepoint, it hits below
error.

; (const void *)attr->raw_tracepoint.name);
27: (79) r3 = *(u64 *)(r2 +0)
access beyond the end of member map_type (mend:4) in struct (anon) with off 0 size 8

It can be reproduced with below BPF prog.

SEC("lsm/bpf")
int BPF_PROG(bpf_audit, int cmd, union bpf_attr *attr, unsigned int size)
{
	switch (cmd) {
	case BPF_RAW_TRACEPOINT_OPEN:
		bpf_printk("raw_tracepoint is %s", attr->raw_tracepoint.name);
		break;
	default:
		break;
	}
	return 0;
}

The reason is that when accessing a field in a union, such as bpf_attr,
if the field is located within a nested struct that is not the first
member of the union, it can result in incorrect field verification.

  union bpf_attr {
      struct {
          __u32 map_type; <<<< Actually it will find that field.
          __u32 key_size;
          __u32 value_size;
         ...
      };
      ...
      struct {
          __u64 name;    <<<< We want to verify this field.
          __u32 prog_fd;
      } raw_tracepoint;
  };

Considering the potential deep nesting levels, finding a perfect
solution to address this issue has proven challenging. Therefore, I
propose a solution where we simply skip the verification process if the
field in question is located within a union.

Fixes: 7e3617a72d ("bpf: Add array support to btf_struct_access")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Link: https://lore.kernel.org/r/20230713025642.27477-4-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:30 +02:00
Yafang Shao
780f072f4f bpf: Clear the probe_addr for uprobe
[ Upstream commit 5125e757e62f6c1d5478db4c2b61a744060ddf3f ]

To avoid returning uninitialized or random values when querying the file
descriptor (fd) and accessing probe_addr, it is necessary to clear the
variable prior to its use.

Fixes: 41bdc4b40e ("bpf: introduce bpf subcommand BPF_TASK_FD_QUERY")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yhs@fb.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230709025630.3735-6-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:30 +02:00
Cyril Hrubis
5fce29ab20 sched/rt: Fix sysctl_sched_rr_timeslice intial value
[ Upstream commit c7fcb99877f9f542c918509b2801065adcaf46fa ]

There is a 10% rounding error in the intial value of the
sysctl_sched_rr_timeslice with CONFIG_HZ_300=y.

This was found with LTP test sched_rr_get_interval01:

sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90
sched_rr_get_interval01.c:57: TPASS: sched_rr_get_interval() passed
sched_rr_get_interval01.c:64: TPASS: Time quantum 0s 99999990ns
sched_rr_get_interval01.c:72: TFAIL: /proc/sys/kernel/sched_rr_timeslice_ms != 100 got 90

What this test does is to compare the return value from the
sched_rr_get_interval() and the sched_rr_timeslice_ms sysctl file and
fails if they do not match.

The problem it found is the intial sysctl file value which was computed as:

static int sysctl_sched_rr_timeslice = (MSEC_PER_SEC / HZ) * RR_TIMESLICE;

which works fine as long as MSEC_PER_SEC is multiple of HZ, however it
introduces 10% rounding error for CONFIG_HZ_300:

(MSEC_PER_SEC / HZ) * (100 * HZ / 1000)

(1000 / 300) * (100 * 300 / 1000)

3 * 30 = 90

This can be easily fixed by reversing the order of the multiplication
and division. After this fix we get:

(MSEC_PER_SEC * (100 * HZ / 1000)) / HZ

(1000 * (100 * 300 / 1000)) / 300

(1000 * 30) / 300 = 100

Fixes: 975e155ed8 ("sched/rt: Show the 'sched_rr_timeslice' SCHED_RR timeslice tuning knob in milliseconds")
Signed-off-by: Cyril Hrubis <chrubis@suse.cz>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Petr Vorel <pvorel@suse.cz>
Acked-by: Mel Gorman <mgorman@suse.de>
Tested-by: Petr Vorel <pvorel@suse.cz>
Link: https://lore.kernel.org/r/20230802151906.25258-2-chrubis@suse.cz
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:29 +02:00
Waiman Long
e0322a255a refscale: Fix uninitalized use of wait_queue_head_t
[ Upstream commit f5063e8948dad7f31adb007284a5d5038ae31bb8 ]

Running the refscale test occasionally crashes the kernel with the
following error:

[ 8569.952896] BUG: unable to handle page fault for address: ffffffffffffffe8
[ 8569.952900] #PF: supervisor read access in kernel mode
[ 8569.952902] #PF: error_code(0x0000) - not-present page
[ 8569.952904] PGD c4b048067 P4D c4b049067 PUD c4b04b067 PMD 0
[ 8569.952910] Oops: 0000 [#1] PREEMPT_RT SMP NOPTI
[ 8569.952916] Hardware name: Dell Inc. PowerEdge R750/0WMWCR, BIOS 1.2.4 05/28/2021
[ 8569.952917] RIP: 0010:prepare_to_wait_event+0x101/0x190
  :
[ 8569.952940] Call Trace:
[ 8569.952941]  <TASK>
[ 8569.952944]  ref_scale_reader+0x380/0x4a0 [refscale]
[ 8569.952959]  kthread+0x10e/0x130
[ 8569.952966]  ret_from_fork+0x1f/0x30
[ 8569.952973]  </TASK>

The likely cause is that init_waitqueue_head() is called after the call to
the torture_create_kthread() function that creates the ref_scale_reader
kthread.  Although this init_waitqueue_head() call will very likely
complete before this kthread is created and starts running, it is
possible that the calling kthread will be delayed between the calls to
torture_create_kthread() and init_waitqueue_head().  In this case, the
new kthread will use the waitqueue head before it is properly initialized,
which is not good for the kernel's health and well-being.

The above crash happened here:

	static inline void __add_wait_queue(...)
	{
		:
		if (!(wq->flags & WQ_FLAG_PRIORITY)) <=== Crash here

The offset of flags from list_head entry in wait_queue_entry is
-0x18. If reader_tasks[i].wq.head.next is NULL as allocated reader_task
structure is zero initialized, the instruction will try to access address
0xffffffffffffffe8, which is exactly the fault address listed above.

This commit therefore invokes init_waitqueue_head() before creating
the kthread.

Fixes: 653ed64b01 ("refperf: Add a test to measure performance of read-side synchronization")
Signed-off-by: Waiman Long <longman@redhat.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:28 +02:00
Zheng Yejian
10f358cd4b tracing: Introduce pipe_cpumask to avoid race on trace_pipes
[ Upstream commit c2489bb7e6be2e8cdced12c16c42fa128403ac03 ]

There is race issue when concurrently splice_read main trace_pipe and
per_cpu trace_pipes which will result in data read out being different
from what actually writen.

As suggested by Steven:
  > I believe we should add a ref count to trace_pipe and the per_cpu
  > trace_pipes, where if they are opened, nothing else can read it.
  >
  > Opening trace_pipe locks all per_cpu ref counts, if any of them are
  > open, then the trace_pipe open will fail (and releases any ref counts
  > it had taken).
  >
  > Opening a per_cpu trace_pipe will up the ref count for just that
  > CPU buffer. This will allow multiple tasks to read different per_cpu
  > trace_pipe files, but will prevent the main trace_pipe file from
  > being opened.

But because we only need to know whether per_cpu trace_pipe is open or
not, using a cpumask instead of using ref count may be easier.

After this patch, users will find that:
 - Main trace_pipe can be opened by only one user, and if it is
   opened, all per_cpu trace_pipes cannot be opened;
 - Per_cpu trace_pipes can be opened by multiple users, but each per_cpu
   trace_pipe can only be opened by one user. And if one of them is
   opened, main trace_pipe cannot be opened.

Link: https://lore.kernel.org/linux-trace-kernel/20230818022645.1948314-1-zhengyejian1@huawei.com

Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:25 +02:00
Masami Hiramatsu (Google)
473a55cfc1 kprobes: Prohibit probing on CFI preamble symbol
[ Upstream commit de02f2ac5d8cfb311f44f2bf144cc20002f1fbbd ]

Do not allow to probe on "__cfi_" or "__pfx_" started symbol, because those
are used for CFI and not executed. Probing it will break the CFI.

Link: https://lore.kernel.org/all/168904024679.116016.18089228029322008512.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-09-13 09:42:23 +02:00
zhengtangquan
6e5f182128 ANDROID: signal: Add vendor hook for memory reap
Add vendor hook to determine if the memory of a process
that received the SIGKILL can be reaped.
Partial cherry-pick of aosp/1724512 & aosp/2093626.

Bug: 232062955
Change-Id: I75072bd264df33caff67d083821ee6f33ca83af9
Signed-off-by: Tangquan Zheng <zhengtangquan@oppo.com>
2023-09-13 00:04:10 +00:00
Greg Kroah-Hartman
f474c6446c Merge 6.1.44 into android14-6.1-lts
Changes in 6.1.44
	init: Provide arch_cpu_finalize_init()
	x86/cpu: Switch to arch_cpu_finalize_init()
	ARM: cpu: Switch to arch_cpu_finalize_init()
	ia64/cpu: Switch to arch_cpu_finalize_init()
	loongarch/cpu: Switch to arch_cpu_finalize_init()
	m68k/cpu: Switch to arch_cpu_finalize_init()
	mips/cpu: Switch to arch_cpu_finalize_init()
	sh/cpu: Switch to arch_cpu_finalize_init()
	sparc/cpu: Switch to arch_cpu_finalize_init()
	um/cpu: Switch to arch_cpu_finalize_init()
	init: Remove check_bugs() leftovers
	init: Invoke arch_cpu_finalize_init() earlier
	init, x86: Move mem_encrypt_init() into arch_cpu_finalize_init()
	x86/init: Initialize signal frame size late
	x86/fpu: Remove cpuinfo argument from init functions
	x86/fpu: Mark init functions __init
	x86/fpu: Move FPU initialization into arch_cpu_finalize_init()
	x86/speculation: Add Gather Data Sampling mitigation
	x86/speculation: Add force option to GDS mitigation
	x86/speculation: Add Kconfig option for GDS
	KVM: Add GDS_NO support to KVM
	x86/mem_encrypt: Unbreak the AMD_MEM_ENCRYPT=n build
	x86/xen: Fix secondary processors' FPU initialization
	x86/mm: fix poking_init() for Xen PV guests
	x86/mm: Use mm_alloc() in poking_init()
	mm: Move mm_cachep initialization to mm_init()
	x86/mm: Initialize text poking earlier
	Documentation/x86: Fix backwards on/off logic about YMM support
	x86/bugs: Increase the x86 bugs vector size to two u32s
	x86/cpu, kvm: Add support for CPUID_80000021_EAX
	x86/srso: Add a Speculative RAS Overflow mitigation
	x86/srso: Add IBPB_BRTYPE support
	x86/srso: Add SRSO_NO support
	x86/srso: Add IBPB
	x86/srso: Add IBPB on VMEXIT
	x86/srso: Fix return thunks in generated code
	x86/srso: Add a forgotten NOENDBR annotation
	x86/srso: Tie SBPB bit setting to microcode patch detection
	xen/netback: Fix buffer overrun triggered by unusual packet
	x86: fix backwards merge of GDS/SRSO bit
	Linux 6.1.44

Change-Id: Ia40e37c806ae2a2daf2127415aa28d0151660667
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-12 15:09:42 +00:00
Greg Kroah-Hartman
cf0f262265 Revert "locking/rtmutex: Fix task->pi_waiters integrity"
This reverts commit 7f1715d827 which is
commit f7853c34241807bb97673a5e97719123be39a09e upstream.

It breaks the Android api due to a structure, struct rt_mutex_waiter,
escaping out of the core kernel code, into an Android hook, making it
something that can not be modified without potentially major problems.

If this needs to come back, it must do so in an ABI-safe way in the
future, for now, it can just be reverted.

Bug: 161946584
Change-Id: Idd251aa4e905cc284523b0b81b3e8993c6a58866
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-12 14:37:42 +00:00
Greg Kroah-Hartman
38b64945f1 Revert "ring-buffer: Fix wrong stat of cpu_buffer->read"
This reverts commit 77996fa5c6 which is
commit 2d093282b0d4357373497f65db6a05eb0c28b7c8 upstream.

It breaks the Android abi and isn't really needed for Android systems.
If it is needed in the future, it can come back in an ABI-safe way.

Bug: 161946584
Change-Id: I1def9966078008125f445941af21e518617a0011
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-12 14:37:42 +00:00
Greg Kroah-Hartman
7f81705800 Merge 6.1.43 into android14-6.1-lts
Changes in 6.1.43
	netfilter: nf_tables: fix underflow in object reference counter
	netfilter: nf_tables: fix underflow in chain reference counter
	platform/x86/amd/pmf: Notify OS power slider update
	platform/x86/amd/pmf: reduce verbosity of apmf_get_system_params
	drm/amd/display: Keep PHY active for dp config
	ovl: fix null pointer dereference in ovl_permission()
	drm/amd: Move helper for dynamic speed switch check out of smu13
	drm/amd: Align SMU11 SMU_MSG_OverridePcieParameters implementation with SMU13
	jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint
	blk-mq: Fix stall due to recursive flush plug
	powerpc/pseries/vas: Hold mmap_mutex after mmap lock during window close
	KVM: s390: pv: fix index value of replaced ASCE
	io_uring: don't audit the capability check in io_uring_create()
	gpio: tps68470: Make tps68470_gpio_output() always set the initial value
	pwm: Add a stub for devm_pwmchip_add()
	gpio: mvebu: Make use of devm_pwmchip_add
	gpio: mvebu: fix irq domain leak
	btrfs: fix race between quota disable and relocation
	i2c: Delete error messages for failed memory allocations
	i2c: Improve size determinations
	i2c: nomadik: Remove unnecessary goto label
	i2c: nomadik: Use devm_clk_get_enabled()
	i2c: nomadik: Remove a useless call in the remove function
	MIPS: Loongson: Move arch cflags to MIPS top level Makefile
	MIPS: Loongson: Fix build error when make modules_install
	PCI/ASPM: Return 0 or -ETIMEDOUT from pcie_retrain_link()
	PCI/ASPM: Factor out pcie_wait_for_retrain()
	PCI/ASPM: Avoid link retraining race
	PCI: rockchip: Remove writes to unused registers
	PCI: rockchip: Fix window mapping and address translation for endpoint
	PCI: rockchip: Don't advertise MSI-X in PCIe capabilities
	drm/amd/display: add FB_DAMAGE_CLIPS support
	drm/amd/display: Check if link state is valid
	drm/amd/display: Rework context change check
	drm/amd/display: Enable new commit sequence only for DCN32x
	drm/amd/display: Copy DC context in the commit streams
	drm/amd/display: Include surface of unaffected streams
	drm/amd/display: Use min transition for all SubVP plane add/remove
	drm/amd/display: add ODM case when looking for first split pipe
	drm/amd/display: use low clocks for no plane configs
	drm/amd/display: fix unbounded requesting for high pixel rate modes on dcn315
	drm/amd/display: add pixel rate based CRB allocation support
	drm/amd/display: fix dcn315 single stream crb allocation
	drm/amd/display: Update correct DCN314 register header
	drm/amd/display: Set minimum requirement for using PSR-SU on Rembrandt
	drm/amd/display: Set minimum requirement for using PSR-SU on Phoenix
	drm/ttm: Don't print error message if eviction was interrupted
	drm/ttm: Don't leak a resource on eviction error
	n_tty: Rename tail to old_tail in n_tty_read()
	tty: fix hang on tty device with no_room set
	drm/ttm: never consider pinned BOs for eviction&swap
	KVM: arm64: Condition HW AF updates on config option
	arm64: errata: Mitigate Ampere1 erratum AC03_CPU_38 at stage-2
	mptcp: introduce 'sk' to replace 'sock->sk' in mptcp_listen()
	mptcp: do not rely on implicit state check in mptcp_listen()
	tracing/probes: Add symstr type for dynamic events
	tracing/probes: Fix to avoid double count of the string length on the array
	tracing: Allow synthetic events to pass around stacktraces
	Revert "tracing: Add "(fault)" name injection to kernel probes"
	tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails
	test_maple_tree: test modifications while iterating
	maple_tree: add __init and __exit to test module
	maple_tree: fix 32 bit mas_next testing
	drm/amd/display: Rework comments on dc file
	drm/amd/display: fix dc/core/dc.c kernel-doc
	drm/amd/display: Add FAMS validation before trying to use it
	drm/amd/display: update extended blank for dcn314 onwards
	drm/amd/display: Fix possible underflow for displays with large vblank
	drm/amd/display: Prevent vtotal from being set to 0
	phy: phy-mtk-dp: Fix an error code in probe()
	phy: qcom-snps: correct struct qcom_snps_hsphy kerneldoc
	phy: qcom-snps-femto-v2: keep cfg_ahb_clk enabled during runtime suspend
	phy: qcom-snps-femto-v2: properly enable ref clock
	soundwire: qcom: update status correctly with mask
	media: staging: atomisp: select V4L2_FWNODE
	media: amphion: Fix firmware path to match linux-firmware
	i40e: Fix an NULL vs IS_ERR() bug for debugfs_create_dir()
	iavf: fix potential deadlock on allocation failure
	iavf: check for removal state before IAVF_FLAG_PF_COMMS_FAILED
	net: phy: marvell10g: fix 88x3310 power up
	net: hns3: fix the imp capability bit cannot exceed 32 bits issue
	net: hns3: fix wrong tc bandwidth weight data issue
	net: hns3: fix wrong bw weight of disabled tc issue
	vxlan: calculate correct header length for GPE
	vxlan: generalize vxlan_parse_gpe_hdr and remove unused args
	vxlan: fix GRO with VXLAN-GPE
	phy: hisilicon: Fix an out of bounds check in hisi_inno_phy_probe()
	atheros: fix return value check in atl1_tso()
	ethernet: atheros: fix return value check in atl1e_tso_csum()
	ipv6 addrconf: fix bug where deleting a mngtmpaddr can create a new temporary address
	tcp: Reduce chance of collisions in inet6_hashfn().
	ice: Fix memory management in ice_ethtool_fdir.c
	bonding: reset bond's flags when down link is P2P device
	team: reset team's flags when down link is P2P device
	octeontx2-af: Removed unnecessary debug messages.
	octeontx2-af: Fix hash extraction enable configuration
	net: stmmac: Apply redundant write work around on 4.xx too
	platform/x86: msi-laptop: Fix rfkill out-of-sync on MSI Wind U100
	x86/traps: Fix load_unaligned_zeropad() handling for shared TDX memory
	igc: Fix Kernel Panic during ndo_tx_timeout callback
	netfilter: nft_set_rbtree: fix overlap expiration walk
	netfilter: nf_tables: skip immediate deactivate in _PREPARE_ERROR
	netfilter: nf_tables: disallow rule addition to bound chain via NFTA_RULE_CHAIN_ID
	mm: suppress mm fault logging if fatal signal already pending
	net/sched: mqprio: refactor nlattr parsing to a separate function
	net/sched: mqprio: add extack to mqprio_parse_nlattr()
	net/sched: mqprio: Add length check for TCA_MQPRIO_{MAX/MIN}_RATE64
	benet: fix return value check in be_lancer_xmit_workarounds()
	tipc: check return value of pskb_trim()
	tipc: stop tipc crypto on failure in tipc_node_create
	RDMA/mlx4: Make check for invalid flags stricter
	drm/msm/dpu: drop enum dpu_core_perf_data_bus_id
	drm/msm/adreno: Fix snapshot BINDLESS_DATA size
	RDMA/irdma: Add missing read barriers
	RDMA/irdma: Fix data race on CQP completion stats
	RDMA/irdma: Fix data race on CQP request done
	RDMA/mthca: Fix crash when polling CQ for shared QPs
	RDMA/bnxt_re: Prevent handling any completions after qp destroy
	drm/msm: Fix IS_ERR_OR_NULL() vs NULL check in a5xx_submit_in_rb()
	cxl/acpi: Fix a use-after-free in cxl_parse_cfmws()
	cxl/acpi: Return 'rc' instead of '0' in cxl_parse_cfmws()
	ASoC: fsl_spdif: Silence output on stop
	block: Fix a source code comment in include/uapi/linux/blkzoned.h
	smb3: do not set NTLMSSP_VERSION flag for negotiate not auth request
	drm/i915: Fix an error handling path in igt_write_huge()
	xenbus: check xen_domain in xenbus_probe_initcall
	dm raid: fix missing reconfig_mutex unlock in raid_ctr() error paths
	dm raid: clean up four equivalent goto tags in raid_ctr()
	dm raid: protect md_stop() with 'reconfig_mutex'
	drm/amd: Fix an error handling mistake in psp_sw_init()
	drm/amd/display: Unlock on error path in dm_handle_mst_sideband_msg_ready_event()
	RDMA/irdma: Fix op_type reporting in CQEs
	RDMA/irdma: Report correct WC error
	drm/msm: Switch idr_lock to spinlock
	drm/msm: Disallow submit with fence id 0
	ublk_drv: move ublk_get_device_from_id into ublk_ctrl_uring_cmd
	ublk: fail to start device if queue setup is interrupted
	ublk: fail to recover device if queue setup is interrupted
	ata: pata_ns87415: mark ns87560_tf_read static
	ring-buffer: Fix wrong stat of cpu_buffer->read
	tracing: Fix warning in trace_buffered_event_disable()
	Revert "usb: gadget: tegra-xudc: Fix error check in tegra_xudc_powerdomain_init()"
	usb: gadget: call usb_gadget_check_config() to verify UDC capability
	USB: gadget: Fix the memory leak in raw_gadget driver
	usb: gadget: core: remove unbalanced mutex_unlock in usb_gadget_activate
	KVM: Grab a reference to KVM for VM and vCPU stats file descriptors
	KVM: VMX: Don't fudge CR0 and CR4 for restricted L2 guest
	KVM: x86: Disallow KVM_SET_SREGS{2} if incoming CR0 is invalid
	serial: qcom-geni: drop bogus runtime pm state update
	serial: 8250_dw: Preserve original value of DLF register
	serial: sifive: Fix sifive_serial_console_setup() section
	USB: serial: option: support Quectel EM060K_128
	USB: serial: option: add Quectel EC200A module support
	USB: serial: simple: add Kaufmann RKS+CAN VCP
	USB: serial: simple: sort driver entries
	can: gs_usb: gs_can_close(): add missing set of CAN state to CAN_STATE_STOPPED
	usb: typec: Set port->pd before adding device for typec_port
	usb: typec: Iterate pds array when showing the pd list
	usb: typec: Use sysfs_emit_at when concatenating the string
	Revert "usb: dwc3: core: Enable AutoRetry feature in the controller"
	usb: dwc3: pci: skip BYT GPIO lookup table for hardwired phy
	usb: dwc3: don't reset device side if dwc3 was configured as host-only
	usb: misc: ehset: fix wrong if condition
	usb: ohci-at91: Fix the unhandle interrupt when resume
	USB: quirks: add quirk for Focusrite Scarlett
	usb: cdns3: fix incorrect calculation of ep_buf_size when more than one config
	usb: xhci-mtk: set the dma max_seg_size
	Revert "usb: xhci: tegra: Fix error check"
	Documentation: security-bugs.rst: update preferences when dealing with the linux-distros group
	Documentation: security-bugs.rst: clarify CVE handling
	staging: r8712: Fix memory leak in _r8712_init_xmit_priv()
	staging: ks7010: potential buffer overflow in ks_wlan_set_encode_ext()
	tty: n_gsm: fix UAF in gsm_cleanup_mux
	Revert "xhci: add quirk for host controllers that don't update endpoint DCS"
	ALSA: hda/realtek: Support ASUS G713PV laptop
	ALSA: hda/relatek: Enable Mute LED on HP 250 G8
	hwmon: (k10temp) Enable AMD3255 Proc to show negative temperature
	hwmon: (nct7802) Fix for temp6 (PECI1) processed even if PECI1 disabled
	btrfs: account block group tree when calculating global reserve size
	btrfs: check if the transaction was aborted at btrfs_wait_for_commit()
	btrfs: check for commit error at btrfs_attach_transaction_barrier()
	x86/MCE/AMD: Decrement threshold_bank refcount when removing threshold blocks
	file: always lock position for FMODE_ATOMIC_POS
	nfsd: Remove incorrect check in nfsd4_validate_stateid
	ACPI/IORT: Remove erroneous id_count check in iort_node_get_rmr_info()
	tpm_tis: Explicitly check for error code
	irq-bcm6345-l1: Do not assume a fixed block to cpu mapping
	irqchip/gic-v4.1: Properly lock VPEs when doing a directLPI invalidation
	locking/rtmutex: Fix task->pi_waiters integrity
	proc/vmcore: fix signedness bug in read_from_oldmem()
	xen: speed up grant-table reclaim
	virtio-net: fix race between set queues and probe
	net: dsa: qca8k: fix search_and_insert wrong handling of new rule
	net: dsa: qca8k: fix broken search_and_del
	net: dsa: qca8k: fix mdb add/del case with 0 VID
	selftests: mptcp: join: only check for ip6tables if needed
	soundwire: fix enumeration completion
	Revert "um: Use swap() to make code cleaner"
	LoongArch: BPF: Fix check condition to call lu32id in move_imm()
	LoongArch: BPF: Enable bpf_probe_read{, str}() on LoongArch
	s390/dasd: fix hanging device after quiesce/resume
	s390/dasd: print copy pair message only for the correct error
	ASoC: wm8904: Fill the cache for WM8904_ADC_TEST_0 register
	arm64/sme: Set new vector length before reallocating
	PM: sleep: wakeirq: fix wake irq arming
	ceph: never send metrics if disable_send_metrics is set
	drm/i915/dpt: Use shmem for dpt objects
	dm cache policy smq: ensure IO doesn't prevent cleaner policy progress
	rbd: make get_lock_owner_info() return a single locker or NULL
	rbd: harden get_lock_owner_info() a bit
	rbd: retrieve and check lock owner twice before blocklisting
	drm/amd/display: set per pipe dppclk to 0 when dpp is off
	tracing: Fix trace_event_raw_event_synth() if else statement
	drm/amd/display: perform a bounds check before filling dirty rectangles
	drm/amd/display: Write to correct dirty_rect
	ACPI: processor: perflib: Use the "no limit" frequency QoS
	ACPI: processor: perflib: Avoid updating frequency QoS unnecessarily
	cpufreq: intel_pstate: Drop ACPI _PSS states table patching
	mptcp: ensure subflow is unhashed before cleaning the backlog
	selftests: mptcp: sockopt: use 'iptables-legacy' if available
	test_firmware: return ENOMEM instead of ENOSPC on failed memory allocation
	dma-buf: keep the signaling time of merged fences v3
	dma-buf: fix an error pointer vs NULL bug
	Linux 6.1.43

Change-Id: Id1d61f2351c51edad33ab654f1f3d911b9a75830
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-12 12:36:39 +00:00
Omkar Sai Sandeep Katadi
e401e169ba Merge remote-tracking branch into HEAD
* keystone/mirror-android14-6.1-2023-08: (162 commits)
  ANDROID: uid_sys_stats: Use llist for deferred work
  UPSTREAM: usb: typec: ucsi: Fix command cancellation
  ANDROID: GKI: update symbol list file for xiaomi
  UPSTREAM: erofs: avoid infinite loop in z_erofs_do_read_page() when reading beyond EOF
  UPSTREAM: erofs: avoid useless loops in z_erofs_pcluster_readmore() when reading beyond EOF
  UPSTREAM: erofs: Fix detection of atomic context
  UPSTREAM: erofs: fix compact 4B support for 16k block size
  UPSTREAM: erofs: kill hooked chains to avoid loops on deduplicated compressed images
  UPSTREAM: erofs: fix potential overflow calculating xattr_isize
  UPSTREAM: erofs: stop parsing non-compact HEAD index if clusterofs is invalid
  UPSTREAM: erofs: initialize packed inode after root inode is assigned
  ANDROID: GKI: Update ABI for zsmalloc fixes
  BACKPORT: zsmalloc: fix races between modifications of fullness and isolated
  UPSTREAM: zsmalloc: consolidate zs_pool's migrate_lock and size_class's locks
  BACKPORT: FROMGIT: mm: handle faults that merely update the accessed bit under the VMA lock
  FROMLIST: mm: Allow fault_dirty_shared_page() to be called under the VMA lock
  FROMGIT: mm: handle swap and NUMA PTE faults under the VMA lock
  FROMGIT: mm: run the fault-around code under the VMA lock
  FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check down from do_fault()
  FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault()
  ...

Change-Id: Ic33be5a9dae71958c187029751cb599a83110ab9
2023-09-11 22:08:38 +00:00
Rick Yiu
2490ab50e7 ANDROID: sched: Add vendor hook for rt util update
Vendor may have need to track rt util.

Bug: 201261299
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I2f4e5142c6bc8574ee3558042e1fb0dae13b702d
2023-09-08 03:14:00 +00:00
Rick Yiu
6d97f75abc ANDROID: sched: Add vendor hook for util-update related functions
Vendor may have the need to implement their own util tracking.

Bug: 297343949
Signed-off-by: Rick Yiu <rickyiu@google.com>
Change-Id: I973902e6ff82a85ecd029ac5a78692d629df1ebe
2023-09-08 03:14:00 +00:00
Wei Wang
e08c5de06e ANDROID: sched: Add vendor hooks for override sugov behavior
Upstream moved the sugov to DEADLINE class which has higher prio than RT
so it can potentially block many RT use case in Android.

Also currently iowait doesn't distinguish background/foreground tasks
and we have seen cases where device run to high frequency unnecessarily
when running some background I/O.

Bug: 297343949
Signed-off-by: Wei Wang <wvw@google.com>
Change-Id: I21e9bfe9ef75a4178279574389e417c3f38e65ac
2023-09-08 03:14:00 +00:00
Qais Yousef
5762974151 ANDROID: Add new hook to enable overriding uclamp_validate()
We want to add more special values, specifically for uclamp_max so that
it can be set automatically to the most efficient value based on the
core it's running on.

Bug: 297343949
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: I57343c4544f6cac621c855cbb94de0b8d80c51fa
2023-09-08 03:14:00 +00:00
Qais Yousef
b57e3c1d99 ANDROID: sched/uclamp: Don't enable uclamp_is_used static key by in-kernel requests
We do have now in-kernel users of uclamp to implement inheritance. The
static_branch_enable() path unconditionally holds the cpus_read_lock()
which might_sleep(). The path in binder that implements inheritance
happens from in_atomic() context which leads to a splat like this one:

	[  147.529960] BUG: sleeping function called from invalid context at include/linux/percpu-rwsem.h:56
	[  147.530196] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2586, name: RenderThread
	[  147.530410] INFO: lockdep is turned off.
	[  147.530518] Preemption disabled at:
	[  147.530521] [<ffffffc008ca2cec>] binder_proc_transaction+0x78/0x41c
	[  147.530793] CPU: 8 PID: 2586 Comm: RenderThread Tainted: G S      W  O      5.15.76-android14-5-00086-gc01afe5d262f #1
	[  147.531214] Call trace:
	[  147.531288] dump_backtrace+0xe8/0x134
	[  147.531444] show_stack+0x1c/0x4c
	[  147.531598] dump_stack_lvl+0x74/0x94
	[  147.531766] dump_stack+0x14/0x3c
	[  147.531920] ___might_sleep+0x210/0x230
	[  147.532094] __might_sleep+0x54/0x84
	[  147.532259] cpus_read_lock+0x2c/0x160
	[  147.532429] static_key_enable+0x1c/0x34
	[  147.532608] __sched_setscheduler+0x2a8/0x99c
	[  147.532802] sched_setattr_nocheck+0x1c/0x24
	[  147.532994] binder_do_set_priority+0x31c/0x4a4
	[  147.533195] binder_transaction_priority+0x200/0x3f4
	[  147.533413] binder_proc_transaction+0x220/0x41c
	[  147.533618] binder_transaction+0x1df0/0x234c
	[  147.533812] binder_thread_write+0xd84/0x2398
	[  147.534007] binder_ioctl_write_read+0x19c/0xb28
	[  147.534212] binder_ioctl+0x344/0x1a3c
	[  147.534382] __arm64_sys_ioctl+0x94/0xc8
	[  147.534561] invoke_syscall+0x44/0xf8
	[  147.534729] el0_svc_common+0xc8/0x10c
	[  147.534900] do_el0_svc+0x20/0x28
	[  147.535053] el0_svc+0x58/0xe0
	[  147.535198] el0t_64_sync_handler+0x7c/0xe4
	[  147.535386] el0t_64_sync+0x188/0x18c

Prevent enabling the lock for !user initiated sched_setattr()
operations. Generally we don't expect in-kernel uclamp users.

Bug: 259145692
Signed-off-by: Qais Yousef <qyousef@google.com>
Change-Id: Iac5be139b5ffd39f5e1c0431ce253133d81b98cf
2023-09-08 03:14:00 +00:00
Rick Yiu
eb9686932b ANDROID: sched: Export symbols needed for vendor hooks
Bug: 297343949
Change-Id: I0cb65e85b36687bfaae6a185ca373d7fb8de0a77
Signed-off-by: Rick Yiu <rickyiu@google.com>
2023-09-08 03:14:00 +00:00
Shaleen Agrawal
d05fcdbc3e sched/walt: Remove references to EXPORT_SYMBOL
In accordance with guidelines, remove any references to EXPORT_SYMBOL
and use EXPORT_SYMBOL_GPL instead.

Change-Id: Ie37d5d5da24e0a5daf0569054622399723fe98f8
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-09-07 12:01:00 -07:00
Greg Kroah-Hartman
469cf75fcc Revert "sched/psi: Fix avgs_work re-arm in psi_avgs_work()"
This reverts commit 7d8bba4da1 which is
commit 2fcd7bbae90a6d844da8660a9d27079281dfbba2 upstream.

It is part of a patch series that breaks the Android API.  If this
series is needed in Android devices in the future, it can come back in
an ABI-safe manner.

Bug: 161946584
Change-Id: I6737a0ee4a39082058be5a63caee994af70a262e
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-07 10:52:29 +00:00
Greg Kroah-Hartman
d18fe3efda Revert "sched/psi: Rearrange polling code in preparation"
This reverts commit c176dda0a6 which is
commit 7fab21fa0d000a0ea32d73ce8eec68557c6c268b upstream.

It is part of a patch series that breaks the Android API.  If this
series is needed in Android devices in the future, it can come back in
an ABI-safe manner.

Bug: 161946584
Change-Id: I086b52b9a332fac76c3b96c71d7bba9eeeabb9e8
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-07 10:52:11 +00:00
Greg Kroah-Hartman
5b039dbb91 Revert "sched/psi: Rename existing poll members in preparation"
This reverts commit c1623d4d0b which is
commit 65457b74aa9437418e552e8d52d7112d4f9901a6 upstream.

It is part of a patch series that breaks the Android API.  If this
series is needed in Android devices in the future, it can come back in
an ABI-safe manner.

Bug: 161946584
Change-Id: I0e49fbb45ba3881e2a6a7521cc1f44b859772ad5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-07 10:51:53 +00:00
Greg Kroah-Hartman
ed063a7e76 Revert "sched/psi: Extract update_triggers side effect"
This reverts commit fb4bc32fc1 which is
commit 4468fcae49f08e88fbbffe05b29496192df89991 upstream.

It is part of a patch series that breaks the Android API.  If this
series is needed in Android devices in the future, it can come back in
an ABI-safe manner.

Bug: 161946584
Change-Id: I6869017fade6bcd138d5132b80da688012436887
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-07 10:51:34 +00:00
Greg Kroah-Hartman
2c1e89916b Revert "sched/psi: Allow unprivileged polling of N*2s period"
This reverts commit d5dca19776 which is
commit d82caa273565b45fcf103148950549af76c314b0 upstream.

It is part of a patch series that breaks the Android API.  If this
series is needed in Android devices in the future, it can come back in
an ABI-safe manner.

Bug: 161946584
Change-Id: Ibe2e8b99fff1782f9bfc2d633ac748741f119256
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-07 10:51:18 +00:00
Greg Kroah-Hartman
ffed79e366 Revert "sched/psi: use kernfs polling functions for PSI trigger polling"
This reverts commit 92cc015332 which is
commit aff037078ecaecf34a7c2afab1341815f90fba5e upstream.

It is part of a patch series that breaks the Android API.  If this
series is needed in Android devices in the future, it can come back in
an ABI-safe manner.

Bug: 161946584
Change-Id: I2c941ec44d5261a914d95591b7918e46f7db825a
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-07 10:50:58 +00:00
Christoph Hellwig
5d0fe30be4 modules: only allow symbol_get of EXPORT_SYMBOL_GPL modules
commit 9011e49d54dcc7653ebb8a1e05b5badb5ecfa9f9 upstream.

It has recently come to my attention that nvidia is circumventing the
protection added in 262e6ae708 ("modules: inherit
TAINT_PROPRIETARY_MODULE") by importing exports from their proprietary
modules into an allegedly GPL licensed module and then rexporting them.

Given that symbol_get was only ever intended for tightly cooperating
modules using very internal symbols it is logical to restrict it to
being used on EXPORT_SYMBOL_GPL and prevent nvidia from costly DMCA
Circumvention of Access Controls law suites.

All symbols except for four used through symbol_get were already exported
as EXPORT_SYMBOL_GPL, and the remaining four ones were switched over in
the preparation patches.

Fixes: 262e6ae708 ("modules: inherit TAINT_PROPRIETARY_MODULE")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-06 21:27:00 +01:00
Greg Kroah-Hartman
8976ff249f Merge 6.1.42 into android14-6.1-lts
Changes in 6.1.42
	io_uring: treat -EAGAIN for REQ_F_NOWAIT as final for io-wq
	ALSA: hda/realtek - remove 3k pull low procedure
	ALSA: hda/realtek: Add quirk for Clevo NS70AU
	ALSA: hda/realtek: Enable Mute LED on HP Laptop 15s-eq2xxx
	maple_tree: set the node limit when creating a new root node
	maple_tree: fix node allocation testing on 32 bit
	keys: Fix linking a duplicate key to a keyring's assoc_array
	perf probe: Add test for regression introduced by switch to die_get_decl_file()
	btrfs: fix warning when putting transaction with qgroups enabled after abort
	fuse: revalidate: don't invalidate if interrupted
	fuse: Apply flags2 only when userspace set the FUSE_INIT_EXT
	btrfs: set_page_extent_mapped after read_folio in btrfs_cont_expand
	btrfs: zoned: fix memory leak after finding block group with super blocks
	fuse: ioctl: translate ENOSYS in outarg
	btrfs: fix race between balance and cancel/pause
	selftests: tc: set timeout to 15 minutes
	selftests: tc: add 'ct' action kconfig dep
	regmap: Drop initial version of maximum transfer length fixes
	of: Preserve "of-display" device name for compatibility
	regmap: Account for register length in SMBus I/O limits
	arm64/fpsimd: Ensure SME storage is allocated after SVE VL changes
	can: mcp251xfd: __mcp251xfd_chip_set_mode(): increase poll timeout
	can: bcm: Fix UAF in bcm_proc_show()
	can: gs_usb: gs_can_open(): improve error handling
	selftests: tc: add ConnTrack procfs kconfig
	dma-buf/dma-resv: Stop leaking on krealloc() failure
	drm/amdgpu/vkms: relax timer deactivation by hrtimer_try_to_cancel
	drm/amdgpu/pm: make gfxclock consistent for sienna cichlid
	drm/amdgpu/pm: make mclk consistent for smu 13.0.7
	drm/client: Fix memory leak in drm_client_target_cloned
	drm/client: Fix memory leak in drm_client_modeset_probe
	drm/amd/display: only accept async flips for fast updates
	drm/amd/display: Disable MPC split by default on special asic
	drm/amd/display: check TG is non-null before checking if enabled
	drm/amd/display: Keep PHY active for DP displays on DCN31
	ASoC: fsl_sai: Disable bit clock with transmitter
	ASoC: fsl_sai: Revert "ASoC: fsl_sai: Enable MCTL_MCLK_EN bit for master mode"
	ASoC: tegra: Fix ADX byte map
	ASoC: rt5640: Fix sleep in atomic context
	ASoC: cs42l51: fix driver to properly autoload with automatic module loading
	ASoC: codecs: wcd938x: fix missing clsh ctrl error handling
	ASoC: codecs: wcd-mbhc-v2: fix resource leaks on component remove
	ASoC: qdsp6: audioreach: fix topology probe deferral
	ASoC: tegra: Fix AMX byte map
	ASoC: codecs: wcd938x: fix resource leaks on component remove
	ASoC: codecs: wcd938x: fix missing mbhc init error handling
	ASoC: codecs: wcd934x: fix resource leaks on component remove
	ASoC: codecs: wcd938x: fix codec initialisation race
	ASoC: codecs: wcd938x: fix soundwire initialisation race
	ext4: correct inline offset when handling xattrs in inode body
	drm/radeon: Fix integer overflow in radeon_cs_parser_init
	ALSA: emu10k1: roll up loops in DSP setup code for Audigy
	quota: Properly disable quotas when add_dquot_ref() fails
	quota: fix warning in dqgrab()
	HID: add quirk for 03f0:464a HP Elite Presenter Mouse
	ovl: check type and offset of struct vfsmount in ovl_entry
	udf: Fix uninitialized array access for some pathnames
	fs: jfs: Fix UBSAN: array-index-out-of-bounds in dbAllocDmapLev
	MIPS: dec: prom: Address -Warray-bounds warning
	FS: JFS: Fix null-ptr-deref Read in txBegin
	FS: JFS: Check for read-only mounted filesystem in txBegin
	ACPI: video: Add backlight=native DMI quirk for Dell Studio 1569
	rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
	rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
	sched/fair: Don't balance task to its current running CPU
	wifi: ath11k: fix registration of 6Ghz-only phy without the full channel range
	bpf: Print a warning only if writing to unprivileged_bpf_disabled.
	bpf: Address KCSAN report on bpf_lru_list
	bpf: tcp: Avoid taking fast sock lock in iterator
	wifi: ath11k: add support default regdb while searching board-2.bin for WCN6855
	wifi: mac80211_hwsim: Fix possible NULL dereference
	spi: dw: Add compatible for Intel Mount Evans SoC
	wifi: ath11k: fix memory leak in WMI firmware stats
	net: ethernet: litex: add support for 64 bit stats
	devlink: report devlink_port_type_warn source device
	wifi: wext-core: Fix -Wstringop-overflow warning in ioctl_standard_iw_point()
	wifi: iwlwifi: Add support for new PCI Id
	wifi: iwlwifi: mvm: avoid baid size integer overflow
	wifi: iwlwifi: pcie: add device id 51F1 for killer 1675
	igb: Fix igb_down hung on surprise removal
	net: hns3: fix strncpy() not using dest-buf length as length issue
	ASoC: amd: acp: fix for invalid dai id handling in acp_get_byte_count()
	ASoC: codecs: wcd938x: fix mbhc impedance loglevel
	ASoC: codecs: wcd938x: fix dB range for HPHL and HPHR
	ASoC: qcom: q6apm: do not close GPR port before closing graph
	sched/fair: Use recent_used_cpu to test p->cpus_ptr
	sched/psi: Fix avgs_work re-arm in psi_avgs_work()
	sched/psi: Rearrange polling code in preparation
	sched/psi: Rename existing poll members in preparation
	sched/psi: Extract update_triggers side effect
	sched/psi: Allow unprivileged polling of N*2s period
	sched/psi: use kernfs polling functions for PSI trigger polling
	pinctrl: renesas: rzv2m: Handle non-unique subnode names
	pinctrl: renesas: rzg2l: Handle non-unique subnode names
	spi: bcm63xx: fix max prepend length
	fbdev: imxfb: warn about invalid left/right margin
	fbdev: imxfb: Removed unneeded release_mem_region
	perf build: Fix library not found error when using CSLIBS
	btrfs: be a bit more careful when setting mirror_num_ret in btrfs_map_block
	spi: s3c64xx: clear loopback bit after loopback test
	kallsyms: Improve the performance of kallsyms_lookup_name()
	kallsyms: Correctly sequence symbols when CONFIG_LTO_CLANG=y
	kallsyms: strip LTO-only suffixes from promoted global functions
	dsa: mv88e6xxx: Do a final check before timing out
	net: ethernet: ti: cpsw_ale: Fix cpsw_ale_get_field()/cpsw_ale_set_field()
	bridge: Add extack warning when enabling STP in netns.
	net: ethernet: mtk_eth_soc: handle probe deferral
	cifs: fix mid leak during reconnection after timeout threshold
	ASoC: SOF: ipc3-dtrace: uninitialized data in dfsentry_trace_filter_write()
	net: sched: cls_matchall: Undo tcf_bind_filter in case of failure after mall_set_parms
	net: sched: cls_u32: Undo tcf_bind_filter if u32_replace_hw_knode
	net: sched: cls_u32: Undo refcount decrement in case update failed
	net: sched: cls_bpf: Undo tcf_bind_filter in case of an error
	net: dsa: microchip: ksz8: Separate static MAC table operations for code reuse
	net: dsa: microchip: ksz8: Make ksz8_r_sta_mac_table() static
	net: dsa: microchip: ksz8_r_sta_mac_table(): Avoid using error code for empty entries
	net: dsa: microchip: correct KSZ8795 static MAC table access
	iavf: Fix use-after-free in free_netdev
	iavf: Fix out-of-bounds when setting channels on remove
	iavf: use internal state to free traffic IRQs
	iavf: Move netdev_update_features() into watchdog task
	iavf: send VLAN offloading caps once after VFR
	iavf: make functions static where possible
	iavf: Wait for reset in callbacks which trigger it
	iavf: fix a deadlock caused by rtnl and driver's lock circular dependencies
	iavf: fix reset task race with iavf_remove()
	security: keys: Modify mismatched function name
	octeontx2-pf: Dont allocate BPIDs for LBK interfaces
	bpf: Fix subprog idx logic in check_max_stack_depth
	bpf: Repeat check_max_stack_depth for async callbacks
	bpf, arm64: Fix BTI type used for freplace attached functions
	igc: Avoid transmit queue timeout for XDP
	igc: Prevent garbled TX queue with XDP ZEROCOPY
	net: ipv4: use consistent txhash in TIME_WAIT and SYN_RECV
	tcp: annotate data-races around tcp_rsk(req)->txhash
	tcp: annotate data-races around tcp_rsk(req)->ts_recent
	net: ipv4: Use kfree_sensitive instead of kfree
	net:ipv6: check return value of pskb_trim()
	Revert "tcp: avoid the lookup process failing to get sk in ehash table"
	fbdev: au1200fb: Fix missing IRQ check in au1200fb_drv_probe
	llc: Don't drop packet from non-root netns.
	ALSA: hda/realtek: Fix generic fixup definition for cs35l41 amp
	netfilter: nf_tables: fix spurious set element insertion failure
	netfilter: nf_tables: can't schedule in nft_chain_validate
	netfilter: nft_set_pipapo: fix improper element removal
	netfilter: nf_tables: skip bound chain in netns release path
	netfilter: nf_tables: skip bound chain on rule flush
	Bluetooth: use RCU for hci_conn_params and iterate safely in hci_sync
	Bluetooth: hci_event: call disconnect callback before deleting conn
	Bluetooth: ISO: fix iso_conn related locking and validity issues
	Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor()
	tcp: annotate data-races around tp->tcp_tx_delay
	tcp: annotate data-races around tp->tsoffset
	tcp: annotate data-races around tp->keepalive_time
	tcp: annotate data-races around tp->keepalive_intvl
	tcp: annotate data-races around tp->keepalive_probes
	tcp: annotate data-races around icsk->icsk_syn_retries
	tcp: annotate data-races around tp->linger2
	tcp: annotate data-races around rskq_defer_accept
	tcp: annotate data-races around tp->notsent_lowat
	tcp: annotate data-races around icsk->icsk_user_timeout
	tcp: annotate data-races around fastopenq.max_qlen
	net: phy: prevent stale pointer dereference in phy_init()
	jbd2: recheck chechpointing non-dirty buffer
	tracing/histograms: Return an error if we fail to add histogram to hist_vars list
	drm/ttm: fix bulk_move corruption when adding a entry
	spi: dw: Remove misleading comment for Mount Evans SoC
	kallsyms: add kallsyms_seqs_of_names to list of special symbols
	scripts/kallsyms.c Make the comment up-to-date with current implementation
	scripts/kallsyms: update the usage in the comment block
	bpf: allow precision tracking for programs with subprogs
	bpf: stop setting precise in current state
	bpf: aggressively forget precise markings during state checkpointing
	selftests/bpf: make test_align selftest more robust
	selftests/bpf: Workaround verification failure for fexit_bpf2bpf/func_replace_return_code
	selftests/bpf: Fix sk_assign on s390x
	drm/amd/display: use max_dsc_bpp in amdgpu_dm
	drm/amd/display: fix some coding style issues
	drm/dp_mst: Clear MSG_RDY flag before sending new message
	drm/amd/display: force connector state when bpc changes during compliance
	drm/amd/display: Clean up errors & warnings in amdgpu_dm.c
	drm/amd/display: fix linux dp link lost handled only one time
	drm/amd/display: Add polling method to handle MST reply packet
	Revert "drm/amd/display: edp do not add non-edid timings"
	Linux 6.1.42

Change-Id: I6b7257a16f9a025d0c23dfd3eb43317c1c164a93
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-06 10:23:12 +00:00
liuxudong5
176d72d941 ANDROID: vendor_hooks: export cgroup_threadgroup_rwsem
When the task wakes up from percpu_rwsem_wait, it will enter a long
runnable state, which will cause frame loss when the application
starts. In order to solve this problem, we need to let the process
enter the "vip" queue when it is woken up, so we need to set a flag
for the process holding the lock to prove that it is about to hold
the lock. Most of this long runnable state occurs in the
cgroup_threadgroup_rwsem, so we only care cgroup_threadgroup_rwsem,
and cgroup_threadgroup_rwsem should be exported. Finally, if the
semaphore is of cgroup_threadgroup_rwsem type and has a flag,
then let it join the "vip" queue.

Bug: 297785167
Signed-off-by: liuxudong <liuxudong5@xiaomi.com>
Change-Id: I2297dfbc2f2681581241f85a3b4fd59415ea67db
2023-09-05 21:16:05 +00:00
Greg Kroah-Hartman
f1311733c2 Merge 6.1.40 into android14-6.1-lts
Changes in 6.1.40
	HID: amd_sfh: Rename the float32 variable
	HID: amd_sfh: Fix for shift-out-of-bounds
	net: lan743x: Don't sleep in atomic context
	workqueue: clean up WORK_* constant types, clarify masking
	ksmbd: add missing compound request handing in some commands
	ksmbd: fix out of bounds read in smb2_sess_setup
	drm/panel: simple: Add connector_type for innolux_at043tn24
	drm/bridge: ti-sn65dsi86: Fix auxiliary bus lifetime
	swiotlb: always set the number of areas before allocating the pool
	swiotlb: reduce the swiotlb buffer size on allocation failure
	swiotlb: reduce the number of areas to match actual memory pool size
	drm/panel: simple: Add Powertip PH800480T013 drm_display_mode flags
	ice: Fix max_rate check while configuring TX rate limits
	igc: Remove delay during TX ring configuration
	net/mlx5e: fix double free in mlx5e_destroy_flow_table
	net/mlx5e: fix memory leak in mlx5e_fs_tt_redirect_any_create
	net/mlx5e: fix memory leak in mlx5e_ptp_open
	net/mlx5e: Check for NOT_READY flag state after locking
	igc: set TP bit in 'supported' and 'advertising' fields of ethtool_link_ksettings
	igc: Handle PPS start time programming for past time values
	blk-crypto: use dynamic lock class for blk_crypto_profile::lock
	scsi: qla2xxx: Fix error code in qla2x00_start_sp()
	scsi: ufs: ufs-mediatek: Add dependency for RESET_CONTROLLER
	bpf: Fix max stack depth check for async callbacks
	net: mvneta: fix txq_map in case of txq_number==1
	net/sched: cls_fw: Fix improper refcount update leads to use-after-free
	gve: Set default duplex configuration to full
	octeontx2-af: Promisc enable/disable through mbox
	octeontx2-af: Move validation of ptp pointer before its usage
	ionic: remove WARN_ON to prevent panic_on_warn
	net: bgmac: postpone turning IRQs off to avoid SoC hangs
	net: prevent skb corruption on frag list segmentation
	icmp6: Fix null-ptr-deref of ip6_null_entry->rt6i_idev in icmp6_dev().
	udp6: fix udp6_ehashfn() typo
	ntb: idt: Fix error handling in idt_pci_driver_init()
	NTB: amd: Fix error handling in amd_ntb_pci_driver_init()
	ntb: intel: Fix error handling in intel_ntb_pci_driver_init()
	NTB: ntb_transport: fix possible memory leak while device_register() fails
	NTB: ntb_tool: Add check for devm_kcalloc
	ipv6/addrconf: fix a potential refcount underflow for idev
	net: dsa: qca8k: Add check for skb_copy
	platform/x86: wmi: Break possible infinite loop when parsing GUID
	kernel/trace: Fix cleanup logic of enable_trace_eprobe
	igc: Fix launchtime before start of cycle
	igc: Fix inserting of empty frame for launchtime
	nvme: fix the NVME_ID_NS_NVM_STS_MASK definition
	riscv, bpf: Fix inconsistent JIT image generation
	drm/i915: Don't preserve dpll_hw_state for slave crtc in Bigjoiner
	drm/i915: Fix one wrong caching mode enum usage
	octeontx2-pf: Add additional check for MCAM rules
	erofs: avoid useless loops in z_erofs_pcluster_readmore() when reading beyond EOF
	erofs: avoid infinite loop in z_erofs_do_read_page() when reading beyond EOF
	erofs: fix fsdax unavailability for chunk-based regular files
	wifi: airo: avoid uninitialized warning in airo_get_rate()
	bpf: cpumap: Fix memory leak in cpu_map_update_elem
	net/sched: flower: Ensure both minimum and maximum ports are specified
	riscv: mm: fix truncation warning on RV32
	netdevsim: fix uninitialized data in nsim_dev_trap_fa_cookie_write()
	net/sched: make psched_mtu() RTNL-less safe
	wifi: rtw89: debug: fix error code in rtw89_debug_priv_send_h2c_set()
	net/sched: sch_qfq: refactor parsing of netlink parameters
	net/sched: sch_qfq: account for stab overhead in qfq_enqueue
	nvme-pci: fix DMA direction of unmapping integrity data
	fs/ntfs3: Check fields while reading
	ovl: let helper ovl_i_path_real() return the realinode
	ovl: fix null pointer dereference in ovl_get_acl_rcu()
	cifs: fix session state check in smb2_find_smb_ses
	drm/client: Send hotplug event after registering a client
	drm/amdgpu/sdma4: set align mask to 255
	drm/amd/pm: revise the ASPM settings for thunderbolt attached scenario
	drm/amdgpu: add the fan abnormal detection feature
	drm/amdgpu: Fix minmax warning
	drm/amd/pm: add abnormal fan detection for smu 13.0.0
	f2fs: fix the wrong condition to determine atomic context
	f2fs: fix deadlock in i_xattr_sem and inode page lock
	pinctrl: amd: Add Z-state wake control bits
	pinctrl: amd: Adjust debugfs output
	pinctrl: amd: Add fields for interrupt status and wake status
	pinctrl: amd: Detect internal GPIO0 debounce handling
	pinctrl: amd: Fix mistake in handling clearing pins at startup
	pinctrl: amd: Detect and mask spurious interrupts
	pinctrl: amd: Revert "pinctrl: amd: disable and mask interrupts on probe"
	pinctrl: amd: Only use special debounce behavior for GPIO 0
	pinctrl: amd: Use amd_pinconf_set() for all config options
	pinctrl: amd: Drop pull up select configuration
	pinctrl: amd: Unify debounce handling into amd_pinconf_set()
	tpm: Do not remap from ACPI resources again for Pluton TPM
	tpm: tpm_vtpm_proxy: fix a race condition in /dev/vtpmx creation
	tpm: tis_i2c: Limit read bursts to I2C_SMBUS_BLOCK_MAX (32) bytes
	tpm: tis_i2c: Limit write bursts to I2C_SMBUS_BLOCK_MAX (32) bytes
	tpm: return false from tpm_amd_is_rng_defective on non-x86 platforms
	mtd: rawnand: meson: fix unaligned DMA buffers handling
	net: bcmgenet: Ensure MDIO unregistration has clocks enabled
	net: phy: dp83td510: fix kernel stall during netboot in DP83TD510E PHY driver
	kasan: add kasan_tag_mismatch prototype
	tracing/user_events: Fix incorrect return value for writing operation when events are disabled
	powerpc: Fail build if using recordmcount with binutils v2.37
	misc: fastrpc: Create fastrpc scalar with correct buffer count
	powerpc/security: Fix Speculation_Store_Bypass reporting on Power10
	powerpc/64s: Fix native_hpte_remove() to be irq-safe
	MIPS: Loongson: Fix cpu_probe_loongson() again
	MIPS: KVM: Fix NULL pointer dereference
	ext4: Fix reusing stale buffer heads from last failed mounting
	ext4: fix wrong unit use in ext4_mb_clear_bb
	ext4: get block from bh in ext4_free_blocks for fast commit replay
	ext4: fix wrong unit use in ext4_mb_new_blocks
	ext4: fix to check return value of freeze_bdev() in ext4_shutdown()
	ext4: turn quotas off if mount failed after enabling quotas
	ext4: only update i_reserved_data_blocks on successful block allocation
	fs: dlm: revert check required context while close
	soc: qcom: mdt_loader: Fix unconditional call to scm_pas_mem_setup
	ext2/dax: Fix ext2_setsize when len is page aligned
	jfs: jfs_dmap: Validate db_l2nbperpage while mounting
	hwrng: imx-rngc - fix the timeout for init and self check
	dm integrity: reduce vmalloc space footprint on 32-bit architectures
	scsi: mpi3mr: Propagate sense data for admin queue SCSI I/O
	s390/zcrypt: do not retry administrative requests
	PCI/PM: Avoid putting EloPOS E2/S2/H2 PCIe Ports in D3cold
	PCI: Release resource invalidated by coalescing
	PCI: Add function 1 DMA alias quirk for Marvell 88SE9235
	PCI: qcom: Disable write access to read only registers for IP v2.3.3
	PCI: epf-test: Fix DMA transfer completion initialization
	PCI: epf-test: Fix DMA transfer completion detection
	PCI: rockchip: Assert PCI Configuration Enable bit after probe
	PCI: rockchip: Write PCI Device ID to correct register
	PCI: rockchip: Add poll and timeout to wait for PHY PLLs to be locked
	PCI: rockchip: Fix legacy IRQ generation for RK3399 PCIe endpoint core
	PCI: rockchip: Use u32 variable to access 32-bit registers
	PCI: rockchip: Set address alignment for endpoint mode
	misc: pci_endpoint_test: Free IRQs before removing the device
	misc: pci_endpoint_test: Re-init completion for every test
	mfd: pm8008: Fix module autoloading
	md/raid0: add discard support for the 'original' layout
	dm init: add dm-mod.waitfor to wait for asynchronously probed block devices
	fs: dlm: return positive pid value for F_GETLK
	fs: dlm: fix cleanup pending ops when interrupted
	fs: dlm: interrupt posix locks only when process is killed
	fs: dlm: make F_SETLK use unkillable wait_event
	fs: dlm: fix mismatch of plock results from userspace
	scsi: lpfc: Fix double free in lpfc_cmpl_els_logo_acc() caused by lpfc_nlp_not_used()
	drm/atomic: Allow vblank-enabled + self-refresh "disable"
	drm/rockchip: vop: Leave vblank enabled in self-refresh
	drm/amd/display: fix seamless odm transitions
	drm/amd/display: edp do not add non-edid timings
	drm/amd/display: Remove Phantom Pipe Check When Calculating K1 and K2
	drm/amd/display: disable seamless boot if force_odm_combine is enabled
	drm/amdgpu: fix clearing mappings for BOs that are always valid in VM
	drm/amd: Disable PSR-SU on Parade 0803 TCON
	drm/amd/display: add a NULL pointer check
	drm/amd/display: Correct `DMUB_FW_VERSION` macro
	drm/amd/display: Add monitor specific edid quirk
	drm/amdgpu: avoid restore process run into dead loop.
	drm/ttm: Don't leak a resource on swapout move error
	serial: atmel: don't enable IRQs prematurely
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() in case of error
	tty: serial: samsung_tty: Fix a memory leak in s3c24xx_serial_getclk() when iterating clk
	tty: serial: imx: fix rs485 rx after tx
	firmware: stratix10-svc: Fix a potential resource leak in svc_create_memory_pool()
	libceph: harden msgr2.1 frame segment length checks
	ceph: add a dedicated private data for netfs rreq
	ceph: fix blindly expanding the readahead windows
	ceph: don't let check_caps skip sending responses for revoke msgs
	xhci: Fix resume issue of some ZHAOXIN hosts
	xhci: Fix TRB prefetch issue of ZHAOXIN hosts
	xhci: Show ZHAOXIN xHCI root hub speed correctly
	meson saradc: fix clock divider mask length
	opp: Fix use-after-free in lazy_opp_tables after probe deferral
	soundwire: qcom: fix storing port config out-of-bounds
	Revert "8250: add support for ASIX devices with a FIFO bug"
	bus: ixp4xx: fix IXP4XX_EXP_T1_MASK
	s390/decompressor: fix misaligned symbol build error
	dm: verity-loadpin: Add NULL pointer check for 'bdev' parameter
	tracing/histograms: Add histograms to hist_vars if they have referenced variables
	tracing: Fix memory leak of iter->temp when reading trace_pipe
	nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices
	samples: ftrace: Save required argument registers in sample trampolines
	perf: RISC-V: Remove PERF_HES_STOPPED flag checking in riscv_pmu_start()
	regmap-irq: Fix out-of-bounds access when allocating config buffers
	net: ena: fix shift-out-of-bounds in exponential backoff
	ring-buffer: Fix deadloop issue on reading trace_pipe
	ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
	drm/amd/pm: share the code around SMU13 pcie parameters update
	drm/amd/pm: conditionally disable pcie lane/speed switching for SMU13
	cifs: if deferred close is disabled then close files immediately
	xtensa: ISS: fix call to split_if_spec
	perf/x86: Fix lockdep warning in for_each_sibling_event() on SPR
	PM: QoS: Restore support for default value on frequency QoS
	pwm: meson: modify and simplify calculation in meson_pwm_get_state
	pwm: meson: fix handling of period/duty if greater than UINT_MAX
	fprobe: Release rethook after the ftrace_ops is unregistered
	fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
	tracing: Fix null pointer dereference in tracing_err_log_open()
	selftests: mptcp: connect: fail if nft supposed to work
	selftests: mptcp: sockopt: return error if wrong mark
	selftests: mptcp: userspace_pm: use correct server port
	selftests: mptcp: userspace_pm: report errors with 'remove' tests
	selftests: mptcp: depend on SYN_COOKIES
	selftests: mptcp: pm_nl_ctl: fix 32-bit support
	tracing/probes: Fix not to count error code to total length
	tracing/probes: Fix to update dynamic data counter if fetcharg uses it
	tracing/user_events: Fix struct arg size match check
	scsi: qla2xxx: Multi-que support for TMF
	scsi: qla2xxx: Fix task management cmd failure
	scsi: qla2xxx: Fix task management cmd fail due to unavailable resource
	scsi: qla2xxx: Fix hang in task management
	scsi: qla2xxx: Wait for io return on terminate rport
	scsi: qla2xxx: Fix mem access after free
	scsi: qla2xxx: Array index may go out of bound
	scsi: qla2xxx: Avoid fcport pointer dereference
	scsi: qla2xxx: Fix buffer overrun
	scsi: qla2xxx: Fix potential NULL pointer dereference
	scsi: qla2xxx: Check valid rport returned by fc_bsg_to_rport()
	scsi: qla2xxx: Correct the index of array
	scsi: qla2xxx: Pointer may be dereferenced
	scsi: qla2xxx: Remove unused nvme_ls_waitq wait queue
	scsi: qla2xxx: Fix end of loop test
	MIPS: kvm: Fix build error with KVM_MIPS_DEBUG_COP0_COUNTERS enabled
	Revert "drm/amd: Disable PSR-SU on Parade 0803 TCON"
	swiotlb: mark swiotlb_memblock_alloc() as __init
	net/sched: sch_qfq: reintroduce lmax bound check for MTU
	drm/atomic: Fix potential use-after-free in nonblocking commits
	net/ncsi: make one oem_gma function for all mfr id
	net/ncsi: change from ndo_set_mac_address to dev_set_mac_address
	Linux 6.1.40

Change-Id: I5cc6aab178c66d2a23fe2a8d21e71cc4a8b15acf
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-05 16:35:01 +00:00
Greg Kroah-Hartman
c057db2f88 Revert "bpf: Remove bpf trampoline selector"
This reverts commit 8ea165e1f8 which is
commit 47e79cbeea4b3891ad476047f4c68543eb51c8e0 upstream.

It breaks the Android ABI and can be brought back in an abi-safe way in
the future if it is still needed.

Bug: 161946584
Change-Id: I6b0b4846d8ec26e44c6627b8b4cbb2c46cc01a13
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-05 10:47:44 +00:00
Greg Kroah-Hartman
b435525822 This is the 6.1.39 stable release
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEZH8oZUiU471FcZm+ONu9yGCSaT4FAmS38qMACgkQONu9yGCS
 aT56yQ//ZuDuw8Ev3HISVgZhE9FpuXC1RSYXiMCAvwA9rH3KnJ4wKVPEhEWLy9P4
 jdJaatSLbLOvA7ME7JnwZxz2qahjBxo1tpx6u2S3zrzz4UlAPNLwCxTxxp4X07VI
 3fBNvsmucqFSayCrA8t9xgkaJizuCvHZm7eSoyVIigPwbB5igc2b+bNSRcx1Zo+j
 SHl4Y4nGK8a47XU9RSlDLVKow0/6rrQLHQ9DLpxACArRHw3h451vD0DMcgOuU/Uv
 6qq9u3COcdVw3oc5VENu9XklPmvQkxo3RaCUHyRadVstuc0H/BBUDvEhPn5PcVOV
 EdBWlTjmhsQo0aUziK4kotLNeX1VRgKa+rrIUBJn68OHv1SRRPZU/eJ8hkL81dCi
 FDPzXDOszixO7pPv1jj7O9kNcwKPuiHPmdaNPCY6jviOHhZnAEub44DpQamxWvU/
 kb5MZRRY72wt9iWeI3kscCCSbf6eyjlmDMoYIeLuYn10n7gIDU80eUOBl9bqEsz/
 X+OUxaY+XuKbCoucpNmSHHLmynJ5D0CXhl/5qnlgMoSo4UJ5BUIMj2e3ZqsKLfrR
 e/09MCRX79y9J+TxUunnQZfq5vBlH1tRsvUyhIfYfW4AaC9BrkOL2XZviQldKY6x
 FUmsxh62O3iGRtLOWDKQA5MwoJuD54qVcHr1iidWkO2G8T3ctCc=
 =kyUh
 -----END PGP SIGNATURE-----

Merge 6.1.39 into android14-6.1-lts

Changes in 6.1.39
	drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2
	fs: pipe: reveal missing function protoypes
	block: Fix the type of the second bdev_op_is_zoned_write() argument
	erofs: clean up cached I/O strategies
	erofs: avoid tagged pointers to mark sync decompression
	erofs: remove tagged pointer helpers
	erofs: move zdata.h into zdata.c
	erofs: kill hooked chains to avoid loops on deduplicated compressed images
	x86/resctrl: Only show tasks' pid in current pid namespace
	blk-iocost: use spin_lock_irqsave in adjust_inuse_and_calc_cost
	x86/sev: Fix calculation of end address based on number of pages
	virt: sevguest: Add CONFIG_CRYPTO dependency
	blk-mq: fix potential io hang by wrong 'wake_batch'
	lockd: drop inappropriate svc_get() from locked_get()
	nvme-auth: rename __nvme_auth_[reset|free] to nvme_auth[reset|free]_dhchap
	nvme-auth: rename authentication work elements
	nvme-auth: remove symbol export from nvme_auth_reset
	nvme-auth: no need to reset chap contexts on re-authentication
	nvme-core: fix memory leak in dhchap_secret_store
	nvme-core: fix memory leak in dhchap_ctrl_secret
	nvme-auth: don't ignore key generation failures when initializing ctrl keys
	nvme-core: add missing fault-injection cleanup
	nvme-core: fix dev_pm_qos memleak
	md/raid10: check slab-out-of-bounds in md_bitmap_get_counter
	md/raid10: fix overflow of md/safe_mode_delay
	md/raid10: fix wrong setting of max_corr_read_errors
	md/raid10: fix null-ptr-deref of mreplace in raid10_sync_request
	md/raid10: fix io loss while replacement replace rdev
	md/raid1-10: factor out a helper to add bio to plug
	md/raid1-10: factor out a helper to submit normal write
	md/raid1-10: submit write io directly if bitmap is not enabled
	block: fix blktrace debugfs entries leakage
	irqchip/stm32-exti: Fix warning on initialized field overwritten
	irqchip/jcore-aic: Fix missing allocation of IRQ descriptors
	svcrdma: Prevent page release when nothing was received
	erofs: simplify iloc()
	erofs: fix compact 4B support for 16k block size
	posix-timers: Prevent RT livelock in itimer_delete()
	tick/rcu: Fix bogus ratelimit condition
	tracing/timer: Add missing hrtimer modes to decode_hrtimer_mode().
	clocksource/drivers/cadence-ttc: Fix memory leak in ttc_timer_probe
	PM: domains: fix integer overflow issues in genpd_parse_state()
	perf/arm-cmn: Fix DTC reset
	x86/mm: Allow guest.enc_status_change_prepare() to fail
	x86/tdx: Fix race between set_memory_encrypted() and load_unaligned_zeropad()
	drivers/perf: hisi: Don't migrate perf to the CPU going to teardown
	powercap: RAPL: Fix CONFIG_IOSF_MBI dependency
	PM: domains: Move the verification of in-params from genpd_add_device()
	ARM: 9303/1: kprobes: avoid missing-declaration warnings
	cpufreq: intel_pstate: Fix energy_performance_preference for passive
	thermal/drivers/sun8i: Fix some error handling paths in sun8i_ths_probe()
	rcu: Make rcu_cpu_starting() rely on interrupts being disabled
	rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
	rcutorture: Correct name of use_softirq module parameter
	rcuscale: Move shutdown from wait_event() to wait_event_idle()
	rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
	rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
	kselftest: vDSO: Fix accumulation of uninitialized ret when CLOCK_REALTIME is undefined
	perf/ibs: Fix interface via core pmu events
	x86/mm: Fix __swp_entry_to_pte() for Xen PV guests
	locking/atomic: arm: fix sync ops
	evm: Complete description of evm_inode_setattr()
	evm: Fix build warnings
	ima: Fix build warnings
	pstore/ram: Add check for kstrdup
	igc: Enable and fix RX hash usage by netstack
	wifi: ath9k: fix AR9003 mac hardware hang check register offset calculation
	wifi: ath9k: avoid referencing uninit memory in ath9k_wmi_ctrl_rx
	libbpf: btf_dump_type_data_check_overflow needs to consider BTF_MEMBER_BITFIELD_SIZE
	samples/bpf: Fix buffer overflow in tcp_basertt
	spi: spi-geni-qcom: Correct CS_TOGGLE bit in SPI_TRANS_CFG
	wifi: wilc1000: fix for absent RSN capabilities WFA testcase
	wifi: mwifiex: Fix the size of a memory allocation in mwifiex_ret_802_11_scan()
	sctp: add bpf_bypass_getsockopt proto callback
	libbpf: fix offsetof() and container_of() to work with CO-RE
	bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
	spi: dw: Round of n_bytes to power of 2
	nfc: llcp: fix possible use of uninitialized variable in nfc_llcp_send_connect()
	bpftool: JIT limited misreported as negative value on aarch64
	bpf: Remove bpf trampoline selector
	bpf: Fix memleak due to fentry attach failure
	selftests/bpf: Do not use sign-file as testcase
	regulator: core: Fix more error checking for debugfs_create_dir()
	regulator: core: Streamline debugfs operations
	wifi: orinoco: Fix an error handling path in spectrum_cs_probe()
	wifi: orinoco: Fix an error handling path in orinoco_cs_probe()
	wifi: atmel: Fix an error handling path in atmel_probe()
	wifi: wl3501_cs: Fix an error handling path in wl3501_probe()
	wifi: ray_cs: Fix an error handling path in ray_probe()
	wifi: ath9k: don't allow to overwrite ENDPOINT0 attributes
	samples/bpf: xdp1 and xdp2 reduce XDPBUFSIZE to 60
	wifi: ath10k: Trigger STA disconnect after reconfig complete on hardware restart
	wifi: mac80211: recalc min chandef for new STA links
	selftests/bpf: Fix check_mtu using wrong variable type
	wifi: rsi: Do not configure WoWlan in shutdown hook if not enabled
	wifi: rsi: Do not set MMC_PM_KEEP_POWER in shutdown
	ice: handle extts in the miscellaneous interrupt thread
	selftests: cgroup: fix unexpected failure on test_memcg_low
	watchdog/perf: define dummy watchdog_update_hrtimer_threshold() on correct config
	watchdog/perf: more properly prevent false positives with turbo modes
	kexec: fix a memory leak in crash_shrink_memory()
	mmc: mediatek: Avoid ugly error message when SDIO wakeup IRQ isn't used
	memstick r592: make memstick_debug_get_tpc_name() static
	wifi: ath9k: Fix possible stall on ath9k_txq_list_has_key()
	wifi: mac80211: Fix permissions for valid_links debugfs entry
	rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO
	wifi: ath11k: Add missing check for ioremap
	wifi: iwlwifi: pull from TXQs with softirqs disabled
	wifi: iwlwifi: pcie: fix NULL pointer dereference in iwl_pcie_irq_rx_msix_handler()
	wifi: mac80211: Remove "Missing iftype sband data/EHT cap" spam
	wifi: cfg80211: rewrite merging of inherited elements
	wifi: cfg80211: drop incorrect nontransmitted BSS update code
	wifi: cfg80211: fix regulatory disconnect with OCB/NAN
	wifi: cfg80211/mac80211: Fix ML element common size calculation
	wifi: ieee80211: Fix the common size calculation for reconfiguration ML
	mmc: Add MMC_QUIRK_BROKEN_SD_CACHE for Kingston Canvas Go Plus from 11/2019
	wifi: iwlwifi: mvm: indicate HW decrypt for beacon protection
	wifi: ath9k: convert msecs to jiffies where needed
	bpf: Factor out socket lookup functions for the TC hookpoint.
	bpf: Call __bpf_sk_lookup()/__bpf_skc_lookup() directly via TC hookpoint
	bpf: Fix bpf socket lookup from tc/xdp to respect socket VRF bindings
	can: length: fix bitstuffing count
	can: kvaser_pciefd: Add function to set skb hwtstamps
	can: kvaser_pciefd: Set hardware timestamp on transmitted packets
	net: stmmac: fix double serdes powerdown
	netlink: fix potential deadlock in netlink_set_err()
	netlink: do not hard code device address lenth in fdb dumps
	bonding: do not assume skb mac_header is set
	selftests: rtnetlink: remove netdevsim device after ipsec offload test
	gtp: Fix use-after-free in __gtp_encap_destroy().
	net: axienet: Move reset before 64-bit DMA detection
	ocfs2: Fix use of slab data with sendpage
	sfc: fix crash when reading stats while NIC is resetting
	net: nfc: Fix use-after-free caused by nfc_llcp_find_local
	lib/ts_bm: reset initial match offset for every block of text
	netfilter: conntrack: dccp: copy entire header to stack buffer, not just basic one
	netfilter: nf_conntrack_sip: fix the ct_sip_parse_numerical_param() return value.
	ipvlan: Fix return value of ipvlan_queue_xmit()
	netlink: Add __sock_i_ino() for __netlink_diag_dump().
	drm/amd/display: Add logging for display MALL refresh setting
	radeon: avoid double free in ci_dpm_init()
	drm/amd/display: Explicitly specify update type per plane info change
	drm/bridge: it6505: Move a variable assignment behind a null pointer check in receive_timing_debugfs_show()
	Input: drv260x - sleep between polling GO bit
	drm/bridge: ti-sn65dsi83: Fix enable error path
	drm/bridge: tc358768: always enable HS video mode
	drm/bridge: tc358768: fix PLL parameters computation
	drm/bridge: tc358768: fix PLL target frequency
	drm/bridge: tc358768: fix TCLK_ZEROCNT computation
	drm/bridge: tc358768: Add atomic_get_input_bus_fmts() implementation
	drm/bridge: tc358768: fix TCLK_TRAILCNT computation
	drm/bridge: tc358768: fix THS_ZEROCNT computation
	drm/bridge: tc358768: fix TXTAGOCNT computation
	drm/bridge: tc358768: fix THS_TRAILCNT computation
	drm/vram-helper: fix function names in vram helper doc
	ARM: dts: BCM5301X: Drop "clock-names" from the SPI node
	ARM: dts: meson8b: correct uart_B and uart_C clock references
	mm: call arch_swap_restore() from do_swap_page()
	clk: vc5: Use `clamp()` to restrict PLL range
	bootmem: remove the vmemmap pages from kmemleak in free_bootmem_page
	clk: vc5: Fix .driver_data content in i2c_device_id
	clk: vc7: Fix .driver_data content in i2c_device_id
	clk: rs9: Fix .driver_data content in i2c_device_id
	Input: adxl34x - do not hardcode interrupt trigger type
	drm: sun4i_tcon: use devm_clk_get_enabled in `sun4i_tcon_init_clocks`
	drm/panel: sharp-ls043t1le01: adjust mode settings
	driver: soc: xilinx: use _safe loop iterator to avoid a use after free
	ASoC: Intel: sof_sdw: remove SOF_SDW_TGL_HDMI for MeteorLake devices
	drm/vkms: isolate pixel conversion functionality
	drm: Add fixed-point helper to get rounded integer values
	drm/vkms: Fix RGB565 pixel conversion
	ARM: dts: stm32: Move ethernet MAC EEPROM from SoM to carrier boards
	bus: ti-sysc: Fix dispc quirk masking bool variables
	arm64: dts: microchip: sparx5: do not use PSCI on reference boards
	drm/bridge: tc358767: Switch to devm MIPI-DSI helpers
	clk: imx: scu: use _safe list iterator to avoid a use after free
	hwmon: (f71882fg) prevent possible division by zero
	RDMA/bnxt_re: Disable/kill tasklet only if it is enabled
	RDMA/bnxt_re: Fix to remove unnecessary return labels
	RDMA/bnxt_re: Use unique names while registering interrupts
	RDMA/bnxt_re: Remove a redundant check inside bnxt_re_update_gid
	RDMA/bnxt_re: Fix to remove an unnecessary log
	drm/msm/dsi: don't allow enabling 14nm VCO with unprogrammed rate
	drm/msm/disp/dpu: get timing engine status from intf status register
	drm/msm/dpu: Set DPU_DATA_HCTL_EN for in INTF_SC7180_MASK
	iommu/virtio: Detach domain on endpoint release
	iommu/virtio: Return size mapped for a detached domain
	clk: renesas: rzg2l: Fix CPG_SIPLL5_CLK1 register write
	ARM: dts: gta04: Move model property out of pinctrl node
	drm/bridge: anx7625: Convert to i2c's .probe_new()
	drm/bridge: anx7625: Prevent endless probe loop
	ARM: dts: qcom: msm8974: do not use underscore in node name (again)
	arm64: dts: qcom: msm8916: correct camss unit address
	arm64: dts: qcom: msm8916: correct MMC unit address
	arm64: dts: qcom: msm8994: correct SPMI unit address
	arm64: dts: qcom: msm8996: correct camss unit address
	arm64: dts: qcom: sdm630: correct camss unit address
	arm64: dts: qcom: sdm845: correct camss unit address
	arm64: dts: qcom: sm8350: Add GPI DMA compatible fallback
	arm64: dts: qcom: sm8350: correct DMA controller unit address
	arm64: dts: qcom: sdm845-polaris: add missing touchscreen child node reg
	arm64: dts: qcom: apq8016-sbc: Fix regulator constraints
	arm64: dts: qcom: apq8016-sbc: Fix 1.8V power rail on LS expansion
	drm/bridge: Introduce pre_enable_prev_first to alter bridge init order
	drm/bridge: ti-sn65dsi83: Fix enable/disable flow to meet spec
	drm/panel: simple: fix active size for Ampire AM-480272H3TMQW-T01H
	ARM: ep93xx: fix missing-prototype warnings
	ARM: omap2: fix missing tick_broadcast() prototype
	arm64: dts: qcom: pm7250b: add missing spmi-vadc include
	arm64: dts: qcom: apq8096: fix fixed regulator name property
	arm64: dts: mediatek: mt8183: Add mediatek,broken-save-restore-fw to kukui
	ARM: dts: stm32: Shorten the AV96 HDMI sound card name
	memory: brcmstb_dpfe: fix testing array offset after use
	ARM: dts: qcom: apq8074-dragonboard: Set DMA as remotely controlled
	ASoC: es8316: Increment max value for ALC Capture Target Volume control
	ASoC: es8316: Do not set rate constraints for unsupported MCLKs
	ARM: dts: meson8: correct uart_B and uart_C clock references
	soc/fsl/qe: fix usb.c build errors
	RDMA/irdma: avoid fortify-string warning in irdma_clr_wqes
	IB/hfi1: Fix wrong mmu_node used for user SDMA packet after invalidate
	RDMA/hns: Fix hns_roce_table_get return value
	ARM: dts: iwg20d-q7-common: Fix backlight pwm specifier
	arm64: dts: renesas: ulcb-kf: Remove flow control for SCIF1
	drm/msm/dpu: set DSC flush bit correctly at MDP CTL flush register
	fbdev: omapfb: lcd_mipid: Fix an error handling path in mipid_spi_probe()
	arm64: dts: ti: k3-j7200: Fix physical address of pin
	Input: pm8941-powerkey - fix debounce on gen2+ PMICs
	ARM: dts: stm32: Fix audio routing on STM32MP15xx DHCOM PDK2
	ARM: dts: stm32: fix i2s endpoint format property for stm32mp15xx-dkx
	hwmon: (gsc-hwmon) fix fan pwm temperature scaling
	hwmon: (pmbus/adm1275) Fix problems with temperature monitoring on ADM1272
	ARM: dts: BCM5301X: fix duplex-full => full-duplex
	clk: Export clk_hw_forward_rate_request()
	drm/amd/display: Fix a test CalculatePrefetchSchedule()
	drm/amd/display: Fix a test dml32_rq_dlg_get_rq_reg()
	drm/amdkfd: Fix potential deallocation of previously deallocated memory.
	soc: mediatek: SVS: Fix MT8192 GPU node name
	drm/amd/display: Fix artifacting on eDP panels when engaging freesync video mode
	drm/radeon: fix possible division-by-zero errors
	HID: uclogic: Modular KUnit tests should not depend on KUNIT=y
	RDMA/rxe: Add ibdev_dbg macros for rxe
	RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mw.c
	RDMA/rxe: Fix access checks in rxe_check_bind_mw
	amdgpu: validate offset_in_bo of drm_amdgpu_gem_va
	drm/msm/a5xx: really check for A510 in a5xx_gpu_init
	RDMA/bnxt_re: wraparound mbox producer index
	RDMA/bnxt_re: Avoid calling wake_up threads from spin_lock context
	clk: imx: clk-imxrt1050: fix memory leak in imxrt1050_clocks_probe
	clk: imx: clk-imx8mn: fix memory leak in imx8mn_clocks_probe
	clk: imx93: fix memory leak and missing unwind goto in imx93_clocks_probe
	clk: imx: clk-imx8mp: improve error handling in imx8mp_clocks_probe()
	arm64: dts: qcom: sdm845: Flush RSC sleep & wake votes
	arm64: dts: qcom: sm8250-edo: Panel framebuffer is 2.5k instead of 4k
	clk: bcm: rpi: Fix off by one in raspberrypi_discover_clocks()
	clk: clocking-wizard: Fix Oops in clk_wzrd_register_divider()
	clk: tegra: tegra124-emc: Fix potential memory leak
	ALSA: ac97: Fix possible NULL dereference in snd_ac97_mixer
	drm/msm/dpu: do not enable color-management if DSPPs are not available
	drm/msm/dpu: Fix slice_last_group_size calculation
	drm/msm/dsi: Use DSC slice(s) packet size to compute word count
	drm/msm/dsi: Flip greater-than check for slice_count and slice_per_intf
	drm/msm/dsi: Remove incorrect references to slice_count
	drm/msm/dp: Free resources after unregistering them
	arm64: dts: mediatek: Add cpufreq nodes for MT8192
	arm64: dts: mediatek: mt8192: Fix CPUs capacity-dmips-mhz
	drm/amdgpu: Fix memcpy() in sienna_cichlid_append_powerplay_table function.
	drm/amdgpu: Fix usage of UMC fill record in RAS
	drm/msm/dpu: correct MERGE_3D length
	clk: vc5: check memory returned by kasprintf()
	clk: cdce925: check return value of kasprintf()
	clk: si5341: return error if one synth clock registration fails
	clk: si5341: check return value of {devm_}kasprintf()
	clk: si5341: free unused memory on probe failure
	clk: keystone: sci-clk: check return value of kasprintf()
	clk: ti: clkctrl: check return value of kasprintf()
	drivers: meson: secure-pwrc: always enable DMA domain
	ovl: update of dentry revalidate flags after copy up
	ASoC: imx-audmix: check return value of devm_kasprintf()
	clk: Fix memory leak in devm_clk_notifier_register()
	ARM: dts: lan966x: kontron-d10: fix board reset
	ARM: dts: lan966x: kontron-d10: fix SPI CS
	ASoC: amd: acp: clear pdm dma interrupt mask
	PCI: cadence: Fix Gen2 Link Retraining process
	PCI: vmd: Reset VMD config register between soft reboots
	scsi: qedf: Fix NULL dereference in error handling
	pinctrl: bcm2835: Handle gpiochip_add_pin_range() errors
	platform/x86: lenovo-yogabook: Fix work race on remove()
	platform/x86: lenovo-yogabook: Reprobe devices on remove()
	platform/x86: lenovo-yogabook: Set default keyboard backligh brightness on probe()
	PCI/ASPM: Disable ASPM on MFD function removal to avoid use-after-free
	scsi: 3w-xxxx: Add error handling for initialization failure in tw_probe()
	PCI: pciehp: Cancel bringup sequence if card is not present
	PCI: ftpci100: Release the clock resources
	pinctrl: sunplus: Add check for kmalloc
	PCI: Add pci_clear_master() stub for non-CONFIG_PCI
	scsi: lpfc: Revise NPIV ELS unsol rcv cmpl logic to drop ndlp based on nlp_state
	perf bench: Add missing setlocale() call to allow usage of %'d style formatting
	pinctrl: cherryview: Return correct value if pin in push-pull mode
	platform/x86: think-lmi: mutex protection around multiple WMI calls
	platform/x86: think-lmi: Correct System password interface
	platform/x86: think-lmi: Correct NVME password handling
	pinctrl:sunplus: Add check for kmalloc
	pinctrl: npcm7xx: Add missing check for ioremap
	kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures
	powerpc/interrupt: Don't read MSR from interrupt_exit_kernel_prepare()
	powerpc/signal32: Force inlining of __unsafe_save_user_regs() and save_tm_user_regs_unsafe()
	perf script: Fix allocation of evsel->priv related to per-event dump files
	platform/x86: thinkpad_acpi: Fix lkp-tests warnings for platform profiles
	perf dwarf-aux: Fix off-by-one in die_get_varname()
	platform/x86/dell/dell-rbtn: Fix resources leaking on error path
	perf tool x86: Consolidate is_amd check into single function
	perf tool x86: Fix perf_env memory leak
	powerpc/64s: Fix VAS mm use after free
	pinctrl: microchip-sgpio: check return value of devm_kasprintf()
	pinctrl: at91-pio4: check return value of devm_kasprintf()
	powerpc/powernv/sriov: perform null check on iov before dereferencing iov
	powerpc: simplify ppc_save_regs
	powerpc: update ppc_save_regs to save current r1 in pt_regs
	PCI: qcom: Remove PCIE20_ prefix from register definitions
	PCI: qcom: Sort and group registers and bitfield definitions
	PCI: qcom: Use lower case for hex
	PCI: qcom: Use DWC helpers for modifying the read-only DBI registers
	PCI: qcom: Disable write access to read only registers for IP v2.9.0
	riscv: uprobes: Restore thread.bad_cause
	powerpc/book3s64/mm: Fix DirectMap stats in /proc/meminfo
	powerpc/mm/dax: Fix the condition when checking if altmap vmemap can cross-boundary
	PCI: endpoint: Fix Kconfig indent style
	PCI: endpoint: Fix a Kconfig prompt of vNTB driver
	PCI: endpoint: functions/pci-epf-test: Fix dma_chan direction
	PCI: vmd: Fix uninitialized variable usage in vmd_enable_domain()
	vfio/mdev: Move the compat_class initialization to module init
	hwrng: virtio - Fix race on data_avail and actual data
	modpost: remove broken calculation of exception_table_entry size
	crypto: nx - fix build warnings when DEBUG_FS is not enabled
	modpost: fix section mismatch message for R_ARM_ABS32
	modpost: fix section mismatch message for R_ARM_{PC24,CALL,JUMP24}
	crypto: marvell/cesa - Fix type mismatch warning
	crypto: jitter - correct health test during initialization
	modpost: fix off by one in is_executable_section()
	ARC: define ASM_NL and __ALIGN(_STR) outside #ifdef __ASSEMBLY__ guard
	crypto: kpp - Add helper to set reqsize
	crypto: qat - Use helper to set reqsize
	crypto: qat - unmap buffer before free for DH
	crypto: qat - unmap buffers before free for RSA
	NFSv4.2: fix wrong shrinker_id
	NFSv4.1: freeze the session table upon receiving NFS4ERR_BADSESSION
	SMB3: Do not send lease break acknowledgment if all file handles have been closed
	dax: Fix dax_mapping_release() use after free
	dax: Introduce alloc_dev_dax_id()
	dax/kmem: Pass valid argument to memory_group_register_static
	hwrng: st - keep clock enabled while hwrng is registered
	kbuild: Disable GCOV for *.mod.o
	efi/libstub: Disable PCI DMA before grabbing the EFI memory map
	cifs: prevent use-after-free by freeing the cfile later
	cifs: do all necessary checks for credits within or before locking
	smb: client: fix broken file attrs with nodfs mounts
	ksmbd: avoid field overflow warning
	arm64: sme: Use STR P to clear FFR context field in streaming SVE mode
	x86/efi: Make efi_set_virtual_address_map IBT safe
	md/raid1-10: fix casting from randomized structure in raid1_submit_write()
	USB: serial: option: add LARA-R6 01B PIDs
	usb: dwc3: gadget: Propagate core init errors to UDC during pullup
	phy: tegra: xusb: Clear the driver reference in usb-phy dev
	iio: adc: ad7192: Fix null ad7192_state pointer access
	iio: adc: ad7192: Fix internal/external clock selection
	iio: accel: fxls8962af: errata bug only applicable for FXLS8962AF
	iio: accel: fxls8962af: fixup buffer scan element type
	Revert "drm/amd/display: edp do not add non-edid timings"
	mm/mmap: Fix VM_LOCKED check in do_vmi_align_munmap()
	ALSA: hda/realtek: Enable mute/micmute LEDs and limit mic boost on EliteBook
	ALSA: hda/realtek: Add quirk for Clevo NPx0SNx
	ALSA: jack: Fix mutex call in snd_jack_report()
	ALSA: pcm: Fix potential data race at PCM memory allocation helpers
	block: fix signed int overflow in Amiga partition support
	block: add overflow checks for Amiga partition support
	block: change all __u32 annotations to __be32 in affs_hardblocks.h
	block: increment diskseq on all media change events
	btrfs: fix race when deleting free space root from the dirty cow roots list
	SUNRPC: Fix UAF in svc_tcp_listen_data_ready()
	w1: w1_therm: fix locking behavior in convert_t
	w1: fix loop in w1_fini()
	dt-bindings: power: reset: qcom-pon: Only allow reboot-mode pre-pmk8350
	f2fs: do not allow to defragment files have FI_COMPRESS_RELEASED
	sh: j2: Use ioremap() to translate device tree address into kernel memory
	usb: dwc2: platform: Improve error reporting for problems during .remove()
	usb: dwc2: Fix some error handling paths
	serial: 8250: omap: Fix freeing of resources on failed register
	clk: qcom: mmcc-msm8974: remove oxili_ocmemgx_clk
	clk: qcom: camcc-sc7180: Add parent dependency to all camera GDSCs
	clk: qcom: gcc-ipq6018: Use floor ops for sdcc clocks
	clk: qcom: gcc-qcm2290: Mark RCGs shared where applicable
	media: usb: Check az6007_read() return value
	media: amphion: drop repeated codec data for vc1l format
	media: amphion: drop repeated codec data for vc1g format
	media: amphion: initiate a drain of the capture queue in dynamic resolution change
	media: videodev2.h: Fix struct v4l2_input tuner index comment
	media: usb: siano: Fix warning due to null work_func_t function pointer
	media: i2c: Correct format propagation for st-mipid02
	media: hi846: fix usage of pm_runtime_get_if_in_use()
	media: mediatek: vcodec: using decoder status instead of core work count
	clk: qcom: reset: support resetting multiple bits
	clk: qcom: ipq6018: fix networking resets
	clk: qcom: dispcc-qcm2290: Fix BI_TCXO_AO handling
	clk: qcom: dispcc-qcm2290: Fix GPLL0_OUT_DIV handling
	clk: qcom: mmcc-msm8974: use clk_rcg2_shared_ops for mdp_clk_src clock
	staging: vchiq_arm: mark vchiq_platform_init() static
	usb: dwc3: qcom: Fix potential memory leak
	usb: gadget: u_serial: Add null pointer check in gserial_suspend
	extcon: Fix kernel doc of property fields to avoid warnings
	extcon: Fix kernel doc of property capability fields to avoid warnings
	usb: phy: phy-tahvo: fix memory leak in tahvo_usb_probe()
	usb: hide unused usbfs_notify_suspend/resume functions
	usb: misc: eud: Fix eud sysfs path (use 'qcom_eud')
	serial: core: lock port for stop_rx() in uart_suspend_port()
	serial: 8250: lock port for stop_rx() in omap8250_irq()
	serial: core: lock port for start_rx() in uart_resume_port()
	serial: 8250: lock port for UART_IER access in omap8250_irq()
	kernfs: fix missing kernfs_idr_lock to remove an ID from the IDR
	lkdtm: replace ll_rw_block with submit_bh
	i3c: master: svc: fix cpu schedule in spin lock
	coresight: Fix loss of connection info when a module is unloaded
	mfd: rt5033: Drop rt5033-battery sub-device
	media: venus: helpers: Fix ALIGN() of non power of two
	media: atomisp: gmin_platform: fix out_len in gmin_get_config_dsm_var()
	sh: Avoid using IRQ0 on SH3 and SH4
	gfs2: Fix duplicate should_fault_in_pages() call
	f2fs: fix potential deadlock due to unpaired node_write lock use
	f2fs: fix to avoid NULL pointer dereference f2fs_write_end_io()
	KVM: s390: fix KVM_S390_GET_CMMA_BITS for GFNs in memslot holes
	usb: dwc3: qcom: Release the correct resources in dwc3_qcom_remove()
	usb: dwc3: qcom: Fix an error handling path in dwc3_qcom_probe()
	usb: common: usb-conn-gpio: Set last role to unknown before initial detection
	usb: dwc3-meson-g12a: Fix an error handling path in dwc3_meson_g12a_probe()
	mfd: wcd934x: Fix an error handling path in wcd934x_slim_probe()
	mfd: intel-lpss: Add missing check for platform_get_resource
	Revert "usb: common: usb-conn-gpio: Set last role to unknown before initial detection"
	serial: 8250_omap: Use force_suspend and resume for system suspend
	device property: Fix documentation for fwnode_get_next_parent()
	device property: Clarify description of returned value in some functions
	drivers: fwnode: fix fwnode_irq_get[_byname]()
	nvmem: sunplus-ocotp: release otp->clk before return
	nvmem: rmem: Use NVMEM_DEVID_AUTO
	bus: fsl-mc: don't assume child devices are all fsl-mc devices
	mfd: stmfx: Fix error path in stmfx_chip_init
	mfd: stmfx: Nullify stmfx->vdd in case of error
	KVM: s390: vsie: fix the length of APCB bitmap
	KVM: s390/diag: fix racy access of physical cpu number in diag 9c handler
	cpufreq: mediatek: correct voltages for MT7622 and MT7623
	misc: fastrpc: check return value of devm_kasprintf()
	clk: qcom: mmcc-msm8974: fix MDSS_GDSC power flags
	hwtracing: hisi_ptt: Fix potential sleep in atomic context
	mfd: stmpe: Only disable the regulators if they are enabled
	phy: tegra: xusb: check return value of devm_kzalloc()
	lib/bitmap: drop optimization of bitmap_{from,to}_arr64
	pwm: imx-tpm: force 'real_period' to be zero in suspend
	pwm: sysfs: Do not apply state to already disabled PWMs
	pwm: ab8500: Fix error code in probe()
	pwm: mtk_disp: Fix the disable flow of disp_pwm
	md/raid10: fix the condition to call bio_end_io_acct()
	rtc: st-lpc: Release some resources in st_rtc_probe() in case of error
	drm/i915/psr: Use hw.adjusted mode when calculating io/fast wake times
	drm/i915/guc/slpc: Apply min softlimit correctly
	f2fs: check return value of freeze_super()
	media: cec: i2c: ch7322: also select REGMAP
	sctp: fix potential deadlock on &net->sctp.addr_wq_lock
	net/sched: act_ipt: add sanity checks on table name and hook locations
	net: add a couple of helpers for iph tot_len
	net/sched: act_ipt: add sanity checks on skb before calling target
	spi: spi-geni-qcom: enable SPI_CONTROLLER_MUST_TX for GPI DMA mode
	net: mscc: ocelot: don't report that RX timestamping is enabled by default
	net: mscc: ocelot: don't keep PTP configuration of all ports in single structure
	net: dsa: felix: don't drop PTP frames with tag_8021q when RX timestamping is disabled
	net: dsa: sja1105: always enable the INCL_SRCPT option
	net: dsa: tag_sja1105: always prefer source port information from INCL_SRCPT
	Add MODULE_FIRMWARE() for FIRMWARE_TG357766.
	Bluetooth: fix invalid-bdaddr quirk for non-persistent setup
	Bluetooth: ISO: use hci_sync for setting CIG parameters
	Bluetooth: MGMT: add CIS feature bits to controller information
	Bluetooth: MGMT: Use BIT macro when defining bitfields
	Bluetooth: MGMT: Fix marking SCAN_RSP as not connectable
	ibmvnic: Do not reset dql stats on NON_FATAL err
	net: dsa: vsc73xx: fix MTU configuration
	mlxsw: minimal: fix potential memory leak in mlxsw_m_linecards_init
	spi: bcm-qspi: return error if neither hif_mspi nor mspi is available
	drm/amdgpu: fix number of fence calculations
	drm/amd: Don't try to enable secure display TA multiple times
	mailbox: ti-msgmgr: Fill non-message tx data fields with 0x0
	f2fs: fix error path handling in truncate_dnode()
	octeontx2-af: Fix mapping for NIX block from CGX connection
	octeontx2-af: Add validation before accessing cgx and lmac
	ntfs: Fix panic about slab-out-of-bounds caused by ntfs_listxattr()
	powerpc: allow PPC_EARLY_DEBUG_CPM only when SERIAL_CPM=y
	powerpc: dts: turris1x.dts: Fix PCIe MEM size for pci2 node
	net: bridge: keep ports without IFF_UNICAST_FLT in BR_PROMISC mode
	net: dsa: tag_sja1105: fix source port decoding in vlan_filtering=0 bridge mode
	net: fix net_dev_start_xmit trace event vs skb_transport_offset()
	tcp: annotate data races in __tcp_oow_rate_limited()
	bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()
	xsk: Honor SO_BINDTODEVICE on bind
	net/sched: act_pedit: Add size check for TCA_PEDIT_PARMS_EX
	fanotify: disallow mount/sb marks on kernel internal pseudo fs
	riscv: move memblock_allow_resize() after linear mapping is ready
	pptp: Fix fib lookup calls.
	net: dsa: tag_sja1105: fix MAC DA patching from meta frames
	net: dsa: sja1105: always enable the send_meta options
	octeontx-af: fix hardware timestamp configuration
	afs: Fix accidental truncation when storing data
	s390/qeth: Fix vipa deletion
	sh: dma: Fix DMA channel offset calculation
	apparmor: fix missing error check for rhashtable_insert_fast
	i2c: xiic: Don't try to handle more interrupt events after error
	dm: fix undue/missing spaces
	dm: avoid split of quoted strings where possible
	dm ioctl: have constant on the right side of the test
	dm ioctl: Avoid double-fetch of version
	extcon: usbc-tusb320: Convert to i2c's .probe_new()
	extcon: usbc-tusb320: Unregister typec port on driver removal
	btrfs: do not BUG_ON() on tree mod log failure at balance_level()
	i2c: qup: Add missing unwind goto in qup_i2c_probe()
	irqchip/loongson-pch-pic: Fix potential incorrect hwirq assignment
	NFSD: add encoding of op_recall flag for write delegation
	irqchip/loongson-pch-pic: Fix initialization of HT vector register
	io_uring: wait interruptibly for request completions on exit
	mmc: core: disable TRIM on Kingston EMMC04G-M627
	mmc: core: disable TRIM on Micron MTFC4GACAJCN-1M
	mmc: mmci: Set PROBE_PREFER_ASYNCHRONOUS
	mmc: sdhci: fix DMA configure compatibility issue when 64bit DMA mode is used.
	wifi: cfg80211: fix regulatory disconnect for non-MLO
	wifi: ath10k: Serialize wake_tx_queue ops
	wifi: mt76: mt7921e: fix init command fail with enabled device
	bcache: fixup btree_cache_wait list damage
	bcache: Remove unnecessary NULL point check in node allocations
	bcache: Fix __bch_btree_node_alloc to make the failure behavior consistent
	watch_queue: prevent dangling pipe pointer
	um: Use HOST_DIR for mrproper
	integrity: Fix possible multiple allocation in integrity_inode_get()
	autofs: use flexible array in ioctl structure
	mm/damon/ops-common: atomically test and clear young on ptes and pmds
	shmem: use ramfs_kill_sb() for kill_sb method of ramfs-based tmpfs
	jffs2: reduce stack usage in jffs2_build_xattr_subsystem()
	fs: avoid empty option when generating legacy mount string
	ext4: Remove ext4 locking of moved directory
	Revert "f2fs: fix potential corruption when moving a directory"
	fs: Establish locking order for unrelated directories
	fs: Lock moved directories
	i2c: nvidia-gpu: Add ACPI property to align with device-tree
	i2c: nvidia-gpu: Remove ccgx,firmware-build property
	usb: typec: ucsi: Mark dGPUs as DEVICE scope
	ipvs: increase ip_vs_conn_tab_bits range for 64BIT
	btrfs: add handling for RAID1C23/DUP to btrfs_reduce_alloc_profile
	btrfs: delete unused BGs while reclaiming BGs
	btrfs: bail out reclaim process if filesystem is read-only
	btrfs: add block-group tree to lockdep classes
	btrfs: reinsert BGs failed to reclaim
	btrfs: fix race when deleting quota root from the dirty cow roots list
	btrfs: fix extent buffer leak after tree mod log failure at split_node()
	btrfs: do not BUG_ON() on tree mod log failure at __btrfs_cow_block()
	ASoC: mediatek: mt8173: Fix irq error path
	ASoC: mediatek: mt8173: Fix snd_soc_component_initialize error path
	regulator: tps65219: Fix matching interrupts for their regulators
	ARM: dts: qcom: ipq4019: fix broken NAND controller properties override
	ARM: orion5x: fix d2net gpio initialization
	leds: trigger: netdev: Recheck NETDEV_LED_MODE_LINKUP on dev rename
	blktrace: use inline function for blk_trace_remove() while blktrace is disabled
	fs: no need to check source
	xfs: explicitly specify cpu when forcing inodegc delayed work to run immediately
	xfs: check that per-cpu inodegc workers actually run on that cpu
	xfs: disable reaping in fscounters scrub
	xfs: fix xfs_inodegc_stop racing with mod_delayed_work
	mm/mmap: Fix extra maple tree write
	drm/i915: Fix TypeC mode initialization during system resume
	drm/i915/tc: Fix TC port link ref init for DP MST during HW readout
	drm/i915/tc: Fix system resume MST mode restore for DP-alt sinks
	mtd: parsers: refer to ARCH_BCMBCA instead of ARCH_BCM4908
	netfilter: nf_tables: unbind non-anonymous set if rule construction fails
	netfilter: conntrack: Avoid nf_ct_helper_hash uses after free
	netfilter: nf_tables: do not ignore genmask when looking up chain by id
	netfilter: nf_tables: prevent OOB access in nft_byteorder_eval
	wireguard: queueing: use saner cpu selection wrapping
	wireguard: netlink: send staged packets when setting initial private key
	tty: serial: fsl_lpuart: add earlycon for imx8ulp platform
	block/partition: fix signedness issue for Amiga partitions
	sh: mach-r2d: Handle virq offset in cascaded IRL demux
	sh: mach-highlander: Handle virq offset in cascaded IRL demux
	sh: mach-dreamcast: Handle virq offset in cascaded IRQ demux
	sh: hd64461: Handle virq offset for offchip IRQ base and HD64461 IRQ
	io_uring: Use io_schedule* in cqring wait
	Linux 6.1.39

Change-Id: I5867c943c99c157fa599ecd08da961c632e58302
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-09-02 19:41:42 +00:00
Yonghong Song
583a8426ab kallsyms: Fix kallsyms_selftest failure
commit 33f0467fe06934d5e4ea6e24ce2b9c65ce618e26 upstream.

Kernel test robot reported a kallsyms_test failure when clang lto is
enabled (thin or full) and CONFIG_KALLSYMS_SELFTEST is also enabled.
I can reproduce in my local environment with the following error message
with thin lto:
  [    1.877897] kallsyms_selftest: Test for 1750th symbol failed: (tsc_cs_mark_unstable) addr=ffffffff81038090
  [    1.877901] kallsyms_selftest: abort

It appears that commit 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes
from promoted global functions") caused the failure. Commit 8cc32a9bbf29
changed cleanup_symbol_name() based on ".llvm." instead of '.' where
".llvm." is appended to a before-lto-optimization local symbol name.
We need to propagate such knowledge in kallsyms_selftest.c as well.

Further more, compare_symbol_name() in kallsyms.c needs change as well.
In scripts/kallsyms.c, kallsyms_names and kallsyms_seqs_of_names are used
to record symbol names themselves and index to symbol names respectively.
For example:
  kallsyms_names:
    ...
    __amd_smn_rw._entry       <== seq 1000
    __amd_smn_rw._entry.5     <== seq 1001
    __amd_smn_rw.llvm.<hash>  <== seq 1002
    ...

kallsyms_seqs_of_names are sorted based on cleanup_symbol_name() through, so
the order in kallsyms_seqs_of_names actually has

  index 1000:   seq 1002   <== __amd_smn_rw.llvm.<hash> (actual symbol comparison using '__amd_smn_rw')
  index 1001:   seq 1000   <== __amd_smn_rw._entry
  index 1002:   seq 1001   <== __amd_smn_rw._entry.5

Let us say at a particular point, at index 1000, symbol '__amd_smn_rw.llvm.<hash>'
is comparing to '__amd_smn_rw._entry' where '__amd_smn_rw._entry' is the one to
search e.g., with function kallsyms_on_each_match_symbol(). The current implementation
will find out '__amd_smn_rw._entry' is less than '__amd_smn_rw.llvm.<hash>' and
then continue to search e.g., index 999 and never found a match although the actual
index 1001 is a match.

To fix this issue, let us do cleanup_symbol_name() first and then do comparison.
In the above case, comparing '__amd_smn_rw' vs '__amd_smn_rw._entry' and
'__amd_smn_rw._entry' being greater than '__amd_smn_rw', the next comparison will
be > index 1000 and eventually index 1001 will be hit an a match is found.

For any symbols not having '.llvm.' substr, there is no functionality change
for compare_symbol_name().

Fixes: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions")
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202308232200.1c932a90-oliver.sang@intel.com
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Reviewed-by: Song Liu <song@kernel.org>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Link: https://lore.kernel.org/r/20230825034659.1037627-1-yonghong.song@linux.dev
Cc: stable@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-02 09:16:19 +02:00
Helge Deller
b3d099df68 lockdep: fix static memory detection even more
commit 0a6b58c5cd0dfd7961e725212f0fc8dfc5d96195 upstream.

On the parisc architecture, lockdep reports for all static objects which
are in the __initdata section (e.g. "setup_done" in devtmpfs,
"kthreadd_done" in init/main.c) this warning:

	INFO: trying to register non-static key.

The warning itself is wrong, because those objects are in the __initdata
section, but the section itself is on parisc outside of range from
_stext to _end, which is why the static_obj() functions returns a wrong
answer.

While fixing this issue, I noticed that the whole existing check can
be simplified a lot.
Instead of checking against the _stext and _end symbols (which include
code areas too) just check for the .data and .bss segments (since we check a
data object). This can be done with the existing is_kernel_core_data()
macro.

In addition objects in the __initdata section can be checked with
init_section_contains(), and is_kernel_rodata() allows keys to be in the
_ro_after_init section.

This partly reverts and simplifies commit bac59d18c7 ("x86/setup: Fix static
memory detection").

Link: https://lkml.kernel.org/r/ZNqrLRaOi/3wPAdp@p100
Fixes: bac59d18c7 ("x86/setup: Fix static memory detection")
Signed-off-by: Helge Deller <deller@gmx.de>
Cc: Borislav Petkov <bp@suse.de>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Guenter Roeck <linux@roeck-us.net>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-02 09:16:19 +02:00
James Morse
207e228bf1 module: Expose module_init_layout_section()
commit 2abcc4b5a64a65a2d2287ba0be5c2871c1552416 upstream.

module_init_layout_section() choses whether the core module loader
considers a section as init or not. This affects the placement of the
exit section when module unloading is disabled. This code will never run,
so it can be free()d once the module has been initialised.

arm and arm64 need to count the number of PLTs they need before applying
relocations based on the section name. The init PLTs are stored separately
so they can be free()d. arm and arm64 both use within_module_init() to
decide which list of PLTs to use when applying the relocation.

Because within_module_init()'s behaviour changes when module unloading
is disabled, both architecture would need to take this into account when
counting the PLTs.

Today neither architecture does this, meaning when module unloading is
disabled there are insufficient PLTs in the init section to load some
modules, resulting in warnings:
| WARNING: CPU: 2 PID: 51 at arch/arm64/kernel/module-plts.c:99 module_emit_plt_entry+0x184/0x1cc
| Modules linked in: crct10dif_common
| CPU: 2 PID: 51 Comm: modprobe Not tainted 6.5.0-rc4-yocto-standard-dirty #15208
| Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
| pstate: 20400005 (nzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
| pc : module_emit_plt_entry+0x184/0x1cc
| lr : module_emit_plt_entry+0x94/0x1cc
| sp : ffffffc0803bba60
[...]
| Call trace:
|  module_emit_plt_entry+0x184/0x1cc
|  apply_relocate_add+0x2bc/0x8e4
|  load_module+0xe34/0x1bd4
|  init_module_from_file+0x84/0xc0
|  __arm64_sys_finit_module+0x1b8/0x27c
|  invoke_syscall.constprop.0+0x5c/0x104
|  do_el0_svc+0x58/0x160
|  el0_svc+0x38/0x110
|  el0t_64_sync_handler+0xc0/0xc4
|  el0t_64_sync+0x190/0x194

Instead of duplicating module_init_layout_section()s logic, expose it.

Reported-by: Adam Johnston <adam.johnston@arm.com>
Fixes: 055f23b74b ("module: check for exit sections in layout_sections() instead of module_init_section()")
Cc: stable@vger.kernel.org
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-09-02 09:16:18 +02:00
Dietmar Eggemann
d3ff67076b cgroup/cpuset: Free DL BW in case can_attach() fails
commit 2ef269ef1ac006acf974793d975539244d77b28f upstream.

cpuset_can_attach() can fail. Postpone DL BW allocation until all tasks
have been checked. DL BW is not allocated per-task but as a sum over
all DL tasks migrating.

If multiple controllers are attached to the cgroup next to the cpuset
controller a non-cpuset can_attach() can fail. In this case free DL BW
in cpuset_cancel_attach().

Finally, update cpuset DL task count (nr_deadline_tasks) only in
cpuset_attach().

Suggested-by: Waiman Long <longman@redhat.com>
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30 16:11:11 +02:00
Dietmar Eggemann
f0135131bb sched/deadline: Create DL BW alloc, free & check overflow interface
commit 85989106feb734437e2d598b639991b9185a43a6 upstream.

While moving a set of tasks between exclusive cpusets,
cpuset_can_attach() -> task_can_attach() calls dl_cpu_busy(..., p) for
DL BW overflow checking and per-task DL BW allocation on the destination
root_domain for the DL tasks in this set.

This approach has the issue of not freeing already allocated DL BW in
the following error cases:

(1) The set of tasks includes multiple DL tasks and DL BW overflow
    checking fails for one of the subsequent DL tasks.

(2) Another controller next to the cpuset controller which is attached
    to the same cgroup fails in its can_attach().

To address this problem rework dl_cpu_busy():

(1) Split it into dl_bw_check_overflow() & dl_bw_alloc() and add a
    dedicated dl_bw_free().

(2) dl_bw_alloc() & dl_bw_free() take a `u64 dl_bw` parameter instead of
    a `struct task_struct *p` used in dl_cpu_busy(). This allows to
    allocate DL BW for a set of tasks too rather than only for a single
    task.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30 16:11:11 +02:00
Juri Lelli
064b960dbe cgroup/cpuset: Iterate only if DEADLINE tasks are present
commit c0f78fd5edcf29b2822ac165f9248a6c165e8554 upstream.

update_tasks_root_domain currently iterates over all tasks even if no
DEADLINE task is present on the cpuset/root domain for which bandwidth
accounting is being rebuilt. This has been reported to introduce 10+ ms
delays on suspend-resume operations.

Skip the costly iteration for cpusets that don't contain DEADLINE tasks.

Reported-by: Qais Yousef (Google) <qyousef@layalina.io>
Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30 16:11:11 +02:00
Juri Lelli
d1b4262b78 sched/cpuset: Keep track of SCHED_DEADLINE task in cpusets
commit 6c24849f5515e4966d94fa5279bdff4acf2e9489 upstream.

Qais reported that iterating over all tasks when rebuilding root domains
for finding out which ones are DEADLINE and need their bandwidth
correctly restored on such root domains can be a costly operation (10+
ms delays on suspend-resume).

To fix the problem keep track of the number of DEADLINE tasks belonging
to each cpuset and then use this information (followup patch) to only
perform the above iteration if DEADLINE tasks are actually present in
the cpuset for which a corresponding root domain is being rebuilt.

Reported-by: Qais Yousef (Google) <qyousef@layalina.io>
Link: https://lore.kernel.org/lkml/20230206221428.2125324-1-qyousef@layalina.io/
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30 16:11:10 +02:00
Juri Lelli
9bcfe15278 sched/cpuset: Bring back cpuset_mutex
commit 111cd11bbc54850f24191c52ff217da88a5e639b upstream.

Turns out percpu_cpuset_rwsem - commit 1243dc518c ("cgroup/cpuset:
Convert cpuset_mutex to percpu_rwsem") - wasn't such a brilliant idea,
as it has been reported to cause slowdowns in workloads that need to
change cpuset configuration frequently and it is also not implementing
priority inheritance (which causes troubles with realtime workloads).

Convert percpu_cpuset_rwsem back to regular cpuset_mutex. Also grab it
only for SCHED_DEADLINE tasks (other policies don't care about stable
cpusets anyway).

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
[ Conflict in kernel/cgroup/cpuset.c due to pulling new code/comments.
  Reject all new code. Remove BUG_ON() about rwsem that doesn't exist on
  mainline. ]
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30 16:11:10 +02:00
Juri Lelli
7030fbf75f cgroup/cpuset: Rename functions dealing with DEADLINE accounting
commit ad3a557daf6915296a43ef97a3e9c48e076c9dd8 upstream.

rebuild_root_domains() and update_tasks_root_domain() have neutral
names, but actually deal with DEADLINE bandwidth accounting.

Rename them to use 'dl_' prefix so that intent is more clear.

No functional change.

Suggested-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Waiman Long <longman@redhat.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-30 16:11:10 +02:00
Zheng Yejian
2cb0c037c9 tracing: Fix memleak due to race between current_tracer and trace
[ Upstream commit eecb91b9f98d6427d4af5fdb8f108f52572a39e7 ]

Kmemleak report a leak in graph_trace_open():

  unreferenced object 0xffff0040b95f4a00 (size 128):
    comm "cat", pid 204981, jiffies 4301155872 (age 99771.964s)
    hex dump (first 32 bytes):
      e0 05 e7 b4 ab 7d 00 00 0b 00 01 00 00 00 00 00 .....}..........
      f4 00 01 10 00 a0 ff ff 00 00 00 00 65 00 10 00 ............e...
    backtrace:
      [<000000005db27c8b>] kmem_cache_alloc_trace+0x348/0x5f0
      [<000000007df90faa>] graph_trace_open+0xb0/0x344
      [<00000000737524cd>] __tracing_open+0x450/0xb10
      [<0000000098043327>] tracing_open+0x1a0/0x2a0
      [<00000000291c3876>] do_dentry_open+0x3c0/0xdc0
      [<000000004015bcd6>] vfs_open+0x98/0xd0
      [<000000002b5f60c9>] do_open+0x520/0x8d0
      [<00000000376c7820>] path_openat+0x1c0/0x3e0
      [<00000000336a54b5>] do_filp_open+0x14c/0x324
      [<000000002802df13>] do_sys_openat2+0x2c4/0x530
      [<0000000094eea458>] __arm64_sys_openat+0x130/0x1c4
      [<00000000a71d7881>] el0_svc_common.constprop.0+0xfc/0x394
      [<00000000313647bf>] do_el0_svc+0xac/0xec
      [<000000002ef1c651>] el0_svc+0x20/0x30
      [<000000002fd4692a>] el0_sync_handler+0xb0/0xb4
      [<000000000c309c35>] el0_sync+0x160/0x180

The root cause is descripted as follows:

  __tracing_open() {  // 1. File 'trace' is being opened;
    ...
    *iter->trace = *tr->current_trace;  // 2. Tracer 'function_graph' is
                                        //    currently set;
    ...
    iter->trace->open(iter);  // 3. Call graph_trace_open() here,
                              //    and memory are allocated in it;
    ...
  }

  s_start() {  // 4. The opened file is being read;
    ...
    *iter->trace = *tr->current_trace;  // 5. If tracer is switched to
                                        //    'nop' or others, then memory
                                        //    in step 3 are leaked!!!
    ...
  }

To fix it, in s_start(), close tracer before switching then reopen the
new tracer after switching. And some tracers like 'wakeup' may not update
'iter->private' in some cases when reopen, then it should be cleared
to avoid being mistakenly closed again.

Link: https://lore.kernel.org/linux-trace-kernel/20230817125539.1646321-1-zhengyejian1@huawei.com

Fixes: d7350c3f45 ("tracing/core: make the read callbacks reentrants")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30 16:11:00 +02:00
Zheng Yejian
7d0c2b0de2 tracing: Fix cpu buffers unavailable due to 'record_disabled' missed
[ Upstream commit b71645d6af10196c46cbe3732de2ea7d36b3ff6d ]

Trace ring buffer can no longer record anything after executing
following commands at the shell prompt:

  # cd /sys/kernel/tracing
  # cat tracing_cpumask
  fff
  # echo 0 > tracing_cpumask
  # echo 1 > snapshot
  # echo fff > tracing_cpumask
  # echo 1 > tracing_on
  # echo "hello world" > trace_marker
  -bash: echo: write error: Bad file descriptor

The root cause is that:
  1. After `echo 0 > tracing_cpumask`, 'record_disabled' of cpu buffers
     in 'tr->array_buffer.buffer' became 1 (see tracing_set_cpumask());
  2. After `echo 1 > snapshot`, 'tr->array_buffer.buffer' is swapped
     with 'tr->max_buffer.buffer', then the 'record_disabled' became 0
     (see update_max_tr());
  3. After `echo fff > tracing_cpumask`, the 'record_disabled' become -1;
Then array_buffer and max_buffer are both unavailable due to value of
'record_disabled' is not 0.

To fix it, enable or disable both array_buffer and max_buffer at the same
time in tracing_set_cpumask().

Link: https://lkml.kernel.org/r/20230805033816.3284594-2-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Cc: <shuah@kernel.org>
Fixes: 71babb2705 ("tracing: change CPU ring buffer state from tracing_cpumask")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-30 16:11:00 +02:00
jianzhou
e1fcc974b1 Merge keystone/android14-6.1-keystone-qcom-release.6.1.25 (af4467f) into
qcom-6.1

* refs/heads/tmp-af4467f:
  ANDROID: ABI: Update STG ABI to format version 2
  ANDROID: GKI: Update pixel symbol list for thermal
  ANDROID: thermal: Add vendor thermal genl check
  ANDROID: ABI: Update symbol for Exynos SoC
  ANDROID: GKI: Update mtk ABI symbol list
  ANDROID: ABI: Update symbol list for imx
  FROMGIT: Multi-gen LRU: Fix per-zone reclaim
  ANDROID: GKI: Update abi_gki_aarch64_qcom
  ANDROID: ABI: Update STG ABI to format version 2
  BACKPORT: FROMGIT: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627
  ANDROID: ABI: update symbol list for Xclipse GPU
  ANDROID: drm/ttm: export ttm_tt_unpopulate()
  ANDROID: fuse-bpf: Add partial flock support
  ANDROID: Incremental fs: Allocate data buffer based on input request size
  UPSTREAM: gfs2: Don't deref jdesc in evict
  ANDROID: KVM: arm64: Fix MMU context save/restore over TLB invalidation
  ANDROID: Update symbol list for VIVO
  ANDROID: add initial symbol list file for ExynosAuto SoCs
  ANDROID: sched: Export sched_domains_mutex for lockdep
  ANDROID: Update symbol for Exynos SoC
  ANDROID: ABI: Update symbol for Exynos SoC
  ANDROID: Update symbol list for mtk
  UPSTREAM: dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
  ANDROID: vendor_hooks: Supplement the missing hook call point.
  ANDROID: GKI: Add WWAN as GKI protected module
  ANDROID: GKI: regmap: Add regmap vendor hook for of_syscon_register
  UPSTREAM: kasan: suppress recursive reports for HW_TAGS
  UPSTREAM: kasan, arm64: add arch_suppress_tag_checks_start/stop
  UPSTREAM: arm64: mte: rename TCO routines
  BACKPORT: kasan, arm64: rename tagging-related routines
  UPSTREAM: kasan: drop empty tagging-related defines
  ANDROID: usb: xhci-plat: Fix double-free in xhci_plat_remove
  ANDROID: ABI: update symbol list for galaxy
  ANDROID: GKI: update the ABI symbol list
  ANDROID: ABI: Update symbol for Exynos SoC
  ANDROID: GKI: ABI: update whitelist for the kmsg_dump and native_hang symbols used by unisoc for kernel6.1
  ANDROID: ABI: Update symbols to unisoc whitelist for ims_bridge module
  ANDROID: abi_gki_aarch64_qcom: Add drm_plane_from_index and drm_gem_prime_export
  ANDROID: abi_gki_aarch64_qcom: Update symbol list
  UPSTREAM: fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds
  UPSTREAM: fsverity: explicitly check for buffer overflow in build_merkle_tree()
  ANDROID: update unisoc symbol list
  ANDROID: update symbol for unisoc whitelist
  UPSTREAM: f2fs: fix deadlock in i_xattr_sem and inode page lock
  ANDROID: GKI: update xiaomi symbol list
  Revert "FROMLIST: f2fs: remove i_xattr_sem to avoid deadlock and fix the original issue"
  ANDROID: ABI: Update pixel symbol list
  ANDROID: Set arch attribute for allmodconfig builds
  UPSTREAM: usb: gadget: udc: renesas_usb3: Fix use after free bug in renesas_usb3_remove due to race condition
  ANDROID: ABI: Add to QCOM symbols list
  UPSTREAM: arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block
  UPSTREAM: media: rkvdec: fix use after free bug in rkvdec_remove
  ANDROID: GKI: Update symbol list for MediatTek
  UPSTREAM: scsi: ufs: core: Remove dedicated hwq for dev command
  BACKPORT: scsi: ufs: mcq: Fix the incorrect OCS value for the device command
  FROMLIST: scsi: ufs: ufs-mediatek: Add MCQ support for MTK platform
  FROMLIST: scsi: ufs: core: Export symbols for MTK driver module
  UPSTREAM: blk-mq: check on cpu id when there is only one ctx mapping
  UPSTREAM: relayfs: fix out-of-bounds access in relay_file_read
  UPSTREAM: net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
  UPSTREAM: x86/mm: Avoid using set_pgd() outside of real PGD pages
  UPSTREAM: iommu/amd: Add missing domain type checks
  UPSTREAM: tty: serial: qcom_geni: avoid duplicate struct member init
  UPSTREAM: scsi: ufs: core: bsg: Fix cast to restricted __be16 warning
  UPSTREAM: netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
  ANDROID: fix build error when use cpu_cgroup_online vh
  ANDROID: ABI: add android_debug_symbol to whitelist
  ANDROID: defconfig: Enable debug_symbol driver
  ANDROID: android: Create debug_symbols driver
  ANDROID: ABI: update symbol list for exynos
  ANDROID: KVM: arm64: Remove 'struct kvm_vcpu' from the KMI
  UPSTREAM: KVM: arm64: Restore GICv2-on-GICv3 functionality
  UPSTREAM: KVM: arm64: vgic: Wrap vgic_its_create() with config_lock
  UPSTREAM: KVM: arm64: vgic: Fix a circular locking issue
  UPSTREAM: KVM: arm64: vgic: Don't acquire its_lock before config_lock
  BACKPORT: KVM: arm64: Avoid lock inversion when setting the VM register width
  UPSTREAM: KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON
  BACKPORT: KVM: arm64: Use config_lock to protect data ordered against KVM_RUN
  UPSTREAM: KVM: arm64: Use config_lock to protect vgic state
  BACKPORT: KVM: arm64: Add helper vgic_write_guest_lock()
  ANDROID: sound: usb: Fix wrong behavior of vendor hooking
  ANDROID: GKI: USB: XHCI: add Android ABI padding to struct xhci_vendor_ops
  Revert "ANDROID: android: Create debug_symbols driver"
  ANDROID: android: Create debug_symbols driver
  UPSTREAM: ipvlan:Fix out-of-bounds caused by unclear skb->cb
  ANDROID: update symbol list for unisoc vendor hook
  ANDROID: thermal: Add hook to enable/disable thermal power throttle
  ANDROID: ABI: Update symbol for Exynos SoC
  BACKPORT: FROMGIT: usb: gadget: udc: Handle gadget_connect failure during bind operation
  FROMGIT: usb: dwc3: gadget: Bail out in pullup if soft reset timeout happens
  ANDROID: GKI: Update symbol list for xiaomi
  ANDROID: vendor_hooks: vendor hook for MM
  ANDROID: add a symbol to unisoc symbol list
  ANDROID: GKI: update symbol list file for xiaomi
  UPSTREAM: net/sched: cls_u32: Fix reference counter leak leading to overflow
  ANDROID: db845c: Fix build when using --kgdb
  FROMGIT: usb: host: xhci-plat: Set XHCI_STATE_REMOVING before resuming XHCI HC
  FROMGIT: usb: host: xhci: Do not re-initialize the XHCI HC if being removed
  FROMLIST: kheaders: dereferences the source tree
  FROMLIST: f2fs: remove i_xattr_sem to avoid deadlock and fix the original issue
  ANDROID: db845c: Local define for db845c targets
  ANDROID: GKI: Update symbols to symbol list
  ANDROID: Export memcg functions to allow module to add new files
  ANDROID: rockpi4: Fix build when using --kgdb
  ANDROID: GKI: update symbol list file for xiaomi
  ANDROID: kleaf: android/gki_system_dlkm_modules is generated.
  ANDROID: ABI: Update pixel symbol list
  ANDROID: fuse-bpf: Move FUSE_RELEASE to correct place
  ANDROID: fuse-bpf: Ensure bpf field can never be nulled
  ANDROID: GKI: Increase CMA areas to 32
  ANDROID: Delete MODULES_LIST from build configs.
  ANDROID: ABI: Update symbols to unisoc whitelist
  ANDROID: HID: Only utilise UHID provided exports if UHID is enabled

 Conflicts:
	BUILD.bazel

Change-Id: Ibeee32bbc28dd5ad943cfb512ae73094cce2027c
Upstream-Build: ks_qcom-android14-6.1-keystone-qcom-release@10659679
UKQ2.230815.001
Signed-off-by: jianzhou <quic_jianzhou@quicinc.com>
2023-08-30 02:32:26 -07:00
qctecmdr
ad947e0d93 Merge "sched/walt: Fix cluster index swap" 2023-08-29 08:37:02 -07:00
Pavankumar Kondeti
7551a1a2a1 ANDROID: cgroup: Add android_rvh_cgroup_force_kthread_migration
In Android GKI, CONFIG_FAIR_GROUP_SCHED is enabled [1] to help
prioritize important work. Given that CPU shares of root cgroup
can't be changed, leaving the tasks inside root cgroup will give
them higher share compared to the other tasks inside important
cgroups. This is mitigated by moving all tasks inside root cgroup to
a different cgroup after Android is booted. However, there are many
kernel tasks stuck in the root cgroup after the boot.

It is possible to relax kernel threads and kworkers migrations under
certain scenarios. However the patch [2] posted at upstream is not
accepted. Hence add a restricted vendor hook to notify modules when a
kernel thread is requested for cgroup migration. The modules can relax
the restrictions forced by the kernel and allow the cgroup migration.

[1] f08f049de1
[2] https://lore.kernel.org/lkml/1617714261-18111-1-git-send-email-pkondeti@codeaurora.org

Bug: 184594949
Change-Id: I445a170ba797c8bece3b4b59b7a42cdd85438f1f
Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com>
[quic_dickey@quicinc.com: port to android-mainline kernel]
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-08-28 23:28:01 +00:00
xieliujie
7afa84fbb9 ANDROID: vendor_hooks: Add hooks for waking up and exiting control
Add hooks at process waking up and exiting routines so that oems
can control these procedures. One possible benifit is the peak of
system load can be shaved and load can be more smooth when a large
number of threads is killed once upon a time, while a sudden peak
of system load can probably lead to user junk issues.

Bug: 296493318
Change-Id: Ide5f9e63a4f50d6a9e3ffbc9516de9ce48ededef
Signed-off-by: xieliujie <xieliujie@oppo.com>
2023-08-25 18:32:06 +00:00
qctecmdr
e42c084b1f Merge "sched/walt: Ensure always in state2 if no partial halted CPUs" 2023-08-25 10:57:11 -07:00
Shaleen Agrawal
8ac97c5dab sched/walt: Fix cluster index swap
In the event of a 4 cluster system, where gold- cluster has a lower
max capacity than gold cluster, the positions of the sched_clusters are
swapped to ensure preset order. However, the cluster ids themselves were
improperly set.

Fix this by ensuring the cluster IDs are appropriately swapped.

Change-Id: I43447a5c1cf89bdbbc7f3ed4eff1a970ada1e3a7
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-25 10:47:44 -07:00
liuxudong5
2d7f87b0ff ANDROID: vendor_hooks:vendor hook for percpu-rwsem
We need a new vendor hook for two reasons:
1.The position of the previous vendor hook is inappropriate: when the task wakes up from percpu_rwsem_wait, it will enter a long runnable state, which will cause frame loss when the application starts. In order to solve this problem, we need to let the process enter the "vip" queue when it is woken up, so we need to set a flag for the process holding the lock to prove that it is about to hold the lock. The timing of setting the flag should be at the beginning of percpu_down_read/percpu_down_write rather than the end.
2.Most of this long runnable state occurs in the cgroup_threadgroup_rwsem, so we only care cgroup_threadgroup_rwsem, and cgroup_threadgroup_rwsem should be exported. At the same time, one more parameter "struct percpu_rw_semaphore *sem", is needed for this vendor hook.

Bug: 294496814
Change-Id: I5f014cfb68a60c29bbfd21452336e381e31e81b1
Signed-off-by: liuxudong5 <liuxudong5@xiaomi.com>
2023-08-25 17:34:59 +00:00
Shaleen Agrawal
7acd4b7dba sched/walt: Ensure always in state2 if no partial halted CPUs
is_state1 is supposed to return true, if and only if all CPUs that are
capable of being partially halted are either partially halted or fully
halted.

In the event that there are no partially halted CPUs in a system,
meaning min_partial_cpus is defined as 0, the expectation is for
is_state1 to return false.

However, cpumask_subset will return true if the first source CPU mask is
empty. This leads to unexpected behavior, as it would result in a case
where the system is always under state1 when min_partial_cpus is set to
0, resulting in a side effect where frequencies would never be synced.

Fix this by ensuring that if the number of CPUs that are capable of
being partially halted is 0, is_state1 returns false, thereby ensuring
that in such a system, state2, and therefore frequency sync, is always
the norm.

Change-Id: I2fb7cf27659d42fe713bdacf08db9b7c88c06800
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-24 10:38:18 -07:00
qctecmdr
97f56eed2b Merge "sched/walt: Introduce bug_on lockdep failures" 2023-08-24 01:55:46 -07:00
qctecmdr
fc59f95c92 Merge "sched/walt: Fix WALT_BUG crash observed" 2023-08-23 12:28:59 -07:00
Greg Kroah-Hartman
8b02e8901d Merge branch 'android14-6.1' into 'android14-6.1-lts'
Catches the android14-6.1-lts branch up with the android14-6.1 branch
which has had a lot of changes that are needed here to resolve future
LTS merges and to ensure that the ABI is kept stable.

It contains the following commits:

* 9fd41ac172 ANDROID: Delete build.config.gki.aarch64.16k.
* 073df44c36 FROMGIT: usb: typec: tcpm: Refactor the PPS APDO selection
* 078410e73f UPSTREAM: usb: typec: tcpm: Fix response to vsafe0V event
* 722f6cc09c ANDROID: Revert "ANDROID: allmodconfig: disable WERROR"
* c2611a04b9 ANDROID: GKI: update symbol list file for xiaomi
* 34fde9ec08 FROMGIT: usb: typec: tcpm: not sink vbus if operational current is 0mA
* 3ebafb7b46 BACKPORT: FROMGIT: mm: handle faults that merely update the accessed bit under the VMA lock
* 9e066d4b35 FROMLIST: mm: Allow fault_dirty_shared_page() to be called under the VMA lock
* 83ab986324 FROMGIT: mm: handle swap and NUMA PTE faults under the VMA lock
* ffcebdef16 FROMGIT: mm: run the fault-around code under the VMA lock
* 072c35fb69 FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check down from do_fault()
* fa9a8adff0 FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check down in handle_pte_fault()
* dd621869c1 BACKPORT: FROMGIT: mm: handle some PMD faults under the VMA lock
* 8594d6a30f BACKPORT: FROMGIT: mm: handle PUD faults under the VMA lock
* 66cbbe6b31 FROMGIT: mm: move FAULT_FLAG_VMA_LOCK check from handle_mm_fault()
* e26044769f BACKPORT: FROMGIT: mm: allow per-VMA locks on file-backed VMAs
* 4cb518a06f FROMGIT: mm: remove CONFIG_PER_VMA_LOCK ifdefs
* f4b32b7f15 FROMGIT: mm: fix a lockdep issue in vma_assert_write_locked
* 250f19771f FROMGIT: mm: handle userfaults under VMA lock
* e704d0e4f9 FROMGIT: mm: handle swap page faults under per-VMA lock
* f8a65b694b FROMGIT: mm: change folio_lock_or_retry to use vm_fault directly
* 693d905ec0 BACKPORT: FROMGIT: mm: drop per-VMA lock when returning VM_FAULT_RETRY or VM_FAULT_COMPLETED
* 939d4b1ccc BACKPORT: FROMGIT: mm: move vma locking out of vma_prepare and dup_anon_vma
* 0f0b09c02c BACKPORT: FROMGIT: mm: always lock new vma before inserting into vma tree
* a8a479ed96 FROMGIT: mm: lock vma explicitly before doing vm_flags_reset and vm_flags_reset_once
* ad18923856 FROMGIT: mm: replace mmap with vma write lock assertions when operating on a vma
* 5f0ca924aa FROMGIT: mm: for !CONFIG_PER_VMA_LOCK equate write lock assertion for vma and mmap
* abb0f2767e FROMGIT: mm: don't drop VMA locks in mm_drop_all_locks()
* 365af746f5 BACKPORT: riscv: mm: try VMA lock-based page fault handling first
* 3c187b4a12 BACKPORT: FROMGIT: mm: enable page walking API to lock vmas during the walk
* b6093c47fe BACKPORT: mm: lock VMA in dup_anon_vma() before setting ->anon_vma
* 0ee0062c94 UPSTREAM: mm: fix memory ordering for mm_lock_seq and vm_lock_seq
* 3378cbd264 FROMGIT: usb: host: ehci-sched: try to turn on io watchdog as long as periodic_count > 0
* 2d3351bd5e FROMGIT: BACKPORT: usb: ehci: add workaround for chipidea PORTSC.PEC bug
* 7fa8861130 UPSTREAM: tty: n_gsm: fix UAF in gsm_cleanup_mux
* 683966ac69 UPSTREAM: mm/mmap: Fix extra maple tree write
* f86c79eb86 FROMGIT: Multi-gen LRU: skip CMA pages when they are not eligible
* 7ae1e02abb UPSTREAM: mm: skip CMA pages when they are not available
* 7666325265 UPSTREAM: dma-buf: fix an error pointer vs NULL bug
* e61d76121f UPSTREAM: dma-buf: keep the signaling time of merged fences v3
* fda157ce15 UPSTREAM: netfilter: nf_tables: skip bound chain on rule flush
* 110a26edd1 UPSTREAM: net/sched: sch_qfq: account for stab overhead in qfq_enqueue
* 9db1437238 UPSTREAM: net/sched: sch_qfq: refactor parsing of netlink parameters
* 7688102949 UPSTREAM: netfilter: nft_set_pipapo: fix improper element removal
* 37f4509407 ANDROID: Add checkpatch target.
* d7dacaa439 UPSTREAM: USB: Gadget: core: Help prevent panic during UVC unconfigure
* 4dc009c3a8 ANDROID: GKI: Update symbols to symbol list
* fadc35923d ANDROID: vendor_hook: fix the error record position of mutex
* 3fc69d3f70 ANDROID: ABI: add allowed list for galaxy
* a5a662187f ANDROID: gfp: add __GFP_CMA in gfpflag_names
* b520b90913 ANDROID: ABI: Update to fix slab-out-of-bounds in xhci_vendor_get_ops
* c2cbb3cc24 ANDROID: usb: host: fix slab-out-of-bounds in xhci_vendor_get_ops
* 64787ee451 ANDROID: GKI: update pixel symbol list for xhci
* b0c06048a8 FROMGIT: fs: drop_caches: draining pages before dropping caches
* 2f76bb83b1 ANDROID: GKI: update symbol list file for xiaomi
* 8e86825eec ANDROID: uid_sys_stats: Use a single work for deferred updates
* 960d9828ee ANDROID: ABI: Update symbol for Exynos SoC
* 3926cc6ef8 ANDROID: GKI: Add symbols to symbol list for vivo
* dbb09068c1 ANDROID: vendor_hooks: Add tune scan type hook in get_scan_count()
* 5e1d25ac2a FROMGIT: BACKPORT: Multi-gen LRU: Fix can_swap in lru_gen_look_around()
* addf1a9a65 FROMGIT: Multi-gen LRU: Avoid race in inc_min_seq()
* a7adb98897 FROMGIT: Multi-gen LRU: Fix per-zone reclaim
* 03812b904e ANDROID: ABI: update symbol list for galaxy
* b283f9b41f ANDROID: oplus: Update the ABI xml and symbol list
* c3d26e2b5a ANDROID: vendor_hooks: Add hooks for lookaround
* 29e2f3e3d1 ANDROID: ABI: Update STG ABI to format version 2
* 3bd3d13701 ANDROID: ABI: Update symbol list for imx
* ad0b008167 FROMGIT: erofs: fix wrong primary bvec selection on deduplicated extents
* 126ef64cba UPSTREAM: media: Add ABGR64_12 video format
* 86e2e8fd05 BACKPORT: media: Add BGR48_12 video format
* 892293272c UPSTREAM: media: Add YUV48_12 video format
* b2cf7e4268 UPSTREAM: media: Add Y212 v4l2 format info
* 0f3f7a21af UPSTREAM: media: Add Y210, Y212 and Y216 formats
* ca7b45b128 UPSTREAM: media: Add Y012 video format
* 343b85ecad UPSTREAM: media: Add P012 and P012M video format
* 7beed73af0 ANDROID: GKI: Create symbol files in include/config
* 295e779e8f ANDROID: fuse-bpf: Use stored bpf for create_open
* 74d9daa59a ANDROID: fuse-bpf: Add bpf to negative fuse_dentry
* 6aef06abba ANDROID: fuse-bpf: Check inode not null
* 4bbda90bd8 ANDROID: fuse-bpf: Fix flock test compile error
* 84ac22a0d3 ANDROID: fuse-bpf: Add partial ioctl support
* e341d2312c ANDROID: ABI: Update oplus symbol list
* f5c707dc65 UPSTREAM: mm/mempolicy: Take VMA lock before replacing policy
* 890b1aabb1 BACKPORT: mm: lock_vma_under_rcu() must check vma->anon_vma under vma lock
* d3b37a712a BACKPORT: FROMGIT: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627
* a89e2cbbc0 ANDROID: GKI: update xiaomi symbol list
* 371f8d901a UPSTREAM: mm: lock newly mapped VMA with corrected ordering
* 0d9960403c UPSTREAM: fork: lock VMAs of the parent process when forking
* e3601b25ae UPSTREAM: mm: lock newly mapped VMA which can be modified after it becomes visible
* 05f7c7fe72 UPSTREAM: mm: lock a vma before stack expansion
* c0ba567af1 ANDROID: GKI: bring back find_extend_vma()
* 188ce9572f BACKPORT: mm: always expand the stack with the mmap write lock held
* 74efdc0966 BACKPORT: execve: expand new process stack manually ahead of time
* c8ad906849 ANDROID: abi_gki_aarch64_qcom: ufshcd_mcq_poll_cqe_lock
* 1afccd4255 UPSTREAM: mm: make find_extend_vma() fail if write lock not held
* 4087cac574 UPSTREAM: powerpc/mm: convert coprocessor fault to lock_mm_and_find_vma()
* 6c33246824 UPSTREAM: mm/fault: convert remaining simple cases to lock_mm_and_find_vma()
* add0a1ea04 UPSTREAM: arm/mm: Convert to using lock_mm_and_find_vma()
* 9f136450af UPSTREAM: riscv/mm: Convert to using lock_mm_and_find_vma()
* 053053fc68 UPSTREAM: mips/mm: Convert to using lock_mm_and_find_vma()
* 9cdce804c0 UPSTREAM: powerpc/mm: Convert to using lock_mm_and_find_vma()
* 1016faf509 BACKPORT: arch/arm64/mm/fault: Fix undeclared variable error in do_page_fault()
* 89298b8b3c BACKPORT: arm64/mm: Convert to using lock_mm_and_find_vma()
* cf70cb4f1f UPSTREAM: mm: make the page fault mmap locking killable
* 544ae28cf6 ANDROID: Inherit "user-aware property" across rtmutex.
* 5e4a5dc820 BACKPORT: blk-crypto: use dynamic lock class for blk_crypto_profile::lock
* db2c29e53d ANDROID: ABI: update symbol list for Xclipse GPU
* 7edb035c79 ANDROID: drm/ttm: export ttm_tt_unpopulate()
* b61f298c0d ANDROID: GKI: Add ABI symbol list(devlink) for MTK
* ec419af28f ANDROID: devlink: Select CONFIG_NET_DEVLINK in Kconfig.gki
* 1e114e6efa ANDROID: KVM: arm64: Fix memory ordering for pKVM module callbacks
* 3803ae4a28 BACKPORT: mm: introduce new 'lock_mm_and_find_vma()' page fault helper
* 66b5ad3507 BACKPORT: maple_tree: fix potential out-of-bounds access in mas_wr_end_piv()
* 19dd4101e0 UPSTREAM: x86/smp: Cure kexec() vs. mwait_play_dead() breakage
* 26260c4bd1 UPSTREAM: x86/smp: Use dedicated cache-line for mwait_play_dead()
* d8cb0365cb UPSTREAM: x86/smp: Remove pointless wmb()s from native_stop_other_cpus()
* 6744547e95 UPSTREAM: x86/smp: Dont access non-existing CPUID leaf
* ba2ccba863 UPSTREAM: x86/smp: Make stop_other_cpus() more robust
* 5c9836e66d UPSTREAM: x86/microcode/AMD: Load late on both threads too
* 53048f151c BACKPORT: mm, hwpoison: when copy-on-write hits poison, take page offline
* a2dff37b0c UPSTREAM: mm, hwpoison: try to recover from copy-on write faults
* 466448f55f BACKPORT: mm/mmap: Fix error return in do_vmi_align_munmap()
* 41b30362e9 BACKPORT: mm/mmap: Fix error path in do_vmi_align_munmap()
* d45a054f9c UPSTREAM: HID: logitech-hidpp: add HIDPP_QUIRK_DELAYED_INIT for the T651.
* 0e477a82e6 UPSTREAM: HID: hidraw: fix data race on device refcount
* af2d741bf3 UPSTREAM: can: isotp: isotp_sendmsg(): fix return error fix on TX path
* 5887040491 UPSTREAM: fbdev: fix potential OOB read in fast_imageblit()
* 6c48edb9c9 ANDROID: GKI: add function symbols for unisoc
* 342aff08ae ANDROID: cgroup: Cleanup android_rvh_cgroup_force_kthread_migration
* fcdea346bb UPSTREAM: net/sched: cls_fw: Fix improper refcount update leads to use-after-free
* f091cc7434 UPSTREAM: netfilter: nf_tables: fix chain binding transaction logic
* 1bb5e7fb37 ANDROID: abi_gki_aarch64_qcom: update abi

Change-Id: I6f86301f218a60c00d03e09a4e3bfebe42bad0d5
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-08-23 18:31:43 +00:00
Vincent Guittot
8517d73992 sched/fair: Remove capacity inversion detection
commit a2e90611b9f425adbbfcdaa5b5e49958ddf6f61b upstream.

Remove the capacity inversion detection which is now handled by
util_fits_cpu() returning -1 when we need to continue to look for a
potential CPU with better performance.

This ends up almost reverting patches below except for some comments:
commit da07d2f9c153 ("sched/fair: Fixes for capacity inversion detection")
commit aa69c36f31aa ("sched/fair: Consider capacity inversion in util_fits_cpu()")
commit 44c7b80bffc3 ("sched/fair: Detect capacity inversion")

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230201143628.270912-3-vincent.guittot@linaro.org
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23 17:52:41 +02:00
Vincent Guittot
e8acf9971f sched/fair: unlink misfit task from cpu overutilized
commit e5ed0550c04c5469ecdc1634d8aa18c8609590f0 upstream.

By taking into account uclamp_min, the 1:1 relation between task misfit
and cpu overutilized is no more true as a task with a small util_avg may
not fit a high capacity cpu because of uclamp_min constraint.

Add a new state in util_fits_cpu() to reflect the case that task would fit
a CPU except for the uclamp_min hint which is a performance requirement.

Use -1 to reflect that a CPU doesn't fit only because of uclamp_min so we
can use this new value to take additional action to select the best CPU
that doesn't match uclamp_min hint.

When util_fits_cpu() returns -1, we will continue to look for a possible
CPU with better performance, which replaces Capacity Inversion detection
with capacity_orig_of() - thermal_load_avg to detect a capacity inversion.

Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Reviewed-and-tested-by: Qais Yousef <qyousef@layalina.io>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Tested-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
Link: https://lore.kernel.org/r/20230201143628.270912-2-vincent.guittot@linaro.org
Signed-off-by: Qais Yousef (Google) <qyousef@layalina.io>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-23 17:52:41 +02:00
Chen Lin
128c06a34c ring-buffer: Do not swap cpu_buffer during resize process
[ Upstream commit 8a96c0288d0737ad77882024974c075345c72011 ]

When ring_buffer_swap_cpu was called during resize process,
the cpu buffer was swapped in the middle, resulting in incorrect state.
Continuing to run in the wrong state will result in oops.

This issue can be easily reproduced using the following two scripts:
/tmp # cat test1.sh
//#! /bin/sh
for i in `seq 0 100000`
do
         echo 2000 > /sys/kernel/debug/tracing/buffer_size_kb
         sleep 0.5
         echo 5000 > /sys/kernel/debug/tracing/buffer_size_kb
         sleep 0.5
done
/tmp # cat test2.sh
//#! /bin/sh
for i in `seq 0 100000`
do
        echo irqsoff > /sys/kernel/debug/tracing/current_tracer
        sleep 1
        echo nop > /sys/kernel/debug/tracing/current_tracer
        sleep 1
done
/tmp # ./test1.sh &
/tmp # ./test2.sh &

A typical oops log is as follows, sometimes with other different oops logs.

[  231.711293] WARNING: CPU: 0 PID: 9 at kernel/trace/ring_buffer.c:2026 rb_update_pages+0x378/0x3f8
[  231.713375] Modules linked in:
[  231.714735] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 #15
[  231.716750] Hardware name: linux,dummy-virt (DT)
[  231.718152] Workqueue: events update_pages_handler
[  231.719714] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  231.721171] pc : rb_update_pages+0x378/0x3f8
[  231.722212] lr : rb_update_pages+0x25c/0x3f8
[  231.723248] sp : ffff800082b9bd50
[  231.724169] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[  231.726102] x26: 0000000000000001 x25: fffffffffffff010 x24: 0000000000000ff0
[  231.728122] x23: ffff0000c3a0b600 x22: ffff0000c3a0b5c0 x21: fffffffffffffe0a
[  231.730203] x20: ffff0000c3a0b600 x19: ffff0000c0102400 x18: 0000000000000000
[  231.732329] x17: 0000000000000000 x16: 0000000000000000 x15: 0000ffffe7aa8510
[  231.734212] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000002
[  231.736291] x11: ffff8000826998a8 x10: ffff800082b9baf0 x9 : ffff800081137558
[  231.738195] x8 : fffffc00030e82c8 x7 : 0000000000000000 x6 : 0000000000000001
[  231.740192] x5 : ffff0000ffbafe00 x4 : 0000000000000000 x3 : 0000000000000000
[  231.742118] x2 : 00000000000006aa x1 : 0000000000000001 x0 : ffff0000c0007208
[  231.744196] Call trace:
[  231.744892]  rb_update_pages+0x378/0x3f8
[  231.745893]  update_pages_handler+0x1c/0x38
[  231.746893]  process_one_work+0x1f0/0x468
[  231.747852]  worker_thread+0x54/0x410
[  231.748737]  kthread+0x124/0x138
[  231.749549]  ret_from_fork+0x10/0x20
[  231.750434] ---[ end trace 0000000000000000 ]---
[  233.720486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[  233.721696] Mem abort info:
[  233.721935]   ESR = 0x0000000096000004
[  233.722283]   EC = 0x25: DABT (current EL), IL = 32 bits
[  233.722596]   SET = 0, FnV = 0
[  233.722805]   EA = 0, S1PTW = 0
[  233.723026]   FSC = 0x04: level 0 translation fault
[  233.723458] Data abort info:
[  233.723734]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[  233.724176]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[  233.724589]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[  233.725075] user pgtable: 4k pages, 48-bit VAs, pgdp=0000000104943000
[  233.725592] [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
[  233.726231] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[  233.726720] Modules linked in:
[  233.727007] CPU: 0 PID: 9 Comm: kworker/0:1 Tainted: G        W          6.5.0-rc1-00276-g20edcec23f92 #15
[  233.727777] Hardware name: linux,dummy-virt (DT)
[  233.728225] Workqueue: events update_pages_handler
[  233.728655] pstate: 200000c5 (nzCv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[  233.729054] pc : rb_update_pages+0x1a8/0x3f8
[  233.729334] lr : rb_update_pages+0x154/0x3f8
[  233.729592] sp : ffff800082b9bd50
[  233.729792] x29: ffff800082b9bd50 x28: ffff8000825f7000 x27: 0000000000000000
[  233.730220] x26: 0000000000000000 x25: ffff800082a8b840 x24: ffff0000c0102418
[  233.730653] x23: 0000000000000000 x22: fffffc000304c880 x21: 0000000000000003
[  233.731105] x20: 00000000000001f4 x19: ffff0000c0102400 x18: ffff800082fcbc58
[  233.731727] x17: 0000000000000000 x16: 0000000000000001 x15: 0000000000000001
[  233.732282] x14: ffff8000825fe0c8 x13: 0000000000000001 x12: 0000000000000000
[  233.732709] x11: ffff8000826998a8 x10: 0000000000000ae0 x9 : ffff8000801b760c
[  233.733148] x8 : fefefefefefefeff x7 : 0000000000000018 x6 : ffff0000c03298c0
[  233.733553] x5 : 0000000000000002 x4 : 0000000000000000 x3 : 0000000000000000
[  233.733972] x2 : ffff0000c3a0b600 x1 : 0000000000000000 x0 : 0000000000000000
[  233.734418] Call trace:
[  233.734593]  rb_update_pages+0x1a8/0x3f8
[  233.734853]  update_pages_handler+0x1c/0x38
[  233.735148]  process_one_work+0x1f0/0x468
[  233.735525]  worker_thread+0x54/0x410
[  233.735852]  kthread+0x124/0x138
[  233.736064]  ret_from_fork+0x10/0x20
[  233.736387] Code: 92400000 910006b5 aa000021 aa0303f7 (f9400060)
[  233.736959] ---[ end trace 0000000000000000 ]---

After analysis, the seq of the error is as follows [1-5]:

int ring_buffer_resize(struct trace_buffer *buffer, unsigned long size,
			int cpu_id)
{
	for_each_buffer_cpu(buffer, cpu) {
		cpu_buffer = buffer->buffers[cpu];
		//1. get cpu_buffer, aka cpu_buffer(A)
		...
		...
		schedule_work_on(cpu,
		 &cpu_buffer->update_pages_work);
		//2. 'update_pages_work' is queue on 'cpu', cpu_buffer(A) is passed to
		// update_pages_handler, do the update process, set 'update_done' in
		// complete(&cpu_buffer->update_done) and to wakeup resize process.
	//---->
		//3. Just at this moment, ring_buffer_swap_cpu is triggered,
		//cpu_buffer(A) be swaped to cpu_buffer(B), the max_buffer.
		//ring_buffer_swap_cpu is called as the 'Call trace' below.

		Call trace:
		 dump_backtrace+0x0/0x2f8
		 show_stack+0x18/0x28
		 dump_stack+0x12c/0x188
		 ring_buffer_swap_cpu+0x2f8/0x328
		 update_max_tr_single+0x180/0x210
		 check_critical_timing+0x2b4/0x2c8
		 tracer_hardirqs_on+0x1c0/0x200
		 trace_hardirqs_on+0xec/0x378
		 el0_svc_common+0x64/0x260
		 do_el0_svc+0x90/0xf8
		 el0_svc+0x20/0x30
		 el0_sync_handler+0xb0/0xb8
		 el0_sync+0x180/0x1c0
	//<----

	/* wait for all the updates to complete */
	for_each_buffer_cpu(buffer, cpu) {
		cpu_buffer = buffer->buffers[cpu];
		//4. get cpu_buffer, cpu_buffer(B) is used in the following process,
		//the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong.
		//for example, cpu_buffer(A)->update_done will leave be set 1, and will
		//not 'wait_for_completion' at the next resize round.
		  if (!cpu_buffer->nr_pages_to_update)
			continue;

		if (cpu_online(cpu))
			wait_for_completion(&cpu_buffer->update_done);
		cpu_buffer->nr_pages_to_update = 0;
	}
	...
}
	//5. the state of cpu_buffer(A) and cpu_buffer(B) is totally wrong,
	//Continuing to run in the wrong state, then oops occurs.

Link: https://lore.kernel.org/linux-trace-kernel/202307191558478409990@zte.com.cn

Signed-off-by: Chen Lin <chen.lin5@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23 17:52:27 +02:00
gaoxu
b7a34e30d4 dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
[ Upstream commit 51ff97d54f02b4444dfc42e380ac4c058e12d5dd ]

If dma_direct_alloc() alloc memory in size of 64MB, the inner function
dma_common_contiguous_remap() will allocate 128KB memory by invoking
the function kmalloc_array(). and the kmalloc_array seems to fail to try to
allocate 128KB mem.

Call trace:
[14977.928623] qcrosvm: page allocation failure: order:5, mode:0x40cc0
[14977.928638] dump_backtrace.cfi_jt+0x0/0x8
[14977.928647] dump_stack_lvl+0x80/0xb8
[14977.928652] warn_alloc+0x164/0x200
[14977.928657] __alloc_pages_slowpath+0x9f0/0xb4c
[14977.928660] __alloc_pages+0x21c/0x39c
[14977.928662] kmalloc_order+0x48/0x108
[14977.928666] kmalloc_order_trace+0x34/0x154
[14977.928668] __kmalloc+0x548/0x7e4
[14977.928673] dma_direct_alloc+0x11c/0x4f8
[14977.928678] dma_alloc_attrs+0xf4/0x138
[14977.928680] gh_vm_ioctl_set_fw_name+0x3c4/0x610 [gunyah]
[14977.928698] gh_vm_ioctl+0x90/0x14c [gunyah]
[14977.928705] __arm64_sys_ioctl+0x184/0x210

work around by doing kvmalloc_array instead.

Signed-off-by: Gao Xu <gaoxu2@hihonor.com>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-23 17:52:21 +02:00
qctecmdr
303abc30ea Merge "sched/walt: revise incorrect passed parameter for pipeline_reset_boost()" 2023-08-21 12:21:45 -07:00
qctecmdr
c44359bbad Merge "sched/walt: manual pipeline should allow cpu 0" 2023-08-21 12:21:44 -07:00
Tengfei Fan
09d0b48fe9 sched/walt: revise incorrect passed parameter for pipeline_reset_boost()
Revise incorrect passed parameter for pipeline_reset_boost().

Change-Id: I01b4f0f546fcebfea1b474f675ddfe05d5d50c84
Signed-off-by: Tengfei Fan <quic_tengfan@quicinc.com>
2023-08-16 18:44:38 -07:00
Stephen Dickey
bd7a3e0fba sched/walt: manual pipeline should allow cpu 0
If cpu 0 is to be allowed as a pipeline cpu, then the existing
code is incorrect because it will by default skip cpu 0 for
manual pipeline assignment.

Correct this by feeding -1 into cpumask_any_and, in the event
that it is time to roll-over and start re-selecting from the
cpus mask again.

Change-Id: Ia5e170e287db9ff7010555d51da3ea9789e2ec1e
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-08-16 15:12:13 -07:00
Shaleen Agrawal
ff989fe194 sched/walt: Fix WALT_BUG crash observed
A WALT_BUG crash was intermittently observed, where a task's mark_start
would appear to have moved ahead of a task's window_start by more than
one window.

In order to debug this further, a series of additional WALT_BUGs were
introduce, which yielded in a crash where mark_task_starting was being
called by a task whose mark_start had already been set.

In wake_up_new_task, prior to trace_android_rvh_new_task_stats being
referenced (which is where mark_start is supposed to be initialized), it
is possible that another call to UTRA sneaks by during the fork
balancing step, at the end of which, mark_start would be initialized.

Address this by reorganizing mark_task_starting to acknowledge that
another UTRA may have initialized mark_start, and further add clarity by
updating the CPU cycles through a reference to UTRA rather than doing so
directly.

Change-Id: Iae4efad890417d3708048ae95eea0532776ad24a
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-16 10:58:37 -07:00
qctecmdr
91de49da25 Merge "sched/walt: Print colocation thresholds" 2023-08-16 10:56:10 -07:00
qctecmdr
976ef61b65 Merge "sched/walt: Introduce WALT_BUG if mark_task_starting occurs twice" 2023-08-15 13:31:20 -07:00
Stephen Dickey
4f09aeb3c8 sched/walt: Introduce bug_on lockdep failures
In the event that lockdep_asserts fail, indicating that runqueue locks
are not held in a path where they should be in WALT, crash instantly to
prevent unpredictable behavior and gain insights to enable fixing the
underlying issue.

Change-Id: I940c958a77221cd1d8eb1976e0f69dfc3b33ce1e
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-15 11:04:45 -07:00
Shaleen Agrawal
3cec0c1971 sched/walt: Print colocation thresholds
Add colocation thresholds in the trace which helps debug information
regarding colocation status.

Change-Id: I908cd868b0cc9b431a3ba49696d5a0328748bf1c
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-14 16:16:47 -07:00
Shaleen Agrawal
12a7d8230a sched/walt: Add tracepoint to find early up/down margin changes
Introduce tracepoint to print the early up/down migration margin
values in the event they are changed by userspace.

Change-Id: Ib642fc24318da654431e8b32fe3a089cee02d27b
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-14 16:16:43 -07:00
Shaleen Agrawal
71b73ba11c sched/walt: Add tracepoint to find up/down margin changes
Introduce tracepoint to print the up/down migration margin values in the
event they are changed by userspace.

Change-Id: I7bb39ac5ee26f1f62fc0ed10713b1e3f6edaaab3
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-14 16:15:54 -07:00
Shaleen Agrawal
72bc8c7d98 sched/walt: Introduce WALT_BUG if mark_task_starting occurs twice
mark_task_starting should be called on a newly woken up task. At this
point, it's mark_start should not be set. Therefore, if mark_start
happens to already be set, do a WALT_BUG.

Change-Id: I47105da7305df428ce31677442a13a7eb2a67b51
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-14 09:48:43 -07:00
Shaleen Agrawal
dd71e2906a sched/walt: Introduce mark start ts
Print out the timestamp at which the task's mark_start is first set to
help debug issues.

Change-Id: Id440e873a75901e8888a68fbe09ba6c51e3acfb1
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-14 09:48:43 -07:00
Shaleen Agrawal
42783f9b06 sched/walt: Introduce WALT_BUG if mark start progresses too far
A crash was observed where a task's mark start was observed to be
greater than 1 window boundary away from the task's window start. This
is problematic as it indicates there was a UTRA call that took place,
where the runqueue's window start was updated, but the task's window
start wasn't.

Introduce bug_on to catch this issue earlier, and gain insights from the
stack trace and event to better understand the background.

Change-Id: Ia1546e537ea5722a6065581898e1516274ed53ea
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-08-14 09:48:35 -07:00
Deyao Ren
dc509aa20a Merge remote-tracking branch into HEAD
* keystone/mirror-android14-6.1-2023-07: (111 commits)
  ANDROID: ABI: Update STG ABI to format version 2
  ANDROID: GKI: Update pixel symbol list for thermal
  ANDROID: thermal: Add vendor thermal genl check
  ANDROID: ABI: Update symbol for Exynos SoC
  ANDROID: GKI: Update mtk ABI symbol list
  ANDROID: ABI: Update symbol list for imx
  ANDROID: GKI: Update abi_gki_aarch64_qcom
  BACKPORT: FROMGIT: irqchip/gic-v3: Workaround for GIC-700 erratum 2941627
  ANDROID: ABI: update symbol list for Xclipse GPU
  ANDROID: drm/ttm: export ttm_tt_unpopulate()
  ANDROID: fuse-bpf: Add partial flock support
  ANDROID: Incremental fs: Allocate data buffer based on input request size
  UPSTREAM: gfs2: Don't deref jdesc in evict
  ANDROID: KVM: arm64: Fix MMU context save/restore over TLB invalidation
  ANDROID: Update symbol list for VIVO
  ANDROID: add initial symbol list file for ExynosAuto SoCs
  ANDROID: sched: Export sched_domains_mutex for lockdep
  ANDROID: Update symbol for Exynos SoC
  ANDROID: ABI: Update symbol for Exynos SoC
  ANDROID: Update symbol list for mtk
  ...

Change-Id: I0186f02e9e3b07ea279334a06e33131b2a78c2f4
2023-08-12 00:55:24 +00:00
xieliujie
fadc35923d ANDROID: vendor_hook: fix the error record position of mutex
Make sure vendorhook trace_android_vh_record_mutex_lock_starttime woking both in fastpath unlock and slowpath unlock.

Fixes: 57750518de ("ANDROID: vendor_hook: Avoid clearing protect-flag before waking waiters")
Bug: 286024926
Change-Id: Ib91c1b88d27aaa4ef872d44102969ffc3c9adb58
Signed-off-by: xieliujie <xieliujie@oppo.com>
2023-08-11 16:57:28 +00:00
Hou Tao
7a1178a367 bpf, cpumap: Make sure kthread is running before map update returns
commit 640a604585aa30f93e39b17d4d6ba69fcb1e66c9 upstream.

The following warning was reported when running stress-mode enabled
xdp_redirect_cpu with some RT threads:

  ------------[ cut here ]------------
  WARNING: CPU: 4 PID: 65 at kernel/bpf/cpumap.c:135
  CPU: 4 PID: 65 Comm: kworker/4:1 Not tainted 6.5.0-rc2+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  Workqueue: events cpu_map_kthread_stop
  RIP: 0010:put_cpu_map_entry+0xda/0x220
  ......
  Call Trace:
   <TASK>
   ? show_regs+0x65/0x70
   ? __warn+0xa5/0x240
   ......
   ? put_cpu_map_entry+0xda/0x220
   cpu_map_kthread_stop+0x41/0x60
   process_one_work+0x6b0/0xb80
   worker_thread+0x96/0x720
   kthread+0x1a5/0x1f0
   ret_from_fork+0x3a/0x70
   ret_from_fork_asm+0x1b/0x30
   </TASK>

The root cause is the same as commit 436901649731 ("bpf: cpumap: Fix memory
leak in cpu_map_update_elem"). The kthread is stopped prematurely by
kthread_stop() in cpu_map_kthread_stop(), and kthread() doesn't call
cpu_map_kthread_run() at all but XDP program has already queued some
frames or skbs into ptr_ring. So when __cpu_map_ring_cleanup() checks
the ptr_ring, it will find it was not emptied and report a warning.

An alternative fix is to use __cpu_map_ring_cleanup() to drop these
pending frames or skbs when kthread_stop() returns -EINTR, but it may
confuse the user, because these frames or skbs have been handled
correctly by XDP program. So instead of dropping these frames or skbs,
just make sure the per-cpu kthread is running before
__cpu_map_entry_alloc() returns.

After apply the fix, the error handle for kthread_stop() will be
unnecessary because it will always return 0, so just remove it.

Fixes: 6710e11269 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230729095107.1722450-2-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 12:08:23 +02:00
Jiri Olsa
36dd8ca330 bpf: Disable preemption in bpf_event_output
commit d62cc390c2e99ae267ffe4b8d7e2e08b6c758c32 upstream.

We received report [1] of kernel crash, which is caused by
using nesting protection without disabled preemption.

The bpf_event_output can be called by programs executed by
bpf_prog_run_array_cg function that disabled migration but
keeps preemption enabled.

This can cause task to be preempted by another one inside the
nesting protection and lead eventually to two tasks using same
perf_sample_data buffer and cause crashes like:

  BUG: kernel NULL pointer dereference, address: 0000000000000001
  #PF: supervisor instruction fetch in kernel mode
  #PF: error_code(0x0010) - not-present page
  ...
  ? perf_output_sample+0x12a/0x9a0
  ? finish_task_switch.isra.0+0x81/0x280
  ? perf_event_output+0x66/0xa0
  ? bpf_event_output+0x13a/0x190
  ? bpf_event_output_data+0x22/0x40
  ? bpf_prog_dfc84bbde731b257_cil_sock4_connect+0x40a/0xacb
  ? xa_load+0x87/0xe0
  ? __cgroup_bpf_run_filter_sock_addr+0xc1/0x1a0
  ? release_sock+0x3e/0x90
  ? sk_setsockopt+0x1a1/0x12f0
  ? udp_pre_connect+0x36/0x50
  ? inet_dgram_connect+0x93/0xa0
  ? __sys_connect+0xb4/0xe0
  ? udp_setsockopt+0x27/0x40
  ? __pfx_udp_push_pending_frames+0x10/0x10
  ? __sys_setsockopt+0xdf/0x1a0
  ? __x64_sys_connect+0xf/0x20
  ? do_syscall_64+0x3a/0x90
  ? entry_SYSCALL_64_after_hwframe+0x72/0xdc

Fixing this by disabling preemption in bpf_event_output.

[1] https://github.com/cilium/cilium/issues/26756
Cc: stable@vger.kernel.org
Reported-by: Oleg "livelace" Popov <o.popov@livelace.ru>
Closes: https://github.com/cilium/cilium/issues/26756
Fixes: 2a916f2f54 ("bpf: Use migrate_disable/enable in array macros and cgroup/lirc code.")
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230725084206.580930-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 12:08:21 +02:00
Jiri Olsa
3654ed5daf bpf: Disable preemption in bpf_perf_event_output
commit f2c67a3e60d1071b65848efaa8c3b66c363dd025 upstream.

The nesting protection in bpf_perf_event_output relies on disabled
preemption, which is guaranteed for kprobes and tracepoints.

However bpf_perf_event_output can be also called from uprobes context
through bpf_prog_run_array_sleepable function which disables migration,
but keeps preemption enabled.

This can cause task to be preempted by another one inside the nesting
protection and lead eventually to two tasks using same perf_sample_data
buffer and cause crashes like:

  kernel tried to execute NX-protected page - exploit attempt? (uid: 0)
  BUG: unable to handle page fault for address: ffffffff82be3eea
  ...
  Call Trace:
   ? __die+0x1f/0x70
   ? page_fault_oops+0x176/0x4d0
   ? exc_page_fault+0x132/0x230
   ? asm_exc_page_fault+0x22/0x30
   ? perf_output_sample+0x12b/0x910
   ? perf_event_output+0xd0/0x1d0
   ? bpf_perf_event_output+0x162/0x1d0
   ? bpf_prog_c6271286d9a4c938_krava1+0x76/0x87
   ? __uprobe_perf_func+0x12b/0x540
   ? uprobe_dispatcher+0x2c4/0x430
   ? uprobe_notify_resume+0x2da/0xce0
   ? atomic_notifier_call_chain+0x7b/0x110
   ? exit_to_user_mode_prepare+0x13e/0x290
   ? irqentry_exit_to_user_mode+0x5/0x30
   ? asm_exc_int3+0x35/0x40

Fixing this by disabling preemption in bpf_perf_event_output.

Cc: stable@vger.kernel.org
Fixes: 8c7dcb84e3 ("bpf: implement sleepable uprobes by chaining gps")
Acked-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230725084206.580930-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 12:08:20 +02:00
Hou Tao
cbd0004518 bpf, cpumap: Handle skb as well when clean up ptr_ring
[ Upstream commit 7c62b75cd1a792e14b037fa4f61f9b18914e7de1 ]

The following warning was reported when running xdp_redirect_cpu with
both skb-mode and stress-mode enabled:

  ------------[ cut here ]------------
  Incorrect XDP memory type (-2128176192) usage
  WARNING: CPU: 7 PID: 1442 at net/core/xdp.c:405
  Modules linked in:
  CPU: 7 PID: 1442 Comm: kworker/7:0 Tainted: G  6.5.0-rc2+ #1
  Hardware name: QEMU Standard PC (i440FX + PIIX, 1996)
  Workqueue: events __cpu_map_entry_free
  RIP: 0010:__xdp_return+0x1e4/0x4a0
  ......
  Call Trace:
   <TASK>
   ? show_regs+0x65/0x70
   ? __warn+0xa5/0x240
   ? __xdp_return+0x1e4/0x4a0
   ......
   xdp_return_frame+0x4d/0x150
   __cpu_map_entry_free+0xf9/0x230
   process_one_work+0x6b0/0xb80
   worker_thread+0x96/0x720
   kthread+0x1a5/0x1f0
   ret_from_fork+0x3a/0x70
   ret_from_fork_asm+0x1b/0x30
   </TASK>

The reason for the warning is twofold. One is due to the kthread
cpu_map_kthread_run() is stopped prematurely. Another one is
__cpu_map_ring_cleanup() doesn't handle skb mode and treats skbs in
ptr_ring as XDP frames.

Prematurely-stopped kthread will be fixed by the preceding patch and
ptr_ring will be empty when __cpu_map_ring_cleanup() is called. But
as the comments in __cpu_map_ring_cleanup() said, handling and freeing
skbs in ptr_ring as well to "catch any broken behaviour gracefully".

Fixes: 11941f8a85 ("bpf: cpumap: Implement generic cpumap")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20230729095107.1722450-3-houtao@huaweicloud.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-11 12:08:15 +02:00
Peter Zijlstra
15c22cd1de perf: Fix function pointer case
commit 1af6239d1d3e61d33fd2f0ba53d3d1a67cc50574 upstream.

With the advent of CFI it is no longer acceptible to cast function
pointers.

The robot complains thusly:

  kernel-events-core.c⚠️cast-from-int-(-)(struct-perf_cpu_pmu_context-)-to-remote_function_f-(aka-int-(-)(void-)-)-converts-to-incompatible-function-type

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Cixi Geng <cixi.geng1@unisoc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-11 12:08:09 +02:00
jianzhou
572508aff3 Merge keystone/android14-6.1-keystone-qcom-release.6.1.25 (b9d4167) into qcom-6.1
* refs/heads/tmp-b9d4167:
  ANDROID: Snap to android14-6.1-2023-06
  ANDROID: fuse-bpf: Move FUSE_RELEASE to correct place
  BACKPORT: FROMLIST: ovl: get_acl: Fix null pointer dereference at realinode in rcu-walk mode
  BACKPORT: FROMLIST: ovl: ovl_permission: Fix null pointer dereference at realinode in rcu-walk mode
  BACKPORT: FROMLIST: ovl: Let helper ovl_i_path_real() return the realinode

 Conflicts:
	android/abi_gki_aarch64.stg

Change-Id: I1c41d9c5d104ea48b379f9d3e0637447637607ff
Upstream-Build: ks_qcom-android14-6.1-keystone-qcom-release@10638318 UKQ2.230809.001
Signed-off-by: jianzhou <quic_jianzhou@quicinc.com>
2023-08-10 01:28:37 -07:00
qctecmdr
8472a839c4 Merge "msm-sysstats: protect task->files in get_task_unreclaimable_info()" 2023-08-09 21:58:41 -07:00
Elliot Berman
b7e8439a23 ANDROID: Snap to android14-6.1-2023-06
Snap tree to commit 3bccd89f07 ("ANDROID: fuse-bpf: Move FUSE_RELEASE
to correct place") while preserving the known keystone divergences for
consolidated builds.

Change-Id: I30ae6c895f9b7c9767277a1e8c7d7e0e6237319a
Signed-off-by: Elliot Berman <quic_eberman@quicinc.com>
2023-08-08 17:02:27 -07:00
Peter Zijlstra
e0fd83a193 mm: Move mm_cachep initialization to mm_init()
commit af80602799681c78f14fbe20b6185a56020dedee upstream.

In order to allow using mm_alloc() much earlier, move initializing
mm_cachep into mm_init().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.751153381@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08 20:03:49 +02:00
Peter Zijlstra
9ae15aaff3 x86/mm: Use mm_alloc() in poking_init()
commit 3f4c8211d982099be693be9aa7d6fc4607dff290 upstream.

Instead of duplicating init_mm, allocate a fresh mm. The advantage is
that mm_alloc() has much simpler dependencies. Additionally it makes
more conceptual sense, init_mm has no (and must not have) user state
to duplicate.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.816175235@infradead.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-08 20:03:49 +02:00
qctecmdr
7cf19bf953 Merge "sched/walt: Fix issue caused by incorrect ws setting" 2023-08-03 18:25:40 -07:00
qctecmdr
745444b04c Merge "sched/walt: Improve the scheduler" 2023-08-03 18:25:40 -07:00
qctecmdr
8578f05924 Merge "sched/walt: Switch to using new api for state1/2 determination" 2023-08-03 18:25:38 -07:00
qctecmdr
93655eb27d Merge "sched/walt: Use softaffinity for RT tasks" 2023-08-03 18:25:38 -07:00
qctecmdr
843dc11a1b Merge "sched/walt: single-big-thread code passes global variable to halt" 2023-08-03 18:25:37 -07:00
qctecmdr
5366a12273 Merge "sched/walt: Force pipeline promotion to prime if prime worthy" 2023-08-03 18:25:37 -07:00
Steven Rostedt (Google)
0441c44154 tracing: Fix trace_event_raw_event_synth() if else statement
commit 9971c3f944489ff7aacb9d25e0cde841a5f6018a upstream.

The test to check if the field is a stack is to be done if it is not a
string. But the code had:

    } if (event->fields[i]->is_stack) {

and not

   } else if (event->fields[i]->is_stack) {

which would cause it to always be tested. Worse yet, this also included an
"else" statement that was only to be called if the field was not a string
and a stack, but this code allows it to be called if it was a string (and
not a stack).

Also fixed some whitespace issues.

Link: https://lore.kernel.org/all/202301302110.mEtNwkBD-lkp@intel.com/
Link: https://lore.kernel.org/linux-trace-kernel/20230131095237.63e3ca8d@gandalf.local.home

Cc: Tom Zanussi <zanussi@kernel.org>
Fixes: 00cf3d672a9d ("tracing: Allow synthetic events to pass around stacktraces")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-08-03 10:24:17 +02:00
Peter Zijlstra
7f1715d827 locking/rtmutex: Fix task->pi_waiters integrity
[ Upstream commit f7853c34241807bb97673a5e97719123be39a09e ]

Henry reported that rt_mutex_adjust_prio_check() has an ordering
problem and puts the lie to the comment in [7]. Sharing the sort key
between lock->waiters and owner->pi_waiters *does* create problems,
since unlike what the comment claims, holding [L] is insufficient.

Notably, consider:

	A
      /   \
     M1   M2
     |     |
     B     C

That is, task A owns both M1 and M2, B and C block on them. In this
case a concurrent chain walk (B & C) will modify their resp. sort keys
in [7] while holding M1->wait_lock and M2->wait_lock. So holding [L]
is meaningless, they're different Ls.

This then gives rise to a race condition between [7] and [11], where
the requeue of pi_waiters will observe an inconsistent tree order.

	B				C

  (holds M1->wait_lock,		(holds M2->wait_lock,
   holds B->pi_lock)		 holds A->pi_lock)

  [7]
  waiter_update_prio();
  ...
  [8]
  raw_spin_unlock(B->pi_lock);
  ...
  [10]
  raw_spin_lock(A->pi_lock);

				[11]
				rt_mutex_enqueue_pi();
				// observes inconsistent A->pi_waiters
				// tree order

Fixing this means either extending the range of the owner lock from
[10-13] to [6-13], with the immediate problem that this means [6-8]
hold both blocked and owner locks, or duplicating the sort key.

Since the locking in chain walk is horrible enough without having to
consider pi_lock nesting rules, duplicate the sort key instead.

By giving each tree their own sort key, the above race becomes
harmless, if C sees B at the old location, then B will correct things
(if they need correcting) when it walks up the chain and reaches A.

Fixes: fb00aca474 ("rtmutex: Turn the plist into an rb-tree")
Reported-by: Henry Wu <triangletrap12@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Henry Wu <triangletrap12@gmail.com>
Link: https://lkml.kernel.org/r/20230707161052.GF2883469%40hirez.programming.kicks-ass.net
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:24:14 +02:00
Zheng Yejian
a3a3c7bdda tracing: Fix warning in trace_buffered_event_disable()
[ Upstream commit dea499781a1150d285c62b26659f62fb00824fce ]

Warning happened in trace_buffered_event_disable() at
  WARN_ON_ONCE(!trace_buffered_event_ref)

  Call Trace:
   ? __warn+0xa5/0x1b0
   ? trace_buffered_event_disable+0x189/0x1b0
   __ftrace_event_enable_disable+0x19e/0x3e0
   free_probe_data+0x3b/0xa0
   unregister_ftrace_function_probe_func+0x6b8/0x800
   event_enable_func+0x2f0/0x3d0
   ftrace_process_regex.isra.0+0x12d/0x1b0
   ftrace_filter_write+0xe6/0x140
   vfs_write+0x1c9/0x6f0
   [...]

The cause of the warning is in __ftrace_event_enable_disable(),
trace_buffered_event_enable() was called once while
trace_buffered_event_disable() was called twice.
Reproduction script show as below, for analysis, see the comments:
 ```
 #!/bin/bash

 cd /sys/kernel/tracing/

 # 1. Register a 'disable_event' command, then:
 #    1) SOFT_DISABLED_BIT was set;
 #    2) trace_buffered_event_enable() was called first time;
 echo 'cmdline_proc_show:disable_event:initcall:initcall_finish' > \
     set_ftrace_filter

 # 2. Enable the event registered, then:
 #    1) SOFT_DISABLED_BIT was cleared;
 #    2) trace_buffered_event_disable() was called first time;
 echo 1 > events/initcall/initcall_finish/enable

 # 3. Try to call into cmdline_proc_show(), then SOFT_DISABLED_BIT was
 #    set again!!!
 cat /proc/cmdline

 # 4. Unregister the 'disable_event' command, then:
 #    1) SOFT_DISABLED_BIT was cleared again;
 #    2) trace_buffered_event_disable() was called second time!!!
 echo '!cmdline_proc_show:disable_event:initcall:initcall_finish' > \
     set_ftrace_filter
 ```

To fix it, IIUC, we can change to call trace_buffered_event_enable() at
fist time soft-mode enabled, and call trace_buffered_event_disable() at
last time soft-mode disabled.

Link: https://lore.kernel.org/linux-trace-kernel/20230726095804.920457-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Fixes: 0fc1b09ff1 ("tracing: Use temp buffer when filtering events")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:24:07 +02:00
Zheng Yejian
77996fa5c6 ring-buffer: Fix wrong stat of cpu_buffer->read
[ Upstream commit 2d093282b0d4357373497f65db6a05eb0c28b7c8 ]

When pages are removed in rb_remove_pages(), 'cpu_buffer->read' is set
to 0 in order to make sure any read iterators reset themselves. However,
this will mess 'entries' stating, see following steps:

  # cd /sys/kernel/tracing/
  # 1. Enlarge ring buffer prepare for later reducing:
  # echo 20 > per_cpu/cpu0/buffer_size_kb
  # 2. Write a log into ring buffer of cpu0:
  # taskset -c 0 echo "hello1" > trace_marker
  # 3. Read the log:
  # cat per_cpu/cpu0/trace_pipe
       <...>-332     [000] .....    62.406844: tracing_mark_write: hello1
  # 4. Stop reading and see the stats, now 0 entries, and 1 event readed:
  # cat per_cpu/cpu0/stats
   entries: 0
   [...]
   read events: 1
  # 5. Reduce the ring buffer
  # echo 7 > per_cpu/cpu0/buffer_size_kb
  # 6. Now entries became unexpected 1 because actually no entries!!!
  # cat per_cpu/cpu0/stats
   entries: 1
   [...]
   read events: 0

To fix it, introduce 'page_removed' field to count total removed pages
since last reset, then use it to let read iterators reset themselves
instead of changing the 'read' pointer.

Link: https://lore.kernel.org/linux-trace-kernel/20230724054040.3489499-1-zhengyejian1@huawei.com

Cc: <mhiramat@kernel.org>
Cc: <vnagarnaik@google.com>
Fixes: 83f40318da ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:24:07 +02:00
Linus Torvalds
7218974aba mm: suppress mm fault logging if fatal signal already pending
[ Upstream commit 5f0bc0b042fc77ff70e14c790abdec960cde4ec1 ]

Commit eda0047296a1 ("mm: make the page fault mmap locking killable")
intentionally made it much easier to trigger the "page fault fails
because a fatal signal is pending" situation, by having the mmap locking
fail early in that case.

We have long aborted page faults in other fatal cases when the actual IO
for a page is interrupted by SIGKILL - which is particularly useful for
the traditional case of NFS hanging due to network issues, but local
filesystems could cause it too if you happened to get the SIGKILL while
waiting for a page to be faulted in (eg lock_folio_maybe_drop_mmap()).

So aborting the page fault wasn't a new condition - but it now triggers
earlier, before we even get to 'handle_mm_fault()'.  And as a result the
error doesn't go through our 'fault_signal_pending()' logic, and doesn't
get filtered away there.

Normally you'd never even notice, because if a fatal signal is pending,
the new SIGSEGV we send ends up being ignored anyway.

But it turns out that there is one very noticeable exception: if you
enable 'show_unhandled_signals', the aborted page fault will be logged
in the kernel messages, and you'll get a scary line looking something
like this in your logs:

  pverados[2183248]: segfault at 55e5a00f9ae0 ip 000055e5a00f9ae0 sp 00007ffc0720bea8 error 14 in perl[55e5a00d4000+195000] likely on CPU 10 (core 4, socket 0)

which is rather misleading.  It's not really a segfault at all, it's
just "the thread was killed before the page fault completed, so we
aborted the page fault".

Fix this by just making it clear that a pending fatal signal means that
any new signal coming in after that is implicitly handled.  This will
avoid the misleading logging, since now the signal isn't 'unhandled' any
more.

Reported-and-tested-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Link: https://lore.kernel.org/lkml/8d063a26-43f5-0bb7-3203-c6a04dc159f8@proxmox.com/
Acked-by: Oleg Nesterov <oleg@redhat.com>
Fixes: eda0047296a1 ("mm: make the page fault mmap locking killable")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:24:01 +02:00
Masami Hiramatsu (Google)
a6e2a0e414 tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails
[ Upstream commit 797311bce5c2ac90b8d65e357603cfd410d36ebb ]

Fix to record 0-length data to data_loc in fetch_store_string*() if it fails
to get the string data.
Currently those expect that the data_loc is updated by store_trace_args() if
it returns the error code. However, that does not work correctly if the
argument is an array of strings. In that case, store_trace_args() only clears
the first entry of the array (which may have no error) and leaves other
entries. So it should be cleared by fetch_store_string*() itself.
Also, 'dyndata' and 'maxlen' in store_trace_args() should be updated
only if it is used (ret > 0 and argument is a dynamic data.)

Link: https://lore.kernel.org/all/168908496683.123124.4761206188794205601.stgit@devnote2/

Fixes: 40b53b7718 ("tracing: probeevent: Add array type support")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:23:55 +02:00
Masami Hiramatsu (Google)
bee9946688 Revert "tracing: Add "(fault)" name injection to kernel probes"
[ Upstream commit 4ed8f337dee32df71435689c19d22e4ee846e15a ]

This reverts commit 2e9906f84f.

It was turned out that commit 2e9906f84f ("tracing: Add "(fault)"
name injection to kernel probes") did not work correctly and probe
events still show just '(fault)' (instead of '"(fault)"'). Also,
current '(fault)' is more explicit that it faulted.

This also moves FAULT_STRING macro to trace.h so that synthetic
event can keep using it, and uses it in trace_probe.c too.

Link: https://lore.kernel.org/all/168908495772.123124.1250788051922100079.stgit@devnote2/
Link: https://lore.kernel.org/all/20230706230642.3793a593@rorschach.local.home/

Cc: stable@vger.kernel.org
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 797311bce5c2 ("tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:23:55 +02:00
Steven Rostedt (Google)
f3baa42afe tracing: Allow synthetic events to pass around stacktraces
[ Upstream commit 00cf3d672a9dd409418647e9f98784c339c3ff63 ]

Allow a stacktrace from one event to be displayed by the end event of a
synthetic event. This is very useful when looking for the longest latency
of a sleep or something blocked on I/O.

 # cd /sys/kernel/tracing/
 # echo 's:block_lat pid_t pid; u64 delta; unsigned long[] stack;' > dynamic_events
 # echo 'hist:keys=next_pid:ts=common_timestamp.usecs,st=stacktrace  if prev_state == 1||prev_state == 2' > events/sched/sched_switch/trigger
 # echo 'hist:keys=prev_pid:delta=common_timestamp.usecs-$ts,s=$st:onmax($delta).trace(block_lat,prev_pid,$delta,$s)' >> events/sched/sched_switch/trigger

The above creates a "block_lat" synthetic event that take the stacktrace of
when a task schedules out in either the interruptible or uninterruptible
states, and on a new per process max $delta (the time it was scheduled
out), will print the process id and the stacktrace.

  # echo 1 > events/synthetic/block_lat/enable
  # cat trace
 #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
 #              | |         |   |||||     |         |
    kworker/u16:0-767     [006] d..4.   560.645045: block_lat: pid=767 delta=66 stack=STACK:
 => __schedule
 => schedule
 => pipe_read
 => vfs_read
 => ksys_read
 => do_syscall_64
 => 0x966000aa

           <idle>-0       [003] d..4.   561.132117: block_lat: pid=0 delta=413787 stack=STACK:
 => __schedule
 => schedule
 => schedule_hrtimeout_range_clock
 => do_sys_poll
 => __x64_sys_poll
 => do_syscall_64
 => 0x966000aa

            <...>-153     [006] d..4.   562.068407: block_lat: pid=153 delta=54 stack=STACK:
 => __schedule
 => schedule
 => io_schedule
 => rq_qos_wait
 => wbt_wait
 => __rq_qos_throttle
 => blk_mq_submit_bio
 => submit_bio_noacct_nocheck
 => ext4_bio_write_page
 => mpage_submit_page
 => mpage_process_page_bufs
 => mpage_prepare_extent_to_map
 => ext4_do_writepages
 => ext4_writepages
 => do_writepages
 => __writeback_single_inode

Link: https://lkml.kernel.org/r/20230117152236.010941267@goodmis.org

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Ross Zwisler <zwisler@google.com>
Cc: Ching-lin Yu <chinglinyu@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 797311bce5c2 ("tracing/probes: Fix to record 0-length data_loc in fetch_store_string*() if fails")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:23:55 +02:00
Masami Hiramatsu (Google)
d92ee6bce1 tracing/probes: Fix to avoid double count of the string length on the array
[ Upstream commit 66bcf65d6cf0ca6540e2341e88ee7ef02dbdda08 ]

If an array is specified with the ustring or symstr, the length of the
strings are accumlated on both of 'ret' and 'total', which means the
length is double counted.
Just set the length to the 'ret' value for avoiding double counting.

Link: https://lore.kernel.org/all/168908492917.123124.15076463491122036025.stgit@devnote2/

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/8819b154-2ba1-43c3-98a2-cbde20892023@moroto.mountain/
Fixes: 88903c4643 ("tracing/probe: Add ustring type for user-space string")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:23:54 +02:00
Masami Hiramatsu (Google)
16cc222026 tracing/probes: Add symstr type for dynamic events
[ Upstream commit b26a124cbfa80f42bfc4e63e1d5643ca98159d66 ]

Add 'symstr' type for storing the kernel symbol as a string data
instead of the symbol address. This allows us to filter the
events by wildcard symbol name.

e.g.
  # echo 'e:wqfunc workqueue.workqueue_execute_start symname=$function:symstr' >> dynamic_events
  # cat events/eprobes/wqfunc/format
  name: wqfunc
  ID: 2110
  format:
  	field:unsigned short common_type;	offset:0;	size:2;	signed:0;
  	field:unsigned char common_flags;	offset:2;	size:1;	signed:0;
  	field:unsigned char common_preempt_count;	offset:3;	size:1;	signed:0;
  	field:int common_pid;	offset:4;	size:4;	signed:1;

  	field:__data_loc char[] symname;	offset:8;	size:4;	signed:1;

  print fmt: " symname=\"%s\"", __get_str(symname)

Note that there is already 'symbol' type which just change the
print format (so it still stores the symbol address in the tracing
ring buffer.) On the other hand, 'symstr' type stores the actual
"symbol+offset/size" data as a string.

Link: https://lore.kernel.org/all/166679930847.1528100.4124308529180235965.stgit@devnote3/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Stable-dep-of: 66bcf65d6cf0 ("tracing/probes: Fix to avoid double count of the string length on the array")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-08-03 10:23:54 +02:00
Stephen Dickey
6bb50d6b0d sched/walt: single-big-thread code passes global variable to halt
There is a single global variable used for a mask of which cpus
to manipulate for single-big-thread (sbt) purposes. This mask is
passed into the halt and resume apis.

This is incorrect, because the halt/resume apis can manipulate the
passed mask, indicating which cpus were actually halted or unhalted
in a given operation. Since the mask gets changed, the code forgets
which cpus to use for sbt.

Update the main sbt check routine such that a local copy of the
cpus to be used for sbt is made, and that local copy passed to
the halt/resume routines.

Change-Id: I9280f1bf600565dc63f0a9d9f84536d50e31fbd6
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-08-01 16:06:46 -07:00
Stephen Dickey
6896fa5076 sched/walt: core control sbt trace
Add tracepoint to see current state of sbt, as well as critical
decision making criteria.

Change-Id: Id24a14714ec9e28469f78be48f12b71725cc8bba
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-08-01 16:06:38 -07:00
Srinivasarao Pathipati
0b93b90ef7 msm-sysstats: protect task->files in get_task_unreclaimable_info()
Race is observed between task exiting and accessing task's files
from get_task_unreclaimable_info(), fix it by locking it.

Change-Id: Ie43426696aa09222fdc4bbb2533c8a8fd18a0f7c
Signed-off-by: Srinivasarao Pathipati <quic_c_spathi@quicinc.com>
2023-08-01 15:01:04 -07:00
Ramji Jiyani
7beed73af0 ANDROID: GKI: Create symbol files in include/config
Create input symbol files to generate GKI modules header
under include/config. By placing files in this generated
directory, the default filters that ignore certain files
will work without any special handling required, and they
will also be available to inspect after the build to inspect
for the debugging purposes.

abi_gki_protected_exports: Input for gki_module_protected_exports.h
From :- ${objtree}/abi_gki_protected_exports
To :- include/config/abi_gki_protected_exports

all_kmi_symbols: Input for gki_module_unprotected.h
- Rename to abi_gki_kmi_symbols
From :- all_kmi_symbols
To :- include/config/abi_gki_kmi_symbols

Bug: 286529877
Test: TH
Test: Manual verification of the generated files
Change-Id: Iafa10631e7712a8e1e87a2f56cfd614de6b1053a
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
2023-08-01 21:21:29 +00:00
Stephen Dickey
06dde6c45f sched/walt: Force pipeline promotion to prime if prime worthy
For automatic (heavy) pipeline searching the prime cpu will
not get used when there are only a few pipeline entries. This
will leave the prime cpu unoccupied, when one of the pipeline
tasks may be prime worthy.

To handle this, detect if there are any heavy tasks that are
prime worthy already, and promote the prime worthy tasks to
prime. Since pipeline cpus are reassigned every window rollover,
and automatic detection assigns cpus from lowest to highest,
it is unnecessary to demote a task from prime as this will
regardless.

For pipeline (manual) searching the same issue exists and must
be handled.

For the manual case, when the number of pipeline tasks is few,
but a prime_wts has been found, determine if the task on prime
is prime worthy. If it isn't, it must be demoted to non-prime.
Any remaining prime worthy task must then be found.

Change-Id: I15c9417a14c5860bf48edc1c3443fdc0b1255f42
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-07-31 14:46:10 -07:00
Stephen Dickey
f56bc00a2c sched/walt: move code to find prime and other wts to common
In preparation for finding and promoting (and demoting) prime
worth (and unworthy) tasks, create an api that can be
reused to find the pipeline task currently assigned to prime
and the pipeline task that is max demand.

Change-Id: I3b2334482b74c62598a2449e8938c920cfda85b2
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-07-31 14:45:55 -07:00
Shaleen Agrawal
3ee1a9ef9b sched/walt: add window based hysteresis
Ensure that pipeline tasks which have a demand greater than another
pipeline task running on prime only swap with prime after 4 windows.

Change-Id: I28b3e46f476f4f09682ae2caffc2cae04d76fae5
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-31 14:45:04 -07:00
Shaleen Agrawal
d0e8a6729a sched/walt: Generalize pipeline types enum
Clean up the enum of pipeline types, in an effort to
simplify usage in followup changes.

Change-Id: I451f04fc0b0f4d5f4cf98f071961e1ed451e94cc
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-31 14:44:42 -07:00
Shaleen Agrawal
71c0eb4846 sched/walt: Use coloc demand for pipeline task swaps
Since coloc demand represents averaged demand across history of past few
windows, the growth of pipeline task demands will be steadier,
theoretically resulting in less bouncing between prime and other CPUs.

Change-Id: I9a5c92fbc1c26591889e51be1e65273ebe2d27b4
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-31 14:44:17 -07:00
Stephen Dickey
395e904279 sched/walt: pipeline_set_boost must handle auto and manual pipeline
pipeline_set_boost does not handle the case properly, where both
manual and auto pipeline hinting are enabled or disabled, independently.
For example, this series of events

 pipeline_set_boost(true, MANUAL_PIPELINE);
  - auto pipeline enabled
 pipeline_set_boost(true, AUTO_PIPELINE);
  - auto pipeline disabled\
 pipeline_set_boost(false, MANUAL_PIPELINE);

With the above, pipeline boost will be enabled, even though manual
pipeline is no longer requesting it be enabled, and auto pipeline
is disabled.

Correct the code to independently track state of the AUTO or MANUAL
pipeline, and using that information to decide if boost is currently
requested or not.

Change-Id: Ia31eb8cd45b417f55f1e6827953073f708820ce6
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-07-31 14:43:43 -07:00
Shaleen Agrawal
0b472b73c3 sched/walt: Use softaffinity for RT tasks
Ensure that RT task placements respect userspace defined soft-affinity,
but keep using broader affinity in dire straights.

Change-Id: I01d004c3660ebab8aa3451468fa31b7c13d64e64
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-28 17:16:21 -07:00
Shaleen Agrawal
7ba3485efe sched/walt: Add per task reduce affinity feature
Create the skeleton for a per task tunable, to be used to indicate when
certain CPUs of a task should be ignored even if a task is affined to
them.

Change-Id: I8c0f1f01bbc416b5f5569dd4de1fc18971b8a5d4
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-28 17:16:21 -07:00
Shaleen Agrawal
3971aa7788 sched/walt: Fix issue caused by incorrect ws setting
WALT cpufreq gov notes freq transitions and window rollovers so that it
can generate an accurate avg freq the cluster was subject to.

For this accounting purposes, it needs to have an incrementing view of
window start timestamps as freq updates come in. Specifically it needs
the current window_start, walt_load->ws, greater than the last recorded
window start, wg_policy->last_ws.

A crash was observed with walt_load->ws set to zero while
wg_policy->last_ws is valid.

This likely is because of an issue in 'enable_shared_rail_boost' feature
where the current code fails to update the walt_load->ws, and simply
runs off whatever was set during an earlier update - coupled with a
reset of walt_load->ws when a cpu comes online in the gov->start
callback.

Fix this issue by ensuring the walt_load->ws is set regardless of the
sync features by simply calling __cpu_util_freq_walt() prior to adjusting
the loads based on siblings.

Change-Id: I57a598ec64cd0d1ae8c1d6d88e3232191839ded7
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-28 10:02:56 -07:00
Shaleen Agrawal
e453c52b94 sched/walt: Adjust cpu clusters for mid cap CPUs
In the event of a 4 cluster system, where gold- cluster has a lower
max capacity than gold cluster, ensure the ordering of the 4 cluster
system is maintained in the preset order.

Change-Id: Iff6cfb9ad93917c8b5ee94986165fabdf92b7c57
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-28 10:01:50 -07:00
Shaleen Agrawal
f4bd28d5e8 sched/walt: Switch to using new api for state1/2 determination
If a cluster containing CPUs that are capable of being partially halted
is in a state where all CPUs in that cluster are either partially halted
or fully halted, consider the system to be in state1.

Change-Id: I470690bb74b5617d028e15a211bc5f877c17b8a3
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-27 16:00:14 -07:00
Suren Baghdasaryan
0d9960403c UPSTREAM: fork: lock VMAs of the parent process when forking
When forking a child process, the parent write-protects anonymous pages
and COW-shares them with the child being forked using copy_present_pte().

We must not take any concurrent page faults on the source vma's as they
are being processed, as we expect both the vma and the pte's behind it
to be stable.  For example, the anon_vma_fork() expects the parents
vma->anon_vma to not change during the vma copy.

A concurrent page fault on a page newly marked read-only by the page
copy might trigger wp_page_copy() and a anon_vma_prepare(vma) on the
source vma, defeating the anon_vma_clone() that wasn't done because the
parent vma originally didn't have an anon_vma, but we now might end up
copying a pte entry for a page that has one.

Before the per-vma lock based changes, the mmap_lock guaranteed
exclusion with concurrent page faults.  But now we need to do a
vma_start_write() to make sure no concurrent faults happen on this vma
while it is being processed.

This fix can potentially regress some fork-heavy workloads.  Kernel
build time did not show noticeable regression on a 56-core machine while
a stress test mapping 10000 VMAs and forking 5000 times in a tight loop
shows ~5% regression.  If such fork time regression is unacceptable,
disabling CONFIG_PER_VMA_LOCK should restore its performance.  Further
optimizations are possible if this regression proves to be problematic.

Suggested-by: David Hildenbrand <david@redhat.com>
Reported-by: Jiri Slaby <jirislaby@kernel.org>
Closes: https://lore.kernel.org/all/dbdef34c-3a07-5951-e1ae-e9c6e3cdf51b@kernel.org/
Reported-by: Holger Hoffstätte <holger@applied-asynchrony.com>
Closes: https://lore.kernel.org/all/b198d649-f4bf-b971-31d0-e8433ec2a34c@applied-asynchrony.com/
Reported-by: Jacob Young <jacobly.alt@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=217624
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Cc: stable@vger.kernel.org
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit fb49c455323ff8319a123dd312be9082c49a23a5)
Change-Id: Ic5aa9dc51a888b5b0319ec4ec6d2941424573ca0
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-27 12:19:09 +00:00
Andrii Nakryiko
4c8f30a2ad bpf: aggressively forget precise markings during state checkpointing
[ Upstream commit 7a830b53c17bbadcf99f778f28aaaa4e6c41df5f ]

Exploit the property of about-to-be-checkpointed state to be able to
forget all precise markings up to that point even more aggressively. We
now clear all potentially inherited precise markings right before
checkpointing and branching off into child state. If any of children
states require precise knowledge of any SCALAR register, those will be
propagated backwards later on before this state is finalized, preserving
correctness.

There is a single selftests BPF program change, but tremendous one: 25x
reduction in number of verified instructions and states in
trace_virtqueue_add_sgs.

Cilium results are more modest, but happen across wider range of programs.

SELFTESTS RESULTS
=================

$ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results.csv ~/imprecise-aggressive-results.csv | grep -v '+0'
File                 Program                  Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
-------------------  -----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
loop6.bpf.linked1.o  trace_virtqueue_add_sgs           398057            15114   -382943 (-96.20%)              8717               336      -8381 (-96.15%)
-------------------  -----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------

CILIUM RESULTS
==============

$ ./veristat -C -e file,prog,insns,states ~/imprecise-early-results-cilium.csv ~/imprecise-aggressive-results-cilium.csv | grep -v '+0'
File           Program                           Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
-------------  --------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
bpf_host.o     tail_handle_nat_fwd_ipv4                    23426            23221       -205 (-0.88%)              1537              1515         -22 (-1.43%)
bpf_host.o     tail_handle_nat_fwd_ipv6                    13009            12904       -105 (-0.81%)               719               708         -11 (-1.53%)
bpf_host.o     tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
bpf_host.o     tail_nodeport_nat_ipv6_egress                3446             3406        -40 (-1.16%)               203               198          -5 (-2.46%)
bpf_lxc.o      tail_handle_nat_fwd_ipv4                    23426            23221       -205 (-0.88%)              1537              1515         -22 (-1.43%)
bpf_lxc.o      tail_handle_nat_fwd_ipv6                    13009            12904       -105 (-0.81%)               719               708         -11 (-1.53%)
bpf_lxc.o      tail_ipv4_ct_egress                          5074             4897       -177 (-3.49%)               255               248          -7 (-2.75%)
bpf_lxc.o      tail_ipv4_ct_ingress                         5100             4923       -177 (-3.47%)               255               248          -7 (-2.75%)
bpf_lxc.o      tail_ipv4_ct_ingress_policy_only             5100             4923       -177 (-3.47%)               255               248          -7 (-2.75%)
bpf_lxc.o      tail_ipv6_ct_egress                          4558             4536        -22 (-0.48%)               188               187          -1 (-0.53%)
bpf_lxc.o      tail_ipv6_ct_ingress                         4578             4556        -22 (-0.48%)               188               187          -1 (-0.53%)
bpf_lxc.o      tail_ipv6_ct_ingress_policy_only             4578             4556        -22 (-0.48%)               188               187          -1 (-0.53%)
bpf_lxc.o      tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
bpf_overlay.o  tail_nodeport_nat_ingress_ipv6               5261             5196        -65 (-1.24%)               247               243          -4 (-1.62%)
bpf_overlay.o  tail_nodeport_nat_ipv6_egress                3482             3442        -40 (-1.15%)               204               201          -3 (-1.47%)
bpf_xdp.o      tail_nodeport_nat_egress_ipv4               17200            15619      -1581 (-9.19%)              1111              1010        -101 (-9.09%)
-------------  --------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221104163649.121784-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27 08:50:51 +02:00
Andrii Nakryiko
8b57a37d0e bpf: stop setting precise in current state
[ Upstream commit f63181b6ae79fd3b034cde641db774268c2c3acf ]

Setting reg->precise to true in current state is not necessary from
correctness standpoint, but it does pessimise the whole precision (or
rather "imprecision", because that's what we want to keep as much as
possible) tracking. Why is somewhat subtle and my best attempt to
explain this is recorded in an extensive comment for __mark_chain_precise()
function. Some more careful thinking and code reading is probably required
still to grok this completely, unfortunately. Whiteboarding and a bunch
of extra handwaiving in person would be even more helpful, but is deemed
impractical in Git commit.

Next patch pushes this imprecision property even further, building on top of
the insights described in this patch.

End results are pretty nice, we get reduction in number of total instructions
and states verified due to a better states reuse, as some of the states are now
more generic and permissive due to less unnecessary precise=true requirements.

SELFTESTS RESULTS
=================

$ ./veristat -C -e file,prog,insns,states ~/subprog-precise-results.csv ~/imprecise-early-results.csv | grep -v '+0'
File                                     Program                 Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
---------------------------------------  ----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
bpf_iter_ksym.bpf.linked1.o              dump_ksym                           347              285       -62 (-17.87%)                20                19          -1 (-5.00%)
pyperf600_bpf_loop.bpf.linked1.o         on_event                           3678             3736        +58 (+1.58%)               276               285          +9 (+3.26%)
setget_sockopt.bpf.linked1.o             skops_sockopt                      4038             3947        -91 (-2.25%)               347               343          -4 (-1.15%)
test_l4lb.bpf.linked1.o                  balancer_ingress                   4559             2611     -1948 (-42.73%)               118               105        -13 (-11.02%)
test_l4lb_noinline.bpf.linked1.o         balancer_ingress                   6279             6268        -11 (-0.18%)               237               236          -1 (-0.42%)
test_misc_tcp_hdr_options.bpf.linked1.o  misc_estab                         1307             1303         -4 (-0.31%)               100                99          -1 (-1.00%)
test_sk_lookup.bpf.linked1.o             ctx_narrow_access                   456              447         -9 (-1.97%)                39                38          -1 (-2.56%)
test_sysctl_loop1.bpf.linked1.o          sysctl_tcp_mem                     1389             1384         -5 (-0.36%)                26                25          -1 (-3.85%)
test_tc_dtime.bpf.linked1.o              egress_fwdns_prio101                518              485        -33 (-6.37%)                51                46          -5 (-9.80%)
test_tc_dtime.bpf.linked1.o              egress_host                         519              468        -51 (-9.83%)                50                44         -6 (-12.00%)
test_tc_dtime.bpf.linked1.o              ingress_fwdns_prio101               842             1000      +158 (+18.76%)                73                88        +15 (+20.55%)
xdp_synproxy_kern.bpf.linked1.o          syncookie_tc                     405757           373173     -32584 (-8.03%)             25735             22882      -2853 (-11.09%)
xdp_synproxy_kern.bpf.linked1.o          syncookie_xdp                    479055           371590   -107465 (-22.43%)             29145             22207      -6938 (-23.81%)
---------------------------------------  ----------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------

Slight regression in test_tc_dtime.bpf.linked1.o/ingress_fwdns_prio101
is left for a follow up, there might be some more precision-related bugs
in existing BPF verifier logic.

CILIUM RESULTS
==============

$ ./veristat -C -e file,prog,insns,states ~/subprog-precise-results-cilium.csv ~/imprecise-early-results-cilium.csv | grep -v '+0'
File           Program                         Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
-------------  ------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
bpf_host.o     cil_from_host                               762              556      -206 (-27.03%)                43                37         -6 (-13.95%)
bpf_host.o     tail_handle_nat_fwd_ipv4                  23541            23426       -115 (-0.49%)              1538              1537          -1 (-0.07%)
bpf_host.o     tail_nodeport_nat_egress_ipv4             33592            33566        -26 (-0.08%)              2163              2161          -2 (-0.09%)
bpf_lxc.o      tail_handle_nat_fwd_ipv4                  23541            23426       -115 (-0.49%)              1538              1537          -1 (-0.07%)
bpf_overlay.o  tail_nodeport_nat_egress_ipv4             33581            33543        -38 (-0.11%)              2160              2157          -3 (-0.14%)
bpf_xdp.o      tail_handle_nat_fwd_ipv4                  21659            20920       -739 (-3.41%)              1440              1376         -64 (-4.44%)
bpf_xdp.o      tail_handle_nat_fwd_ipv6                  17084            17039        -45 (-0.26%)               907               905          -2 (-0.22%)
bpf_xdp.o      tail_lb_ipv4                              73442            73430        -12 (-0.02%)              4370              4369          -1 (-0.02%)
bpf_xdp.o      tail_lb_ipv6                             152114           151895       -219 (-0.14%)              6493              6479         -14 (-0.22%)
bpf_xdp.o      tail_nodeport_nat_egress_ipv4             17377            17200       -177 (-1.02%)              1125              1111         -14 (-1.24%)
bpf_xdp.o      tail_nodeport_nat_ingress_ipv6             6405             6397         -8 (-0.12%)               309               308          -1 (-0.32%)
bpf_xdp.o      tail_rev_nodeport_lb4                      7126             6934       -192 (-2.69%)               414               402         -12 (-2.90%)
bpf_xdp.o      tail_rev_nodeport_lb6                     18059            17905       -154 (-0.85%)              1105              1096          -9 (-0.81%)
-------------  ------------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221104163649.121784-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27 08:50:50 +02:00
Andrii Nakryiko
56675ddcb0 bpf: allow precision tracking for programs with subprogs
[ Upstream commit be2ef8161572ec1973124ebc50f56dafc2925e07 ]

Stop forcing precise=true for SCALAR registers when BPF program has any
subprograms. Current restriction means that any BPF program, as soon as
it uses subprograms, will end up not getting any of the precision
tracking benefits in reduction of number of verified states.

This patch keeps the fallback mark_all_scalars_precise() behavior if
precise marking has to cross function frames. E.g., if subprogram
requires R1 (first input arg) to be marked precise, ideally we'd need to
backtrack to the parent function and keep marking R1 and its
dependencies as precise. But right now we give up and force all the
SCALARs in any of the current and parent states to be forced to
precise=true. We can lift that restriction in the future.

But this patch fixes two issues identified when trying to enable
precision tracking for subprogs.

First, prevent "escaping" from top-most state in a global subprog. While
with entry-level BPF program we never end up requesting precision for
R1-R5 registers, because R2-R5 are not initialized (and so not readable
in correct BPF program), and R1 is PTR_TO_CTX, not SCALAR, and so is
implicitly precise. With global subprogs, though, it's different, as
global subprog a) can have up to 5 SCALAR input arguments, which might
get marked as precise=true and b) it is validated in isolation from its
main entry BPF program. b) means that we can end up exhausting parent
state chain and still not mark all registers in reg_mask as precise,
which would lead to verifier bug warning.

To handle that, we need to consider two cases. First, if the very first
state is not immediately "checkpointed" (i.e., stored in state lookup
hashtable), it will get correct first_insn_idx and last_insn_idx
instruction set during state checkpointing. As such, this case is
already handled and __mark_chain_precision() already handles that by
just doing nothing when we reach to the very first parent state.
st->parent will be NULL and we'll just stop. Perhaps some extra check
for reg_mask and stack_mask is due here, but this patch doesn't address
that issue.

More problematic second case is when global function's initial state is
immediately checkpointed before we manage to process the very first
instruction. This is happening because when there is a call to global
subprog from the main program the very first subprog's instruction is
marked as pruning point, so before we manage to process first
instruction we have to check and checkpoint state. This patch adds
a special handling for such "empty" state, which is identified by having
st->last_insn_idx set to -1. In such case, we check that we are indeed
validating global subprog, and with some sanity checking we mark input
args as precise if requested.

Note that we also initialize state->first_insn_idx with correct start
insn_idx offset. For main program zero is correct value, but for any
subprog it's quite confusing to not have first_insn_idx set. This
doesn't have any functional impact, but helps with debugging and state
printing. We also explicitly initialize state->last_insns_idx instead of
relying on is_state_visited() to do this with env->prev_insns_idx, which
will be -1 on the very first instruction. This concludes necessary
changes to handle specifically global subprog's precision tracking.

Second identified problem was missed handling of BPF helper functions
that call into subprogs (e.g., bpf_loop and few others). From precision
tracking and backtracking logic's standpoint those are effectively calls
into subprogs and should be called as BPF_PSEUDO_CALL calls.

This patch takes the least intrusive way and just checks against a short
list of current BPF helpers that do call subprogs, encapsulated in
is_callback_calling_function() function. But to prevent accidentally
forgetting to add new BPF helpers to this "list", we also do a sanity
check in __check_func_call, which has to be called for each such special
BPF helper, to validate that BPF helper is indeed recognized as
callback-calling one. This should catch any missed checks in the future.
Adding some special flags to be added in function proto definitions
seemed like an overkill in this case.

With the above changes, it's possible to remove forceful setting of
reg->precise to true in __mark_reg_unknown, which turns on precision
tracking both inside subprogs and entry progs that have subprogs. No
warnings or errors were detected across all the selftests, but also when
validating with veristat against internal Meta BPF objects and Cilium
objects. Further, in some BPF programs there are noticeable reduction in
number of states and instructions validated due to more effective
precision tracking, especially benefiting syncookie test.

$ ./veristat -C -e file,prog,insns,states ~/baseline-results.csv ~/subprog-precise-results.csv  | grep -v '+0'
File                                      Program                     Total insns (A)  Total insns (B)  Total insns (DIFF)  Total states (A)  Total states (B)  Total states (DIFF)
----------------------------------------  --------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------
pyperf600_bpf_loop.bpf.linked1.o          on_event                               3966             3678       -288 (-7.26%)               306               276         -30 (-9.80%)
pyperf_global.bpf.linked1.o               on_event                               7563             7530        -33 (-0.44%)               520               517          -3 (-0.58%)
pyperf_subprogs.bpf.linked1.o             on_event                              36358            36934       +576 (+1.58%)              2499              2531         +32 (+1.28%)
setget_sockopt.bpf.linked1.o              skops_sockopt                          3965             4038        +73 (+1.84%)               343               347          +4 (+1.17%)
test_cls_redirect_subprogs.bpf.linked1.o  cls_redirect                          64965            64901        -64 (-0.10%)              4619              4612          -7 (-0.15%)
test_misc_tcp_hdr_options.bpf.linked1.o   misc_estab                             1491             1307      -184 (-12.34%)               110               100         -10 (-9.09%)
test_pkt_access.bpf.linked1.o             test_pkt_access                         354              349         -5 (-1.41%)                25                24          -1 (-4.00%)
test_sock_fields.bpf.linked1.o            egress_read_sock_fields                 435              375       -60 (-13.79%)                22                20          -2 (-9.09%)
test_sysctl_loop2.bpf.linked1.o           sysctl_tcp_mem                         1508             1501         -7 (-0.46%)                29                28          -1 (-3.45%)
test_tc_dtime.bpf.linked1.o               egress_fwdns_prio100                    468              435        -33 (-7.05%)                45                41          -4 (-8.89%)
test_tc_dtime.bpf.linked1.o               ingress_fwdns_prio100                   398              408        +10 (+2.51%)                42                39          -3 (-7.14%)
test_tc_dtime.bpf.linked1.o               ingress_fwdns_prio101                  1096              842      -254 (-23.18%)                97                73        -24 (-24.74%)
test_tcp_hdr_options.bpf.linked1.o        estab                                  2758             2408      -350 (-12.69%)               208               181        -27 (-12.98%)
test_urandom_usdt.bpf.linked1.o           urand_read_with_sema                    466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
test_urandom_usdt.bpf.linked1.o           urand_read_without_sema                 466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
test_urandom_usdt.bpf.linked1.o           urandlib_read_with_sema                 466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
test_urandom_usdt.bpf.linked1.o           urandlib_read_without_sema              466              448        -18 (-3.86%)                31                28          -3 (-9.68%)
test_xdp_noinline.bpf.linked1.o           balancer_ingress_v6                    4302             4294         -8 (-0.19%)               257               256          -1 (-0.39%)
xdp_synproxy_kern.bpf.linked1.o           syncookie_tc                         583722           405757   -177965 (-30.49%)             35846             25735     -10111 (-28.21%)
xdp_synproxy_kern.bpf.linked1.o           syncookie_xdp                        609123           479055   -130068 (-21.35%)             35452             29145      -6307 (-17.79%)
----------------------------------------  --------------------------  ---------------  ---------------  ------------------  ----------------  ----------------  -------------------

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20221104163649.121784-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27 08:50:50 +02:00
Mohamed Khalfella
61622fa379 tracing/histograms: Return an error if we fail to add histogram to hist_vars list
commit 4b8b3905165ef98386a3c06f196c85d21292d029 upstream.

Commit 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if
they have referenced variables") added a check to fail histogram creation
if save_hist_vars() failed to add histogram to hist_vars list. But the
commit failed to set ret to failed return code before jumping to
unregister histogram, fix it.

Link: https://lore.kernel.org/linux-trace-kernel/20230714203341.51396-1-mkhalfella@purestorage.com

Cc: stable@vger.kernel.org
Fixes: 6018b585e8c6 ("tracing/histograms: Add histograms to hist_vars if they have referenced variables")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-27 08:50:49 +02:00
Kumar Kartikeya Dwivedi
8620c53ced bpf: Repeat check_max_stack_depth for async callbacks
[ Upstream commit b5e9ad522c4ccd32d322877515cff8d47ed731b9 ]

While the check_max_stack_depth function explores call chains emanating
from the main prog, which is typically enough to cover all possible call
chains, it doesn't explore those rooted at async callbacks unless the
async callback will have been directly called, since unlike non-async
callbacks it skips their instruction exploration as they don't
contribute to stack depth.

It could be the case that the async callback leads to a callchain which
exceeds the stack depth, but this is never reachable while only
exploring the entry point from main subprog. Hence, repeat the check for
the main subprog *and* all async callbacks marked by the symbolic
execution pass of the verifier, as execution of the program may begin at
any of them.

Consider functions with following stack depths:
main: 256
async: 256
foo: 256

main:
    rX = async
    bpf_timer_set_callback(...)

async:
    foo()

Here, async is not descended as it does not contribute to stack depth of
main (since it is referenced using bpf_pseudo_func and not
bpf_pseudo_call). However, when async is invoked asynchronously, it will
end up breaching the MAX_BPF_STACK limit by calling foo.

Hence, in addition to main, we also need to explore call chains
beginning at all async callback subprogs in a program.

Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:44 +02:00
Kumar Kartikeya Dwivedi
d55ff358b0 bpf: Fix subprog idx logic in check_max_stack_depth
[ Upstream commit ba7b3e7d5f9014be65879ede8fd599cb222901c9 ]

The assignment to idx in check_max_stack_depth happens once we see a
bpf_pseudo_call or bpf_pseudo_func. This is not an issue as the rest of
the code performs a few checks and then pushes the frame to the frame
stack, except the case of async callbacks. If the async callback case
causes the loop iteration to be skipped, the idx assignment will be
incorrect on the next iteration of the loop. The value stored in the
frame stack (as the subprogno of the current subprog) will be incorrect.

This leads to incorrect checks and incorrect tail_call_reachable
marking. Save the target subprog in a new variable and only assign to
idx once we are done with the is_async_cb check which may skip pushing
of frame to the frame stack and subsequent stack depth checks and tail
call markings.

Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230717161530.1238-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:44 +02:00
Yonghong Song
f4c0a6b8ce kallsyms: strip LTO-only suffixes from promoted global functions
[ Upstream commit 8cc32a9bbf2934d90762d9de0187adcb5ad46a11 ]

Commit 6eb4bd92c1 ("kallsyms: strip LTO suffixes from static functions")
stripped all function/variable suffixes started with '.' regardless
of whether those suffixes are generated at LTO mode or not. In fact,
as far as I know, in LTO mode, when a static function/variable is
promoted to the global scope, '.llvm.<...>' suffix is added.

The existing mechanism breaks live patch for a LTO kernel even if
no <symbol>.llvm.<...> symbols are involved. For example, for the following
kernel symbols:
  $ grep bpf_verifier_vlog /proc/kallsyms
  ffffffff81549f60 t bpf_verifier_vlog
  ffffffff8268b430 d bpf_verifier_vlog._entry
  ffffffff8282a958 d bpf_verifier_vlog._entry_ptr
  ffffffff82e12a1f d bpf_verifier_vlog.__already_done
'bpf_verifier_vlog' is a static function. '_entry', '_entry_ptr' and
'__already_done' are static variables used inside 'bpf_verifier_vlog',
so llvm promotes them to file-level static with prefix 'bpf_verifier_vlog.'.
Note that the func-level to file-level static function promotion also
happens without LTO.

Given a symbol name 'bpf_verifier_vlog', with LTO kernel, current mechanism will
return 4 symbols to live patch subsystem which current live patching
subsystem cannot handle it. With non-LTO kernel, only one symbol
is returned.

In [1], we have a lengthy discussion, the suggestion is to separate two
cases:
  (1). new symbols with suffix which are generated regardless of whether
       LTO is enabled or not, and
  (2). new symbols with suffix generated only when LTO is enabled.

The cleanup_symbol_name() should only remove suffixes for case (2).
Case (1) should not be changed so it can work uniformly with or without LTO.

This patch removed LTO-only suffix '.llvm.<...>' so live patching and
tracing should work the same way for non-LTO kernel.
The cleanup_symbol_name() in scripts/kallsyms.c is also changed to have the same
filtering pattern so both kernel and kallsyms tool have the same
expectation on the order of symbols.

 [1] https://lore.kernel.org/live-patching/20230615170048.2382735-1-song@kernel.org/T/#u

Fixes: 6eb4bd92c1 ("kallsyms: strip LTO suffixes from static functions")
Reported-by: Song Liu <song@kernel.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628181926.4102448-1-yhs@fb.com
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:39 +02:00
Zhen Lei
28fdfda791 kallsyms: Improve the performance of kallsyms_lookup_name()
[ Upstream commit 60443c88f3a89fd303a9e8c0e84895910675c316 ]

Currently, to search for a symbol, we need to expand the symbols in
'kallsyms_names' one by one, and then use the expanded string for
comparison. It's O(n).

If we sort names in ascending order like addresses, we can also use
binary search. It's O(log(n)).

In order not to change the implementation of "/proc/kallsyms", the table
kallsyms_names[] is still stored in a one-to-one correspondence with the
address in ascending order.

Add array kallsyms_seqs_of_names[], it's indexed by the sequence number
of the sorted names, and the corresponding content is the sequence number
of the sorted addresses. For example:
Assume that the index of NameX in array kallsyms_seqs_of_names[] is 'i',
the content of kallsyms_seqs_of_names[i] is 'k', then the corresponding
address of NameX is kallsyms_addresses[k]. The offset in kallsyms_names[]
is get_symbol_offset(k).

Note that the memory usage will increase by (4 * kallsyms_num_syms)
bytes, the next two patches will reduce (1 * kallsyms_num_syms) bytes
and properly handle the case CONFIG_LTO_CLANG=y.

Performance test results: (x86)
Before:
min=234, max=10364402, avg=5206926
min=267, max=11168517, avg=5207587
After:
min=1016, max=90894, avg=7272
min=1014, max=93470, avg=7293

The average lookup performance of kallsyms_lookup_name() improved 715x.

Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Stable-dep-of: 8cc32a9bbf29 ("kallsyms: strip LTO-only suffixes from promoted global functions")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:39 +02:00
Suren Baghdasaryan
92cc015332 sched/psi: use kernfs polling functions for PSI trigger polling
[ Upstream commit aff037078ecaecf34a7c2afab1341815f90fba5e ]

Destroying psi trigger in cgroup_file_release causes UAF issues when
a cgroup is removed from under a polling process. This is happening
because cgroup removal causes a call to cgroup_file_release while the
actual file is still alive. Destroying the trigger at this point would
also destroy its waitqueue head and if there is still a polling process
on that file accessing the waitqueue, it will step on the freed pointer:

do_select
  vfs_poll
                           do_rmdir
                             cgroup_rmdir
                               kernfs_drain_open_files
                                 cgroup_file_release
                                   cgroup_pressure_release
                                     psi_trigger_destroy
                                       wake_up_pollfree(&t->event_wait)
// vfs_poll is unblocked
                                       synchronize_rcu
                                       kfree(t)
  poll_freewait -> UAF access to the trigger's waitqueue head

Patch [1] fixed this issue for epoll() case using wake_up_pollfree(),
however the same issue exists for synchronous poll() case.
The root cause of this issue is that the lifecycles of the psi trigger's
waitqueue and of the file associated with the trigger are different. Fix
this by using kernfs_generic_poll function when polling on cgroup-specific
psi triggers. It internally uses kernfs_open_node->poll waitqueue head
with its lifecycle tied to the file's lifecycle. This also renders the
fix in [1] obsolete, so revert it.

[1] commit c2dbe32d5db5 ("sched/psi: Fix use-after-free in ep_remove_wait_queue()")

Fixes: 0e94682b73 ("psi: introduce psi monitor")
Closes: https://lore.kernel.org/all/20230613062306.101831-1-lujialin4@huawei.com/
Reported-by: Lu Jialin <lujialin4@huawei.com>
Signed-off-by: Suren Baghdasaryan <surenb@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20230630005612.1014540-1-surenb@google.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:38 +02:00
Domenico Cerasuolo
d5dca19776 sched/psi: Allow unprivileged polling of N*2s period
[ Upstream commit d82caa273565b45fcf103148950549af76c314b0 ]

PSI offers 2 mechanisms to get information about a specific resource
pressure. One is reading from /proc/pressure/<resource>, which gives
average pressures aggregated every 2s. The other is creating a pollable
fd for a specific resource and cgroup.

The trigger creation requires CAP_SYS_RESOURCE, and gives the
possibility to pick specific time window and threshold, spawing an RT
thread to aggregate the data.

Systemd would like to provide containers the option to monitor pressure
on their own cgroup and sub-cgroups. For example, if systemd launches a
container that itself then launches services, the container should have
the ability to poll() for pressure in individual services. But neither
the container nor the services are privileged.

This patch implements a mechanism to allow unprivileged users to create
pressure triggers. The difference with privileged triggers creation is
that unprivileged ones must have a time window that's a multiple of 2s.
This is so that we can avoid unrestricted spawning of rt threads, and
use instead the same aggregation mechanism done for the averages, which
runs independently of any triggers.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20230330105418.77061-5-cerasuolodomenico@gmail.com
Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:38 +02:00
Domenico Cerasuolo
fb4bc32fc1 sched/psi: Extract update_triggers side effect
[ Upstream commit 4468fcae49f08e88fbbffe05b29496192df89991 ]

This change moves update_total flag out of update_triggers function,
currently called only in psi_poll_work.
In the next patch, update_triggers will be called also in psi_avgs_work,
but the total update information is specific to psi_poll_work.
Returning update_total value to the caller let us avoid differentiating
the implementation of update_triggers for different aggregators.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20230330105418.77061-4-cerasuolodomenico@gmail.com
Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:37 +02:00
Domenico Cerasuolo
c1623d4d0b sched/psi: Rename existing poll members in preparation
[ Upstream commit 65457b74aa9437418e552e8d52d7112d4f9901a6 ]

Renaming in PSI implementation to make a clear distinction between
privileged and unprivileged triggers code to be implemented in the
next patch.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20230330105418.77061-3-cerasuolodomenico@gmail.com
Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:37 +02:00
Domenico Cerasuolo
c176dda0a6 sched/psi: Rearrange polling code in preparation
[ Upstream commit 7fab21fa0d000a0ea32d73ce8eec68557c6c268b ]

Move a few functions up in the file to avoid forward declaration needed
in the patch implementing unprivileged PSI triggers.

Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Domenico Cerasuolo <cerasuolodomenico@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Link: https://lore.kernel.org/r/20230330105418.77061-2-cerasuolodomenico@gmail.com
Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:37 +02:00
Chengming Zhou
7d8bba4da1 sched/psi: Fix avgs_work re-arm in psi_avgs_work()
[ Upstream commit 2fcd7bbae90a6d844da8660a9d27079281dfbba2 ]

Pavan reported a problem that PSI avgs_work idle shutoff is not
working at all. Because PSI_NONIDLE condition would be observed in
psi_avgs_work()->collect_percpu_times()->get_recent_times() even if
only the kworker running avgs_work on the CPU.

Although commit 1b69ac6b40 ("psi: fix aggregation idle shut-off")
avoided the ping-pong wake problem when the worker sleep, psi_avgs_work()
still will always re-arm the avgs_work, so shutoff is not working.

This patch changes to use PSI_STATE_RESCHEDULE to flag whether to
re-arm avgs_work in get_recent_times(). For the current CPU, we re-arm
avgs_work only when (NR_RUNNING > 1 || NR_IOWAIT > 0 || NR_MEMSTALL > 0),
for other CPUs we can just check PSI_NONIDLE delta. The new flag
is only used in psi_avgs_work(), so we check in get_recent_times()
that current_work() is avgs_work.

One potential problem is that the brief period of non-idle time
incurred between the aggregation run and the kworker's dequeue will
be stranded in the per-cpu buckets until avgs_work run next time.
The buckets can hold 4s worth of time, and future activity will wake
the avgs_work with a 2s delay, giving us 2s worth of data we can leave
behind when shut off the avgs_work. If the kworker run other works after
avgs_work shut off and doesn't have any scheduler activities for 2s,
this maybe a problem.

Reported-by: Pavan Kondeti <quic_pkondeti@quicinc.com>
Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Suren Baghdasaryan <surenb@google.com>
Tested-by: Chengming Zhou <zhouchengming@bytedance.com>
Link: https://lore.kernel.org/r/20221014110551.22695-1-zhouchengming@bytedance.com
Stable-dep-of: aff037078eca ("sched/psi: use kernfs polling functions for PSI trigger polling")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:37 +02:00
Miaohe Lin
45f739e8fb sched/fair: Use recent_used_cpu to test p->cpus_ptr
[ Upstream commit ae2ad293d6be143ad223f5f947cca07bcbe42595 ]

When checking whether a recently used CPU can be a potential idle
candidate, recent_used_cpu should be used to test p->cpus_ptr as
p->recent_used_cpu is not equal to recent_used_cpu and candidate
decision is made based on recent_used_cpu here.

Fixes: 89aafd67f2 ("sched/fair: Use prev instead of new target as recent_used_cpu")
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Phil Auld <pauld@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Link: https://lore.kernel.org/r/20230620080747.359122-1-linmiaohe@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:37 +02:00
Martin KaFai Lau
c006fe361c bpf: Address KCSAN report on bpf_lru_list
[ Upstream commit ee9fd0ac3017c4313be91a220a9ac4c99dde7ad4 ]

KCSAN reported a data-race when accessing node->ref.
Although node->ref does not have to be accurate,
take this chance to use a more common READ_ONCE() and WRITE_ONCE()
pattern instead of data_race().

There is an existing bpf_lru_node_is_ref() and bpf_lru_node_set_ref().
This patch also adds bpf_lru_node_clear_ref() to do the
WRITE_ONCE(node->ref, 0) also.

==================================================================
BUG: KCSAN: data-race in __bpf_lru_list_rotate / __htab_lru_percpu_map_update_elem

write to 0xffff888137038deb of 1 bytes by task 11240 on cpu 1:
__bpf_lru_node_move kernel/bpf/bpf_lru_list.c:113 [inline]
__bpf_lru_list_rotate_active kernel/bpf/bpf_lru_list.c:149 [inline]
__bpf_lru_list_rotate+0x1bf/0x750 kernel/bpf/bpf_lru_list.c:240
bpf_lru_list_pop_free_to_local kernel/bpf/bpf_lru_list.c:329 [inline]
bpf_common_lru_pop_free kernel/bpf/bpf_lru_list.c:447 [inline]
bpf_lru_pop_free+0x638/0xe20 kernel/bpf/bpf_lru_list.c:499
prealloc_lru_pop kernel/bpf/hashtab.c:290 [inline]
__htab_lru_percpu_map_update_elem+0xe7/0x820 kernel/bpf/hashtab.c:1316
bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
__sys_bpf+0x338/0x810
__do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

read to 0xffff888137038deb of 1 bytes by task 11241 on cpu 0:
bpf_lru_node_set_ref kernel/bpf/bpf_lru_list.h:70 [inline]
__htab_lru_percpu_map_update_elem+0x2f1/0x820 kernel/bpf/hashtab.c:1332
bpf_percpu_hash_update+0x5e/0x90 kernel/bpf/hashtab.c:2313
bpf_map_update_value+0x2a9/0x370 kernel/bpf/syscall.c:200
generic_map_update_batch+0x3ae/0x4f0 kernel/bpf/syscall.c:1687
bpf_map_do_batch+0x2d9/0x3d0 kernel/bpf/syscall.c:4534
__sys_bpf+0x338/0x810
__do_sys_bpf kernel/bpf/syscall.c:5096 [inline]
__se_sys_bpf kernel/bpf/syscall.c:5094 [inline]
__x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5094
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x41/0xc0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x63/0xcd

value changed: 0x01 -> 0x00

Reported by Kernel Concurrency Sanitizer on:
CPU: 0 PID: 11241 Comm: syz-executor.3 Not tainted 6.3.0-rc7-syzkaller-00136-g6a66fdd29ea1 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/30/2023
==================================================================

Reported-by: syzbot+ebe648a84e8784763f82@syzkaller.appspotmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230511043748.1384166-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:34 +02:00
Kui-Feng Lee
10fa03a9c1 bpf: Print a warning only if writing to unprivileged_bpf_disabled.
[ Upstream commit fedf99200ab086c42a572fca1d7266b06cdc3e3f ]

Only print the warning message if you are writing to
"/proc/sys/kernel/unprivileged_bpf_disabled".

The kernel may print an annoying warning when you read
"/proc/sys/kernel/unprivileged_bpf_disabled" saying

  WARNING: Unprivileged eBPF is enabled with eIBRS on, data leaks possible
  via Spectre v2 BHB attacks!

However, this message is only meaningful when the feature is
disabled or enabled.

Signed-off-by: Kui-Feng Lee <kuifeng@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230502181418.308479-1-kuifeng@meta.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:34 +02:00
Yicong Yang
78a5f711ef sched/fair: Don't balance task to its current running CPU
[ Upstream commit 0dd37d6dd33a9c23351e6115ae8cdac7863bc7de ]

We've run into the case that the balancer tries to balance a migration
disabled task and trigger the warning in set_task_cpu() like below:

 ------------[ cut here ]------------
 WARNING: CPU: 7 PID: 0 at kernel/sched/core.c:3115 set_task_cpu+0x188/0x240
 Modules linked in: hclgevf xt_CHECKSUM ipt_REJECT nf_reject_ipv4 <...snip>
 CPU: 7 PID: 0 Comm: swapper/7 Kdump: loaded Tainted: G           O       6.1.0-rc4+ #1
 Hardware name: Huawei TaiShan 2280 V2/BC82AMDC, BIOS 2280-V2 CS V5.B221.01 12/09/2021
 pstate: 604000c9 (nZCv daIF +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
 pc : set_task_cpu+0x188/0x240
 lr : load_balance+0x5d0/0xc60
 sp : ffff80000803bc70
 x29: ffff80000803bc70 x28: ffff004089e190e8 x27: ffff004089e19040
 x26: ffff007effcabc38 x25: 0000000000000000 x24: 0000000000000001
 x23: ffff80000803be84 x22: 000000000000000c x21: ffffb093e79e2a78
 x20: 000000000000000c x19: ffff004089e19040 x18: 0000000000000000
 x17: 0000000000001fad x16: 0000000000000030 x15: 0000000000000000
 x14: 0000000000000003 x13: 0000000000000000 x12: 0000000000000000
 x11: 0000000000000001 x10: 0000000000000400 x9 : ffffb093e4cee530
 x8 : 00000000fffffffe x7 : 0000000000ce168a x6 : 000000000000013e
 x5 : 00000000ffffffe1 x4 : 0000000000000001 x3 : 0000000000000b2a
 x2 : 0000000000000b2a x1 : ffffb093e6d6c510 x0 : 0000000000000001
 Call trace:
  set_task_cpu+0x188/0x240
  load_balance+0x5d0/0xc60
  rebalance_domains+0x26c/0x380
  _nohz_idle_balance.isra.0+0x1e0/0x370
  run_rebalance_domains+0x6c/0x80
  __do_softirq+0x128/0x3d8
  ____do_softirq+0x18/0x24
  call_on_irq_stack+0x2c/0x38
  do_softirq_own_stack+0x24/0x3c
  __irq_exit_rcu+0xcc/0xf4
  irq_exit_rcu+0x18/0x24
  el1_interrupt+0x4c/0xe4
  el1h_64_irq_handler+0x18/0x2c
  el1h_64_irq+0x74/0x78
  arch_cpu_idle+0x18/0x4c
  default_idle_call+0x58/0x194
  do_idle+0x244/0x2b0
  cpu_startup_entry+0x30/0x3c
  secondary_start_kernel+0x14c/0x190
  __secondary_switched+0xb0/0xb4
 ---[ end trace 0000000000000000 ]---

Further investigation shows that the warning is superfluous, the migration
disabled task is just going to be migrated to its current running CPU.
This is because that on load balance if the dst_cpu is not allowed by the
task, we'll re-select a new_dst_cpu as a candidate. If no task can be
balanced to dst_cpu we'll try to balance the task to the new_dst_cpu
instead. In this case when the migration disabled task is not on CPU it
only allows to run on its current CPU, load balance will select its
current CPU as new_dst_cpu and later triggers the warning above.

The new_dst_cpu is chosen from the env->dst_grpmask. Currently it
contains CPUs in sched_group_span() and if we have overlapped groups it's
possible to run into this case. This patch makes env->dst_grpmask of
group_balance_mask() which exclude any CPUs from the busiest group and
solve the issue. For balancing in a domain with no overlapped groups
the behaviour keeps same as before.

Suggested-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20230530082507.10444-1-yangyicong@huawei.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:34 +02:00
Paul E. McKenney
896f4d6046 rcu: Mark additional concurrent load from ->cpu_no_qs.b.exp
[ Upstream commit 9146eb25495ea8bfb5010192e61e3ed5805ce9ef ]

The per-CPU rcu_data structure's ->cpu_no_qs.b.exp field is updated
only on the instance corresponding to the current CPU, but can be read
more widely.  Unmarked accesses are OK from the corresponding CPU, but
only if interrupts are disabled, given that interrupt handlers can and
do modify this field.

Unfortunately, although the load from rcu_preempt_deferred_qs() is always
carried out from the corresponding CPU, interrupts are not necessarily
disabled.  This commit therefore upgrades this load to READ_ONCE.

Similarly, the diagnostic access from synchronize_rcu_expedited_wait()
might run with interrupts disabled and from some other CPU.  This commit
therefore marks this load with data_race().

Finally, the C-language access in rcu_preempt_ctxt_queue() is OK as
is because interrupts are disabled and this load is always from the
corresponding CPU.  This commit adds a comment giving the rationale for
this access being safe.

This data race was reported by KCSAN.  Not appropriate for backporting
due to failure being unlikely.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:33 +02:00
Shigeru Yoshida
9027d69221 rcu-tasks: Avoid pr_info() with spin lock in cblist_init_generic()
[ Upstream commit 5fc8cbe4cf0fd34ded8045c385790c3bf04f6785 ]

pr_info() is called with rtp->cbs_gbl_lock spin lock locked.  Because
pr_info() calls printk() that might sleep, this will result in BUG
like below:

[    0.206455] cblist_init_generic: Setting adjustable number of callback queues.
[    0.206463]
[    0.206464] =============================
[    0.206464] [ BUG: Invalid wait context ]
[    0.206465] 5.19.0-00428-g9de1f9c8ca51 #5 Not tainted
[    0.206466] -----------------------------
[    0.206466] swapper/0/1 is trying to lock:
[    0.206467] ffffffffa0167a58 (&port_lock_key){....}-{3:3}, at: serial8250_console_write+0x327/0x4a0
[    0.206473] other info that might help us debug this:
[    0.206473] context-{5:5}
[    0.206474] 3 locks held by swapper/0/1:
[    0.206474]  #0: ffffffff9eb597e0 (rcu_tasks.cbs_gbl_lock){....}-{2:2}, at: cblist_init_generic.constprop.0+0x14/0x1f0
[    0.206478]  #1: ffffffff9eb579c0 (console_lock){+.+.}-{0:0}, at: _printk+0x63/0x7e
[    0.206482]  #2: ffffffff9ea77780 (console_owner){....}-{0:0}, at: console_emit_next_record.constprop.0+0x111/0x330
[    0.206485] stack backtrace:
[    0.206486] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.19.0-00428-g9de1f9c8ca51 #5
[    0.206488] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
[    0.206489] Call Trace:
[    0.206490]  <TASK>
[    0.206491]  dump_stack_lvl+0x6a/0x9f
[    0.206493]  __lock_acquire.cold+0x2d7/0x2fe
[    0.206496]  ? stack_trace_save+0x46/0x70
[    0.206497]  lock_acquire+0xd1/0x2f0
[    0.206499]  ? serial8250_console_write+0x327/0x4a0
[    0.206500]  ? __lock_acquire+0x5c7/0x2720
[    0.206502]  _raw_spin_lock_irqsave+0x3d/0x90
[    0.206504]  ? serial8250_console_write+0x327/0x4a0
[    0.206506]  serial8250_console_write+0x327/0x4a0
[    0.206508]  console_emit_next_record.constprop.0+0x180/0x330
[    0.206511]  console_unlock+0xf7/0x1f0
[    0.206512]  vprintk_emit+0xf7/0x330
[    0.206514]  _printk+0x63/0x7e
[    0.206516]  cblist_init_generic.constprop.0.cold+0x24/0x32
[    0.206518]  rcu_init_tasks_generic+0x5/0xd9
[    0.206522]  kernel_init_freeable+0x15b/0x2a2
[    0.206523]  ? rest_init+0x160/0x160
[    0.206526]  kernel_init+0x11/0x120
[    0.206527]  ret_from_fork+0x1f/0x30
[    0.206530]  </TASK>
[    0.207018] cblist_init_generic: Setting shift to 1 and lim to 1.

This patch moves pr_info() so that it is called without
rtp->cbs_gbl_lock locked.

Signed-off-by: Shigeru Yoshida <syoshida@redhat.com>
Tested-by: "Zhang, Qiang1" <qiang1.zhang@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-27 08:50:33 +02:00
xieliujie
544ae28cf6 ANDROID: Inherit "user-aware property" across rtmutex.
Since upstream commit 715f7f9ece ("locking/rtmutex: Squash !RT
tasks to DEFAULT_PRIO"), non-rt tasks do not inherit the
nice-priority values across rt_mutexes. This removes the minor
(and indirect) priority-inheritance that rt-mutexes provided for
CFS tasks.

Though without priority inheritance, time-bounded priority
inversion can occur between CFS tasks of different nice
priorities / cgroup limitations.  The proxy-execution efforts
are a work-in-progress to resolve this upstream, but in the
meantime it is left to vendor hooks to provide a near term
solution to avoid priority inversion between CFS tasks.

In our oem scheduler, if a  CFS thread has an "user-aware
property", we will always pick it even if it's vruntime is
bigger than the smallest one in runqueue. That's why the
trace_android_rvh_replace_next_task_fair vendorhook was added
previously in commit 53e809978443 ("ANDROID: vendor_hooks: Add
hooks for scheduler").

Thus for our oem scheduler, important CFS tasks(like
RenderThread) are marked with the "user-aware property" in their
struct task_struct. If those tasks are blocked on an rtmutex, we
want to allow the "user-aware property" to be inherited to lock
owner, so it will be selected to run immediately to release the
lock.

To support this, we need new hooks to map "user-aware property"
into different rtmutex_waiter prio and update the owner's
"user-aware property" if needed. Thus these additional vendor
hooks are needed.

In the future, once an generalized upstream solution for CFS
priority inheritance is in place, this will no longer be needed.

Bug: 290585456
Change-Id: I6521ed2086b147400a54da6b84a324baf16bc649
Signed-off-by: xieliujie <xieliujie@oppo.com>
2023-07-27 00:04:07 +00:00
Shaleen Agrawal
92e7a47840 sched/walt: Improve the scheduler
This change is for general scheduler improvement.

Change-Id: Ib435f0ef2992a6eb526c47639e3a2b9edaf1bd2c
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-25 14:43:45 -07:00
Greg Kroah-Hartman
1ef7816a50 Merge branch 'android14-6.1' into 'android14-6.1-lts'
Catches the android14-6.1-lts branch up with the android14-6.1 branch
which has had a lot of changes that are needed here to resolve future
LTS merges and to ensure that the ABI is kept stable.

It contains the following commits:

* 0ee75a672c UPSTREAM: fs/ntfs3: Check fields while reading
* 6eb48b89a5 ANDROID: GKI: Update abi_gki_aarch64_qcom
* 17a080d04e ANDROID: ABI: Update pixel symbol list
* 0abc74db1a ANDROID: GKI: Move GKI module headers to generated includes
* 15a4b0d726 ANDROID: set kmi_symbol_list_add_only for Kleaf builds.
* dd567c60ff ANDROID: GKI: Add Android ABI padding to wwan_port_ops
* 7ed895f6b7 ANDROID: GKI: Add Android ABI padding to wwan_ops
* 13e8071ce0 ANDROID: update symbol list for unisoc regmap vendor hook
* ca372ba9e7 ANDROID: GKI: Update mtk ABI symbol list
* 8bb470d637 UPSTREAM: media: dvb-core: Fix kernel WARNING for blocking operation in wait_event*()
* 701f85c2a1 ANDROID: abi_gki_aarch64_qcom: Update QCOM symbol list
* d51e21b394 ANDROID: ABI: Update pixel symbol list
* 25a11995fb ANDROID: GKI: add ABI symbol for xiaomi
* 7dd60ce804 ANDROID: vendor_hooks: add vendor hook to support SAGT
* f930b82d16 FROMLIST: fuse: revalidate: don't invalidate if interrupted
* 3a8999c683 ANDROID: GKI: Update pixel symbol list for thermal
* 6ca2ff04a1 ANDROID: thermal: Add vendor thermal genl check
* 62ef90de0d ANDROID: GKI: Update the pixel symbol list
* 7bfd71d298 ANDROID: GKI: Update protected exports
* 4a207efbe0 FROMGIT: mm: add missing VM_FAULT_RESULT_TRACE name for VM_FAULT_COMPLETED
* 77ae3e7bb8 FROMGIT: swap: remove remnants of polling from read_swap_cache_async

Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
Change-Id: I8d0b0242e4c5413e38e0561b6d7afcba94a8c09e
2023-07-25 09:42:58 +00:00
qctecmdr
bf01a58857 Merge "Merge keystone/android14-6.1-keystone-qcom-release.6.1.25 (8823053) into qcom-6.1" 2023-07-24 15:19:33 -07:00
Ramji Jiyani
0abc74db1a ANDROID: GKI: Move GKI module headers to generated includes
Change build time generated GKI module headers location
From :- kernel/module/gki_module_*.h
To :- include/generated/gki_module_*.h

This prevents the kernel source from being contaminated.
By placing the header files in a generated directory,
the default filters that ignore certain files will work
without any special handling required.

Bug: 286529877
Test: Manual verification & TH
Change-Id: Ie247d1c132ddae54906de2e2850e95d7ae9edd50
Signed-off-by: Ramji Jiyani <ramjiyani@google.com>
(cherry picked from commit e9cba885543fc50a5b59ff7234d02b74a380573c)
2023-07-24 17:56:59 +00:00
Randy Dunlap
ff06cd411a swiotlb: mark swiotlb_memblock_alloc() as __init
commit 9b07d27d0fbb7f7441aa986859a0f53ec93a0335 upstream.

swiotlb_memblock_alloc() calls memblock_alloc(), which calls
(__init) memblock_alloc_try_nid(). However, swiotlb_membloc_alloc()
can be marked as __init since it is only called by swiotlb_init_remap(),
which is already marked as __init. This prevents a modpost build
warning/error:

WARNING: modpost: vmlinux.o: section mismatch in reference: swiotlb_memblock_alloc (section: .text) -> memblock_alloc_try_nid (section: .init.text)
WARNING: modpost: vmlinux.o: section mismatch in reference: swiotlb_memblock_alloc (section: .text) -> memblock_alloc_try_nid (section: .init.text)

This fixes the build warning/error seen on ARM64, PPC64, S390, i386,
and x86_64.

Fixes: 8d58aa484920 ("swiotlb: reduce the swiotlb buffer size on allocation failure")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Alexey Kardashevskiy <aik@amd.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: iommu@lists.linux.dev
Cc: Mike Rapoport <rppt@kernel.org>
Cc: linux-mm@kvack.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:50 +02:00
Beau Belgrave
95e34129f3 tracing/user_events: Fix struct arg size match check
commit d0a3022f30629a208e5944022caeca3568add9e7 upstream.

When users register an event the name of the event and it's argument are
checked to ensure they match if the event already exists. Normally all
arguments are in the form of "type name", except for when the type
starts with "struct ". In those cases, the size of the struct is passed
in addition to the name, IE: "struct my_struct a 20" for an argument
that is of type "struct my_struct" with a field name of "a" and has the
size of 20 bytes.

The current code does not honor the above case properly when comparing
a match. This causes the event register to fail even when the same
string was used for events that contain a struct argument within them.
The example above "struct my_struct a 20" generates a match string of
"struct my_struct a" omitting the size field.

Add the struct size of the existing field when generating a comparison
string for a struct field to ensure proper match checking.

Link: https://lkml.kernel.org/r/20230629235049.581-2-beaub@linux.microsoft.com

Cc: stable@vger.kernel.org
Fixes: e6f89a1498 ("tracing/user_events: Ensure user provided strings are safely formatted")
Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:47 +02:00
Masami Hiramatsu (Google)
a95c1fede2 tracing/probes: Fix to update dynamic data counter if fetcharg uses it
commit e38e2c6a9efc435f9de344b7c91f7697e01b47d5 upstream.

Fix to update dynamic data counter ('dyndata') and max length ('maxlen')
only if the fetcharg uses the dynamic data. Also get out arg->dynamic
from unlikely(). This makes dynamic data address wrong if
process_fetch_insn() returns error on !arg->dynamic case.

Link: https://lore.kernel.org/all/168908494781.123124.8160245359962103684.stgit@devnote2/

Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Link: https://lore.kernel.org/all/20230710233400.5aaf024e@gandalf.local.home/
Fixes: 9178412ddf ("tracing: probeevent: Return consumed bytes of dynamic area")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:47 +02:00
Masami Hiramatsu (Google)
837f92d27f tracing/probes: Fix not to count error code to total length
commit b41326b5e0f82e93592c4366359917b5d67b529f upstream.

Fix not to count the error code (which is minus value) to the total
used length of array, because it can mess up the return code of
process_fetch_insn_bottom(). Also clear the 'ret' value because it
will be used for calculating next data_loc entry.

Link: https://lore.kernel.org/all/168908493827.123124.2175257289106364229.stgit@devnote2/

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/8819b154-2ba1-43c3-98a2-cbde20892023@moroto.mountain/
Fixes: 9b960a3883 ("tracing: probeevent: Unify fetch_insn processing common part")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:47 +02:00
Mateusz Stachyra
938d5b7a75 tracing: Fix null pointer dereference in tracing_err_log_open()
commit 02b0095e2fbbc060560c1065f86a211d91e27b26 upstream.

Fix an issue in function 'tracing_err_log_open'.
The function doesn't call 'seq_open' if the file is opened only with
write permissions, which results in 'file->private_data' being left as null.
If we then use 'lseek' on that opened file, 'seq_lseek' dereferences
'file->private_data' in 'mutex_lock(&m->lock)', resulting in a kernel panic.
Writing to this node requires root privileges, therefore this bug
has very little security impact.

Tracefs node: /sys/kernel/tracing/error_log

Example Kernel panic:

Unable to handle kernel NULL pointer dereference at virtual address 0000000000000038
Call trace:
 mutex_lock+0x30/0x110
 seq_lseek+0x34/0xb8
 __arm64_sys_lseek+0x6c/0xb8
 invoke_syscall+0x58/0x13c
 el0_svc_common+0xc4/0x10c
 do_el0_svc+0x24/0x98
 el0_svc+0x24/0x88
 el0t_64_sync_handler+0x84/0xe4
 el0t_64_sync+0x1b4/0x1b8
Code: d503201f aa0803e0 aa1f03e1 aa0103e9 (c8e97d02)
---[ end trace 561d1b49c12cf8a5 ]---
Kernel panic - not syncing: Oops: Fatal exception

Link: https://lore.kernel.org/linux-trace-kernel/20230703155237eucms1p4dfb6a19caa14c79eb6c823d127b39024@eucms1p4
Link: https://lore.kernel.org/linux-trace-kernel/20230704102706eucms1p30d7ecdcc287f46ad67679fc8491b2e0f@eucms1p3

Cc: stable@vger.kernel.org
Fixes: 8a062902be ("tracing: Add tracing error log")
Signed-off-by: Mateusz Stachyra <m.stachyra@samsung.com>
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:46 +02:00
Masami Hiramatsu (Google)
fbcd0c2b56 fprobe: Ensure running fprobe_exit_handler() finished before calling rethook_free()
commit 195b9cb5b288fec1c871ef89f78cc9a7461aad3a upstream.

Ensure running fprobe_exit_handler() has finished before
calling rethook_free() in the unregister_fprobe() so that caller can free
the fprobe right after unregister_fprobe().

unregister_fprobe() ensured that all running fprobe_entry/exit_handler()
have finished by calling unregister_ftrace_function() which synchronizes
RCU. But commit 5f81018753df ("fprobe: Release rethook after the ftrace_ops
is unregistered") changed to call rethook_free() after
unregister_ftrace_function(). So call rethook_stop() to make rethook
disabled before unregister_ftrace_function() and ensure it again.

Here is the possible code flow that can call the exit handler after
unregister_fprobe().

------
 CPU1                              CPU2
 call unregister_fprobe(fp)
 ...
                                   __fprobe_handler()
                                   rethook_hook() on probed function
 unregister_ftrace_function()
                                   return from probed function
                                   rethook hooks
                                   find rh->handler == fprobe_exit_handler
                                   call fprobe_exit_handler()
 rethook_free():
   set rh->handler = NULL;
 return from unreigster_fprobe;
                                   call fp->exit_handler() <- (*)
------

(*) At this point, the exit handler is called after returning from
unregister_fprobe().

This fixes it as following;
------
 CPU1                              CPU2
 call unregister_fprobe()
 ...
 rethook_stop():
   set rh->handler = NULL;
                                   __fprobe_handler()
                                   rethook_hook() on probed function
 unregister_ftrace_function()
                                   return from probed function
                                   rethook hooks
                                   find rh->handler == NULL
                                   return from rethook
 rethook_free()
 return from unreigster_fprobe;
------

Link: https://lore.kernel.org/all/168873859949.156157.13039240432299335849.stgit@devnote2/

Fixes: 5f81018753df ("fprobe: Release rethook after the ftrace_ops is unregistered")
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:46 +02:00
Jiri Olsa
ce3ec57faf fprobe: Release rethook after the ftrace_ops is unregistered
commit 5f81018753dfd4989e33ece1f0cb6b8aae498b82 upstream.

While running bpf selftests it's possible to get following fault:

  general protection fault, probably for non-canonical address \
  0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC NOPTI
  ...
  Call Trace:
   <TASK>
   fprobe_handler+0xc1/0x270
   ? __pfx_bpf_testmod_init+0x10/0x10
   ? __pfx_bpf_testmod_init+0x10/0x10
   ? bpf_fentry_test1+0x5/0x10
   ? bpf_fentry_test1+0x5/0x10
   ? bpf_testmod_init+0x22/0x80
   ? do_one_initcall+0x63/0x2e0
   ? rcu_is_watching+0xd/0x40
   ? kmalloc_trace+0xaf/0xc0
   ? do_init_module+0x60/0x250
   ? __do_sys_finit_module+0xac/0x120
   ? do_syscall_64+0x37/0x90
   ? entry_SYSCALL_64_after_hwframe+0x72/0xdc
   </TASK>

In unregister_fprobe function we can't release fp->rethook while it's
possible there are some of its users still running on another cpu.

Moving rethook_free call after fp->ops is unregistered with
unregister_ftrace_function call.

Link: https://lore.kernel.org/all/20230615115236.3476617-1-jolsa@kernel.org/

Fixes: 5b0ab78998 ("fprobe: Add exit_handler support")
Cc: stable@vger.kernel.org
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:46 +02:00
Chungkai Yang
9a2c57fd32 PM: QoS: Restore support for default value on frequency QoS
commit 3a8395b565b5b4f019b3dc182be4c4541eb35ac8 upstream.

Commit 8d36694245 ("PM: QoS: Add check to make sure CPU freq is
non-negative") makes sure CPU freq is non-negative to avoid negative
value converting to unsigned data type. However, when the value is
PM_QOS_DEFAULT_VALUE, pm_qos_update_target specifically uses
c->default_value which is set to FREQ_QOS_MIN/MAX_DEFAULT_VALUE when
cpufreq_policy_alloc is executed, for this case handling.

Adding check for PM_QOS_DEFAULT_VALUE to let default setting work will
fix this problem.

Fixes: 8d36694245 ("PM: QoS: Add check to make sure CPU freq is non-negative")
Link: https://lore.kernel.org/lkml/20230626035144.19717-1-Chung-kai.Yang@mediatek.com/
Link: https://lore.kernel.org/lkml/20230627071727.16646-1-Chung-kai.Yang@mediatek.com/
Link: https://lore.kernel.org/lkml/CAJZ5v0gxNOWhC58PHeUhW_tgf6d1fGJVZ1x91zkDdht11yUv-A@mail.gmail.com/
Signed-off-by: Chungkai Yang <Chung-kai.Yang@mediatek.com>
Cc: 6.0+ <stable@vger.kernel.org> # 6.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:45 +02:00
Zheng Yejian
99fe81d219 ftrace: Fix possible warning on checking all pages used in ftrace_process_locs()
commit 26efd79c4624294e553aeaa3439c646729bad084 upstream.

As comments in ftrace_process_locs(), there may be NULL pointers in
mcount_loc section:
 > Some architecture linkers will pad between
 > the different mcount_loc sections of different
 > object files to satisfy alignments.
 > Skip any NULL pointers.

After commit 20e5227e9f ("ftrace: allow NULL pointers in mcount_loc"),
NULL pointers will be accounted when allocating ftrace pages but skipped
before adding into ftrace pages, this may result in some pages not being
used. Then after commit 706c81f87f ("ftrace: Remove extra helper
functions"), warning may occur at:
  WARN_ON(pg->next);

To fix it, only warn for case that no pointers skipped but pages not used
up, then free those unused pages after releasing ftrace_lock.

Link: https://lore.kernel.org/linux-trace-kernel/20230712060452.3175675-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: 706c81f87f ("ftrace: Remove extra helper functions")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:44 +02:00
Zheng Yejian
8b0b63fdac ring-buffer: Fix deadloop issue on reading trace_pipe
commit 7e42907f3a7b4ce3a2d1757f6d78336984daf8f5 upstream.

Soft lockup occurs when reading file 'trace_pipe':

  watchdog: BUG: soft lockup - CPU#6 stuck for 22s! [cat:4488]
  [...]
  RIP: 0010:ring_buffer_empty_cpu+0xed/0x170
  RSP: 0018:ffff88810dd6fc48 EFLAGS: 00000246
  RAX: 0000000000000000 RBX: 0000000000000246 RCX: ffffffff93d1aaeb
  RDX: ffff88810a280040 RSI: 0000000000000008 RDI: ffff88811164b218
  RBP: ffff88811164b218 R08: 0000000000000000 R09: ffff88815156600f
  R10: ffffed102a2acc01 R11: 0000000000000001 R12: 0000000051651901
  R13: 0000000000000000 R14: ffff888115e49500 R15: 0000000000000000
  [...]
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007f8d853c2000 CR3: 000000010dcd8000 CR4: 00000000000006e0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   __find_next_entry+0x1a8/0x4b0
   ? peek_next_entry+0x250/0x250
   ? down_write+0xa5/0x120
   ? down_write_killable+0x130/0x130
   trace_find_next_entry_inc+0x3b/0x1d0
   tracing_read_pipe+0x423/0xae0
   ? tracing_splice_read_pipe+0xcb0/0xcb0
   vfs_read+0x16b/0x490
   ksys_read+0x105/0x210
   ? __ia32_sys_pwrite64+0x200/0x200
   ? switch_fpu_return+0x108/0x220
   do_syscall_64+0x33/0x40
   entry_SYSCALL_64_after_hwframe+0x61/0xc6

Through the vmcore, I found it's because in tracing_read_pipe(),
ring_buffer_empty_cpu() found some buffer is not empty but then it
cannot read anything due to "rb_num_of_entries() == 0" always true,
Then it infinitely loop the procedure due to user buffer not been
filled, see following code path:

  tracing_read_pipe() {
    ... ...
    waitagain:
      tracing_wait_pipe() // 1. find non-empty buffer here
      trace_find_next_entry_inc()  // 2. loop here try to find an entry
        __find_next_entry()
          ring_buffer_empty_cpu();  // 3. find non-empty buffer
          peek_next_entry()  // 4. but peek always return NULL
            ring_buffer_peek()
              rb_buffer_peek()
                rb_get_reader_page()
                  // 5. because rb_num_of_entries() == 0 always true here
                  //    then return NULL
      // 6. user buffer not been filled so goto 'waitgain'
      //    and eventually leads to an deadloop in kernel!!!
  }

By some analyzing, I found that when resetting ringbuffer, the 'entries'
of its pages are not all cleared (see rb_reset_cpu()). Then when reducing
the ringbuffer, and if some reduced pages exist dirty 'entries' data, they
will be added into 'cpu_buffer->overrun' (see rb_remove_pages()), which
cause wrong 'overrun' count and eventually cause the deadloop issue.

To fix it, we need to clear every pages in rb_reset_cpu().

Link: https://lore.kernel.org/linux-trace-kernel/20230708225144.3785600-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: a5fb833172 ("ring-buffer: Fix uninitialized read_stamp")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:44 +02:00
Zheng Yejian
be970e22c5 tracing: Fix memory leak of iter->temp when reading trace_pipe
commit d5a821896360cc8b93a15bd888fabc858c038dc0 upstream.

kmemleak reports:
  unreferenced object 0xffff88814d14e200 (size 256):
    comm "cat", pid 336, jiffies 4294871818 (age 779.490s)
    hex dump (first 32 bytes):
      04 00 01 03 00 00 00 00 08 00 00 00 00 00 00 00  ................
      0c d8 c8 9b ff ff ff ff 04 5a ca 9b ff ff ff ff  .........Z......
    backtrace:
      [<ffffffff9bdff18f>] __kmalloc+0x4f/0x140
      [<ffffffff9bc9238b>] trace_find_next_entry+0xbb/0x1d0
      [<ffffffff9bc9caef>] trace_print_lat_context+0xaf/0x4e0
      [<ffffffff9bc94490>] print_trace_line+0x3e0/0x950
      [<ffffffff9bc95499>] tracing_read_pipe+0x2d9/0x5a0
      [<ffffffff9bf03a43>] vfs_read+0x143/0x520
      [<ffffffff9bf04c2d>] ksys_read+0xbd/0x160
      [<ffffffff9d0f0edf>] do_syscall_64+0x3f/0x90
      [<ffffffff9d2000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8

when reading file 'trace_pipe', 'iter->temp' is allocated or relocated
in trace_find_next_entry() but not freed before 'trace_pipe' is closed.

To fix it, free 'iter->temp' in tracing_release_pipe().

Link: https://lore.kernel.org/linux-trace-kernel/20230713141435.1133021-1-zhengyejian1@huawei.com

Cc: stable@vger.kernel.org
Fixes: ff895103a8 ("tracing: Save off entry when peeking at next entry")
Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:43 +02:00
Mohamed Khalfella
5fd32eb6fa tracing/histograms: Add histograms to hist_vars if they have referenced variables
commit 6018b585e8c6fa7d85d4b38d9ce49a5b67be7078 upstream.

Hist triggers can have referenced variables without having direct
variables fields. This can be the case if referenced variables are added
for trigger actions. In this case the newly added references will not
have field variables. Not taking such referenced variables into
consideration can result in a bug where it would be possible to remove
hist trigger with variables being refenced. This will result in a bug
that is easily reproducable like so

$ cd /sys/kernel/tracing
$ echo 'synthetic_sys_enter char[] comm; long id' >> synthetic_events
$ echo 'hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger
$ echo 'hist:keys=common_pid.execname,id.syscall:onmatch(raw_syscalls.sys_enter).synthetic_sys_enter($comm, id)' >> events/raw_syscalls/sys_enter/trigger
$ echo '!hist:keys=common_pid.execname,id.syscall:vals=hitcount:comm=common_pid.execname' >> events/raw_syscalls/sys_enter/trigger

[  100.263533] ==================================================================
[  100.264634] BUG: KASAN: slab-use-after-free in resolve_var_refs+0xc7/0x180
[  100.265520] Read of size 8 at addr ffff88810375d0f0 by task bash/439
[  100.266320]
[  100.266533] CPU: 2 PID: 439 Comm: bash Not tainted 6.5.0-rc1 #4
[  100.267277] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-20220807_005459-localhost 04/01/2014
[  100.268561] Call Trace:
[  100.268902]  <TASK>
[  100.269189]  dump_stack_lvl+0x4c/0x70
[  100.269680]  print_report+0xc5/0x600
[  100.270165]  ? resolve_var_refs+0xc7/0x180
[  100.270697]  ? kasan_complete_mode_report_info+0x80/0x1f0
[  100.271389]  ? resolve_var_refs+0xc7/0x180
[  100.271913]  kasan_report+0xbd/0x100
[  100.272380]  ? resolve_var_refs+0xc7/0x180
[  100.272920]  __asan_load8+0x71/0xa0
[  100.273377]  resolve_var_refs+0xc7/0x180
[  100.273888]  event_hist_trigger+0x749/0x860
[  100.274505]  ? kasan_save_stack+0x2a/0x50
[  100.275024]  ? kasan_set_track+0x29/0x40
[  100.275536]  ? __pfx_event_hist_trigger+0x10/0x10
[  100.276138]  ? ksys_write+0xd1/0x170
[  100.276607]  ? do_syscall_64+0x3c/0x90
[  100.277099]  ? entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  100.277771]  ? destroy_hist_data+0x446/0x470
[  100.278324]  ? event_hist_trigger_parse+0xa6c/0x3860
[  100.278962]  ? __pfx_event_hist_trigger_parse+0x10/0x10
[  100.279627]  ? __kasan_check_write+0x18/0x20
[  100.280177]  ? mutex_unlock+0x85/0xd0
[  100.280660]  ? __pfx_mutex_unlock+0x10/0x10
[  100.281200]  ? kfree+0x7b/0x120
[  100.281619]  ? ____kasan_slab_free+0x15d/0x1d0
[  100.282197]  ? event_trigger_write+0xac/0x100
[  100.282764]  ? __kasan_slab_free+0x16/0x20
[  100.283293]  ? __kmem_cache_free+0x153/0x2f0
[  100.283844]  ? sched_mm_cid_remote_clear+0xb1/0x250
[  100.284550]  ? __pfx_sched_mm_cid_remote_clear+0x10/0x10
[  100.285221]  ? event_trigger_write+0xbc/0x100
[  100.285781]  ? __kasan_check_read+0x15/0x20
[  100.286321]  ? __bitmap_weight+0x66/0xa0
[  100.286833]  ? _find_next_bit+0x46/0xe0
[  100.287334]  ? task_mm_cid_work+0x37f/0x450
[  100.287872]  event_triggers_call+0x84/0x150
[  100.288408]  trace_event_buffer_commit+0x339/0x430
[  100.289073]  ? ring_buffer_event_data+0x3f/0x60
[  100.292189]  trace_event_raw_event_sys_enter+0x8b/0xe0
[  100.295434]  syscall_trace_enter.constprop.0+0x18f/0x1b0
[  100.298653]  syscall_enter_from_user_mode+0x32/0x40
[  100.301808]  do_syscall_64+0x1a/0x90
[  100.304748]  entry_SYSCALL_64_after_hwframe+0x6e/0xd8
[  100.307775] RIP: 0033:0x7f686c75c1cb
[  100.310617] Code: 73 01 c3 48 8b 0d 65 3c 10 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 21 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 35 3c 10 00 f7 d8 64 89 01 48
[  100.317847] RSP: 002b:00007ffc60137a38 EFLAGS: 00000246 ORIG_RAX: 0000000000000021
[  100.321200] RAX: ffffffffffffffda RBX: 000055f566469ea0 RCX: 00007f686c75c1cb
[  100.324631] RDX: 0000000000000001 RSI: 0000000000000001 RDI: 000000000000000a
[  100.328104] RBP: 00007ffc60137ac0 R08: 00007f686c818460 R09: 000000000000000a
[  100.331509] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000009
[  100.334992] R13: 0000000000000007 R14: 000000000000000a R15: 0000000000000007
[  100.338381]  </TASK>

We hit the bug because when second hist trigger has was created
has_hist_vars() returned false because hist trigger did not have
variables. As a result of that save_hist_vars() was not called to add
the trigger to trace_array->hist_vars. Later on when we attempted to
remove the first histogram find_any_var_ref() failed to detect it is
being used because it did not find the second trigger in hist_vars list.

With this change we wait until trigger actions are created so we can take
into consideration if hist trigger has variable references. Also, now we
check the return value of save_hist_vars() and fail trigger creation if
save_hist_vars() fails.

Link: https://lore.kernel.org/linux-trace-kernel/20230712223021.636335-1-mkhalfella@purestorage.com

Cc: stable@vger.kernel.org
Fixes: 067fe038e7 ("tracing: Add variable reference handling to hist triggers")
Signed-off-by: Mohamed Khalfella <mkhalfella@purestorage.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:43 +02:00
sunliming
5aea2ac374 tracing/user_events: Fix incorrect return value for writing operation when events are disabled
commit f6d026eea390d59787a6cdc2ef5c983d02e029d0 upstream.

The writing operation return the count of writes regardless of whether events
are enabled or disabled. Switch it to return -EBADF to indicates that the event
is disabled.

Link: https://lkml.kernel.org/r/20230626111344.19136-2-sunliming@kylinos.cn

Cc: stable@vger.kernel.org
7f5a08c79d ("user_events: Add minimal support for trace_event into ftrace")
Acked-by: Beau Belgrave <beaub@linux.microsoft.com>
Signed-off-by: sunliming <sunliming@kylinos.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:32 +02:00
Pu Lehui
b11a9b4f28 bpf: cpumap: Fix memory leak in cpu_map_update_elem
[ Upstream commit 4369016497319a9635702da010d02af1ebb1849d ]

Syzkaller reported a memory leak as follows:

BUG: memory leak
unreferenced object 0xff110001198ef748 (size 192):
  comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
  hex dump (first 32 bytes):
    00 00 00 00 4a 19 00 00 80 ad e3 e4 fe ff c0 00  ....J...........
    00 b2 d3 0c 01 00 11 ff 28 f5 8e 19 01 00 11 ff  ........(.......
  backtrace:
    [<ffffffffadd28087>] __cpu_map_entry_alloc+0xf7/0xb00
    [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
    [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
    [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
    [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
    [<ffffffffb029cc80>] do_syscall_64+0x30/0x40
    [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

BUG: memory leak
unreferenced object 0xff110001198ef528 (size 192):
  comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
  hex dump (first 32 bytes):
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffffadd281f0>] __cpu_map_entry_alloc+0x260/0xb00
    [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
    [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
    [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
    [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
    [<ffffffffb029cc80>] do_syscall_64+0x30/0x40
    [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

BUG: memory leak
unreferenced object 0xff1100010fd93d68 (size 8):
  comm "syz-executor.3", pid 17672, jiffies 4298118891 (age 9.906s)
  hex dump (first 8 bytes):
    00 00 00 00 00 00 00 00                          ........
  backtrace:
    [<ffffffffade5db3e>] kvmalloc_node+0x11e/0x170
    [<ffffffffadd28280>] __cpu_map_entry_alloc+0x2f0/0xb00
    [<ffffffffadd28d8e>] cpu_map_update_elem+0x2fe/0x3d0
    [<ffffffffadc6d0fd>] bpf_map_update_value.isra.0+0x2bd/0x520
    [<ffffffffadc7349b>] map_update_elem+0x4cb/0x720
    [<ffffffffadc7d983>] __se_sys_bpf+0x8c3/0xb90
    [<ffffffffb029cc80>] do_syscall_64+0x30/0x40
    [<ffffffffb0400099>] entry_SYSCALL_64_after_hwframe+0x61/0xc6

In the cpu_map_update_elem flow, when kthread_stop is called before
calling the threadfn of rcpu->kthread, since the KTHREAD_SHOULD_STOP bit
of kthread has been set by kthread_stop, the threadfn of rcpu->kthread
will never be executed, and rcpu->refcnt will never be 0, which will
lead to the allocated rcpu, rcpu->queue and rcpu->queue->queue cannot be
released.

Calling kthread_stop before executing kthread's threadfn will return
-EINTR. We can complete the release of memory resources in this state.

Fixes: 6710e11269 ("bpf: introduce new bpf cpu map type BPF_MAP_TYPE_CPUMAP")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Acked-by: Jesper Dangaard Brouer <hawk@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20230711115848.2701559-1-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:26 +02:00
Tzvetomir Stoyanov (VMware)
f92a82dc48 kernel/trace: Fix cleanup logic of enable_trace_eprobe
[ Upstream commit cf0a624dc706c306294c14e6b3e7694702f25191 ]

The enable_trace_eprobe() function enables all event probes, attached
to given trace probe. If an error occurs in enabling one of the event
probes, all others should be roll backed. There is a bug in that roll
back logic - instead of all event probes, only the failed one is
disabled.

Link: https://lore.kernel.org/all/20230703042853.1427493-1-tz.stoyanov@gmail.com/

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Fixes: 7491e2c442 ("tracing: Add a probe that attaches to trace events")
Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:24 +02:00
Kumar Kartikeya Dwivedi
b2e74dedb0 bpf: Fix max stack depth check for async callbacks
[ Upstream commit 5415ccd50a8620c8cbaa32d6f18c946c453566f5 ]

The check_max_stack_depth pass happens after the verifier's symbolic
execution, and attempts to walk the call graph of the BPF program,
ensuring that the stack usage stays within bounds for all possible call
chains. There are two cases to consider: bpf_pseudo_func and
bpf_pseudo_call. In the former case, the callback pointer is loaded into
a register, and is assumed that it is passed to some helper later which
calls it (however there is no way to be sure), but the check remains
conservative and accounts the stack usage anyway. For this particular
case, asynchronous callbacks are skipped as they execute asynchronously
when their corresponding event fires.

The case of bpf_pseudo_call is simpler and we know that the call is
definitely made, hence the stack depth of the subprog is accounted for.

However, the current check still skips an asynchronous callback even if
a bpf_pseudo_call was made for it. This is erroneous, as it will miss
accounting for the stack usage of the asynchronous callback, which can
be used to breach the maximum stack depth limit.

Fix this by only skipping asynchronous callbacks when the instruction is
not a pseudo call to the subprog.

Fixes: 7ddc80a476 ("bpf: Teach stack depth check about async callbacks.")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20230705144730.235802-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:22 +02:00
Petr Tesarik
fd5b64c1cf swiotlb: reduce the number of areas to match actual memory pool size
[ Upstream commit 8ac04063354a01a484d2e55d20ed1958aa0d3392 ]

Although the desired size of the SWIOTLB memory pool is increased in
swiotlb_adjust_nareas() to match the number of areas, the actual allocation
may be smaller, which may require reducing the number of areas.

For example, Xen uses swiotlb_init_late(), which in turn uses the page
allocator. On x86, page size is 4 KiB and MAX_ORDER is 10 (1024 pages),
resulting in a maximum memory pool size of 4 MiB. This corresponds to 2048
slots of 2 KiB each. The minimum area size is 128 (IO_TLB_SEGSIZE),
allowing at most 2048 / 128 = 16 areas.

If num_possible_cpus() is greater than the maximum number of areas, areas
are smaller than IO_TLB_SEGSIZE and contiguous groups of free slots will
span multiple areas. When allocating and freeing slots, only one area will
be properly locked, causing race conditions on the unlocked slots and
ultimately data corruption, kernel hangs and crashes.

Fixes: 20347fca71 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:20 +02:00
Alexey Kardashevskiy
fc3db7fbdf swiotlb: reduce the swiotlb buffer size on allocation failure
[ Upstream commit 8d58aa484920c4f9be4834a7aeb446cdced21a37 ]

At the moment the AMD encrypted platform reserves 6% of RAM for SWIOTLB
or 1GB, whichever is less. However it is possible that there is no block
big enough in the low memory which make SWIOTLB allocation fail and
the kernel continues without DMA. In such case a VM hangs on DMA.

This moves alloc+remap to a helper and calls it from a loop where
the size is halved on each iteration.

This updates default_nslabs on successful allocation which looks like
an oversight as not doing so should have broken callers of
swiotlb_size_or_default().

Signed-off-by: Alexey Kardashevskiy <aik@amd.com>
Reviewed-by: Pankaj Gupta <pankaj.gupta@amd.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Stable-dep-of: 8ac04063354a ("swiotlb: reduce the number of areas to match actual memory pool size")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:19 +02:00
Petr Tesarik
24b24863a0 swiotlb: always set the number of areas before allocating the pool
[ Upstream commit aabd12609f91155f26584508b01f548215cc3c0c ]

The number of areas defaults to the number of possible CPUs. However, the
total number of slots may have to be increased after adjusting the number
of areas. Consequently, the number of areas must be determined before
allocating the memory pool. This is even explained with a comment in
swiotlb_init_remap(), but swiotlb_init_late() adjusts the number of areas
after slots are already allocated. The areas may end up being smaller than
IO_TLB_SEGSIZE, which breaks per-area locking.

While fixing swiotlb_init_late(), move all relevant comments before the
definition of swiotlb_adjust_nareas() and convert them to kernel-doc.

Fixes: 20347fca71 ("swiotlb: split up the global swiotlb lock")
Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-23 13:49:19 +02:00
Linus Torvalds
2d57a1590f workqueue: clean up WORK_* constant types, clarify masking
commit afa4bb778e48d79e4a642ed41e3b4e0de7489a6c upstream.

Dave Airlie reports that gcc-13.1.1 has started complaining about some
of the workqueue code in 32-bit arm builds:

  kernel/workqueue.c: In function ‘get_work_pwq’:
  kernel/workqueue.c:713:24: error: cast to pointer from integer of different size [-Werror=int-to-pointer-cast]
    713 |                 return (void *)(data & WORK_STRUCT_WQ_DATA_MASK);
        |                        ^
  [ ... a couple of other cases ... ]

and while it's not immediately clear exactly why gcc started complaining
about it now, I suspect it's some C23-induced enum type handlign fixup in
gcc-13 is the cause.

Whatever the reason for starting to complain, the code and data types
are indeed disgusting enough that the complaint is warranted.

The wq code ends up creating various "helper constants" (like that
WORK_STRUCT_WQ_DATA_MASK) using an enum type, which is all kinds of
confused.  The mask needs to be 'unsigned long', not some unspecified
enum type.

To make matters worse, the actual "mask and cast to a pointer" is
repeated a couple of times, and the cast isn't even always done to the
right pointer, but - as the error case above - to a 'void *' with then
the compiler finishing the job.

That's now how we roll in the kernel.

So create the masks using the proper types rather than some ambiguous
enumeration, and use a nice helper that actually does the type
conversion in one well-defined place.

Incidentally, this magically makes clang generate better code.  That,
admittedly, is really just a sign of clang having been seriously
confused before, and cleaning up the typing unconfuses the compiler too.

Reported-by: Dave Airlie <airlied@gmail.com>
Link: https://lore.kernel.org/lkml/CAPM=9twNnV4zMCvrPkw3H-ajZOH-01JVh_kDrxdPYQErz8ZTdA@mail.gmail.com/
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Tejun Heo <tj@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-23 13:49:19 +02:00
lijun14
7dd60ce804 ANDROID: vendor_hooks: add vendor hook to support SAGT
Add vendor hook of android_rvh_before_do_sched_yield

Bug: 291726037

Change-Id: I1f2d65739a297812f279b83085e3680e40d4cb6e
Signed-off-by: lijun14 <lijun14@xiaomi.corp-partner.google.com>
2023-07-20 20:37:31 +00:00
qctecmdr
c1e64563dc Merge "sched/walt: Fix problem in do_freq_qos_request" 2023-07-19 13:43:30 -07:00
Stephen Dickey
ead247e5cc sched/walt: Combine common pipeline code for swapping with prime
In preparation for creating additional pipeline promotion cabilities,
simplify the existing code by combining common portions between
heavy and manual pipeline selection.

Change-Id: I356d596398ad6cb695248820d6440676a3f19477
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-07-19 13:02:19 -07:00
Greg Kroah-Hartman
524f946fbc Merge branch 'android14-6.1' into 'android14-6.1-lts'
Catches the android14-6.1-lts branch up with the android14-6.1 branch
which has had a lot of changes that are needed here to resolve future
LTS merges and to ensure that the ABI is kept stable.

It contains the following commits:

abb897fe2f8e Merge branch 'android14-6.1' into 'android14-6.1-lts'
a5e46b0f3c UPSTREAM: io_uring/poll: serialize poll linked timer start with poll removal
6c695fad68 ANDROID: fuse-bpf: Add partial flock support
9b655e9328 ANDROID: Incremental fs: Allocate data buffer based on input request size
facf08fa5f UPSTREAM: gfs2: Don't deref jdesc in evict
a16d62a296 ANDROID: KVM: arm64: Fix MMU context save/restore over TLB invalidation
7f0f58f97b ANDROID: Update symbol list for VIVO
1b7f110278 ANDROID: add initial symbol list file for ExynosAuto SoCs
f6707f352b ANDROID: sched: Export sched_domains_mutex for lockdep
a24911abfd ANDROID: Update symbol for Exynos SoC
5e7421101f ANDROID: ABI: Update symbol for Exynos SoC
270ca05882 ANDROID: Update symbol list for mtk
47e02fe1ef UPSTREAM: dma-remap: use kvmalloc_array/kvfree for larger dma memory remap
22e008d6d5 ANDROID: vendor_hooks: Supplement the missing hook call point.
214e6f268b ANDROID: GKI: Add WWAN as GKI protected module
8726a2d930 ANDROID: GKI: regmap: Add regmap vendor hook for of_syscon_register
7c2b6c7b56 UPSTREAM: kasan: suppress recursive reports for HW_TAGS
c0226bf0c7 UPSTREAM: kasan, arm64: add arch_suppress_tag_checks_start/stop
da926e6077 UPSTREAM: arm64: mte: rename TCO routines
553be6e70d BACKPORT: kasan, arm64: rename tagging-related routines
b39a3be50a UPSTREAM: kasan: drop empty tagging-related defines
44ee9eef21 ANDROID: usb: xhci-plat: Fix double-free in xhci_plat_remove
55679fd0a8 ANDROID: ABI: update symbol list for galaxy
30807bebbf ANDROID: GKI: update the ABI symbol list
f3c6324daa ANDROID: ABI: Update symbol for Exynos SoC
c75c8311c8 ANDROID: GKI: ABI: update whitelist for the kmsg_dump and native_hang symbols used by unisoc for kernel6.1
0a2e9dd65c ANDROID: ABI: Update symbols to unisoc whitelist for ims_bridge module
fc9c1ccbbf ANDROID: abi_gki_aarch64_qcom: Add drm_plane_from_index and drm_gem_prime_export
c480e4e576 ANDROID: abi_gki_aarch64_qcom: Update symbol list
8ecaef4d4b UPSTREAM: fsverity: reject FS_IOC_ENABLE_VERITY on mode 3 fds
d5feaf8163 UPSTREAM: fsverity: explicitly check for buffer overflow in build_merkle_tree()
711f5d5bfe ANDROID: update unisoc symbol list
dde9b1794c ANDROID: update symbol for unisoc whitelist
dfd6ca2517 UPSTREAM: f2fs: fix deadlock in i_xattr_sem and inode page lock
a3d8701485 ANDROID: GKI: update xiaomi symbol list
dfc69fd81c Revert "FROMLIST: f2fs: remove i_xattr_sem to avoid deadlock and fix the original issue"
2e2b1f4982 ANDROID: ABI: Update pixel symbol list
b57cdabd55 ANDROID: Set arch attribute for allmodconfig builds
f63b2625af UPSTREAM: usb: gadget: udc: renesas_usb3: Fix use after free bug in renesas_usb3_remove due to race condition
dc8c661b99 ANDROID: ABI: Add to QCOM symbols list
dd451f19f0 UPSTREAM: arm64: mm: pass original fault address to handle_mm_fault() in PER_VMA_LOCK block
39385f7568 UPSTREAM: media: rkvdec: fix use after free bug in rkvdec_remove
35a9539d66 ANDROID: GKI: Update symbol list for MediatTek
fcbb015efd UPSTREAM: scsi: ufs: core: Remove dedicated hwq for dev command
2eb4158749 BACKPORT: scsi: ufs: mcq: Fix the incorrect OCS value for the device command
dc64f5f480 FROMLIST: scsi: ufs: ufs-mediatek: Add MCQ support for MTK platform
8740a92b2e FROMLIST: scsi: ufs: core: Export symbols for MTK driver module
c9814a3af5 UPSTREAM: blk-mq: check on cpu id when there is only one ctx mapping
c413cf731a UPSTREAM: relayfs: fix out-of-bounds access in relay_file_read
e84e043a3c UPSTREAM: net/sched: flower: fix possible OOB write in fl_set_geneve_opt()
d2dfb4ee11 UPSTREAM: x86/mm: Avoid using set_pgd() outside of real PGD pages
3c60e58d7a UPSTREAM: iommu/amd: Add missing domain type checks
820f96cba5 UPSTREAM: tty: serial: qcom_geni: avoid duplicate struct member init
cbea99e1de UPSTREAM: scsi: ufs: core: bsg: Fix cast to restricted __be16 warning
c779836709 UPSTREAM: netfilter: nf_tables: incorrect error path handling with NFT_MSG_NEWRULE
ed2a228522 ANDROID: fix build error when use cpu_cgroup_online vh
8cd2dc493a ANDROID: ABI: add android_debug_symbol to whitelist
1047d4a5df ANDROID: defconfig: Enable debug_symbol driver
dfabd2e38b ANDROID: android: Create debug_symbols driver
f54778f021 ANDROID: ABI: update symbol list for exynos
58004e1d0e ANDROID: KVM: arm64: Remove 'struct kvm_vcpu' from the KMI
8a717a85c5 UPSTREAM: KVM: arm64: Restore GICv2-on-GICv3 functionality
b9d7d47d4a UPSTREAM: KVM: arm64: vgic: Wrap vgic_its_create() with config_lock
486a8ab3ad UPSTREAM: KVM: arm64: vgic: Fix a circular locking issue
b5e26cd12f UPSTREAM: KVM: arm64: vgic: Don't acquire its_lock before config_lock
b1bb8a0bc4 BACKPORT: KVM: arm64: Avoid lock inversion when setting the VM register width
b39849bde6 UPSTREAM: KVM: arm64: Avoid vcpu->mutex v. kvm->lock inversion in CPU_ON
04b12278ee BACKPORT: KVM: arm64: Use config_lock to protect data ordered against KVM_RUN
de6bb81c8b UPSTREAM: KVM: arm64: Use config_lock to protect vgic state
cf0e6c7e09 BACKPORT: KVM: arm64: Add helper vgic_write_guest_lock()
4bbcece823 ANDROID: sound: usb: Fix wrong behavior of vendor hooking
55f146682b ANDROID: GKI: USB: XHCI: add Android ABI padding to struct xhci_vendor_ops
e27c6490ba Revert "ANDROID: android: Create debug_symbols driver"
bb732365f7 ANDROID: android: Create debug_symbols driver
80ac923694 UPSTREAM: ipvlan:Fix out-of-bounds caused by unclear skb->cb
9a9c876461 ANDROID: update symbol list for unisoc vendor hook
e3a72785da ANDROID: thermal: Add hook to enable/disable thermal power throttle
05ba0cb850 ANDROID: ABI: Update symbol for Exynos SoC
251aa28d16 BACKPORT: FROMGIT: usb: gadget: udc: Handle gadget_connect failure during bind operation
5af5006061 FROMGIT: usb: dwc3: gadget: Bail out in pullup if soft reset timeout happens
79b7e0db16 ANDROID: GKI: Update symbol list for xiaomi
ff8496749d ANDROID: vendor_hooks: vendor hook for MM
43d7226c5f ANDROID: add a symbol to unisoc symbol list
51cb1e1cfd ANDROID: GKI: update symbol list file for xiaomi
1499ddcb78 UPSTREAM: net/sched: cls_u32: Fix reference counter leak leading to overflow
054ab3ab00 ANDROID: db845c: Fix build when using --kgdb
a39af6210e FROMGIT: usb: host: xhci-plat: Set XHCI_STATE_REMOVING before resuming XHCI HC
50c99c83e2 FROMGIT: usb: host: xhci: Do not re-initialize the XHCI HC if being removed
fa9645687e FROMLIST: kheaders: dereferences the source tree
21061b7d0f FROMLIST: f2fs: remove i_xattr_sem to avoid deadlock and fix the original issue
ec0fc55aa4 ANDROID: db845c: Local define for db845c targets
947e7c1d72 ANDROID: GKI: Update symbols to symbol list
9afd7b261a ANDROID: Export memcg functions to allow module to add new files
32c2d42ee1 ANDROID: rockpi4: Fix build when using --kgdb
275048c878 ANDROID: GKI: update symbol list file for xiaomi
64e4b4d31b ANDROID: kleaf: android/gki_system_dlkm_modules is generated.
734b06dabf ANDROID: ABI: Update pixel symbol list
9ea87136d1 ANDROID: fuse-bpf: Move FUSE_RELEASE to correct place
b8ef5bfbee ANDROID: fuse-bpf: Ensure bpf field can never be nulled
a97d54b54d ANDROID: GKI: Increase CMA areas to 32
d28f02c47b ANDROID: Delete MODULES_LIST from build configs.
97a56a07e9 ANDROID: ABI: Update symbols to unisoc whitelist
7668cef283 ANDROID: HID: Only utilise UHID provided exports if UHID is enabled
1c4d2aa0c7 UPSTREAM: memstick: r592: Fix UAF bug in r592_remove due to race condition
8aea35f109 UPSTREAM: xfs: verify buffer contents when we skip log replay
04b6079eae UPSTREAM: bluetooth: Perform careful capability checks in hci_sock_ioctl()
8f5a220975 FROMLIST: maple_tree: Adjust node allocation on mas_rebalance()
e835ffdfbc FROMLIST: maple_tree: Reduce resets during store setup
708234485a FROMLIST: BACKPORT: maple_tree: Refine mas_preallocate() node calculations
d766c8399b Revert "FROMLIST: BACKPORT: maple_tree: Refine mas_preallocate() node calculations"

Change-Id: I0c77dd36d8336542cbb66edceec28f36ce3d798f
Signed-off-by: Greg Kroah-Hartman <gregkh@google.com>
2023-07-19 18:11:20 +00:00
Siddh Raman Pant
219a9ec09d watch_queue: prevent dangling pipe pointer
commit 943211c87427f25bd22e0e63849fb486bb5f87fa upstream.

NULL the dangling pipe reference while clearing watch_queue.

If not done, a reference to a freed pipe remains in the watch_queue,
as this function is called before freeing a pipe in free_pipe_info()
(see line 834 of fs/pipe.c).

The sole use of wqueue->defunct is for checking if the watch queue has
been cleared, but wqueue->pipe is also NULLed while clearing.

Thus, wqueue->defunct is superfluous, as wqueue->pipe can be checked
for NULL. Hence, the former can be removed.

Tested with keyutils testsuite.

Cc: stable@vger.kernel.org # 6.1
Signed-off-by: Siddh Raman Pant <code@siddh.me>
Acked-by: David Howells <dhowells@redhat.com>
Message-Id: <20230605143616.640517-1-code@siddh.me>
Signed-off-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-07-19 16:22:10 +02:00
SeongJae Park
6baa6e4836 bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()
[ Upstream commit 3de4d22cc9ac7c9f38e10edcf54f9a8891a9c2aa ]

__register_btf_kfunc_id_set() assumes .BTF to be part of the module's .ko
file if CONFIG_DEBUG_INFO_BTF is enabled. If that's not the case, the
function prints an error message and return an error. As a result, such
modules cannot be loaded.

However, the section could be stripped out during a build process. It would
be better to let the modules loaded, because their basic functionalities
have no problem [0], though the BTF functionalities will not be supported.
Make the function to lower the level of the message from error to warn, and
return no error.

  [0] https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion

Fixes: c446fdacb1 ("bpf: fix register_btf_kfunc_id_set for !CONFIG_DEBUG_INFO_BTF")
Reported-by: Alexander Egorenkov <Alexander.Egorenkov@ibm.com>
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: SeongJae Park <sj@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/87y228q66f.fsf@oc8242746057.ibm.com
Link: https://lore.kernel.org/bpf/20220219082037.ow2kbq5brktf4f2u@apollo.legion
Link: https://lore.kernel.org/bpf/20230701171447.56464-1-sj@kernel.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:22:05 +02:00
Christophe Leroy
081f642b31 kcsan: Don't expect 64 bits atomic builtins from 32 bits architectures
[ Upstream commit 353e7300a1db928e427462f2745f9a2cd1625b3d ]

Activating KCSAN on a 32 bits architecture leads to the following
link-time failure:

    LD      .tmp_vmlinux.kallsyms1
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_load':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_load_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_store':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_store_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_exchange':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_exchange_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_add':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_add_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_sub':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_sub_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_and':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_and_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_or':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_or_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_xor':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_xor_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_fetch_nand':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_fetch_nand_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_strong':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_weak':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'
  powerpc64-linux-ld: kernel/kcsan/core.o: in function `__tsan_atomic64_compare_exchange_val':
  kernel/kcsan/core.c:1273: undefined reference to `__atomic_compare_exchange_8'

32 bits architectures don't have 64 bits atomic builtins. Only
include DEFINE_TSAN_ATOMIC_OPS(64) on 64 bits architectures.

Fixes: 0f8ad5f2e9 ("kcsan: Add support for atomic builtins")
Suggested-by: Marco Elver <elver@google.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Marco Elver <elver@google.com>
Acked-by: Marco Elver <elver@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://msgid.link/d9c6afc28d0855240171a4e0ad9ffcdb9d07fceb.1683892665.git.christophe.leroy@csgroup.eu
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:37 +02:00
Zhen Lei
fd4f89302f kexec: fix a memory leak in crash_shrink_memory()
[ Upstream commit 1cba6c4309f03de570202c46f03df3f73a0d4c82 ]

Patch series "kexec: enable kexec_crash_size to support two crash kernel
regions".

When crashkernel=X fails to reserve region under 4G, it will fall back to
reserve region above 4G and a region of the default size will also be
reserved under 4G.  Unfortunately, /sys/kernel/kexec_crash_size only
supports one crash kernel region now, the user cannot sense the low memory
reserved by reading /sys/kernel/kexec_crash_size.  Also, low memory cannot
be freed by writing this file.

For example:
resource_size(crashk_res) = 512M
resource_size(crashk_low_res) = 256M

The result of 'cat /sys/kernel/kexec_crash_size' is 512M, but it should be
768M.  When we execute 'echo 0 > /sys/kernel/kexec_crash_size', the size
of crashk_res becomes 0 and resource_size(crashk_low_res) is still 256 MB,
which is incorrect.

Since crashk_res manages the memory with high address and crashk_low_res
manages the memory with low address, crashk_low_res is shrunken only when
all crashk_res is shrunken.  And because when there is only one crash
kernel region, crashk_res is always used.  Therefore, if all crashk_res is
shrunken and crashk_low_res still exists, swap them.

This patch (of 6):

If the value of parameter 'new_size' is in the semi-open and semi-closed
interval (crashk_res.end - KEXEC_CRASH_MEM_ALIGN + 1, crashk_res.end], the
calculation result of ram_res is:

	ram_res->start = crashk_res.end + 1
	ram_res->end   = crashk_res.end

The operation of insert_resource() fails, and ram_res is not added to
iomem_resource.  As a result, the memory of the control block ram_res is
leaked.

In fact, on all architectures, the start address and size of crashk_res
are already aligned by KEXEC_CRASH_MEM_ALIGN.  Therefore, we do not need
to round up crashk_res.start again.  Instead, we should round up
'new_size' in advance.

Link: https://lkml.kernel.org/r/20230527123439.772-1-thunder.leizhen@huawei.com
Link: https://lkml.kernel.org/r/20230527123439.772-2-thunder.leizhen@huawei.com
Fixes: 6480e5a092 ("kdump: add missing RAM resource in crash_shrink_memory()")
Fixes: 06a7f71124 ("kexec: premit reduction of the reserved memory size")
Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com>
Acked-by: Baoquan He <bhe@redhat.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:08 +02:00
Douglas Anderson
6525435d14 watchdog/perf: more properly prevent false positives with turbo modes
[ Upstream commit 4379e59fe5665cfda737e45b8bf2f05321ef049c ]

Currently, in the watchdog_overflow_callback() we first check to see if
the watchdog had been touched and _then_ we handle the workaround for
turbo mode.  This order should be reversed.

Specifically, "touching" the hardlockup detector's watchdog should avoid
lockups being detected for one period that should be roughly the same
regardless of whether we're running turbo or not.  That means that we
should do the extra accounting for turbo _before_ we look at (and clear)
the global indicating that we've been touched.

NOTE: this fix is made based on code inspection.  I am not aware of any
reports where the old code would have generated false positives.  That
being said, this order seems more correct and also makes it easier down
the line to share code with the "buddy" hardlockup detector.

Link: https://lkml.kernel.org/r/20230519101840.v5.2.I843b0d1de3e096ba111a179f3adb16d576bef5c7@changeid
Fixes: 7edaeb6841 ("kernel/watchdog: Prevent false positives with turbo modes")
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
Cc: Colin Cross <ccross@android.com>
Cc: Daniel Thompson <daniel.thompson@linaro.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Guenter Roeck <groeck@chromium.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masayoshi Mizuma <msys.mizuma@gmail.com>
Cc: Matthias Kaehlcke <mka@chromium.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: Petr Mladek <pmladek@suse.com>
Cc: Pingfan Liu <kernelfans@gmail.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: "Ravi V. Shankar" <ravi.v.shankar@intel.com>
Cc: Ricardo Neri <ricardo.neri@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Boyd <swboyd@chromium.org>
Cc: Sumit Garg <sumit.garg@linaro.org>
Cc: Tzung-Bi Shih <tzungbi@chromium.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:08 +02:00
Yafang Shao
20109ddd5b bpf: Fix memleak due to fentry attach failure
[ Upstream commit 108598c39eefbedc9882273ac0df96127a629220 ]

If it fails to attach fentry, the allocated bpf trampoline image will be
left in the system. That can be verified by checking /proc/kallsyms.

This meamleak can be verified by a simple bpf program as follows:

  SEC("fentry/trap_init")
  int fentry_run()
  {
      return 0;
  }

It will fail to attach trap_init because this function is freed after
kernel init, and then we can find the trampoline image is left in the
system by checking /proc/kallsyms.

  $ tail /proc/kallsyms
  ffffffffc0613000 t bpf_trampoline_6442453466_1  [bpf]
  ffffffffc06c3000 t bpf_trampoline_6442453466_1  [bpf]

  $ bpftool btf dump file /sys/kernel/btf/vmlinux | grep "FUNC 'trap_init'"
  [2522] FUNC 'trap_init' type_id=119 linkage=static

  $ echo $((6442453466 & 0x7fffffff))
  2522

Note that there are two left bpf trampoline images, that is because the
libbpf will fallback to raw tracepoint if -EINVAL is returned.

Fixes: e21aa34178 ("bpf: Fix fexit trampoline.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Link: https://lore.kernel.org/bpf/20230515130849.57502-2-laoar.shao@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:05 +02:00
Yafang Shao
8ea165e1f8 bpf: Remove bpf trampoline selector
[ Upstream commit 47e79cbeea4b3891ad476047f4c68543eb51c8e0 ]

After commit e21aa34178 ("bpf: Fix fexit trampoline."), the selector is only
used to indicate how many times the bpf trampoline image are updated and been
displayed in the trampoline ksym name. After the trampoline is freed, the
selector will start from 0 again. So the selector is a useless value to the
user. We can remove it.

If the user want to check whether the bpf trampoline image has been updated
or not, the user can compare the address. Each time the trampoline image is
updated, the address will change consequently. Jiri also pointed out another
issue that perf is still using the old name "bpf_trampoline_%lu", so this
change can fix the issue in perf.

Fixes: e21aa34178 ("bpf: Fix fexit trampoline.")
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Cc: Jiri Olsa <olsajiri@gmail.com>
Link: https://lore.kernel.org/bpf/ZFvOOlrmHiY9AgXE@krava
Link: https://lore.kernel.org/bpf/20230515130849.57502-3-laoar.shao@gmail.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:05 +02:00
Stanislav Fomichev
c6a9fc82fe bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
[ Upstream commit 29ebbba7d46136cba324264e513a1e964ca16c0a ]

With the way the hooks implemented right now, we have a special
condition: optval larger than PAGE_SIZE will expose only first 4k into
BPF; any modifications to the optval are ignored. If the BPF program
doesn't handle this condition by resetting optlen to 0,
the userspace will get EFAULT.

The intention of the EFAULT was to make it apparent to the
developers that the program is doing something wrong.
However, this inadvertently might affect production workloads
with the BPF programs that are not too careful (i.e., returning EFAULT
for perfectly valid setsockopt/getsockopt calls).

Let's try to minimize the chance of BPF program screwing up userspace
by ignoring the output of those BPF programs (instead of returning
EFAULT to the userspace). pr_info_once those cases to
the dmesg to help with figuring out what's going wrong.

Fixes: 0d01da6afc ("bpf: implement getsockopt and setsockopt hooks")
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230511170456.1759459-2-sdf@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:05 +02:00
Qiuxu Zhuo
b8a6ba524d rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale
[ Upstream commit 23fc8df26dead16687ae6eb47b0561a4a832e2f6 ]

Running the 'kfree_rcu_test' test case [1] results in a splat [2].
The root cause is the kfree_scale_thread thread(s) continue running
after unloading the rcuscale module.  This commit fixes that isue by
invoking kfree_scale_cleanup() from rcu_scale_cleanup() when removing
the rcuscale module.

[1] modprobe rcuscale kfree_rcu_test=1
    // After some time
    rmmod rcuscale
    rmmod torture

[2] BUG: unable to handle page fault for address: ffffffffc0601a87
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    PGD 11de4f067 P4D 11de4f067 PUD 11de51067 PMD 112f4d067 PTE 0
    Oops: 0010 [#1] PREEMPT SMP NOPTI
    CPU: 1 PID: 1798 Comm: kfree_scale_thr Not tainted 6.3.0-rc1-rcu+ #1
    Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015
    RIP: 0010:0xffffffffc0601a87
    Code: Unable to access opcode bytes at 0xffffffffc0601a5d.
    RSP: 0018:ffffb25bc2e57e18 EFLAGS: 00010297
    RAX: 0000000000000000 RBX: ffffffffc061f0b6 RCX: 0000000000000000
    RDX: 0000000000000000 RSI: ffffffff962fd0de RDI: ffffffff962fd0de
    RBP: ffffb25bc2e57ea8 R08: 0000000000000000 R09: 0000000000000000
    R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000000
    R13: 0000000000000000 R14: 000000000000000a R15: 00000000001c1dbe
    FS:  0000000000000000(0000) GS:ffff921fa2200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffffffc0601a5d CR3: 000000011de4c006 CR4: 0000000000370ee0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
     <TASK>
     ? kvfree_call_rcu+0xf0/0x3a0
     ? kthread+0xf3/0x120
     ? kthread_complete_and_exit+0x20/0x20
     ? ret_from_fork+0x1f/0x30
     </TASK>
    Modules linked in: rfkill sunrpc ... [last unloaded: torture]
    CR2: ffffffffc0601a87
    ---[ end trace 0000000000000000 ]---

Fixes: e6e78b004f ("rcuperf: Add kfree_rcu() performance Tests")
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:02 +02:00
Qiuxu Zhuo
3506e64ec1 rcu/rcuscale: Move rcu_scale_*() after kfree_scale_cleanup()
[ Upstream commit bf5ddd736509a7d9077c0b6793e6f0852214dbea ]

This code-movement-only commit moves the rcu_scale_cleanup() and
rcu_scale_shutdown() functions to follow kfree_scale_cleanup().
This is code movement is in preparation for a bug-fix patch that invokes
kfree_scale_cleanup() from rcu_scale_cleanup().

Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Stable-dep-of: 23fc8df26dea ("rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:02 +02:00
Paul E. McKenney
7a34922194 rcuscale: Move shutdown from wait_event() to wait_event_idle()
[ Upstream commit ef1ef3d47677dc191b88650a9f7f91413452cc1b ]

The rcu_scale_shutdown() and kfree_scale_shutdown() kthreads/functions
use wait_event() to wait for the rcuscale test to complete.  However,
each updater thread in such a test waits for at least 100 grace periods.
If each grace period takes more than 1.2 seconds, which is long, but
not insanely so, this can trigger the hung-task timeout.

This commit therefore replaces those wait_event() calls with calls to
wait_event_idle(), which do not trigger the hung-task timeout.

Reported-by: kernel test robot <yujie.liu@intel.com>
Reported-by: Liam Howlett <liam.howlett@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Yujie Liu <yujie.liu@intel.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
Stable-dep-of: 23fc8df26dea ("rcu/rcuscale: Stop kfree_scale_thread thread(s) after unloading rcuscale")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:02 +02:00
Paul E. McKenney
b1cdc56bc1 rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs
[ Upstream commit 401b0de3ae4fa49d1014c8941e26d9a25f37e7cf ]

The rcu_tasks_invoke_cbs() function relies on queue_work_on() to silently
fall back to WORK_CPU_UNBOUND when the specified CPU is offline.  However,
the queue_work_on() function's silent fallback mechanism relies on that
CPU having been online at some time in the past.  When queue_work_on()
is passed a CPU that has never been online, workqueue lockups ensue,
which can be bad for your kernel's general health and well-being.

This commit therefore checks whether a given CPU has ever been online,
and, if not substitutes WORK_CPU_UNBOUND in the subsequent call to
queue_work_on().  Why not simply omit the queue_work_on() call entirely?
Because this function is flooding callback-invocation notifications
to all CPUs, and must deal with possibilities that include a sparse
cpu_possible_mask.

This commit also moves the setting of the rcu_data structure's
->beenonline field to rcu_cpu_starting(), which executes on the
incoming CPU before that CPU has ever enabled interrupts.  This ensures
that the required workqueues are present.  In addition, because the
incoming CPU has not yet enabled its interrupts, there cannot yet have
been any softirq handlers running on this CPU, which means that the
WARN_ON_ONCE(!rdp->beenonline) within the RCU_SOFTIRQ handler cannot
have triggered yet.

Fixes: d363f833c6 ("rcu-tasks: Use workqueues for multiple rcu_tasks_invoke_cbs() invocations")
Reported-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:01 +02:00
Paul E. McKenney
d58f0f0ce6 rcu: Make rcu_cpu_starting() rely on interrupts being disabled
[ Upstream commit 15d44dfa40305da1648de4bf001e91cc63148725 ]

Currently, rcu_cpu_starting() is written so that it might be invoked
with interrupts enabled.  However, it is always called when interrupts
are disabled, either by rcu_init(), notify_cpu_starting(), or from a
call point prior to the call to notify_cpu_starting().

But why bother requiring that interrupts be disabled?  The purpose is
to allow the rcu_data structure's ->beenonline flag to be set after all
early processing has completed for the incoming CPU, thus allowing this
flag to be used to determine when workqueues have been set up for the
incoming CPU, while still allowing this flag to be used as a diagnostic
within rcu_core().

This commit therefore makes rcu_cpu_starting() rely on interrupts being
disabled.

Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Stable-dep-of: 401b0de3ae4f ("rcu-tasks: Stop rcu_tasks_invoke_cbs() from using never-onlined CPUs")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:21:01 +02:00
Wen Yang
77cc52f1b8 tick/rcu: Fix bogus ratelimit condition
[ Upstream commit a7e282c77785c7eabf98836431b1f029481085ad ]

The ratelimit logic in report_idle_softirq() is broken because the
exit condition is always true:

	static int ratelimit;

	if (ratelimit < 10)
		return false;  ---> always returns here

	ratelimit++;           ---> no chance to run

Make it check for >= 10 instead.

Fixes: 0345691b24 ("tick/rcu: Stop allowing RCU_SOFTIRQ in idle")
Signed-off-by: Wen Yang <wenyang.linux@foxmail.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/tencent_5AAA3EEAB42095C9B7740BE62FBF9A67E007@qq.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:59 +02:00
Thomas Gleixner
e7aff15ba2 posix-timers: Prevent RT livelock in itimer_delete()
[ Upstream commit 9d9e522010eb5685d8b53e8a24320653d9d4cbbf ]

itimer_delete() has a retry loop when the timer is concurrently expired. On
non-RT kernels this just spin-waits until the timer callback has completed,
except for posix CPU timers which have HAVE_POSIX_CPU_TIMERS_TASK_WORK
enabled.

In that case and on RT kernels the existing task could live lock when
preempting the task which does the timer delivery.

Replace spin_unlock() with an invocation of timer_wait_running() to handle
it the same way as the other retry loops in the posix timer code.

Fixes: ec8f954a40 ("posix-timers: Use a callback for cancel synchronization on PREEMPT_RT")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/87v8g7c50d.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-07-19 16:20:59 +02:00
qctecmdr
826f36ec77 Merge "sched/walt: Introduce shared rail sibling" 2023-07-18 19:06:28 -07:00
Shaleen Agrawal
dace646e99 sched/walt: Fix problem in do_freq_qos_request
Currently, there is an issue where the request to execute a QOS
request is not being serviced, as requests will only be honored
if a CPU is offline.

Fix this by ensuring QOS requests are made on online CPUs.

Change-Id: I454a5ddf50e373dc6d4fe5ac06674288aa529f1e
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-18 17:38:04 -07:00
qctecmdr
302ef6d2df Merge "sched/walt: create an api for cluster boost for pipeline" 2023-07-18 14:24:21 -07:00
qctecmdr
e766132452 Merge "sched/walt: Optimize single candidate EAS" 2023-07-17 18:21:16 -07:00
Shaleen Agrawal
dddf5ac2dd sched/walt: Introduce shared rail sibling
Treat Gold and Prime CPUs as shared rail siblings and ensure frequency
sync when pipeline feature is enabled. This also ensures sync of
gold/prime under intercluster migrations involving either clusters.

Change-Id: Iaaa8a7c0c23b71f3486539396f21c466a75a051c
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-17 16:00:02 -07:00
Stephen Dickey
aa882d2168 sched/walt: update pipeline code to selectively boost clusters
Use the core_ctl_set_cluster_boost api to choose which clusters
should be boosted. Specifically, the boost should be applied
to only those clusters which have a cpu selected by the
cpus_for_pipeline mask.

Change-Id: Ibaf77e0896bb834c014e51e0e97ba8d04978f027
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
Signed-off-by: Shaleen Agrawal <quic_shalagra@quicinc.com>
2023-07-17 15:59:23 -07:00
Stephen Dickey
608b4af07d sched/walt: create an api for cluster boost for pipeline
Not all clusters should be unhalted when pipeline is enabled,
but all clusters are being unhalted now (through
core_ctl_sched_boost).

To prepare for only unhalting the clusters that are part
of the user selected pipeline cpus, create a per-cluster
boost api which controls the boost characteristics of
only the cluster idx specified.

Change-Id: I92fdc13bcb6ee84782b5ea1b5dcb74caabee8ccf
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
2023-07-17 15:54:58 -07:00
Stephen Dickey
9902aeb581 sched/walt: specify cpus for pipeline threads
Cleanup the assignment of pipeline cpus such that the
value comes from a sysctl node, which specifies exactly
which cpus are to be used for pipeline tasks (not including
prime).

Change-Id: Ib108e1128c0eae4f18049f2328045f8637a79efb
Signed-off-by: Stephen Dickey <quic_dickey@quicinc.com>
Signed-off-by: Abhijeet Dharmapurikar <quic_adharmap@quicinc.com>
2023-07-17 15:54:58 -07:00
qctecmdr
0270812f0e Merge "sched/walt: Fix potential sleep under atomic context" 2023-07-15 02:08:20 -07:00
qctecmdr
e406d916fc Merge "sched: walt: add stalls to sched_switch_with_ctrs" 2023-07-14 23:21:13 -07:00