[ Upstream commit d14fa1fcf69db9d070e75f1c4425211fa619dfc8 ]
childregs represents the registers which are active for the new thread
in user context. For a kernel thread, childregs->gp is never used since
the kernel gp is not touched by switch_to. For a user mode helper, the
gp value can be observed in user space after execve or possibly by other
means.
[From the email thread]
The /* Kernel thread */ comment is somewhat inaccurate in that it is also used
for user_mode_helper threads, which exec a user process, e.g. /sbin/init or
when /proc/sys/kernel/core_pattern is a pipe. Such threads do not have
PF_KTHREAD set and are valid targets for ptrace etc. even before they exec.
childregs is the *user* context during syscall execution and it is observable
from userspace in at least five ways:
1. kernel_execve does not currently clear integer registers, so the starting
register state for PID 1 and other user processes started by the kernel has
sp = user stack, gp = kernel __global_pointer$, all other integer registers
zeroed by the memset in the patch comment.
This is a bug in its own right, but I'm unwilling to bet that it is the only
way to exploit the issue addressed by this patch.
2. ptrace(PTRACE_GETREGSET): you can PTRACE_ATTACH to a user_mode_helper thread
before it execs, but ptrace requires SIGSTOP to be delivered which can only
happen at user/kernel boundaries.
3. /proc/*/task/*/syscall: this is perfectly happy to read pt_regs for
user_mode_helpers before the exec completes, but gp is not one of the
registers it returns.
4. PERF_SAMPLE_REGS_USER: LOCKDOWN_PERF normally prevents access to kernel
addresses via PERF_SAMPLE_REGS_INTR, but due to this bug kernel addresses
are also exposed via PERF_SAMPLE_REGS_USER which is permitted under
LOCKDOWN_PERF. I have not attempted to write exploit code.
5. Much of the tracing infrastructure allows access to user registers. I have
not attempted to determine which forms of tracing allow access to user
registers without already allowing access to kernel registers.
Fixes: 7db91e57a0 ("RISC-V: Task implementation")
Cc: stable@vger.kernel.org
Signed-off-by: Stefan O'Rear <sorear@fastmail.com>
Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20240327061258.2370291-1-sorear@fastmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fea2fed201ee5647699018a56fbb6a5e8cc053a5 ]
This enables the use of per-task stack canary values if GCC has
support for emitting the stack canary reference relative to the
value of tp, which holds the task struct pointer in the riscv
kernel.
After compare arm64 and x86 implementations, seems arm64's is more
flexible and readable. The key point is how gcc get the offset of
stack_canary from gs/el0_sp.
x86: Use a fix offset from gs, not flexible.
struct fixed_percpu_data {
/*
* GCC hardcodes the stack canary as %gs:40. Since the
* irq_stack is the object at %gs:0, we reserve the bottom
* 48 bytes of the irq stack for the canary.
*/
char gs_base[40]; // :(
unsigned long stack_canary;
};
arm64: Use -mstack-protector-guard-offset & guard-reg
gcc options:
-mstack-protector-guard=sysreg
-mstack-protector-guard-reg=sp_el0
-mstack-protector-guard-offset=xxx
riscv: Use -mstack-protector-guard-offset & guard-reg
gcc options:
-mstack-protector-guard=tls
-mstack-protector-guard-reg=tp
-mstack-protector-guard-offset=xxx
GCC's implementation has been merged:
commit c931e8d5a96463427040b0d11f9c4352ac22b2b0
Author: Cooper Qu <cooper.qu@linux.alibaba.com>
Date: Mon Jul 13 16:15:08 2020 +0800
RISC-V: Add support for TLS stack protector canary access
In the end, these codes are inserted by gcc before return:
* 0xffffffe00020b396 <+120>: ld a5,1008(tp) # 0x3f0
* 0xffffffe00020b39a <+124>: xor a5,a5,a4
* 0xffffffe00020b39c <+126>: mv a0,s5
* 0xffffffe00020b39e <+128>: bnez a5,0xffffffe00020b61c <_do_fork+766>
0xffffffe00020b3a2 <+132>: ld ra,136(sp)
0xffffffe00020b3a4 <+134>: ld s0,128(sp)
0xffffffe00020b3a6 <+136>: ld s1,120(sp)
0xffffffe00020b3a8 <+138>: ld s2,112(sp)
0xffffffe00020b3aa <+140>: ld s3,104(sp)
0xffffffe00020b3ac <+142>: ld s4,96(sp)
0xffffffe00020b3ae <+144>: ld s5,88(sp)
0xffffffe00020b3b0 <+146>: ld s6,80(sp)
0xffffffe00020b3b2 <+148>: ld s7,72(sp)
0xffffffe00020b3b4 <+150>: addi sp,sp,144
0xffffffe00020b3b6 <+152>: ret
...
* 0xffffffe00020b61c <+766>: auipc ra,0x7f8
* 0xffffffe00020b620 <+770>: jalr -1764(ra) # 0xffffffe000a02f38 <__stack_chk_fail>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Cooper Qu <cooper.qu@linux.alibaba.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Stable-dep-of: d14fa1fcf69d ("riscv: process: Fix kernel gp leakage")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 22e0eb04837a63af111fae35a92f7577676b9bc8 ]
This is a backport of a fix that was done in OpenSBI: ec0559eb315b
("lib: sbi_misaligned_ldst: Fix handling of C.SWSP and C.SDSP").
Unlike C.LWSP/C.LDSP, these encodings can be used with the zero
register, so checking that the rs2 field is non-zero is unnecessary.
Additionally, the previous check was incorrect since it was checking
the immediate field of the instruction instead of the rs2 field.
Fixes: 956d705dd2 ("riscv: Unaligned load/store handling for M_MODE")
Signed-off-by: Clément Léger <cleger@rivosinc.com>
Link: https://lore.kernel.org/r/20231103090223.702340-1-cleger@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 8d736482749f6d350892ef83a7a11d43cd49981e upstream.
In a NOMMU kernel, sigreturn trampolines are generated on the user
stack by setup_rt_frame. Currently, these trampolines are not instruction
fenced, thus their visibility to ifetch is not guaranteed.
This patch adds a flush_icache_range in setup_rt_frame to fix this
problem.
Signed-off-by: Mathis Salmen <mathis.salmen@matsal.de>
Fixes: 6bd33e1ece ("riscv: add nommu support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 2a8db5ec4a28a0fce822d10224db9471a44b6925 ]
We're currently using stop_machine() to update ftrace & kprobes, which
means that the thread that takes text_mutex during may not be the same
as the thread that eventually patches the code. This isn't actually a
race because the lock is still held (preventing any other concurrent
accesses) and there is only one thread running during stop_machine(),
but it does trigger a lockdep failure.
This patch just elides the lockdep check during stop_machine.
Fixes: c15ac4fd60 ("riscv/ftrace: Add dynamic function tracer support")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230303143754.4005217-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 130aee3fd9981297ff9354e5d5609cd59aafbbea ]
While working on something else, I noticed that the kernel would start
accepting interrupts again after crashing in an interrupt handler. Since
the kernel is already in inconsistent state, enabling interrupts is
dangerous and opens up risk of kernel state deteriorating further.
Interrupts do get enabled via what looks like an unintended side effect of
spin_unlock_irq, so switch to the more cautious
spin_lock_irqsave/spin_unlock_irqrestore instead.
Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230215144828.3370316-1-mnissler@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f2913d006fcdb61719635e093d1b5dd0dafecac7 ]
I don't think we can actually die() without a regs pointer, but the
compiler was warning about a NULL check after a dereference. It seems
prudent to just avoid the possibly-NULL dereference, given that when
die()ing the system is already toast so who knows how we got there.
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20220920200037.6727-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Stable-dep-of: 130aee3fd998 ("riscv: Avoid enabling interrupts in die()")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8b3b8fbb4896984b5564789a42240e4b3caddb61 ]
Similarly to commit 022eb8ae8b ("ARM: 8938/1: kernel: initialize
broadcast hrtimer based clock event device"), RISC-V needs to initiate
hrtimer based broadcast clock event device before C3STOP can be used.
Otherwise, the introduction of C3STOP for the RISC-V arch timer in
commit 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped
during CPU suspend") leaves us without any broadcast timer registered.
This prevents the kernel from entering oneshot mode, which breaks timer
behaviour, for example clock_nanosleep().
A test app that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250
& C3STOP enabled, the sleep times are rounded up to the next jiffy:
== CPU: 1 == == CPU: 2 == == CPU: 3 == == CPU: 4 ==
Mean: 7.974992 Mean: 7.976534 Mean: 7.962591 Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000 Hi: 10.495000 Hi: 8.864000 Hi: 4.736000
Lo: 6.087000 Lo: 6.380000 Lo: 4.872000 Lo: 3.403000
Samples: 521 Samples: 521 Samples: 521 Samples: 521
Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Fixes: 232ccac1bd9b ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Suggested-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230103141102.772228-2-apatel@ventanamicro.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 0e25498f8cd43c1b5aa327f373dd094e9a006da7 upstream.
There are two big uses of do_exit. The first is it's design use to be
the guts of the exit(2) system call. The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.
Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure. In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.
Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.
As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 5c3022e4a616d800cf5f4c3a981d7992179e44a1 ]
The 'retp' is a pointer to the return address on the stack, so we
must pass the current return address pointer as the 'retp'
argument to ftrace_push_return_trace(). Not parent function's
return address on the stack.
Fixes: b785ec129b ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221109064937.3643993-2-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit f766f77a74f5784d8d4d3c36b1900731f97d08d0 ]
When a function doesn't have a callee, then it will not
push ra into the stack, such as lkdtm_BUG() function,
addi sp,sp,-16
sd s0,8(sp)
addi s0,sp,16
ebreak
The struct stackframe use {fp,ra} to get information from
stack, if walk_stackframe() with pr_regs, we will obtain
wrong value and bad stacktrace,
[<ffffffe00066c56c>] lkdtm_BUG+0x6/0x8
---[ end trace 18da3fbdf08e25d5 ]---
Correct the next fp and pc, after that, full stacktrace
shown as expects,
[<ffffffe00066c56c>] lkdtm_BUG+0x6/0x8
[<ffffffe0008b24a4>] lkdtm_do_action+0x14/0x1c
[<ffffffe00066c372>] direct_entry+0xc0/0x10a
[<ffffffe000439f86>] full_proxy_write+0x42/0x6a
[<ffffffe000309626>] vfs_write+0x7e/0x214
[<ffffffe00030992a>] ksys_write+0x98/0xc0
[<ffffffe000309960>] sys_write+0xe/0x16
[<ffffffe0002014bc>] ret_from_syscall+0x0/0x2
---[ end trace 61917f3d9a9fadcd ]---
Signed-off-by: Chen Huang <chenhuang5@huawei.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Stable-dep-of: 5c3022e4a616 ("riscv: stacktrace: Fixup ftrace_graph_ret_addr retp argument")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4727dc20e0422211a0e0c72b1ace4ed6096df8a6 ]
PF_IO_WORKER are kernel threads too, but they aren't PF_KTHREAD in the
sense that we don't assign ->set_child_tid with our own structure. Just
ensure that every arch sets up the PF_IO_WORKER threads like kthreads
in the arch implementation of copy_thread().
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 74f6bb55c834da6d4bac24f44868202743189b2b upstream.
lkp reported a build error, I tried the config and can reproduce
build error as below:
VDSOLD arch/riscv/kernel/vdso/vdso.so.dbg
ld.lld: error: section .note file range overlaps with .text
>>> .note range is [0x7C8, 0x803]
>>> .text range is [0x800, 0x1993]
ld.lld: error: section .text file range overlaps with .dynamic
>>> .text range is [0x800, 0x1993]
>>> .dynamic range is [0x808, 0x937]
ld.lld: error: section .note virtual address range overlaps with .text
>>> .note range is [0x7C8, 0x803]
>>> .text range is [0x800, 0x1993]
Fix it by setting DISABLE_BRANCH_PROFILING which will disable branch
tracing for vdso, thus avoid useless _ftrace_annotated_branch section
and _ftrace_branch section. Although we can also fix it by removing
the hardcoded .text begin address, but I think that's another story
and should be put into another patch.
Link: https://lore.kernel.org/lkml/202210122123.Cc4FPShJ-lkp@intel.com/#r
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221102170254.1925-1-jszhang@kernel.org
Fixes: ad5d1122b8 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit fcae44fd36d052e956e69a64642fc03820968d78 ]
Recently, ld.lld moved from '--undefined-version' to
'--no-undefined-version' as the default, which breaks the compat vDSO
build:
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_gettimeofday' failed: symbol not defined
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_gettime' failed: symbol not defined
ld.lld: error: version script assignment of 'LINUX_4.15' to symbol '__vdso_clock_getres' failed: symbol not defined
These symbols are not present in the compat vDSO or the regular vDSO for
32-bit but they are unconditionally included in the version section of
the linker script, which is prohibited with '--no-undefined-version'.
Fix this issue by only including the symbols that are actually exported
in the version section of the linker script.
Link: https://github.com/ClangBuiltLinux/linux/issues/1756
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221108171324.3377226-1-nathan@kernel.org/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50e63dd8ed92045eb70a72d7ec725488320fb68b ]
Currently, RISC-V sets up reserved memory using the "early" copy of the
device tree. As a result, when trying to get a reserved memory region
using of_reserved_mem_lookup(), the pointer to reserved memory regions
is using the early, pre-virtual-memory address which causes a kernel
panic when trying to use the buffer's name:
Unable to handle kernel paging request at virtual address 00000000401c31ac
Oops [#1]
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc1-00001-g0d9d6953d834 #1
Hardware name: Microchip PolarFire-SoC Icicle Kit (DT)
epc : string+0x4a/0xea
ra : vsnprintf+0x1e4/0x336
epc : ffffffff80335ea0 ra : ffffffff80338936 sp : ffffffff81203be0
gp : ffffffff812e0a98 tp : ffffffff8120de40 t0 : 0000000000000000
t1 : ffffffff81203e28 t2 : 7265736572203a46 s0 : ffffffff81203c20
s1 : ffffffff81203e28 a0 : ffffffff81203d22 a1 : 0000000000000000
a2 : ffffffff81203d08 a3 : 0000000081203d21 a4 : ffffffffffffffff
a5 : 00000000401c31ac a6 : ffff0a00ffffff04 a7 : ffffffffffffffff
s2 : ffffffff81203d08 s3 : ffffffff81203d00 s4 : 0000000000000008
s5 : ffffffff000000ff s6 : 0000000000ffffff s7 : 00000000ffffff00
s8 : ffffffff80d9821a s9 : ffffffff81203d22 s10: 0000000000000002
s11: ffffffff80d9821c t3 : ffffffff812f3617 t4 : ffffffff812f3617
t5 : ffffffff812f3618 t6 : ffffffff81203d08
status: 0000000200000100 badaddr: 00000000401c31ac cause: 000000000000000d
[<ffffffff80338936>] vsnprintf+0x1e4/0x336
[<ffffffff80055ae2>] vprintk_store+0xf6/0x344
[<ffffffff80055d86>] vprintk_emit+0x56/0x192
[<ffffffff80055ed8>] vprintk_default+0x16/0x1e
[<ffffffff800563d2>] vprintk+0x72/0x80
[<ffffffff806813b2>] _printk+0x36/0x50
[<ffffffff8068af48>] print_reserved_mem+0x1c/0x24
[<ffffffff808057ec>] paging_init+0x528/0x5bc
[<ffffffff808031ae>] setup_arch+0xd0/0x592
[<ffffffff8080070e>] start_kernel+0x82/0x73c
early_init_fdt_scan_reserved_mem() takes no arguments as it operates on
initial_boot_params, which is populated by early_init_dt_verify(). On
RISC-V, early_init_dt_verify() is called twice. Once, directly, in
setup_arch() if CONFIG_BUILTIN_DTB is not enabled and once indirectly,
very early in the boot process, by parse_dtb() when it calls
early_init_dt_scan_nodes().
This first call uses dtb_early_va to set initial_boot_params, which is
not usable later in the boot process when
early_init_fdt_scan_reserved_mem() is called. On arm64 for example, the
corresponding call to early_init_dt_scan_nodes() uses fixmap addresses
and doesn't suffer the same fate.
Move early_init_fdt_scan_reserved_mem() further along the boot sequence,
after the direct call to early_init_dt_verify() in setup_arch() so that
the names use the correct virtual memory addresses. The above supposed
that CONFIG_BUILTIN_DTB was not set, but should work equally in the case
where it is - unflatted_and_copy_device_tree() also updates
initial_boot_params.
Reported-by: Valentina Fernandez <valentina.fernandezalanis@microchip.com>
Reported-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/linux-riscv/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
Fixes: 922b0375fc ("riscv: Fix memblock reservation for device tree blob")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/r/20221107151524.3941467-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit cbd34f4bb37d62d8a027f54205bff07e73340da4 ]
Currently, we perform some memory init functions in paging init. But,
that will be an issue for NUMA support where DT needs to be flattened
before numa initialization and memblock_present can only be called
after numa initialization.
Move memory initialization related functions to a separate function.
Signed-off-by: Atish Patra <atish.patra@wdc.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Stable-dep-of: 50e63dd8ed92 ("riscv: fix reserved memory setup")
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 50f4dd657a0fcf90aa8da8dc2794a8100ff4c37c ]
Even after commit 89fd4a1df829 ("riscv: jump_label: mark arguments as
const to satisfy asm constraints"), building with CC_OPTIMIZE_FOR_SIZE
+ LLVM=1 can reproduce below build error:
CC arch/riscv/kernel/vdso/vgettimeofday.o
In file included from <built-in>:4:
In file included from lib/vdso/gettimeofday.c:5:
In file included from include/vdso/datapage.h:17:
In file included from include/vdso/processor.h:10:
In file included from arch/riscv/include/asm/vdso/processor.h:7:
In file included from include/linux/jump_label.h:112:
arch/riscv/include/asm/jump_label.h:42:3: error:
invalid operand for inline asm constraint 'i'
" .option push \n\t"
^
1 error generated.
I think the problem is when "-Os" is passed as CFLAGS, it's removed by
"CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os" which is
introduced in commit e05d57dcb8 ("riscv: Fixup __vdso_gettimeofday
broke dynamic ftrace"), thus no optimization at all for vgettimeofday.c
arm64 does remove "-Os" as well, but it forces "-O2" after removing
"-Os".
I compared the generated vgettimeofday.o with "-O2" and "-Os",
I think no big performance difference. So let's tell the kbuild not
to remove "-Os" rather than follow arm64 style.
vdso related performance can be improved a lot when building kernel with
CC_OPTIMIZE_FOR_SIZE after this commit, ("-Os" VS no optimization)
Fixes: e05d57dcb8 ("riscv: Fixup __vdso_gettimeofday broke dynamic ftrace")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221031182943.2453-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 6510c78490c490a6636e48b61eeaa6fb65981f4b ]
thread_struct's s[12] may contain random kernel memory content, which
may be finally leaked to userspace. This is a security hole. Fix it
by clearing the s[12] array in thread_struct when fork.
As for kthread case, it's better to clear the s[12] array as well.
Fixes: 7db91e57a0 ("RISC-V: Task implementation")
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Tested-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221029113450.4027-1-jszhang@kernel.org
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/CAJF2gTSdVyAaM12T%2B7kXAdRPGS4VyuO08X1c7paE-n4Fr8OtRA@mail.gmail.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit fbd92809997a391f28075f1c8b5ee314c225557c upstream.
RISC-V has no sane defaults to fall back on where there is no cpu-map
in the devicetree.
Without sane defaults, the package, core and thread IDs are all set to
-1. This causes user-visible inaccuracies for tools like hwloc/lstopo
which rely on the sysfs cpu topology files to detect a system's
topology.
On a PolarFire SoC, which should have 4 harts with a thread each,
lstopo currently reports:
Machine (793MB total)
Package L#0
NUMANode L#0 (P#0 793MB)
Core L#0
L1d L#0 (32KB) + L1i L#0 (32KB) + PU L#0 (P#0)
L1d L#1 (32KB) + L1i L#1 (32KB) + PU L#1 (P#1)
L1d L#2 (32KB) + L1i L#2 (32KB) + PU L#2 (P#2)
L1d L#3 (32KB) + L1i L#3 (32KB) + PU L#3 (P#3)
Adding calls to store_cpu_topology() in {boot,smp} hart bringup code
results in the correct topolgy being reported:
Machine (793MB total)
Package L#0
NUMANode L#0 (P#0 793MB)
L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0)
L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1)
L1d L#2 (32KB) + L1i L#2 (32KB) + Core L#2 + PU L#2 (P#2)
L1d L#3 (32KB) + L1i L#3 (32KB) + Core L#3 + PU L#3 (P#3)
CC: stable@vger.kernel.org # 456797da792f: arm64: topology: move store_cpu_topology() to shared code
Fixes: 03f11f03db ("RISC-V: Parse cpu topology during boot.")
Reported-by: Brice Goglin <Brice.Goglin@inria.fr>
Link: https://github.com/open-mpi/hwloc/issues/536
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 10f6913c548b32ecb73801a16b120e761c6957ea ]
When CONFIG_CMDLINE_FORCE is enabled, cmdline provided by
CONFIG_CMDLINE are always used. This allows CONFIG_CMDLINE to be
used regardless of the result of device tree scanning.
This especially fixes the case where a device tree without the
chosen node is supplied to the kernel. In such cases,
early_init_dt_scan would return true. But inside
early_init_dt_scan_chosen, the cmdline won't be updated as there
is no chosen node in the device tree. As a result, CONFIG_CMDLINE
is not copied into boot_command_line even if CONFIG_CMDLINE_FORCE
is enabled. This commit allows properly update boot_command_line
in this situation.
Fixes: 8fd6e05c74 ("arch: riscv: support kernel command line forcing when no DTB passed")
Signed-off-by: Wenting Zhang <zephray@outlook.com>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/PSBPR04MB399135DFC54928AB958D0638B1829@PSBPR04MB3991.apcprd04.prod.outlook.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 46ad48e8a28da7cc37a16c7e7fc632ecf906e4bf ]
Add the machine name to kernel boot-up log, and install
the machine name to stack dump for DT boot mode.
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Stable-dep-of: 10f6913c548b ("riscv: always honor the CONFIG_CMDLINE_FORCE when parsing dtb")
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 9e2e6042a7ec6504fe8e366717afa2f40cf16488 upstream.
Commit 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is
invalid") made mmap() return EINVAL if PROT_WRITE was set wihtout
PROT_READ with the justification that a write-only PTE is considered a
reserved PTE permission bit pattern in the privileged spec. This check
is unnecessary since we let VM_WRITE imply VM_READ on RISC-V, and it is
inconsistent with other architectures that don't support write-only PTEs,
creating a potential software portability issue. Just remove the check
altogether and let PROT_WRITE imply PROT_READ as is the case on other
architectures.
Note that this also allows PROT_WRITE|PROT_EXEC mappings which were
disallowed prior to the aforementioned commit; PROT_READ is implied in
such mappings as well.
Fixes: 2139619bcad7 ("riscv: mmap with PROT_WRITE but no PROT_READ is invalid")
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20220915193702.2201018-3-abrestic@rivosinc.com/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 762df359aa5849e010ef04c3ed79d57588ce17d9 upstream.
riscv has an equivalent of arm bug fixed by 653d48b221 ("arm: fix
really nasty sigreturn bug"); if signal gets caught by an interrupt that
hits when we have the right value in a0 (-513), *and* another signal
gets delivered upon sigreturn() (e.g. included into the blocked mask for
the first signal and posted while the handler had been running), the
syscall restart logics will see regs->cause equal to EXC_SYSCALL (we are
in a syscall, after all) and a0 already restored to its original value
(-513, which happens to be -ERESTARTNOINTR) and assume that we need to
apply the usual syscall restart logics.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/YxJEiSq%2FCGaL6Gm9@ZenIV/
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 3f1901110a89b0e2e13adb2ac8d1a7102879ea98 ]
Currently, almost all archs (x86, arm64, mips...) support fast call
of crash_kexec() when "regs && kexec_should_crash()" is true. But
RISC-V not, it can only enter crash system via panic(). However panic()
doesn't pass the regs of the real accident scene to crash_kexec(),
it caused we can't get accurate backtrace via gdb,
$ riscv64-linux-gnu-gdb vmlinux vmcore
Reading symbols from vmlinux...
[New LWP 95]
#0 console_unlock () at kernel/printk/printk.c:2557
2557 if (do_cond_resched)
(gdb) bt
#0 console_unlock () at kernel/printk/printk.c:2557
#1 0x0000000000000000 in ?? ()
With the patch we can get the accurate backtrace,
$ riscv64-linux-gnu-gdb vmlinux vmcore
Reading symbols from vmlinux...
[New LWP 95]
#0 0xffffffe00063a4e0 in test_thread (data=<optimized out>) at drivers/test_crash.c:81
81 *(int *)p = 0xdead;
(gdb)
(gdb) bt
#0 0xffffffe00064d5c0 in test_thread (data=<optimized out>) at drivers/test_crash.c:81
#1 0x0000000000000000 in ?? ()
Test code to produce NULL address dereference in test_crash.c,
void *p = NULL;
*(int *)p = 0xdead;
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Link: https://lore.kernel.org/r/20220606082308.2883458-1-xianting.tian@linux.alibaba.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2139619bcad7ac44cc8f6f749089120594056613 ]
As mentioned in Table 4.5 in RISC-V spec Volume 2 Section 4.3, write
but not read is "Reserved for future use.". For now, they are not valid.
In the current code, -wx is marked as invalid, but -w- is not marked
as invalid.
This patch refines that judgment.
Reported-by: xctan <xc-tan@outlook.com>
Co-developed-by: dram <dramforever@live.com>
Signed-off-by: dram <dramforever@live.com>
Co-developed-by: Ruizhe Pan <c141028@gmail.com>
Signed-off-by: Ruizhe Pan <c141028@gmail.com>
Signed-off-by: Celeste Liu <coelacanthus@outlook.com>
Link: https://lore.kernel.org/r/PH7PR14MB559464DBDD310E755F5B21E8CEDC9@PH7PR14MB5594.namprd14.prod.outlook.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f2928e224d85e7cc139009ab17cefdfec2df5d11 upstream.
Set pm_power_off to NULL like on all other architectures, check if it
is set in machine_halt() and machine_power_off() and fallback to
default_power_off if no other power driver got registered.
This brings riscv architecture inline with all other architectures,
and allows to reuse exiting power drivers unmodified.
Kernels without legacy SBI v0.1 extensions (CONFIG_RISCV_SBI_V01 is
not set), do not set pm_power_off to sbi_shutdown(). There is no
support for SBI v0.3 system reset extension either. This prevents
using gpio_poweroff on SiFive HiFive Unmatched.
Tested on SiFive HiFive unmatched, with a dtb specifying gpio-poweroff
node and kernel complied without CONFIG_RISCV_SBI_V01.
BugLink: https://bugs.launchpad.net/bugs/1942806
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Ron Economos <w6rz@comcast.net>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 630f972d76d6460235e84e1aa034ee06f9c8c3a9 ]
If EFI pages are marked as read-only,
we should remove the _PAGE_WRITE flag.
The current code overwrites an unused value.
Fixes: b91540d52a ("RISC-V: Add EFI runtime services")
Signed-off-by: Heinrich Schuchardt <heinrich.schuchardt@canonical.com>
Link: https://lore.kernel.org/r/20220528014132.91052-1-heinrich.schuchardt@canonical.com
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 35d33c76d68dfacc330a8eb477b51cc647c5a847 upstream.
Because of the stack canary feature that reads from the current task
structure the stack canary value, the thread pointer register "tp" must
be set before calling any C function from head.S: by chance, setup_vm
and all the functions that it calls does not seem to be part of the
functions where the canary check is done, but in the following commits,
some functions will.
Fixes: f2c9699f65 ("riscv: Add STACKPROTECTOR supported")
Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com>
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 8ec1442953c66a1d8462cccd8c20b7ba561f5915 upstream.
These patch_text implementations are using stop_machine_cpuslocked
infrastructure with atomic cpu_count. The original idea: When the
master CPU patch_text, the others should wait for it. But current
implementation is using the first CPU as master, which couldn't
guarantee the remaining CPUs are waiting. This patch changes the
last CPU as the master to solve the potential risk.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 043cb41a85 ("riscv: introduce interfaces to patch kernel code")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 23fc539e81295b14b50c6ccc5baeb4f3d59d822d ]
On some architectures, access_ok() does not do any argument type
checking, so replacing the definition with a generic one causes
a few warnings for harmless issues that were never caught before.
Fix the ones that I found either through my own test builds or
that were reported by the 0-day bot.
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Dinh Nguyen <dinguyen@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 0966d385830de3470b7131db8e86c0c5bc9c52dc upstream.
RISC-V can do PC-relative jumps with a 32bit range using the following
two instructions:
auipc t0, imm20 ; t0 = PC + imm20 * 2^12
jalr ra, t0, imm12 ; ra = PC + 4, PC = t0 + imm12
Crucially both the 20bit immediate imm20 and the 12bit immediate imm12
are treated as two's-complement signed values. For this reason the
immediates are usually calculated like this:
imm20 = (offset + 0x800) >> 12
imm12 = offset & 0xfff
..where offset is the signed offset from the auipc instruction. When
the 11th bit of offset is 0 the addition of 0x800 doesn't change the top
20 bits and imm12 considered positive. When the 11th bit is 1 the carry
of the addition by 0x800 means imm20 is one higher, but since imm12 is
then considered negative the two's complement representation means it
all cancels out nicely.
However, this addition by 0x800 (2^11) means an offset greater than or
equal to 2^31 - 2^11 would overflow so imm20 is considered negative and
result in a backwards jump. Similarly the lower range of offset is also
moved down by 2^11 and hence the true 32bit range is
[-2^31 - 2^11, 2^31 - 2^11)
Signed-off-by: Emil Renner Berthing <kernel@esmil.dk>
Fixes: e2c0cdfba7 ("RISC-V: User-facing API")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit ff083a2d972f56bebfd82409ca62e5dfce950961 upstream.
Protect perf_guest_cbs with RCU to fix multiple possible errors. Luckily,
all paths that read perf_guest_cbs already require RCU protection, e.g. to
protect the callback chains, so only the direct perf_guest_cbs touchpoints
need to be modified.
Bug #1 is a simple lack of WRITE_ONCE/READ_ONCE behavior to ensure
perf_guest_cbs isn't reloaded between a !NULL check and a dereference.
Fixed via the READ_ONCE() in rcu_dereference().
Bug #2 is that on weakly-ordered architectures, updates to the callbacks
themselves are not guaranteed to be visible before the pointer is made
visible to readers. Fixed by the smp_store_release() in
rcu_assign_pointer() when the new pointer is non-NULL.
Bug #3 is that, because the callbacks are global, it's possible for
readers to run in parallel with an unregisters, and thus a module
implementing the callbacks can be unloaded while readers are in flight,
resulting in a use-after-free. Fixed by a synchronize_rcu() call when
unregistering callbacks.
Bug #1 escaped notice because it's extremely unlikely a compiler will
reload perf_guest_cbs in this sequence. perf_guest_cbs does get reloaded
for future derefs, e.g. for ->is_user_mode(), but the ->is_in_guest()
guard all but guarantees the consumer will win the race, e.g. to nullify
perf_guest_cbs, KVM has to completely exit the guest and teardown down
all VMs before KVM start its module unload / unregister sequence. This
also makes it all but impossible to encounter bug #3.
Bug #2 has not been a problem because all architectures that register
callbacks are strongly ordered and/or have a static set of callbacks.
But with help, unloading kvm_intel can trigger bug #1 e.g. wrapping
perf_guest_cbs with READ_ONCE in perf_misc_flags() while spamming
kvm_intel module load/unload leads to:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEMPT SMP
CPU: 6 PID: 1825 Comm: stress Not tainted 5.14.0-rc2+ #459
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:perf_misc_flags+0x1c/0x70
Call Trace:
perf_prepare_sample+0x53/0x6b0
perf_event_output_forward+0x67/0x160
__perf_event_overflow+0x52/0xf0
handle_pmi_common+0x207/0x300
intel_pmu_handle_irq+0xcf/0x410
perf_event_nmi_handler+0x28/0x50
nmi_handle+0xc7/0x260
default_do_nmi+0x6b/0x170
exc_nmi+0x103/0x130
asm_exc_nmi+0x76/0xbf
Fixes: 39447b386c ("perf: Enhance perf to allow for guest statistic collection from host")
Signed-off-by: Sean Christopherson <seanjc@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20211111020738.2512932-2-seanjc@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 64a19591a2938b170aa736443d5d3bf4c51e1388 upstream.
The trap vector marked by label .Lsecondary_park must align on a
4-byte boundary, as the {m,s}tvec is defined to require 4-byte
alignment.
Signed-off-by: Chen Lu <181250012@smail.nju.edu.cn>
Reviewed-by: Anup Patel <anup.patel@wdc.com>
Fixes: e011995e82 ("RISC-V: Move relocate and few other functions out of __init")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 8bb0ab3ae7a4dbe6cf32deb830cf2bdbf5736867 ]
riscv architectures relying on mmap_sem for write in their
arch_setup_additional_pages. If the waiting task gets killed by the oom
killer it would block oom_reaper from asynchronous address space reclaim
and reduce the chances of timely OOM resolving. Wait for the lock in
the killable mode and return with EINTR if the task got killed while
waiting.
Signed-off-by: Tong Tiangen <tongtiangen@huawei.com>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4b92d4add5f6dcf21275185c997d6ecb800054cd ]
DEFINE_SMP_CALL_CACHE_FUNCTION() was usefel before the CPU hotplug rework
to ensure that the cache related functions are called on the upcoming CPU
because the notifier itself could run on any online CPU.
The hotplug state machine guarantees that the callbacks are invoked on the
upcoming CPU. So there is no need to have this SMP function call
obfuscation. That indirection was missed when the hotplug notifiers were
converted.
This also solves the problem of ARM64 init_cache_level() invoking ACPI
functions which take a semaphore in that context. That's invalid as SMP
function calls run with interrupts disabled. Running it just from the
callback in context of the CPU hotplug thread solves this.
Fixes: 8571890e15 ("arm64: Add support for ACPI based firmware tables")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Acked-by: Will Deacon <will@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/871r69ersb.ffs@tglx
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 67d945778099b14324811fe67c5aff2cda7a7ad5 upstream.
We must use $(CC_FLAGS_FTRACE) instead of directly using -pg. It
will cause -fpatchable-function-entry error.
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 379eb01c21795edb4ca8d342503bd2183a19ec3a upstream.
The value of FP registers in the core dump file comes from the
thread.fstate. However, kernel saves the FP registers to the thread.fstate
only before scheduling out the process. If no process switch happens
during the exception handling process, kernel will not have a chance to
save the latest value of FP registers to thread.fstate. It will cause the
value of FP registers in the core dump file may be incorrect. To solve this
problem, this patch force lets kernel save the FP register into the
thread.fstate if the target task_struct equals the current.
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Fixes: b8c8a9590e ("RISC-V: Add FP register ptrace support for gdb.")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit f1a0a376ca0c4ef1fc3d24e3e502acbb5b795674 ]
As pointed out by commit
de9b8f5dcb ("sched: Fix crash trying to dequeue/enqueue the idle thread")
init_idle() can and will be invoked more than once on the same idle
task. At boot time, it is invoked for the boot CPU thread by
sched_init(). Then smp_init() creates the threads for all the secondary
CPUs and invokes init_idle() on them.
As the hotplug machinery brings the secondaries to life, it will issue
calls to idle_thread_get(), which itself invokes init_idle() yet again.
In this case it's invoked twice more per secondary: at _cpu_up(), and at
bringup_cpu().
Given smp_init() already initializes the idle tasks for all *possible*
CPUs, no further initialization should be required. Now, removing
init_idle() from idle_thread_get() exposes some interesting expectations
with regards to the idle task's preempt_count: the secondary startup always
issues a preempt_disable(), requiring some reset of the preempt count to 0
between hot-unplug and hotplug, which is currently served by
idle_thread_get() -> idle_init().
Given the idle task is supposed to have preemption disabled once and never
see it re-enabled, it seems that what we actually want is to initialize its
preempt_count to PREEMPT_DISABLED and leave it there. Do that, and remove
init_idle() from idle_thread_get().
Secondary startups were patched via coccinelle:
@begone@
@@
-preempt_disable();
...
cpu_startup_entry(CPUHP_AP_ONLINE_IDLE);
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210512094636.2958515-1-valentin.schneider@arm.com
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 772d7891e8b3b0baae7bb88a294d61fd07ba6d15 ]
Running "make" on an already compiled kernel tree will rebuild the
kernel even without any modifications:
CALL linux/scripts/checksyscalls.sh
CALL linux/scripts/atomic/check-atomics.sh
CHK include/generated/compile.h
SO2S arch/riscv/kernel/vdso/vdso-syms.S
AS arch/riscv/kernel/vdso/vdso-syms.o
AR arch/riscv/kernel/vdso/built-in.a
AR arch/riscv/kernel/built-in.a
AR arch/riscv/built-in.a
GEN .version
CHK include/generated/compile.h
UPD include/generated/compile.h
CC init/version.o
AR init/built-in.a
LD vmlinux.o
The reason is "Any target that utilizes if_changed must be listed in
$(targets), otherwise the command line check will fail, and the target
will always be built" as explained by Documentation/kbuild/makefiles.rst
Fix this build bug by adding vdso-syms.S to $(targets)
At the same time, there are two trivial clean up modifications:
- the vdso-dummy.o is not needed any more after so remove it.
- vdso.lds is a generated file, so it should be prefixed with
$(obj)/ instead of $(src)/
Fixes: c2c81bb2f6 ("RISC-V: Fix the VDSO symbol generaton for binutils-2.35+")
Cc: stable@vger.kernel.org
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7ce04771503074a7de7f539cc43f5e1b385cb99b ]
Prior to clang 13.0.0, the RISC-V name for the mcount symbol was
"mcount", which differs from the GCC version of "_mcount", which results
in the following errors:
riscv64-linux-gnu-ld: init/main.o: in function `__traceiter_initcall_level':
main.c:(.text+0xe): undefined reference to `mcount'
riscv64-linux-gnu-ld: init/main.o: in function `__traceiter_initcall_start':
main.c:(.text+0x4e): undefined reference to `mcount'
riscv64-linux-gnu-ld: init/main.o: in function `__traceiter_initcall_finish':
main.c:(.text+0x92): undefined reference to `mcount'
riscv64-linux-gnu-ld: init/main.o: in function `.LBB32_28':
main.c:(.text+0x30c): undefined reference to `mcount'
riscv64-linux-gnu-ld: init/main.o: in function `free_initmem':
main.c:(.text+0x54c): undefined reference to `mcount'
This has been corrected in https://reviews.llvm.org/D98881 but the
minimum supported clang version is 10.0.1. To avoid build errors and to
gain a working function tracer, adjust the name of the mcount symbol for
older versions of clang in mount.S and recordmcount.pl.
Link: https://github.com/ClangBuiltLinux/linux/issues/1331
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 7f3d349065d0c643f7f7013fbf9bc9f2c90b675f ]
Currently, the VDSO is being linked through $(CC). This does not match
how the rest of the kernel links objects, which is through the $(LD)
variable.
When linking with clang, there are a couple of warnings about flags that
will not be used during the link:
clang-12: warning: argument unused during compilation: '-no-pie' [-Wunused-command-line-argument]
clang-12: warning: argument unused during compilation: '-pg' [-Wunused-command-line-argument]
'-no-pie' was added in commit 85602bea29 ("RISC-V: build vdso-dummy.o
with -no-pie") to override '-pie' getting added to the ld command from
distribution versions of GCC that enable PIE by default. It is
technically no longer needed after commit c2c81bb2f6 ("RISC-V: Fix the
VDSO symbol generaton for binutils-2.35+"), which removed vdso-dummy.o
in favor of generating vdso-syms.S from vdso.so with $(NM) but this also
resolves the issue in case it ever comes back due to having full control
over the $(LD) command. '-pg' is for function tracing, it is not used
during linking as clang states.
These flags could be removed/filtered to fix the warnings but it is
easier to just match the rest of the kernel and use $(LD) directly for
linking. See commits
fe00e50b2d ("ARM: 8858/1: vdso: use $(LD) instead of $(CC) to link VDSO")
691efbedc6 ("arm64: vdso: use $(LD) instead of $(CC) to link VDSO")
2ff906994b ("MIPS: VDSO: Use $(LD) instead of $(CC) to link VDSO")
2b2a25845d ("s390/vdso: Use $(LD) instead of $(CC) to link vDSO")
for more information.
The flags are converted to linker flags and '--eh-frame-hdr' is added to
match what is added by GCC implicitly, which can be seen by adding '-v'
to GCC's invocation.
Additionally, since this area is being modified, use the $(OBJCOPY)
variable instead of an open coded $(CROSS_COMPILE)objcopy so that the
user's choice of objcopy binary is respected.
Link: https://github.com/ClangBuiltLinux/linux/issues/803
Link: https://github.com/ClangBuiltLinux/linux/issues/970
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Fangrui Song <maskray@google.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 533b4f3a789d49574e7ae0f6ececed153f651f97 ]
We should return a negative error code upon failure in
riscv_hartid_to_cpuid() instead of NR_CPUS. This is also
aligned with all uses of riscv_hartid_to_cpuid() which
expect negative error code upon failure.
Fixes: 6825c7a80f ("RISC-V: Add logical CPU indexing for RISC-V")
Fixes: f99fb607fb ("RISC-V: Use Linux logical CPU number instead of hartid")
Signed-off-by: Anup Patel <anup.patel@wdc.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ac8d0b901f0033b783156ab2dc1a0e73ec42409b ]
In RV64, the size of each entry in excp_vect_table is 8 bytes. If the
base of the table is not 8-byte aligned, loading an entry in the table
will raise a misaligned exception. Although such exception will be
handled by opensbi/bbl, this still causes performance degradation.
Signed-off-by: Zihao Yu <yuzihao@ict.ac.cn>
Reviewed-by: Anup Patel <anup@brainfault.org>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f3d60f2a25e4417e1676161fe42115de3e3f98a2 upstream.
We use the generic C VDSO implementations of a handful of clock-related
functions. When kasan is enabled this results in asan stub calls that
are unlikely to be resolved by userspace, this just disables KASAN
when building the VDSO.
Verified the fix on a kernel with KASAN enabled using vDSO selftests.
Link: https://lore.kernel.org/lkml/CACT4Y+ZNJBnkKHXUf=tm_yuowvZvHwN=0rmJ=7J+xFd+9r_6pQ@mail.gmail.com/
Tested-by: Tobias Klauser <tklauser@distanz.ch>
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Tested-by: Dmitry Vyukov <dvyukov@google.com>
[Palmer: commit text]
Fixes: ad5d1122b8 ("riscv: use vDSO common flow to reduce the latency of the time-related functions")
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>