lineage-22.1
2325 Commits
Author | SHA1 | Message | Date | |
---|---|---|---|---|
|
49a7b708da |
libsubcmd: Fix memory leak in uniq()
[ Upstream commit ad30469a841b50dbb541df4d6971d891f703c297 ] uniq() will write one command name over another causing the overwritten string to be leaked. Fix by doing a pass that removes duplicates and a second that removes the holes. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chenyuan Mi <cymi20@fudan.edu.cn> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20231208000515.1693746-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
12473265f5 |
libbpf: Fix NULL pointer dereference in bpf_object__collect_prog_relos
[ Upstream commit fc3a5534e2a8855427403113cbeb54af5837bbe0 ] An issue occurred while reading an ELF file in libbpf.c during fuzzing: Program received signal SIGSEGV, Segmentation fault. 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206 4206 in libbpf.c (gdb) bt #0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206 #1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706 #2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437 #3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497 #4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16 #5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one () #6 0x000000000087ad92 in tracing::span::Span::in_scope () #7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir () #8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} () #9 0x00000000005f2601 in main () (gdb) scn_data was null at this code(tools/lib/bpf/src/libbpf.c): if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) { The scn_data is derived from the code above: scn = elf_sec_by_idx(obj, sec_idx); scn_data = elf_sec_data(obj, scn); relo_sec_name = elf_sec_str(obj, shdr->sh_name); sec_name = elf_sec_name(obj, scn); if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL return -EINVAL; In certain special scenarios, such as reading a malformed ELF file, it is possible that scn_data may be a null pointer Signed-off-by: Mingyi Zhang <zhangmingyi5@huawei.com> Signed-off-by: Xin Liu <liuxin350@huawei.com> Signed-off-by: Changye Wu <wuchangye@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20231221033947.154564-1-liuxin350@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
d5ef7480d6 |
libapi: Add missing linux/types.h header to get the __u64 type on io.h
[ Upstream commit af76b2dec0984a079d8497bfa37d29a9b55932e1 ]
There are functions using __u64, so we need to have the linux/types.h
header otherwise we'll break when its not included before api/io.h.
Fixes:
|
||
|
d5feaef143 |
libbpf: Free btf_vmlinux when closing bpf_object
[ Upstream commit 29d67fdebc42af6466d1909c60fdd1ef4f3e5240 ] I hit a memory leak when testing bpf_program__set_attach_target(). Basically, set_attach_target() may allocate btf_vmlinux, for example, when setting attach target for bpf_iter programs. But btf_vmlinux is freed only in bpf_object_load(), which means if we only open bpf object but not load it, setting attach target may leak btf_vmlinux. So let's free btf_vmlinux in bpf_object__close() anyway. Signed-off-by: Hao Luo <haoluo@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230822193840.1509809-1-haoluo@google.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
0cfbadb153 |
libbpf: Fix realloc API handling in zero-sized edge cases
[ Upstream commit 8a0260dbf6553c969248b6530cafadac46562f47 ] realloc() and reallocarray() can either return NULL or a special non-NULL pointer, if their size argument is zero. This requires a bit more care to handle NULL-as-valid-result situation differently from NULL-as-error case. This has caused real issues before ([0]), and just recently bit again in production when performing bpf_program__attach_usdt(). This patch fixes 4 places that do or potentially could suffer from this mishandling of NULL, including the reported USDT-related one. There are many other places where realloc()/reallocarray() is used and NULL is always treated as an error value, but all those have guarantees that their size is always non-zero, so those spot don't need any extra handling. [0] d08ab82f59d5 ("libbpf: Fix double-free when linker processes empty sections") Fixes: |
||
|
05026e944b |
tools lib subcmd: Add dependency test to install_headers
commit 5d890591db6bed8ca69bd4bfe0cdaca372973033 upstream. Compute the headers to be installed from their source headers and make each have its own build target to install it. Using dependencies avoids headers being reinstalled and getting a new timestamp which then causes files that depend on the header to be rebuilt. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Rix <trix@redhat.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Link: https://lore.kernel.org/r/20221202045743.2639466-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
424fd56932 |
tools lib subcmd: Make install_headers clearer
commit 77dce6890a2a715b186bdc149c843571a5bb47df upstream. Add libsubcmd to the name so that this install_headers build appears different to similar targets in different libraries. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Hao Luo <haoluo@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@google.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20221117004356.279422-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
a07388d1a7 |
tools lib subcmd: Add install target
commit 630ae80ea1dd253609cb50cff87f3248f901aca3 upstream. This allows libsubcmd to be installed as a dependency. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii.nakryiko@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Nicolas Schier <nicolas@fjasle.eu> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: bpf@vger.kernel.org Link: http://lore.kernel.org/lkml/20221109184914.1357295-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
||
|
34fe7aa8ef |
libbpf: fix offsetof() and container_of() to work with CO-RE
[ Upstream commit bdeeed3498c7871c17465bb4f11d1bc67f9098af ]
It seems like __builtin_offset() doesn't preserve CO-RE field
relocations properly. So if offsetof() macro is defined through
__builtin_offset(), CO-RE-enabled BPF code using container_of() will be
subtly and silently broken.
To avoid this problem, redefine offsetof() and container_of() in the
form that works with CO-RE relocations more reliably.
Fixes:
|
||
|
a7434a4dcc |
libbpf: btf_dump_type_data_check_overflow needs to consider BTF_MEMBER_BITFIELD_SIZE
[ Upstream commit c39028b333f3a3a765c5c0b9726b8e38aedf0ba1 ]
The btf_dump/struct_data selftest is failing with:
[...]
test_btf_dump_struct_data:FAIL:unexpected return value dumping fs_context unexpected unexpected return value dumping fs_context: actual -7 != expected 264
[...]
The reason is in btf_dump_type_data_check_overflow(). It does not use
BTF_MEMBER_BITFIELD_SIZE from the struct's member (btf_member). Instead,
it is using the enum size which is 4. It had been working till the recent
commit 4e04143c869c ("fs_context: drop the unused lsm_flags member")
removed an integer member which also removed the 4 bytes padding at the
end of the fs_context. Missing this 4 bytes padding exposed this bug. In
particular, when btf_dump_type_data_check_overflow() reaches the member
'phase', -E2BIG is returned.
The fix is to pass bit_sz to btf_dump_type_data_check_overflow(). In
btf_dump_type_data_check_overflow(), it does a different size check when
bit_sz is not zero.
The current fs_context:
[3600] ENUM 'fs_context_purpose' encoding=UNSIGNED size=4 vlen=3
'FS_CONTEXT_FOR_MOUNT' val=0
'FS_CONTEXT_FOR_SUBMOUNT' val=1
'FS_CONTEXT_FOR_RECONFIGURE' val=2
[3601] ENUM 'fs_context_phase' encoding=UNSIGNED size=4 vlen=7
'FS_CONTEXT_CREATE_PARAMS' val=0
'FS_CONTEXT_CREATING' val=1
'FS_CONTEXT_AWAITING_MOUNT' val=2
'FS_CONTEXT_AWAITING_RECONF' val=3
'FS_CONTEXT_RECONF_PARAMS' val=4
'FS_CONTEXT_RECONFIGURING' val=5
'FS_CONTEXT_FAILED' val=6
[3602] STRUCT 'fs_context' size=264 vlen=21
'ops' type_id=3603 bits_offset=0
'uapi_mutex' type_id=235 bits_offset=64
'fs_type' type_id=872 bits_offset=1216
'fs_private' type_id=21 bits_offset=1280
'sget_key' type_id=21 bits_offset=1344
'root' type_id=781 bits_offset=1408
'user_ns' type_id=251 bits_offset=1472
'net_ns' type_id=984 bits_offset=1536
'cred' type_id=1785 bits_offset=1600
'log' type_id=3621 bits_offset=1664
'source' type_id=42 bits_offset=1792
'security' type_id=21 bits_offset=1856
's_fs_info' type_id=21 bits_offset=1920
'sb_flags' type_id=20 bits_offset=1984
'sb_flags_mask' type_id=20 bits_offset=2016
's_iflags' type_id=20 bits_offset=2048
'purpose' type_id=3600 bits_offset=2080 bitfield_size=8
'phase' type_id=3601 bits_offset=2088 bitfield_size=8
'need_free' type_id=67 bits_offset=2096 bitfield_size=1
'global' type_id=67 bits_offset=2097 bitfield_size=1
'oldapi' type_id=67 bits_offset=2098 bitfield_size=1
Fixes:
|
||
|
c997f28917 |
libbpf: Fix ld_imm64 copy logic for ksym in light skeleton.
[ Upstream commit a506d6ce1dd184051037dc9d26c3eb187c9fe625 ]
Unlike normal libbpf the light skeleton 'loader' program is doing
btf_find_by_name_kind() call at run-time to find ksym in the kernel and
populate its {btf_id, btf_obj_fd} pair in ld_imm64 insn. To avoid doing the
search multiple times for the same ksym it remembers the first patched ld_imm64
insn and copies {btf_id, btf_obj_fd} from it into subsequent ld_imm64 insn.
Fix a bug in copying logic, since it may incorrectly clear BPF_PSEUDO_BTF_ID flag.
Also replace always true if (btf_obj_fd >= 0) check with unconditional JMP_JA
to clarify the code.
Fixes:
|
||
|
8a53428784 |
libbpf: Fix single-line struct definition output in btf_dump
[ Upstream commit 872aec4b5f635d94111d48ec3c57fbe078d64e7d ] btf_dump APIs emit unnecessary tabs when emitting struct/union definition that fits on the single line. Before this patch we'd get: struct blah {<tab>}; This patch fixes this and makes sure that we get more natural: struct blah {}; Fixes: 44a726c3f23c ("bpftool: Print newline before '}' for struct with padding only fields") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221212211505.558851-2-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
77e41187a3 |
bpftool: Print newline before '}' for struct with padding only fields
[ Upstream commit 44a726c3f23cf762ef4ce3c1709aefbcbe97f62c ] btf_dump_emit_struct_def attempts to print empty structures at a single line, e.g. `struct empty {}`. However, it has to account for a case when there are no regular but some padding fields in the struct. In such case `vlen` would be zero, but size would be non-zero. E.g. here is struct bpf_timer from vmlinux.h before this patch: struct bpf_timer { long: 64; long: 64;}; And after this patch: struct bpf_dynptr { long: 64; long: 64; }; Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221001104425.415768-1-eddyz87@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
e5c5cb47a9 |
libbpf: Fix btf_dump's packed struct determination
[ Upstream commit 4fb877aaa179dcdb1676d55216482febaada457e ] Fix bug in btf_dump's logic of determining if a given struct type is packed or not. The notion of "natural alignment" is not needed and is even harmful in this case, so drop it altogether. The biggest difference in btf_is_struct_packed() compared to its original implementation is that we don't really use btf__align_of() to determine overall alignment of a struct type (because it could be 1 for both packed and non-packed struct, depending on specifci field definitions), and just use field's actual alignment to calculate whether any field is requiring packing or struct's size overall necessitates packing. Add two simple test cases that demonstrate the difference this change would make. Fixes: ea2ce1ba99aa ("libbpf: Fix BTF-to-C converter's padding logic") Reported-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20221215183605.4149488-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
524617e553 |
libbpf: Fix BTF-to-C converter's padding logic
[ Upstream commit ea2ce1ba99aa6a60c8d8a706e3abadf3de372163 ] Turns out that btf_dump API doesn't handle a bunch of tricky corner cases, as reported by Per, and further discovered using his testing Python script ([0]). This patch revamps btf_dump's padding logic significantly, making it more correct and also avoiding unnecessary explicit padding, where compiler would pad naturally. This overall topic turned out to be very tricky and subtle, there are lots of subtle corner cases. The comments in the code tries to give some clues, but comments themselves are supposed to be paired with good understanding of C alignment and padding rules. Plus some experimentation to figure out subtle things like whether `long :0;` means that struct is now forced to be long-aligned (no, it's not, turns out). Anyways, Per's script, while not completely correct in some known situations, doesn't show any obvious cases where this logic breaks, so this is a nice improvement over the previous state of this logic. Some selftests had to be adjusted to accommodate better use of natural alignment rules, eliminating some unnecessary padding, or changing it to `type: 0;` alignment markers. Note also that for when we are in between bitfields, we emit explicit bit size, while otherwise we use `: 0`, this feels much more natural in practice. Next patch will add few more test cases, found through randomized Per's script. [0] https://lore.kernel.org/bpf/85f83c333f5355c8ac026f835b18d15060725fcb.camel@ericsson.com/ Reported-by: Per Sundström XP <per.xp.sundstrom@ericsson.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221212211505.558851-6-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
2de2830887 |
tools/lib/thermal: Fix thermal_sampling_exit()
[ Upstream commit a29cbd76aaf63f5493e962aa2fbaadcdc4615143 ]
thermal_sampling_init() suscribes to THERMAL_GENL_SAMPLING_GROUP_NAME group
so thermal_sampling_exit() should unsubscribe from the same group.
Fixes:
|
||
|
fbbe6ddb60 |
libbpf: Fix alen calculation in libbpf_nla_dump_errormsg()
[ Upstream commit 17bcd27a08a21397698edf143084d7c87ce17946 ]
The code assumes that everything that comes after nlmsgerr are nlattrs.
When calculating their size, it does not account for the initial
nlmsghdr. This may lead to accessing uninitialized memory.
Fixes:
|
||
|
700aec11cb |
libbpf: Fix invalid return address register in s390
[ Upstream commit 7244eb669397f309c3d014264823cdc9cb3f8e6b ]
There is currently an invalid register mapping in the s390 return
address register. As the manual[1] states, the return address can be
found at r14. In bpf_tracing.h, the s390 registers were named
gprs(general purpose registers). This commit fixes the problem by
correcting the mistyped mapping.
[1]: https://uclibc.org/docs/psABI-s390x.pdf#page=14
Fixes:
|
||
|
9e93b377dc |
libbpf: Fix btf__align_of() by taking into account field offsets
[ Upstream commit 25a4481b4136af7794e1df2d6c90ed2f354d60ce ]
btf__align_of() is supposed to be return alignment requirement of
a requested BTF type. For STRUCT/UNION it doesn't always return correct
value, because it calculates alignment only based on field types. But
for packed structs this is not enough, we need to also check field
offsets and struct size. If field offset isn't aligned according to
field type's natural alignment, then struct must be packed. Similarly,
if struct size is not a multiple of struct's natural alignment, then
struct must be packed as well.
This patch fixes this issue precisely by additionally checking these
conditions.
Fixes:
|
||
|
66d2f118d7 |
libbpf: Avoid enum forward-declarations in public API in C++ mode
[ Upstream commit b42693415b86f608049cf1b4870adc1dc65e58b0 ] C++ enum forward declarations are fundamentally not compatible with pure C enum definitions, and so libbpf's use of `enum bpf_stats_type;` forward declaration in libbpf/bpf.h public API header is causing C++ compilation issues. More details can be found in [0], but it comes down to C++ supporting enum forward declaration only with explicitly specified backing type: enum bpf_stats_type: int; In C (and I believe it's a GCC extension also), such forward declaration is simply: enum bpf_stats_type; Further, in Linux UAPI this enum is defined in pure C way: enum bpf_stats_type { BPF_STATS_RUN_TIME = 0; } And even though in both cases backing type is int, which can be confirmed by looking at DWARF information, for C++ compiler actual enum definition and forward declaration are incompatible. To eliminate this problem, for C++ mode define input argument as int, which makes enum unnecessary in libbpf public header. This solves the issue and as demonstrated by next patch doesn't cause any unwanted compiler warnings, at least with default warnings setting. [0] https://stackoverflow.com/questions/42766839/c11-enum-forward-causes-underlying-type-mismatch [1] Closes: https://github.com/libbpf/libbpf/issues/249 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20221130200013.2997831-1-andrii@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> |
||
|
8543ac4923 |
libbpf: Btf dedup identical struct test needs check for nested structs/arrays
[ Upstream commit f3c51fe02c55bd944662714e5b91b96dc271ad9f ]
When examining module BTF, it is common to see core kernel structures
such as sk_buff, net_device duplicated in the module. After adding
debug messaging to BTF it turned out that much of the problem
was down to the identical struct test failing during deduplication;
sometimes the compiler adds identical structs. However
it turns out sometimes that type ids of identical struct members
can also differ, even when the containing structs are still identical.
To take an example, for struct sk_buff, debug messaging revealed
that the identical struct matching was failing for the anon
struct "headers"; specifically for the first field:
__u8 __pkt_type_offset[0]; /* 128 0 */
Looking at the code in BTF deduplication, we have code that guards
against the possibility of identical struct definitions, down to
type ids, and identical array definitions. However in this case
we have a struct which is being defined twice but does not have
identical type ids since each duplicate struct has separate type
ids for the above array member. A similar problem (though not
observed) could occur for struct-in-struct.
The solution is to make the "identical struct" test check members
not just for matching ids, but to also check if they in turn are
identical structs or arrays.
The results of doing this are quite dramatic (for some modules
at least); I see the number of type ids drop from around 10000
to just over 1000 in one module for example.
For testing use latest pahole or apply [1], otherwise dedups
can fail for the reasons described there.
Also fix return type of btf_dedup_identical_arrays() as
suggested by Andrii to match boolean return type used
elsewhere.
Fixes:
|
||
|
ecb0f3d7dd |
libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()
[ Upstream commit d0d382f95a9270dcf803539d6781d6bd67e3f5b2 ]
When there are no program sections, obj->programs is left unallocated,
and find_prog_by_sec_insn()'s search lands on &obj->programs[0] == NULL,
and will cause null-pointer dereference in the following access to
prog->sec_idx.
Guard the search with obj->nr_programs similar to what's being done in
__bpf_program__iter() to prevent null-pointer access from happening.
Fixes:
|
||
|
e0f5361d95 |
libbpf: Deal with section with no data gracefully
[ Upstream commit 35a855509e6ee3442477c8ebc6827b5b5d32a7b5 ]
ELF section data pointer returned by libelf may be NULL (if section has
SHT_NOBITS), so null check section data pointer before attempting to
copy license and kversion section.
Fixes:
|
||
|
622ff59742 |
libbpf: Use elf_getshdrnum() instead of e_shnum
[ Upstream commit 51deedc9b8680953437dfe359e5268120de10e30 ] This commit replace e_shnum with the elf_getshdrnum() helper to fix two oss-fuzz-reported heap-buffer overflow in __bpf_object__open. Both reports are incorrectly marked as fixed and while still being reproducible in the latest libbpf. # clusterfuzz-testcase-minimized-bpf-object-fuzzer-5747922482888704 libbpf: loading object 'fuzz-object' from buffer libbpf: sec_cnt is 0 libbpf: elf: section(1) .data, size 0, link 538976288, flags 2020202020202020, type=2 libbpf: elf: section(2) .data, size 32, link 538976288, flags 202020202020ff20, type=1 ================================================================= ==13==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x6020000000c0 at pc 0x0000005a7b46 bp 0x7ffd12214af0 sp 0x7ffd12214ae8 WRITE of size 4 at 0x6020000000c0 thread T0 SCARINESS: 46 (4-byte-write-heap-buffer-overflow-far-from-bounds) #0 0x5a7b45 in bpf_object__elf_collect /src/libbpf/src/libbpf.c:3414:24 #1 0x5733c0 in bpf_object_open /src/libbpf/src/libbpf.c:7223:16 #2 0x5739fd in bpf_object__open_mem /src/libbpf/src/libbpf.c:7263:20 ... The issue lie in libbpf's direct use of e_shnum field in ELF header as the section header count. Where as libelf implemented an extra logic that, when e_shnum == 0 && e_shoff != 0, will use sh_size member of the initial section header as the real section header count (part of ELF spec to accommodate situation where section header counter is larger than SHN_LORESERVE). The above inconsistency lead to libbpf writing into a zero-entry calloc area. So intead of using e_shnum directly, use the elf_getshdrnum() helper provided by libelf to retrieve the section header counter into sec_cnt. Fixes: |
||
|
13866e924a |
libbpf: Fix memory leak in parse_usdt_arg()
[ Upstream commit 0dc9254e03704c75f2ebc9cbef2ce4de83fba603 ]
In the arm64 version of parse_usdt_arg(), when sscanf returns 2, reg_name
is allocated but not freed. Fix it.
Fixes:
|
||
|
fbe08093fb |
libbpf: Fix use-after-free in btf_dump_name_dups
[ Upstream commit 93c660ca40b5d2f7c1b1626e955a8e9fa30e0749 ]
ASAN reports an use-after-free in btf_dump_name_dups:
ERROR: AddressSanitizer: heap-use-after-free on address 0xffff927006db at pc 0xaaaab5dfb618 bp 0xffffdd89b890 sp 0xffffdd89b928
READ of size 2 at 0xffff927006db thread T0
#0 0xaaaab5dfb614 in __interceptor_strcmp.part.0 (test_progs+0x21b614)
#1 0xaaaab635f144 in str_equal_fn tools/lib/bpf/btf_dump.c:127
#2 0xaaaab635e3e0 in hashmap_find_entry tools/lib/bpf/hashmap.c:143
#3 0xaaaab635e72c in hashmap__find tools/lib/bpf/hashmap.c:212
#4 0xaaaab6362258 in btf_dump_name_dups tools/lib/bpf/btf_dump.c:1525
#5 0xaaaab636240c in btf_dump_resolve_name tools/lib/bpf/btf_dump.c:1552
#6 0xaaaab6362598 in btf_dump_type_name tools/lib/bpf/btf_dump.c:1567
#7 0xaaaab6360b48 in btf_dump_emit_struct_def tools/lib/bpf/btf_dump.c:912
#8 0xaaaab6360630 in btf_dump_emit_type tools/lib/bpf/btf_dump.c:798
#9 0xaaaab635f720 in btf_dump__dump_type tools/lib/bpf/btf_dump.c:282
#10 0xaaaab608523c in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:236
#11 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#12 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#13 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#14 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#15 0xaaaab5d65990 (test_progs+0x185990)
0xffff927006db is located 11 bytes inside of 16-byte region [0xffff927006d0,0xffff927006e0)
freed by thread T0 here:
#0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
#6 0xaaaab6353e10 in btf__add_field tools/lib/bpf/btf.c:2032
#7 0xaaaab6084fcc in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:232
#8 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#9 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#10 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#11 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#12 0xaaaab5d65990 (test_progs+0x185990)
previously allocated by thread T0 here:
#0 0xaaaab5e2c7c4 in realloc (test_progs+0x24c7c4)
#1 0xaaaab634f4a0 in libbpf_reallocarray tools/lib/bpf/libbpf_internal.h:191
#2 0xaaaab634f840 in libbpf_add_mem tools/lib/bpf/btf.c:163
#3 0xaaaab636643c in strset_add_str_mem tools/lib/bpf/strset.c:106
#4 0xaaaab6366560 in strset__add_str tools/lib/bpf/strset.c:157
#5 0xaaaab6352d70 in btf__add_str tools/lib/bpf/btf.c:1519
#6 0xaaaab6353ff0 in btf_add_enum_common tools/lib/bpf/btf.c:2070
#7 0xaaaab6354080 in btf__add_enum tools/lib/bpf/btf.c:2102
#8 0xaaaab6082f50 in test_btf_dump_incremental tools/testing/selftests/bpf/prog_tests/btf_dump.c:162
#9 0xaaaab6097530 in test_btf_dump tools/testing/selftests/bpf/prog_tests/btf_dump.c:875
#10 0xaaaab6314ed0 in run_one_test tools/testing/selftests/bpf/test_progs.c:1062
#11 0xaaaab631a0a8 in main tools/testing/selftests/bpf/test_progs.c:1697
#12 0xffff9676d214 in __libc_start_main ../csu/libc-start.c:308
#13 0xaaaab5d65990 (test_progs+0x185990)
The reason is that the key stored in hash table name_map is a string
address, and the string memory is allocated by realloc() function, when
the memory is resized by realloc() later, the old memory may be freed,
so the address stored in name_map references to a freed memory, causing
use-after-free.
Fix it by storing duplicated string address in name_map.
Fixes:
|
||
|
76c6303530 |
libbpf: Fix uninitialized warning in btf_dump_dump_type_data
commit dfd0afbf151d85411b371e841f62b81ee5d1ca54 upstream.
GCC 11.3.0 fails to compile btf_dump.c due to the following error,
which seems to originate in btf_dump_struct_data where the returned
value would be uninitialized if btf_vlen returns zero.
btf_dump.c: In function ‘btf_dump_dump_type_data’:
btf_dump.c:2363:12: error: ‘err’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
2363 | if (err < 0)
| ^
Fixes:
|
||
|
05c1558bfc |
libbpf: Check the validity of size in user_ring_buffer__reserve()
The top two bits of size are used as busy and discard flags, so reject the reservation that has any of these special bits in the size. With the addition of validity check, these is also no need to check whether or not total_size is overflowed. Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221116072351.1168938-5-houtao@huaweicloud.com |
||
|
64176bff24 |
libbpf: Handle size overflow for user ringbuf mmap
Similar with the overflow problem on ringbuf mmap, in user_ringbuf_map()
2 * max_entries may overflow u32 when mapping writeable region.
Fixing it by casting the size of writable mmap region into a __u64 and
checking whether or not there will be overflow during mmap.
Fixes:
|
||
|
927cbb478a |
libbpf: Handle size overflow for ringbuf mmap
The maximum size of ringbuf is 2GB on x86-64 host, so 2 * max_entries
will overflow u32 when mapping producer page and data pages. Only
casting max_entries to size_t is not enough, because for 32-bits
application on 64-bits kernel the size of read-only mmap region
also could overflow size_t.
So fixing it by casting the size of read-only mmap region into a __u64
and checking whether or not there will be overflow during mmap.
Fixes:
|
||
|
689eb2f1ba |
libbpf: Use page size as max_entries when probing ring buffer map
Using page size as max_entries when probing ring buffer map, else the probe may fail on host with 64KB page size (e.g., an ARM64 host). After the fix, the output of "bpftool feature" on above host will be correct. Before : eBPF map_type ringbuf is NOT available eBPF map_type user_ringbuf is NOT available After : eBPF map_type ringbuf is available eBPF map_type user_ringbuf is available Signed-off-by: Hou Tao <houtao1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20221116072351.1168938-2-houtao@huaweicloud.com |
||
|
5fd2a60aec |
libbpf: Use correct return pointer in attach_raw_tp
We need to pass '*link' to final libbpf_get_error,
because that one holds the return value, not 'link'.
Fixes:
|
||
|
d5e57375a5 |
libperf: Do not include non-UAPI linux/compiler.h header
Its just for that __packed define, so use it expanded as __attribute__((packed)), like the other files in /usr/include do. This was problem was preventing building the libperf examples on ALT Linux and Fedora 35, fix it. Reported-by: Vitaly Chikunov <vt@altlinux.org> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Dmitry Levin <ldv@altlinux.org Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/Y0lnpl2Ix7VljVDc@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
d465bff130 |
perf tools changes for v6.1: 1st batch
- Add support for AMD on 'perf mem' and 'perf c2c', the kernel enablement patches went via tip. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ - "perf lock" improvements: - Add -E/--entries option to limit the number of entries to display, say to ask for just the top 5 contended locks. - Add -q/--quiet option to suppress header and debug messages. - Add a 'perf test' kernel lock contention entry to test 'perf lock'. - "perf lock contention" improvements: - Ask BPF's bpf_get_stackid() to skip some callchain entries. The ones closer to the tooling are bpf related and not that interesting, the ones calling the locking function are the ones we're interested in, example of a full, unskipped callstack: - Allow changing the callstack depth and number of entries to skip. 1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffbb8b8e75 bpf_trace_run2+0x35 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 0xffffffffbc1c26ff _raw_spin_lock+0x1f 0xffffffffbb841015 tick_do_update_jiffies64+0x25 0xffffffffbb8409ee tick_irq_enter+0x9e - Show full callstack in verbose mode (-v option), sometimes this is desirable instead of showing just one callstack entry. - Allow multiple time ranges in 'perf record --delay' to help in reducing the amount of data collected from hardware tracing (Intel PT, etc) when there is a rough idea of periods of time where events of interest take time. - Add Intel PT to record only decoder debug messages when error happens. - Improve layout of Intel PT man page. - Add new branch types: alignment, data and inst faults and arch specific ones, such as fiq, debug_halt, debug_exit, debug_inst and debug_data on arm64. Kernel enablement went thru the tip tree. - Fix 'perf probe' error log check in 'perf test' when no debuginfo is available. - Fix 'perf stat' aggregation mode logic, it should be looking at the CPU not at the core number. - Fix flags parsing in 'perf trace' filters. - Introduce compact encoding of CPU range encoding on perf.data, to avoid having a bitmap with all the CPUs. - Improvements to the 'perf stat' metrics, including adding "core_wide", and computing "smt" from the CPU topology. - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format, that allows tooling to ask for the precise number of lost samples for a given event. - Add 'addr' sort key to see just the address of sampled instructions: $ perf record -o- true | perf report -i- -s addr [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # Samples: 12 of event 'cycles:u' # Event count (approx.): 252512 # # Overhead Address # ........ .................. 42.96% 0x7f96f08443d7 29.55% 0x7f96f0859b50 14.76% 0x7f96f0852e02 8.30% 0x7f96f0855028 4.43% 0xffffffff8de01087 perf annotate: Toggle full address <-> offset display - Add 'f' hotkey to the 'perf annotate' TUI interface when in 'disassembler output' mode ('o' hotkey) to toggle showing full virtual address or just the offset. - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for pre-existing threads, at the start of a 'perf record' session, speeding up that record startup phase. - Add a command line option to specify build ids in 'perf inject'. - Update JSON event files for the Intel alderlake, broadwell, broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids, skylake, skylakex, and tigerlake processors. - Update vendor JSON event files for the ARM Neoverse V1 and E1 platforms. - Add a 'perf test' entry for 'perf mem' where a struct has false sharing and this gets detected in the 'perf mem' output, tested with Intel, AMD and ARM64 systems. - Add a 'perf test' entry to test the resolution of java symbols, where an output like this is expected: 8.18% jshell jitted-50116-29.so [.] Interpreter 0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int) - Add tests for the ARM64 CoreSight hardware tracing feature, with specially crafted pureloop, memcpy, thread loop and unroll tread that then gets traced and the output compared with expected output. Documentation explaining it is also included. - Add per thread Intel PT 'perf test' entry to check that PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a mixture of per thread and per CPU events and mmaps, verify that this gets all recorded correctly. - Introduce pthread mutex wrappers to allow for building with clang's -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by" "lockable", "exclusive_lock_function", "exclusive_trylock_function", "exclusive_locks_required", and "no_thread_safety_analysis" compiler function attributes. - Fix empty version number when building outside of a git repo. - Improve feature detection display when multiple versions of a feature are present, such as for binutils libbfd, that has a mix of possible ways to detect according to the Linux distribution. Previously in some cases we had: Auto-detecting system features <SNIP> ... libbfd: [ on ] ... libbfd-liberty: [ on ] ... libbfd-liberty-z: [ on ] <SNIP> Now for this case we show just the main feature: Auto-detecting system features <SNIP> ... libbfd: [ on ] <SNIP> - Remove some unused structs, variables, macros, function prototypes and includes from various places. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCY0CKuAAKCRCyPKLppCJ+ JywwAQDWLForEnEZNk92Fd3y342Lh9W/8z1V51dKK7XdY1cV6AD/Rn5L57v7k/yG mG5w2Fd1J/xBjlsL/BvNlimUD2tbkQA= =XPMg -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: - Add support for AMD on 'perf mem' and 'perf c2c', the kernel enablement patches went via tip. Example: $ sudo perf mem record -- -c 10000 ^C[ perf record: Woken up 227 times to write data ] [ perf record: Captured and wrote 58.760 MB perf.data (836978 samples) ] $ sudo perf mem report -F mem,sample,snoop Samples: 836K of event 'ibs_op//', Event count (approx.): 8418762 Memory access Samples Snoop N/A 700620 N/A L1 hit 126675 N/A L2 hit 424 N/A L3 hit 664 HitM L3 hit 10 N/A Local RAM hit 2 N/A Remote RAM (1 hop) hit 8558 N/A Remote Cache (1 hop) hit 3 N/A Remote Cache (1 hop) hit 2 HitM Remote Cache (2 hops) hit 10 HitM Remote Cache (2 hops) hit 6 N/A Uncached hit 4 N/A $ - "perf lock" improvements: - Add -E/--entries option to limit the number of entries to display, say to ask for just the top 5 contended locks. - Add -q/--quiet option to suppress header and debug messages. - Add a 'perf test' kernel lock contention entry to test 'perf lock'. - "perf lock contention" improvements: - Ask BPF's bpf_get_stackid() to skip some callchain entries. The ones closer to the tooling are bpf related and not that interesting, the ones calling the locking function are the ones we're interested in, example of a full, unskipped callstack: - Allow changing the callstack depth and number of entries to skip. 1 10.74 us 10.74 us 10.74 us spinlock __bpf_trace_contention_begin+0xb 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffc03b5c47 bpf_prog_bf07ae9e2cbd02c5_contention_begin+0x117 0xffffffffbb8b8e75 bpf_trace_run2+0x35 0xffffffffbb7eab9b __bpf_trace_contention_begin+0xb 0xffffffffbb7ebe75 queued_spin_lock_slowpath+0x1f5 0xffffffffbc1c26ff _raw_spin_lock+0x1f 0xffffffffbb841015 tick_do_update_jiffies64+0x25 0xffffffffbb8409ee tick_irq_enter+0x9e - Show full callstack in verbose mode (-v option), sometimes this is desirable instead of showing just one callstack entry. - Allow multiple time ranges in 'perf record --delay' to help in reducing the amount of data collected from hardware tracing (Intel PT, etc) when there is a rough idea of periods of time where events of interest take time. - Add Intel PT to record only decoder debug messages when error happens. - Improve layout of Intel PT man page. - Add new branch types: alignment, data and inst faults and arch specific ones, such as fiq, debug_halt, debug_exit, debug_inst and debug_data on arm64. Kernel enablement went thru the tip tree. - Fix 'perf probe' error log check in 'perf test' when no debuginfo is available. - Fix 'perf stat' aggregation mode logic, it should be looking at the CPU not at the core number. - Fix flags parsing in 'perf trace' filters. - Introduce compact encoding of CPU range encoding on perf.data, to avoid having a bitmap with all the CPUs. - Improvements to the 'perf stat' metrics, including adding "core_wide", and computing "smt" from the CPU topology. - Add support to the new PERF_FORMAT_LOST perf_event_attr.read_format, that allows tooling to ask for the precise number of lost samples for a given event. - Add 'addr' sort key to see just the address of sampled instructions: $ perf record -o- true | perf report -i- -s addr [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.000 MB - ] # Samples: 12 of event 'cycles:u' # Event count (approx.): 252512 # # Overhead Address # ........ .................. 42.96% 0x7f96f08443d7 29.55% 0x7f96f0859b50 14.76% 0x7f96f0852e02 8.30% 0x7f96f0855028 4.43% 0xffffffff8de01087 perf annotate: Toggle full address <-> offset display - Add 'f' hotkey to the 'perf annotate' TUI interface when in 'disassembler output' mode ('o' hotkey) to toggle showing full virtual address or just the offset. - Cache DSO build-ids when synthesizing PERF_RECORD_MMAP records for pre-existing threads, at the start of a 'perf record' session, speeding up that record startup phase. - Add a command line option to specify build ids in 'perf inject'. - Update JSON event files for the Intel alderlake, broadwell, broadwellde, broadwellx, cascadelakex, haswell, haswellx, icelake, icelakex, ivybridge, ivytown, jaketown, sandybridge, sapphirerapids, skylake, skylakex, and tigerlake processors. - Update vendor JSON event files for the ARM Neoverse V1 and E1 platforms. - Add a 'perf test' entry for 'perf mem' where a struct has false sharing and this gets detected in the 'perf mem' output, tested with Intel, AMD and ARM64 systems. - Add a 'perf test' entry to test the resolution of java symbols, where an output like this is expected: 8.18% jshell jitted-50116-29.so [.] Interpreter 0.75% Thread-1 jitted-83602-1670.so [.] jdk.internal.jimage.BasicImageReader.getString(int) - Add tests for the ARM64 CoreSight hardware tracing feature, with specially crafted pureloop, memcpy, thread loop and unroll tread that then gets traced and the output compared with expected output. Documentation explaining it is also included. - Add per thread Intel PT 'perf test' entry to check that PERF_RECORD_TEXT_POKE events are recorded per CPU, resulting in a mixture of per thread and per CPU events and mmaps, verify that this gets all recorded correctly. - Introduce pthread mutex wrappers to allow for building with clang's -Wthread-safety, i.e. using the "guarded_by" "pt_guarded_by" "lockable", "exclusive_lock_function", "exclusive_trylock_function", "exclusive_locks_required", and "no_thread_safety_analysis" compiler function attributes. - Fix empty version number when building outside of a git repo. - Improve feature detection display when multiple versions of a feature are present, such as for binutils libbfd, that has a mix of possible ways to detect according to the Linux distribution. Previously in some cases we had: Auto-detecting system features <SNIP> ... libbfd: [ on ] ... libbfd-liberty: [ on ] ... libbfd-liberty-z: [ on ] <SNIP> Now for this case we show just the main feature: Auto-detecting system features <SNIP> ... libbfd: [ on ] <SNIP> - Remove some unused structs, variables, macros, function prototypes and includes from various places. * tag 'perf-tools-for-v6.1-1-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (169 commits) perf script: Add missing fields in usage hint perf mem: Print "LFB/MAB" for PERF_MEM_LVLNUM_LFB perf mem/c2c: Avoid printing empty lines for unsupported events perf mem/c2c: Add load store event mappings for AMD perf mem/c2c: Set PERF_SAMPLE_WEIGHT for LOAD_STORE events perf mem: Add support for printing PERF_MEM_LVLNUM_{CXL|IO} perf amd ibs: Sync arch/x86/include/asm/amd-ibs.h header with the kernel tools headers UAPI: Sync include/uapi/linux/perf_event.h header with the kernel perf stat: Fix cpu check to use id.cpu.cpu in aggr_printout() perf test coresight: Add relevant documentation about ARM64 CoreSight testing perf test: Add git ignore for tmp and output files of ARM CoreSight tests perf test coresight: Add unroll thread test shell script perf test coresight: Add unroll thread test tool perf test coresight: Add thread loop test shell scripts perf test coresight: Add thread loop test tool perf test coresight: Add memcpy thread test shell script perf test coresight: Add memcpy thread test tool perf test: Add git ignore for perf data generated by the ARM CoreSight tests perf test: Add arm64 asm pureloop test shell script perf test: Add asm pureloop test tool ... |
||
|
d4013bc4d4 |
bitmap patches for v6.1-rc1
From Phil Auld: drivers/base: Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES From me: cpumask: cleanup nr_cpu_ids vs nr_cpumask_bits mess This series cleans that mess and adds new config FORCE_NR_CPUS that allows to optimize cpumask subsystem if the number of CPUs is known at compile-time. From me: lib: optimize find_bit() functions Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros. From me: lib/find: add find_nth_bit() Adds find_nth_bit(), which is ~70 times faster than bitcounting with for_each() loop: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; Also adds bitmap_weight_and() to let people replace this pattern: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); with a single bitmap_weight_and() call. From me: cpumask: repair cpumask_check() After switching cpumask to use nr_cpu_ids, cpumask_check() started generating many false-positive warnings. This series fixes it. From Valentin Schneider: bitmap,cpumask: Add for_each_cpu_andnot() and for_each_cpu_andnot() Extends the API with one more function and applies it in sched/core. -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEEi8GdvG6xMhdgpu/4sUSA/TofvsgFAmNBwmUACgkQsUSA/Tof vshPRwv+KlqnZlKtuSPgbo/Kgswworpi/7TqfnN9GWlb8AJ2uhjBKI3GFwv4TDow 7KV6wdKdXYLr4pktcIhWy3qLrT+bDDExfarHRo3QI1A1W42EJ+ZiUaGnQGcnVMzD 5q/K1YMJYq0oaesHEw5PVUh8mm6h9qRD8VbX1u+riW/VCWBj3bho9Dp4mffQ48Q6 hVy/SnMGgClQwNYp+sxkqYx38xUqUGYoU5MzeziUmoS6pZQh+4lF33MULnI3EKmc /ehXilPPtOV/Tm0RovDWFfm3rjNapV9FXHu8Ob2z/c+1A29EgXnE3pwrBDkAx001 TQrL9qbCANRDGPLzWQHw0dwFIaXvTdrSttCsfYYfU5hI4JbnJEe0Pqkaaohy7jqm r0dW/TlyOG5T+k8Kwdx9w9A+jKs8TbKKZ8HOaN8BpkXswVnpbzpQbj3TITZI4aeV 6YR4URBQ5UkrVLEXFXbrOzwjL2zqDdyNoBdTJmGLJ+5b/n0HHzmyMVkegNIwLLM3 GR7sMQae =Q/+F -----END PGP SIGNATURE----- Merge tag 'bitmap-6.1-rc1' of https://github.com/norov/linux Pull bitmap updates from Yury Norov: - Fix unsigned comparison to -1 in CPUMAP_FILE_MAX_BYTES (Phil Auld) - cleanup nr_cpu_ids vs nr_cpumask_bits mess (me) This series cleans that mess and adds new config FORCE_NR_CPUS that allows to optimize cpumask subsystem if the number of CPUs is known at compile-time. - optimize find_bit() functions (me) Reworks find_bit() functions based on new FIND_{FIRST,NEXT}_BIT() macros. - add find_nth_bit() (me) Adds find_nth_bit(), which is ~70 times faster than bitcounting with for_each() loop: for_each_set_bit(bit, mask, size) if (n-- == 0) return bit; Also adds bitmap_weight_and() to let people replace this pattern: tmp = bitmap_alloc(nbits); bitmap_and(tmp, map1, map2, nbits); weight = bitmap_weight(tmp, nbits); bitmap_free(tmp); with a single bitmap_weight_and() call. - repair cpumask_check() (me) After switching cpumask to use nr_cpu_ids, cpumask_check() started generating many false-positive warnings. This series fixes it. - Add for_each_cpu_andnot() and for_each_cpu_andnot() (Valentin Schneider) Extends the API with one more function and applies it in sched/core. * tag 'bitmap-6.1-rc1' of https://github.com/norov/linux: (28 commits) sched/core: Merge cpumask_andnot()+for_each_cpu() into for_each_cpu_andnot() lib/test_cpumask: Add for_each_cpu_and(not) tests cpumask: Introduce for_each_cpu_andnot() lib/find_bit: Introduce find_next_andnot_bit() cpumask: fix checking valid cpu range lib/bitmap: add tests for for_each() loops lib/find: optimize for_each() macros lib/bitmap: introduce for_each_set_bit_wrap() macro lib/find_bit: add find_next{,_and}_bit_wrap cpumask: switch for_each_cpu{,_not} to use for_each_bit() net: fix cpu_max_bits_warn() usage in netif_attrmask_next{,_and} cpumask: add cpumask_nth_{,and,andnot} lib/bitmap: remove bitmap_ord_to_pos lib/bitmap: add tests for find_nth_bit() lib: add find_nth{,_and,_andnot}_bit() lib/bitmap: add bitmap_weight_and() lib/bitmap: don't call __bitmap_weight() in kernel code tools: sync find_bit() implementation lib/find_bit: optimize find_next_bit() functions lib/find_bit: create find_first_zero_bit_le() ... |
||
|
1337b9dcb0 |
perf tools: Remove special handling of system-wide evsel
For system-wide evsels, the thread map should be dummy - i.e. it has a single entry of -1. But the code guarantees such a thread map, so no need to handle it specially. No functional change intended. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221003204647.1481128-6-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
7e2450bb75 |
libperf: Propagate maps only if necessary
The current code propagate evsel's cpu map settings to evlist when it's added to an evlist. But the evlist->all_cpus and each evsel's cpus will be updated in perf_evlist__set_maps() later. No need to do it before evlist's cpus are set actually. In fact it discards this intermediate all_cpus maps at the beginning of perf_evlist__set_maps(). Let's not do this. It's only needed when an evsel is added after the evlist cpu/thread maps are set. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221003204647.1481128-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
06b552ee37 |
libperf: Populate system-wide evsel maps
Setting proper cpu and thread maps for system wide evsels regardless of user requested cpu in __perf_evlist__propagate_maps(). Those evsels need to be active on all cpus always. Do it in the libperf so that we can guarantee it has proper maps. Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221003204647.1481128-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
0326074ff4 |
Networking changes for 6.1.
Core ---- - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF --- - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols --------- - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API ---------- - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers ---------------------- - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers ------- - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmM7vtkACgkQMUZtbf5S Irvotg//dmh53rC+UMKO3OgOqPlSMnaqzbUdDEfN6mj4Mpox7Csb8zERVURHhBHY fvlXWsDgxmvgTebI5fvNC5+f1iW5xcqgJV2TWnNmDOKWwvQwb6qQfgixVmunvkpe IIukMXYt0dAf9bXeeEfbNXcCb85cPwB76stX0tMV6BX7osp3T0TL1fvFk0NJkL0j TeydLad/yAQtPb4TbeWYjNDoxPVDf0cVpUrevLGmWE88UMYmgTqPze+h1W5Wri52 bzjdLklY/4cgcIZClHQ6F9CeRWqEBxvujA5Hj/cwOcn/ptVVJWUGi7sQo3sYkoSs HFu+F8XsTec14kGNC0Ab40eVdqs5l/w8+E+4jvgXeKGOtVns8DwoiUIzqXpyty89 Ib04mffrwWNjFtHvo/kIsNwP05X2PGE9HUHfwsTUfisl/ASvMmQp7D7vUoqQC/4B AMVzT5qpjkmfBHYQQGuw8FxJhMeAOjC6aAo6censhXJyiUhIfleQsN0syHdaNb8q 9RZlhAgQoVb6ZgvBV8r8unQh/WtNZ3AopwifwVJld2unsE/UNfQy2KyqOWBES/zf LP9sfuX0JnmHn8s1BQEUMPU1jF9ZVZCft7nufJDL6JhlAL+bwZeEN4yCiAHOPZqE ymSLHI9s8yWZoNpuMWKrI9kFexVnQFKmA3+quAJUcYHNMSsLkL8= =Gsio -----END PGP SIGNATURE----- Merge tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "Core: - Introduce and use a single page frag cache for allocating small skb heads, clawing back the 10-20% performance regression in UDP flood test from previous fixes. - Run packets which already went thru HW coalescing thru SW GRO. This significantly improves TCP segment coalescing and simplifies deployments as different workloads benefit from HW or SW GRO. - Shrink the size of the base zero-copy send structure. - Move TCP init under a new slow / sleepable version of DO_ONCE(). BPF: - Add BPF-specific, any-context-safe memory allocator. - Add helpers/kfuncs for PKCS#7 signature verification from BPF programs. - Define a new map type and related helpers for user space -> kernel communication over a ring buffer (BPF_MAP_TYPE_USER_RINGBUF). - Allow targeting BPF iterators to loop through resources of one task/thread. - Add ability to call selected destructive functions. Expose crash_kexec() to allow BPF to trigger a kernel dump. Use CAP_SYS_BOOT check on the loading process to judge permissions. - Enable BPF to collect custom hierarchical cgroup stats efficiently by integrating with the rstat framework. - Support struct arguments for trampoline based programs. Only structs with size <= 16B and x86 are supported. - Invoke cgroup/connect{4,6} programs for unprivileged ICMP ping sockets (instead of just TCP and UDP sockets). - Add a helper for accessing CLOCK_TAI for time sensitive network related programs. - Support accessing network tunnel metadata's flags. - Make TCP SYN ACK RTO tunable by BPF programs with TCP Fast Open. - Add support for writing to Netfilter's nf_conn:mark. Protocols: - WiFi: more Extremely High Throughput (EHT) and Multi-Link Operation (MLO) work (802.11be, WiFi 7). - vsock: improve support for SO_RCVLOWAT. - SMC: support SO_REUSEPORT. - Netlink: define and document how to use netlink in a "modern" way. Support reporting missing attributes via extended ACK. - IPSec: support collect metadata mode for xfrm interfaces. - TCPv6: send consistent autoflowlabel in SYN_RECV state and RST packets. - TCP: introduce optional per-netns connection hash table to allow better isolation between namespaces (opt-in, at the cost of memory and cache pressure). - MPTCP: support TCP_FASTOPEN_CONNECT. - Add NEXT-C-SID support in Segment Routing (SRv6) End behavior. - Adjust IP_UNICAST_IF sockopt behavior for connected UDP sockets. - Open vSwitch: - Allow specifying ifindex of new interfaces. - Allow conntrack and metering in non-initial user namespace. - TLS: support the Korean ARIA-GCM crypto algorithm. - Remove DECnet support. Driver API: - Allow selecting the conduit interface used by each port in DSA switches, at runtime. - Ethernet Power Sourcing Equipment and Power Device support. - Add tc-taprio support for queueMaxSDU parameter, i.e. setting per traffic class max frame size for time-based packet schedules. - Support PHY rate matching - adapting between differing host-side and link-side speeds. - Introduce QUSGMII PHY mode and 1000BASE-KX interface mode. - Validate OF (device tree) nodes for DSA shared ports; make phylink-related properties mandatory on DSA and CPU ports. Enforcing more uniformity should allow transitioning to phylink. - Require that flash component name used during update matches one of the components for which version is reported by info_get(). - Remove "weight" argument from driver-facing NAPI API as much as possible. It's one of those magic knobs which seemed like a good idea at the time but is too indirect to use in practice. - Support offload of TLS connections with 256 bit keys. New hardware / drivers: - Ethernet: - Microchip KSZ9896 6-port Gigabit Ethernet Switch - Renesas Ethernet AVB (EtherAVB-IF) Gen4 SoCs - Analog Devices ADIN1110 and ADIN2111 industrial single pair Ethernet (10BASE-T1L) MAC+PHY. - Rockchip RV1126 Gigabit Ethernet (a version of stmmac IP). - Ethernet SFPs / modules: - RollBall / Hilink / Turris 10G copper SFPs - HALNy GPON module - WiFi: - CYW43439 SDIO chipset (brcmfmac) - CYW89459 PCIe chipset (brcmfmac) - BCM4378 on Apple platforms (brcmfmac) Drivers: - CAN: - gs_usb: HW timestamp support - Ethernet PHYs: - lan8814: cable diagnostics - Ethernet NICs: - Intel (100G): - implement control of FCS/CRC stripping - port splitting via devlink - L2TPv3 filtering offload - nVidia/Mellanox: - tunnel offload for sub-functions - MACSec offload, w/ Extended packet number and replay window offload - significantly restructure, and optimize the AF_XDP support, align the behavior with other vendors - Huawei: - configuring DSCP map for traffic class selection - querying standard FEC statistics - querying SerDes lane number via ethtool - Marvell/Cavium: - egress priority flow control - MACSec offload - AMD/SolarFlare: - PTP over IPv6 and raw Ethernet - small / embedded: - ax88772: convert to phylink (to support SFP cages) - altera: tse: convert to phylink - ftgmac100: support fixed link - enetc: standard Ethtool counters - macb: ZynqMP SGMII dynamic configuration support - tsnep: support multi-queue and use page pool - lan743x: Rx IP & TCP checksum offload - igc: add xdp frags support to ndo_xdp_xmit - Ethernet high-speed switches: - Marvell (prestera): - support SPAN port features (traffic mirroring) - nexthop object offloading - Microchip (sparx5): - multicast forwarding offload - QoS queuing offload (tc-mqprio, tc-tbf, tc-ets) - Ethernet embedded switches: - Marvell (mv88e6xxx): - support RGMII cmode - NXP (felix): - standardized ethtool counters - Microchip (lan966x): - QoS queuing offload (tc-mqprio, tc-tbf, tc-cbs, tc-ets) - traffic policing and mirroring - link aggregation / bonding offload - QUSGMII PHY mode support - Qualcomm 802.11ax WiFi (ath11k): - cold boot calibration support on WCN6750 - support to connect to a non-transmit MBSSID AP profile - enable remain-on-channel support on WCN6750 - Wake-on-WLAN support for WCN6750 - support to provide transmit power from firmware via nl80211 - support to get power save duration for each client - spectral scan support for 160 MHz - MediaTek WiFi (mt76): - WiFi-to-Ethernet bridging offload for MT7986 chips - RealTek WiFi (rtw89): - P2P support" * tag 'net-next-6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1864 commits) eth: pse: add missing static inlines once: rename _SLOW to _SLEEPABLE net: pse-pd: add regulator based PSE driver dt-bindings: net: pse-dt: add bindings for regulator based PoDL PSE controller ethtool: add interface to interact with Ethernet Power Equipment net: mdiobus: search for PSE nodes by parsing PHY nodes. net: mdiobus: fwnode_mdiobus_register_phy() rework error handling net: add framework to support Ethernet PSE and PDs devices dt-bindings: net: phy: add PoDL PSE property net: marvell: prestera: Propagate nh state from hw to kernel net: marvell: prestera: Add neighbour cache accounting net: marvell: prestera: add stub handler neighbour events net: marvell: prestera: Add heplers to interact with fib_notifier_info net: marvell: prestera: Add length macros for prestera_ip_addr net: marvell: prestera: add delayed wq and flush wq on deinit net: marvell: prestera: Add strict cleanup of fib arbiter net: marvell: prestera: Add cleanup of allocated fib_nodes net: marvell: prestera: Add router nexthops ABI eth: octeon: fix build after netif_napi_add() changes net/mlx5: E-Switch, Return EBUSY if can't get mode lock ... |
||
|
0584439345 |
perf subcmd: Set environment variable "PREFIX"
Set environment variable "PREFIX", it will be used by invoked shell script, e.g. the shell script uses it to find lib paths. Signed-off-by: Leo Yan <leo.yan@linaro.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20220925025835.70364-2-leo.yan@linaro.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
da4062021e |
perf tools: Add debug messages and comments for testing
Add debug messages to enable scripts to track aspects of 'perf record' behaviour. The messages will be consumed after 'perf record' has run, with the exception of "perf record has started" which is consequently flushed. Put comments so developers know which messages are also being used by test scripts. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20220912083412.7058-11-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
c7202d20fb |
perf cpumap: Add range data encoding
Often cpumaps encode a range of all CPUs, add a compact encoding that doesn't require a bit mask or list of all CPUs. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@intel.com> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kees Kook <keescook@chromium.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220614143353.1559597-7-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
d773c999b8 |
perf events: Prefer union over variable length array
It is possible for casts to introduce alignment issues, prefer a union for perf_record_event_update. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Colin Ian King <colin.king@intel.com> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: German Gomez <german.gomez@arm.com> Cc: Gustavo A. R. Silva <gustavoars@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Kees Kook <keescook@chromium.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20220614143353.1559597-6-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> |
||
|
6562c9acb4 |
perf record: Fix way of handling non-perf-event pollfds
perf record __cmd_record() does not poll evlist pollfds. Instead it polls
thread_data[0].pollfd. That happens whether or not threads are being used.
perf record duplicates evlist mmap pollfds as needed for separate threads.
The non-perf-event represented by evlist->ctl_fd has to handled separately,
which is done explicitly, duplicating it into the thread_data[0] pollfds.
That approach neglects any other non-perf-event file descriptors. Currently
there is also done_fd which needs the same handling.
Add a new generalized approach.
Add fdarray_flag__non_perf_event to identify the file descriptors that
need the special handling. For those cases, also keep a mapping of the
evlist pollfd index and thread pollfd index, so that the evlist revents
can be updated.
Although this patch adds the new handling, it does not take it into use.
There is no functional change, but it is the precursor to a fix, so is
marked as a fix.
Fixes:
|
||
|
a08d97a193 |
Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says: ==================== pull-request: bpf-next 2022-10-03 We've added 143 non-merge commits during the last 27 day(s) which contain a total of 151 files changed, 8321 insertions(+), 1402 deletions(-). The main changes are: 1) Add kfuncs for PKCS#7 signature verification from BPF programs, from Roberto Sassu. 2) Add support for struct-based arguments for trampoline based BPF programs, from Yonghong Song. 3) Fix entry IP for kprobe-multi and trampoline probes under IBT enabled, from Jiri Olsa. 4) Batch of improvements to veristat selftest tool in particular to add CSV output, a comparison mode for CSV outputs and filtering, from Andrii Nakryiko. 5) Add preparatory changes needed for the BPF core for upcoming BPF HID support, from Benjamin Tissoires. 6) Support for direct writes to nf_conn's mark field from tc and XDP BPF program types, from Daniel Xu. 7) Initial batch of documentation improvements for BPF insn set spec, from Dave Thaler. 8) Add a new BPF_MAP_TYPE_USER_RINGBUF map which provides single-user-space-producer / single-kernel-consumer semantics for BPF ring buffer, from David Vernet. 9) Follow-up fixes to BPF allocator under RT to always use raw spinlock for the BPF hashtab's bucket lock, from Hou Tao. 10) Allow creating an iterator that loops through only the resources of one task/thread instead of all, from Kui-Feng Lee. 11) Add support for kptrs in the per-CPU arraymap, from Kumar Kartikeya Dwivedi. 12) Add a new kfunc helper for nf to set src/dst NAT IP/port in a newly allocated CT entry which is not yet inserted, from Lorenzo Bianconi. 13) Remove invalid recursion check for struct_ops for TCP congestion control BPF programs, from Martin KaFai Lau. 14) Fix W^X issue with BPF trampoline and BPF dispatcher, from Song Liu. 15) Fix percpu_counter leakage in BPF hashtab allocation error path, from Tetsuo Handa. 16) Various cleanups in BPF selftests to use preferred ASSERT_* macros, from Wang Yufen. 17) Add invocation for cgroup/connect{4,6} BPF programs for ICMP pings, from YiFei Zhu. 18) Lift blinding decision under bpf_jit_harden = 1 to bpf_capable(), from Yauheni Kaliuta. 19) Various libbpf fixes and cleanups including a libbpf NULL pointer deref, from Xin Liu. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (143 commits) net: netfilter: move bpf_ct_set_nat_info kfunc in nf_nat_bpf.c Documentation: bpf: Add implementation notes documentations to table of contents bpf, docs: Delete misformatted table. selftests/xsk: Fix double free bpftool: Fix error message of strerror libbpf: Fix overrun in netlink attribute iteration selftests/bpf: Fix spelling mistake "unpriviledged" -> "unprivileged" samples/bpf: Fix typo in xdp_router_ipv4 sample bpftool: Remove unused struct event_ring_info bpftool: Remove unused struct btf_attach_point bpf, docs: Add TOC and fix formatting. bpf, docs: Add Clang note about BPF_ALU bpf, docs: Move Clang notes to a separate file bpf, docs: Linux byteswap note bpf, docs: Move legacy packet instructions to a separate file selftests/bpf: Check -EBUSY for the recurred bpf_setsockopt(TCP_CONGESTION) bpf: tcp: Stop bpf_setsockopt(TCP_CONGESTION) in init ops to recur itself bpf: Refactor bpf_setsockopt(TCP_CONGESTION) handling into another function bpf: Move the "cdg" tcp-cc check to the common sol_tcp_sockopt() bpf: Add __bpf_prog_{enter,exit}_struct_ops for struct_ops trampoline ... ==================== Link: https://lore.kernel.org/r/20221003194915.11847-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
||
|
51e05a8cf8 |
libbpf: Fix overrun in netlink attribute iteration
I accidentally found that a change in commit
|
||
|
b8a94bfb33 |
kallsyms: increase maximum kernel symbol length to 512
Rust symbols can become quite long due to namespacing introduced by modules, types, traits, generics, etc. For instance, the following code: pub mod my_module { pub struct MyType; pub struct MyGenericType<T>(T); pub trait MyTrait { fn my_method() -> u32; } impl MyTrait for MyGenericType<MyType> { fn my_method() -> u32 { 42 } } } generates a symbol of length 96 when using the upcoming v0 mangling scheme: _RNvXNtCshGpAVYOtgW1_7example9my_moduleINtB2_13MyGenericTypeNtB2_6MyTypeENtB2_7MyTrait9my_method At the moment, Rust symbols may reach up to 300 in length. Setting 512 as the maximum seems like a reasonable choice to keep some headroom. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Co-developed-by: Alex Gaynor <alex.gaynor@gmail.com> Signed-off-by: Alex Gaynor <alex.gaynor@gmail.com> Co-developed-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Co-developed-by: Gary Guo <gary@garyguo.net> Signed-off-by: Gary Guo <gary@garyguo.net> Co-developed-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Miguel Ojeda <ojeda@kernel.org> |
||
|
87dbdc230d |
libbpf: Don't require full struct enum64 in UAPI headers
Drop the requirement for system-wide kernel UAPI headers to provide full
struct btf_enum64 definition. This is an unexpected requirement that
slipped in libbpf 1.0 and put unnecessary pressure ([0]) on users to have
a bleeding-edge kernel UAPI header from unreleased Linux 6.0.
To achieve this, we forward declare struct btf_enum64. But that's not
enough as there is btf_enum64_value() helper that expects to know the
layout of struct btf_enum64. So we get a bit creative with
reinterpreting memory layout as array of __u32 and accesing lo32/hi32
fields as array elements. Alternative way would be to have a local
pointer variable for anonymous struct with exactly the same layout as
struct btf_enum64, but that gets us into C++ compiler errors complaining
about invalid type casts. So play it safe, if ugly.
[0] Closes: https://github.com/libbpf/libbpf/issues/562
Fixes:
|
||
|
6a4ab8869d |
libbpf: Fix the case of running as non-root with capabilities
When running rootless with special capabilities like: FOWNER / DAC_OVERRIDE / DAC_READ_SEARCH The "access" API will not make the proper check if there is really access to a file or not. >From the access man page: " The check is done using the calling process's real UID and GID, rather than the effective IDs as is done when actually attempting an operation (e.g., open(2)) on the file. Similarly, for the root user, the check uses the set of permitted capabilities rather than the set of effective capabilities; ***and for non-root users, the check uses an empty set of capabilities.*** " What that means is that for non-root user the access API will not do the proper validation if the process really has permission to a file or not. To resolve this this patch replaces all the access API calls with faccessat with AT_EACCESS flag. Signed-off-by: Jon Doron <jond@wiz.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220925070431.1313680-1-arilou@gmail.com |
||
|
dbdea9b36f |
libbpf: restore memory layout of bpf_object_open_opts
When attach_prog_fd field was removed in libbpf 1.0 and replaced with
`long: 0` placeholder, it actually shifted all the subsequent fields by
8 byte. This is due to `long: 0` promising to adjust next field's offset
to long-aligned offset. But in this case we were already long-aligned
as pin_root_path is a pointer. So `long: 0` had no effect, and thus
didn't feel the gap created by removed attach_prog_fd.
Non-zero bitfield should have been used instead. I validated using
pahole. Originally kconfig field was at offset 40. With `long: 0` it's
at offset 32, which is wrong. With this change it's back at offset 40.
While technically libbpf 1.0 is allowed to break backwards
compatibility and applications should have been recompiled against
libbpf 1.0 headers, but given how trivial it is to preserve memory
layout, let's fix this.
Reported-by: Grant Seltzer Richman <grantseltzer@gmail.com>
Fixes:
|