kmemtrace: Additional documentation.
Documented kmemtrace's ABI, purpose and design. Also includes a short usage guide, FAQ, as well as a link to the userspace application's Git repository, which is currently hosted at repo.or.cz. Signed-off-by: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
This commit is contained in:
parent
b9ce08c010
commit
aa46a7e022
71
Documentation/ABI/testing/debugfs-kmemtrace
Normal file
71
Documentation/ABI/testing/debugfs-kmemtrace
Normal file
@ -0,0 +1,71 @@
|
|||||||
|
What: /sys/kernel/debug/kmemtrace/
|
||||||
|
Date: July 2008
|
||||||
|
Contact: Eduard - Gabriel Munteanu <eduard.munteanu@linux360.ro>
|
||||||
|
Description:
|
||||||
|
|
||||||
|
In kmemtrace-enabled kernels, the following files are created:
|
||||||
|
|
||||||
|
/sys/kernel/debug/kmemtrace/
|
||||||
|
cpu<n> (0400) Per-CPU tracing data, see below. (binary)
|
||||||
|
total_overruns (0400) Total number of bytes which were dropped from
|
||||||
|
cpu<n> files because of full buffer condition,
|
||||||
|
non-binary. (text)
|
||||||
|
abi_version (0400) Kernel's kmemtrace ABI version. (text)
|
||||||
|
|
||||||
|
Each per-CPU file should be read according to the relay interface. That is,
|
||||||
|
the reader should set affinity to that specific CPU and, as currently done by
|
||||||
|
the userspace application (though there are other methods), use poll() with
|
||||||
|
an infinite timeout before every read(). Otherwise, erroneous data may be
|
||||||
|
read. The binary data has the following _core_ format:
|
||||||
|
|
||||||
|
Event ID (1 byte) Unsigned integer, one of:
|
||||||
|
0 - represents an allocation (KMEMTRACE_EVENT_ALLOC)
|
||||||
|
1 - represents a freeing of previously allocated memory
|
||||||
|
(KMEMTRACE_EVENT_FREE)
|
||||||
|
Type ID (1 byte) Unsigned integer, one of:
|
||||||
|
0 - this is a kmalloc() / kfree()
|
||||||
|
1 - this is a kmem_cache_alloc() / kmem_cache_free()
|
||||||
|
2 - this is a __get_free_pages() et al.
|
||||||
|
Event size (2 bytes) Unsigned integer representing the
|
||||||
|
size of this event. Used to extend
|
||||||
|
kmemtrace. Discard the bytes you
|
||||||
|
don't know about.
|
||||||
|
Sequence number (4 bytes) Signed integer used to reorder data
|
||||||
|
logged on SMP machines. Wraparound
|
||||||
|
must be taken into account, although
|
||||||
|
it is unlikely.
|
||||||
|
Caller address (8 bytes) Return address to the caller.
|
||||||
|
Pointer to mem (8 bytes) Pointer to target memory area. Can be
|
||||||
|
NULL, but not all such calls might be
|
||||||
|
recorded.
|
||||||
|
|
||||||
|
In case of KMEMTRACE_EVENT_ALLOC events, the next fields follow:
|
||||||
|
|
||||||
|
Requested bytes (8 bytes) Total number of requested bytes,
|
||||||
|
unsigned, must not be zero.
|
||||||
|
Allocated bytes (8 bytes) Total number of actually allocated
|
||||||
|
bytes, unsigned, must not be lower
|
||||||
|
than requested bytes.
|
||||||
|
Requested flags (4 bytes) GFP flags supplied by the caller.
|
||||||
|
Target CPU (4 bytes) Signed integer, valid for event id 1.
|
||||||
|
If equal to -1, target CPU is the same
|
||||||
|
as origin CPU, but the reverse might
|
||||||
|
not be true.
|
||||||
|
|
||||||
|
The data is made available in the same endianness the machine has.
|
||||||
|
|
||||||
|
Other event ids and type ids may be defined and added. Other fields may be
|
||||||
|
added by increasing event size, but see below for details.
|
||||||
|
Every modification to the ABI, including new id definitions, are followed
|
||||||
|
by bumping the ABI version by one.
|
||||||
|
|
||||||
|
Adding new data to the packet (features) is done at the end of the mandatory
|
||||||
|
data:
|
||||||
|
Feature size (2 byte)
|
||||||
|
Feature ID (1 byte)
|
||||||
|
Feature data (Feature size - 4 bytes)
|
||||||
|
|
||||||
|
|
||||||
|
Users:
|
||||||
|
kmemtrace-user - git://repo.or.cz/kmemtrace-user.git
|
||||||
|
|
126
Documentation/vm/kmemtrace.txt
Normal file
126
Documentation/vm/kmemtrace.txt
Normal file
@ -0,0 +1,126 @@
|
|||||||
|
kmemtrace - Kernel Memory Tracer
|
||||||
|
|
||||||
|
by Eduard - Gabriel Munteanu
|
||||||
|
<eduard.munteanu@linux360.ro>
|
||||||
|
|
||||||
|
I. Introduction
|
||||||
|
===============
|
||||||
|
|
||||||
|
kmemtrace helps kernel developers figure out two things:
|
||||||
|
1) how different allocators (SLAB, SLUB etc.) perform
|
||||||
|
2) how kernel code allocates memory and how much
|
||||||
|
|
||||||
|
To do this, we trace every allocation and export information to the userspace
|
||||||
|
through the relay interface. We export things such as the number of requested
|
||||||
|
bytes, the number of bytes actually allocated (i.e. including internal
|
||||||
|
fragmentation), whether this is a slab allocation or a plain kmalloc() and so
|
||||||
|
on.
|
||||||
|
|
||||||
|
The actual analysis is performed by a userspace tool (see section III for
|
||||||
|
details on where to get it from). It logs the data exported by the kernel,
|
||||||
|
processes it and (as of writing this) can provide the following information:
|
||||||
|
- the total amount of memory allocated and fragmentation per call-site
|
||||||
|
- the amount of memory allocated and fragmentation per allocation
|
||||||
|
- total memory allocated and fragmentation in the collected dataset
|
||||||
|
- number of cross-CPU allocation and frees (makes sense in NUMA environments)
|
||||||
|
|
||||||
|
Moreover, it can potentially find inconsistent and erroneous behavior in
|
||||||
|
kernel code, such as using slab free functions on kmalloc'ed memory or
|
||||||
|
allocating less memory than requested (but not truly failed allocations).
|
||||||
|
|
||||||
|
kmemtrace also makes provisions for tracing on some arch and analysing the
|
||||||
|
data on another.
|
||||||
|
|
||||||
|
II. Design and goals
|
||||||
|
====================
|
||||||
|
|
||||||
|
kmemtrace was designed to handle rather large amounts of data. Thus, it uses
|
||||||
|
the relay interface to export whatever is logged to userspace, which then
|
||||||
|
stores it. Analysis and reporting is done asynchronously, that is, after the
|
||||||
|
data is collected and stored. By design, it allows one to log and analyse
|
||||||
|
on different machines and different arches.
|
||||||
|
|
||||||
|
As of writing this, the ABI is not considered stable, though it might not
|
||||||
|
change much. However, no guarantees are made about compatibility yet. When
|
||||||
|
deemed stable, the ABI should still allow easy extension while maintaining
|
||||||
|
backward compatibility. This is described further in Documentation/ABI.
|
||||||
|
|
||||||
|
Summary of design goals:
|
||||||
|
- allow logging and analysis to be done across different machines
|
||||||
|
- be fast and anticipate usage in high-load environments (*)
|
||||||
|
- be reasonably extensible
|
||||||
|
- make it possible for GNU/Linux distributions to have kmemtrace
|
||||||
|
included in their repositories
|
||||||
|
|
||||||
|
(*) - one of the reasons Pekka Enberg's original userspace data analysis
|
||||||
|
tool's code was rewritten from Perl to C (although this is more than a
|
||||||
|
simple conversion)
|
||||||
|
|
||||||
|
|
||||||
|
III. Quick usage guide
|
||||||
|
======================
|
||||||
|
|
||||||
|
1) Get a kernel that supports kmemtrace and build it accordingly (i.e. enable
|
||||||
|
CONFIG_KMEMTRACE and CONFIG_DEFAULT_ENABLED).
|
||||||
|
|
||||||
|
2) Get the userspace tool and build it:
|
||||||
|
$ git-clone git://repo.or.cz/kmemtrace-user.git # current repository
|
||||||
|
$ cd kmemtrace-user/
|
||||||
|
$ ./autogen.sh
|
||||||
|
$ ./configure
|
||||||
|
$ make
|
||||||
|
|
||||||
|
3) Boot the kmemtrace-enabled kernel if you haven't, preferably in the
|
||||||
|
'single' runlevel (so that relay buffers don't fill up easily), and run
|
||||||
|
kmemtrace:
|
||||||
|
# '$' does not mean user, but root here.
|
||||||
|
$ mount -t debugfs none /sys/kernel/debug
|
||||||
|
$ mount -t proc none /proc
|
||||||
|
$ cd path/to/kmemtrace-user/
|
||||||
|
$ ./kmemtraced
|
||||||
|
Wait a bit, then stop it with CTRL+C.
|
||||||
|
$ cat /sys/kernel/debug/kmemtrace/total_overruns # Check if we didn't
|
||||||
|
# overrun, should
|
||||||
|
# be zero.
|
||||||
|
$ (Optionally) [Run kmemtrace_check separately on each cpu[0-9]*.out file to
|
||||||
|
check its correctness]
|
||||||
|
$ ./kmemtrace-report
|
||||||
|
|
||||||
|
Now you should have a nice and short summary of how the allocator performs.
|
||||||
|
|
||||||
|
IV. FAQ and known issues
|
||||||
|
========================
|
||||||
|
|
||||||
|
Q: 'cat /sys/kernel/debug/kmemtrace/total_overruns' is non-zero, how do I fix
|
||||||
|
this? Should I worry?
|
||||||
|
A: If it's non-zero, this affects kmemtrace's accuracy, depending on how
|
||||||
|
large the number is. You can fix it by supplying a higher
|
||||||
|
'kmemtrace.subbufs=N' kernel parameter.
|
||||||
|
---
|
||||||
|
|
||||||
|
Q: kmemtrace_check reports errors, how do I fix this? Should I worry?
|
||||||
|
A: This is a bug and should be reported. It can occur for a variety of
|
||||||
|
reasons:
|
||||||
|
- possible bugs in relay code
|
||||||
|
- possible misuse of relay by kmemtrace
|
||||||
|
- timestamps being collected unorderly
|
||||||
|
Or you may fix it yourself and send us a patch.
|
||||||
|
---
|
||||||
|
|
||||||
|
Q: kmemtrace_report shows many errors, how do I fix this? Should I worry?
|
||||||
|
A: This is a known issue and I'm working on it. These might be true errors
|
||||||
|
in kernel code, which may have inconsistent behavior (e.g. allocating memory
|
||||||
|
with kmem_cache_alloc() and freeing it with kfree()). Pekka Enberg pointed
|
||||||
|
out this behavior may work with SLAB, but may fail with other allocators.
|
||||||
|
|
||||||
|
It may also be due to lack of tracing in some unusual allocator functions.
|
||||||
|
|
||||||
|
We don't want bug reports regarding this issue yet.
|
||||||
|
---
|
||||||
|
|
||||||
|
V. See also
|
||||||
|
===========
|
||||||
|
|
||||||
|
Documentation/kernel-parameters.txt
|
||||||
|
Documentation/ABI/testing/debugfs-kmemtrace
|
||||||
|
|
Loading…
Reference in New Issue
Block a user