diff options
Diffstat (limited to 'src/trace_profile')
-rw-r--r-- | src/trace_profile/callgrind.md | 36 |
1 files changed, 32 insertions, 4 deletions
diff --git a/src/trace_profile/callgrind.md b/src/trace_profile/callgrind.md index 1336a9e..6935ede 100644 --- a/src/trace_profile/callgrind.md +++ b/src/trace_profile/callgrind.md @@ -1,13 +1,16 @@ # callgrind -Callgrind is a tracing profiler to record the function call history of a target -program. It is part of the [valgrind][callgrind] tool suite. +Callgrind is a tracing profiler which records the function call history of a +target program and collects the number of executed instructions. It is part of +the [valgrind][callgrind] tool suite. Profiling data is collected by instrumentation rather than sampling of the target program. Callgrind does not capture the actual time spent in a function but computes the -cost of a function based on the instructions fetched (`Ir = Instruction read`). +*inclusive* & *exclusive* cost of a function based on the instructions fetched +(`Ir = Instruction read`). This provides reproducibility and high-precision and +is a major difference to sampling profilers like `perf` or `vtune`. Therefore effects like slow IO are not reflected, which should be kept in mind when analyzing callgrind results. @@ -15,7 +18,7 @@ By default the profiler data is dumped when the target process is terminating, but [callgrind_control] allows for interactive control of callgrind. ```bash # Run a program under callgrind. -valgrind --tool=callgrind -- <prog> +valgrind --tool=callgrind -- <prog> [<args>] # Interactive control of callgrind. callgrind_control [opts] @@ -29,6 +32,10 @@ callgrind_control [opts] Results can be analyzed by using one of the following tools - [callgrind_annotate] (cli) + ```sh + # Show only specific trace events (default is all). + callgrind_annotate --show=Ir,Dr,Dw [callgrind_out_file] + ``` - [kcachegrind] (ui) The following is a collection of frequently used callgrind options. @@ -44,8 +51,29 @@ valgrind --tool=callgrind [opts] -- <prog> --separate-threads=<yes|no> .... create separate output files per thread, appends -<thread_id> to the output file + + --cache-sim=<yes|no> ........... control if cache simulation is enabled ``` +## Trace events + +By default callgrind collects following events: +- `Ir`: Instruction read + +Callgrind also provides a functional cache simulation with their own model, +which is enabled by passing `--cache-sim=yes`. +This simulates a 2-level cache hierarchy with separate L1 *instruction* and +*data* caches (`L1i`/ `L1d`) and a *unified* last level (`LL`) cache. +When enabled, this collects the following additional events: +- `I1mr`: L1 cache miss on instruction read +- `ILmr`: LL cache miss on instruction read +- `Dr`: Data reads access +- `D1mr`: L1 cache miss on data read +- `DLmr`: LL cache miss on data read +- `Dw`: Data write access +- `D1mw`: L1 cache miss on data write +- `DLmw`: LL cache miss on data write + ## Profile specific part of the target Programmatically enable/disable instrumentation using the macros defined in the callgrind header. |