aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/trace_profile
diff options
context:
space:
mode:
Diffstat (limited to 'src/trace_profile')
-rw-r--r--src/trace_profile/callgrind.md36
1 files changed, 32 insertions, 4 deletions
diff --git a/src/trace_profile/callgrind.md b/src/trace_profile/callgrind.md
index 1336a9e..6935ede 100644
--- a/src/trace_profile/callgrind.md
+++ b/src/trace_profile/callgrind.md
@@ -1,13 +1,16 @@
# callgrind
-Callgrind is a tracing profiler to record the function call history of a target
-program. It is part of the [valgrind][callgrind] tool suite.
+Callgrind is a tracing profiler which records the function call history of a
+target program and collects the number of executed instructions. It is part of
+the [valgrind][callgrind] tool suite.
Profiling data is collected by instrumentation rather than sampling of the
target program.
Callgrind does not capture the actual time spent in a function but computes the
-cost of a function based on the instructions fetched (`Ir = Instruction read`).
+*inclusive* & *exclusive* cost of a function based on the instructions fetched
+(`Ir = Instruction read`). This provides reproducibility and high-precision and
+is a major difference to sampling profilers like `perf` or `vtune`.
Therefore effects like slow IO are not reflected, which should be kept in mind
when analyzing callgrind results.
@@ -15,7 +18,7 @@ By default the profiler data is dumped when the target process is terminating,
but [callgrind_control] allows for interactive control of callgrind.
```bash
# Run a program under callgrind.
-valgrind --tool=callgrind -- <prog>
+valgrind --tool=callgrind -- <prog> [<args>]
# Interactive control of callgrind.
callgrind_control [opts]
@@ -29,6 +32,10 @@ callgrind_control [opts]
Results can be analyzed by using one of the following tools
- [callgrind_annotate] (cli)
+ ```sh
+ # Show only specific trace events (default is all).
+ callgrind_annotate --show=Ir,Dr,Dw [callgrind_out_file]
+ ```
- [kcachegrind] (ui)
The following is a collection of frequently used callgrind options.
@@ -44,8 +51,29 @@ valgrind --tool=callgrind [opts] -- <prog>
--separate-threads=<yes|no> .... create separate output files per thread,
appends -<thread_id> to the output file
+
+ --cache-sim=<yes|no> ........... control if cache simulation is enabled
```
+## Trace events
+
+By default callgrind collects following events:
+- `Ir`: Instruction read
+
+Callgrind also provides a functional cache simulation with their own model,
+which is enabled by passing `--cache-sim=yes`.
+This simulates a 2-level cache hierarchy with separate L1 *instruction* and
+*data* caches (`L1i`/ `L1d`) and a *unified* last level (`LL`) cache.
+When enabled, this collects the following additional events:
+- `I1mr`: L1 cache miss on instruction read
+- `ILmr`: LL cache miss on instruction read
+- `Dr`: Data reads access
+- `D1mr`: L1 cache miss on data read
+- `DLmr`: LL cache miss on data read
+- `Dw`: Data write access
+- `D1mw`: L1 cache miss on data write
+- `DLmw`: LL cache miss on data write
+
## Profile specific part of the target
Programmatically enable/disable instrumentation using the macros defined in
the callgrind header.