callgrind: update notes

author: Johannes Stoelp <johannes.stoelp@gmail.com> 2024-08-31 00:10:02 +0200
committer: Johannes Stoelp <johannes.stoelp@gmail.com> 2024-08-31 00:10:02 +0200
commit: 205fa3c6183d12a274d635810197044a6487ec3d (patch)
tree: 6d302e6b8529735be8915b898e22af28c7e2c230
parent: a77e6329cfd64147dba99621663b999094239ccf (diff)
download: notes-205fa3c6183d12a274d635810197044a6487ec3d.tar.gz
notes-205fa3c6183d12a274d635810197044a6487ec3d.zip
1 files changed, 32 insertions, 4 deletions
diff --git a/src/trace_profile/callgrind.md b/src/trace_profile/callgrind.md
index 1336a9e..6935ede 100644
--- a/src/trace_profile/callgrind.md
+++ b/src/trace_profile/callgrind.md
@@ -1,13 +1,16 @@
 # callgrind
 
-Callgrind is a tracing profiler to record the function call history of a target
-program. It is part of the [valgrind][callgrind] tool suite.
+Callgrind is a tracing profiler which records the function call history of a
+target program and collects the number of executed instructions. It is part of
+the [valgrind][callgrind] tool suite.
 
 Profiling data is collected by instrumentation rather than sampling of the
 target program.
 
 Callgrind does not capture the actual time spent in a function but computes the
-cost of a function based on the instructions fetched (`Ir = Instruction read`).
+*inclusive* & *exclusive* cost of a function based on the instructions fetched
+(`Ir = Instruction read`). This provides reproducibility and high-precision and
+is a major difference to sampling profilers like `perf` or `vtune`.
 Therefore effects like slow IO are not reflected, which should be kept in mind
 when analyzing callgrind results.
 
@@ -15,7 +18,7 @@ By default the profiler data is dumped when the target process is terminating,
 but [callgrind_control] allows for interactive control of callgrind.
 ```bash
 # Run a program under callgrind.
-valgrind --tool=callgrind -- <prog>
+valgrind --tool=callgrind -- <prog> [<args>]
 
 # Interactive control of callgrind.
 callgrind_control [opts]
@@ -29,6 +32,10 @@ callgrind_control [opts]
 
 Results can be analyzed by using one of the following tools
 - [callgrind_annotate] (cli)
+  ```sh
+  # Show only specific trace events (default is all).
+  callgrind_annotate --show=Ir,Dr,Dw [callgrind_out_file]
+  ```
 - [kcachegrind] (ui)
 
 The following is a collection of frequently used callgrind options.
@@ -44,8 +51,29 @@ valgrind --tool=callgrind [opts] -- <prog>
 
     --separate-threads=<yes|no> .... create separate output files per thread,
                                      appends -<thread_id> to the output file
+
+    --cache-sim=<yes|no> ........... control if cache simulation is enabled
 ```
 
+## Trace events
+
+By default callgrind collects following events:
+- `Ir`: Instruction read
+
+Callgrind also provides a functional cache simulation with their own model,
+which is enabled by passing `--cache-sim=yes`.
+This simulates a 2-level cache hierarchy with separate L1 *instruction* and
+*data* caches (`L1i`/ `L1d`) and a *unified* last level (`LL`) cache.
+When enabled, this collects the following additional events:
+- `I1mr`: L1 cache miss on instruction read
+- `ILmr`: LL cache miss on instruction read
+- `Dr`: Data reads access
+- `D1mr`: L1 cache miss on data read
+- `DLmr`: LL cache miss on data read
+- `Dw`: Data write access
+- `D1mw`: L1 cache miss on data write
+- `DLmw`: LL cache miss on data write
+
 ## Profile specific part of the target
 Programmatically enable/disable instrumentation using the macros defined in
 the callgrind header.
author	Johannes Stoelp <johannes.stoelp@gmail.com>	2024-08-31 00:10:02 +0200
committer	Johannes Stoelp <johannes.stoelp@gmail.com>	2024-08-31 00:10:02 +0200
commit	205fa3c6183d12a274d635810197044a6487ec3d (patch)
tree	6d302e6b8529735be8915b898e22af28c7e2c230
parent	a77e6329cfd64147dba99621663b999094239ccf (diff)
download	notes-205fa3c6183d12a274d635810197044a6487ec3d.tar.gz notes-205fa3c6183d12a274d635810197044a6487ec3d.zip