diff options
author | johannst <johannst@users.noreply.github.com> | 2023-08-22 21:38:08 +0000 |
---|---|---|
committer | johannst <johannst@users.noreply.github.com> | 2023-08-22 21:38:08 +0000 |
commit | f0cf514eb3ca30c5170e534c3861ad73996c7726 (patch) | |
tree | d5d1eb653716e4c01174f9bd86bf85f47e8ac15b /development/pgo.html | |
parent | 928b2fa4af916a5f75d1269620914f7bb4225e6e (diff) | |
download | notes-f0cf514eb3ca30c5170e534c3861ad73996c7726.tar.gz notes-f0cf514eb3ca30c5170e534c3861ad73996c7726.zip |
deploy: 9bb639287cae88b32fc1b17b7a4b494340e54434
Diffstat (limited to 'development/pgo.html')
-rw-r--r-- | development/pgo.html | 363 |
1 files changed, 363 insertions, 0 deletions
diff --git a/development/pgo.html b/development/pgo.html new file mode 100644 index 0000000..b928df6 --- /dev/null +++ b/development/pgo.html @@ -0,0 +1,363 @@ +<!DOCTYPE HTML> +<html lang="en" class="sidebar-visible no-js light"> + <head> + <!-- Book generated using mdBook --> + <meta charset="UTF-8"> + <title>pgo - Notes</title> + + + <!-- Custom HTML head --> + + <meta name="description" content=""> + <meta name="viewport" content="width=device-width, initial-scale=1"> + <meta name="theme-color" content="#ffffff" /> + + <link rel="icon" href="../favicon.svg"> + <link rel="shortcut icon" href="../favicon.png"> + <link rel="stylesheet" href="../css/variables.css"> + <link rel="stylesheet" href="../css/general.css"> + <link rel="stylesheet" href="../css/chrome.css"> + <link rel="stylesheet" href="../css/print.css" media="print"> + + <!-- Fonts --> + <link rel="stylesheet" href="../FontAwesome/css/font-awesome.css"> + <link rel="stylesheet" href="../fonts/fonts.css"> + + <!-- Highlight.js Stylesheets --> + <link rel="stylesheet" href="../highlight.css"> + <link rel="stylesheet" href="../tomorrow-night.css"> + <link rel="stylesheet" href="../ayu-highlight.css"> + + <!-- Custom theme stylesheets --> + + </head> + <body> + <div id="body-container"> + <!-- Provide site root to javascript --> + <script> + var path_to_root = "../"; + var default_theme = window.matchMedia("(prefers-color-scheme: dark)").matches ? "navy" : "light"; + </script> + + <!-- Work around some values being stored in localStorage wrapped in quotes --> + <script> + try { + var theme = localStorage.getItem('mdbook-theme'); + var sidebar = localStorage.getItem('mdbook-sidebar'); + + if (theme.startsWith('"') && theme.endsWith('"')) { + localStorage.setItem('mdbook-theme', theme.slice(1, theme.length - 1)); + } + + if (sidebar.startsWith('"') && sidebar.endsWith('"')) { + localStorage.setItem('mdbook-sidebar', sidebar.slice(1, sidebar.length - 1)); + } + } catch (e) { } + </script> + + <!-- Set the theme before any content is loaded, prevents flash --> + <script> + var theme; + try { theme = localStorage.getItem('mdbook-theme'); } catch(e) { } + if (theme === null || theme === undefined) { theme = default_theme; } + var html = document.querySelector('html'); + html.classList.remove('no-js') + html.classList.remove('light') + html.classList.add(theme); + html.classList.add('js'); + </script> + + <!-- Hide / unhide sidebar before it is displayed --> + <script> + var html = document.querySelector('html'); + var sidebar = null; + if (document.body.clientWidth >= 1080) { + try { sidebar = localStorage.getItem('mdbook-sidebar'); } catch(e) { } + sidebar = sidebar || 'visible'; + } else { + sidebar = 'hidden'; + } + html.classList.remove('sidebar-visible'); + html.classList.add("sidebar-" + sidebar); + </script> + + <nav id="sidebar" class="sidebar" aria-label="Table of contents"> + <div class="sidebar-scrollbox"> + <ol class="chapter"><li class="chapter-item expanded affix "><a href="../intro.html">Introduction</a></li><li class="chapter-item expanded "><a href="../tools/index.html"><strong aria-hidden="true">1.</strong> Tools</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../tools/zsh.html"><strong aria-hidden="true">1.1.</strong> zsh</a></li><li class="chapter-item expanded "><a href="../tools/bash.html"><strong aria-hidden="true">1.2.</strong> bash</a></li><li class="chapter-item expanded "><a href="../tools/fish.html"><strong aria-hidden="true">1.3.</strong> fish</a></li><li class="chapter-item expanded "><a href="../tools/tmux.html"><strong aria-hidden="true">1.4.</strong> tmux</a></li><li class="chapter-item expanded "><a href="../tools/git.html"><strong aria-hidden="true">1.5.</strong> git</a></li><li class="chapter-item expanded "><a href="../tools/awk.html"><strong aria-hidden="true">1.6.</strong> awk</a></li><li class="chapter-item expanded "><a href="../tools/emacs.html"><strong aria-hidden="true">1.7.</strong> emacs</a></li><li class="chapter-item expanded "><a href="../tools/gpg.html"><strong aria-hidden="true">1.8.</strong> gpg</a></li><li class="chapter-item expanded "><a href="../tools/gdb.html"><strong aria-hidden="true">1.9.</strong> gdb</a></li><li class="chapter-item expanded "><a href="../tools/gdbserver.html"><strong aria-hidden="true">1.10.</strong> gdbserver</a></li><li class="chapter-item expanded "><a href="../tools/radare2.html"><strong aria-hidden="true">1.11.</strong> radare2</a></li><li class="chapter-item expanded "><a href="../tools/qemu.html"><strong aria-hidden="true">1.12.</strong> qemu</a></li><li class="chapter-item expanded "><a href="../tools/pacman.html"><strong aria-hidden="true">1.13.</strong> pacman</a></li><li class="chapter-item expanded "><a href="../tools/dot.html"><strong aria-hidden="true">1.14.</strong> dot</a></li></ol></li><li class="chapter-item expanded "><a href="../monitor/index.html"><strong aria-hidden="true">2.</strong> Resource analysis & monitor</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../monitor/lsof.html"><strong aria-hidden="true">2.1.</strong> lsof</a></li><li class="chapter-item expanded "><a href="../monitor/ss.html"><strong aria-hidden="true">2.2.</strong> ss</a></li><li class="chapter-item expanded "><a href="../monitor/pidstat.html"><strong aria-hidden="true">2.3.</strong> pidstat</a></li><li class="chapter-item expanded "><a href="../monitor/pgrep.html"><strong aria-hidden="true">2.4.</strong> pgrep</a></li><li class="chapter-item expanded "><a href="../monitor/pmap.html"><strong aria-hidden="true">2.5.</strong> pmap</a></li><li class="chapter-item expanded "><a href="../monitor/pstack.html"><strong aria-hidden="true">2.6.</strong> pstack</a></li></ol></li><li class="chapter-item expanded "><a href="../trace_profile/index.html"><strong aria-hidden="true">3.</strong> Trace and Profile</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../trace_profile/strace.html"><strong aria-hidden="true">3.1.</strong> strace</a></li><li class="chapter-item expanded "><a href="../trace_profile/ltrace.html"><strong aria-hidden="true">3.2.</strong> ltrace</a></li><li class="chapter-item expanded "><a href="../trace_profile/perf.html"><strong aria-hidden="true">3.3.</strong> perf</a></li><li class="chapter-item expanded "><a href="../trace_profile/oprofile.html"><strong aria-hidden="true">3.4.</strong> OProfile</a></li><li class="chapter-item expanded "><a href="../trace_profile/time.html"><strong aria-hidden="true">3.5.</strong> time</a></li></ol></li><li class="chapter-item expanded "><a href="../binary/index.html"><strong aria-hidden="true">4.</strong> Binary</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../binary/od.html"><strong aria-hidden="true">4.1.</strong> od</a></li><li class="chapter-item expanded "><a href="../binary/xxd.html"><strong aria-hidden="true">4.2.</strong> xxd</a></li><li class="chapter-item expanded "><a href="../binary/readelf.html"><strong aria-hidden="true">4.3.</strong> readelf</a></li><li class="chapter-item expanded "><a href="../binary/objdump.html"><strong aria-hidden="true">4.4.</strong> objdump</a></li><li class="chapter-item expanded "><a href="../binary/nm.html"><strong aria-hidden="true">4.5.</strong> nm</a></li></ol></li><li class="chapter-item expanded "><a href="../development/index.html"><strong aria-hidden="true">5.</strong> Development</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../development/c++filt.html"><strong aria-hidden="true">5.1.</strong> c++filt</a></li><li class="chapter-item expanded "><a href="../development/c++.html"><strong aria-hidden="true">5.2.</strong> c++</a></li><li class="chapter-item expanded "><a href="../development/glibc.html"><strong aria-hidden="true">5.3.</strong> glibc</a></li><li class="chapter-item expanded "><a href="../development/gcc.html"><strong aria-hidden="true">5.4.</strong> gcc</a></li><li class="chapter-item expanded "><a href="../development/make.html"><strong aria-hidden="true">5.5.</strong> make</a></li><li class="chapter-item expanded "><a href="../development/ld.so.html"><strong aria-hidden="true">5.6.</strong> ld.so</a></li><li class="chapter-item expanded "><a href="../development/symbolver.html"><strong aria-hidden="true">5.7.</strong> symbol versioning</a></li><li class="chapter-item expanded "><a href="../development/python.html"><strong aria-hidden="true">5.8.</strong> python</a></li><li class="chapter-item expanded "><a href="../development/gcov.html"><strong aria-hidden="true">5.9.</strong> gcov</a></li><li class="chapter-item expanded "><a href="../development/pgo.html" class="active"><strong aria-hidden="true">5.10.</strong> pgo</a></li></ol></li><li class="chapter-item expanded "><a href="../linux/index.html"><strong aria-hidden="true">6.</strong> Linux</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../linux/systemd.html"><strong aria-hidden="true">6.1.</strong> systemd</a></li><li class="chapter-item expanded "><a href="../linux/coredump.html"><strong aria-hidden="true">6.2.</strong> coredump</a></li><li class="chapter-item expanded "><a href="../linux/ptrace_scope.html"><strong aria-hidden="true">6.3.</strong> ptrace_scope</a></li><li class="chapter-item expanded "><a href="../linux/cryptsetup.html"><strong aria-hidden="true">6.4.</strong> cryptsetup</a></li><li class="chapter-item expanded "><a href="../linux/swap.html"><strong aria-hidden="true">6.5.</strong> swap</a></li><li class="chapter-item expanded "><a href="../linux/input.html"><strong aria-hidden="true">6.6.</strong> input</a></li><li class="chapter-item expanded "><a href="../linux/acl.html"><strong aria-hidden="true">6.7.</strong> acl</a></li><li class="chapter-item expanded "><a href="../linux/zfs.html"><strong aria-hidden="true">6.8.</strong> zfs</a></li></ol></li><li class="chapter-item expanded "><a href="../network/index.html"><strong aria-hidden="true">7.</strong> Network</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../network/tcpdump.html"><strong aria-hidden="true">7.1.</strong> tcpdump</a></li><li class="chapter-item expanded "><a href="../network/firewall-cmd.html"><strong aria-hidden="true">7.2.</strong> firewall-cmd</a></li><li class="chapter-item expanded "><a href="../network/nftables.html"><strong aria-hidden="true">7.3.</strong> nftables</a></li></ol></li><li class="chapter-item expanded "><a href="../web/index.html"><strong aria-hidden="true">8.</strong> Web</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../web/html.html"><strong aria-hidden="true">8.1.</strong> html</a></li><li class="chapter-item expanded "><a href="../web/chartjs.html"><strong aria-hidden="true">8.2.</strong> chartjs</a></li></ol></li><li class="chapter-item expanded "><a href="../arch/index.html"><strong aria-hidden="true">9.</strong> Arch</a></li><li><ol class="section"><li class="chapter-item expanded "><a href="../arch/x86_64.html"><strong aria-hidden="true">9.1.</strong> x86_64</a></li><li class="chapter-item expanded "><a href="../arch/arm64.html"><strong aria-hidden="true">9.2.</strong> arm64</a></li><li class="chapter-item expanded "><a href="../arch/armv7.html"><strong aria-hidden="true">9.3.</strong> armv7</a></li><li class="chapter-item expanded "><a href="../arch/riscv.html"><strong aria-hidden="true">9.4.</strong> riscv</a></li></ol></li></ol> + </div> + <div id="sidebar-resize-handle" class="sidebar-resize-handle"></div> + </nav> + + <!-- Track and set sidebar scroll position --> + <script> + var sidebarScrollbox = document.querySelector('#sidebar .sidebar-scrollbox'); + sidebarScrollbox.addEventListener('click', function(e) { + if (e.target.tagName === 'A') { + sessionStorage.setItem('sidebar-scroll', sidebarScrollbox.scrollTop); + } + }, { passive: true }); + var sidebarScrollTop = sessionStorage.getItem('sidebar-scroll'); + sessionStorage.removeItem('sidebar-scroll'); + if (sidebarScrollTop) { + // preserve sidebar scroll position when navigating via links within sidebar + sidebarScrollbox.scrollTop = sidebarScrollTop; + } else { + // scroll sidebar to current active section when navigating via "next/previous chapter" buttons + var activeSection = document.querySelector('#sidebar .active'); + if (activeSection) { + activeSection.scrollIntoView({ block: 'center' }); + } + } + </script> + + <div id="page-wrapper" class="page-wrapper"> + + <div class="page"> + <div id="menu-bar-hover-placeholder"></div> + <div id="menu-bar" class="menu-bar sticky"> + <div class="left-buttons"> + <button id="sidebar-toggle" class="icon-button" type="button" title="Toggle Table of Contents" aria-label="Toggle Table of Contents" aria-controls="sidebar"> + <i class="fa fa-bars"></i> + </button> + <button id="theme-toggle" class="icon-button" type="button" title="Change theme" aria-label="Change theme" aria-haspopup="true" aria-expanded="false" aria-controls="theme-list"> + <i class="fa fa-paint-brush"></i> + </button> + <ul id="theme-list" class="theme-popup" aria-label="Themes" role="menu"> + <li role="none"><button role="menuitem" class="theme" id="light">Light</button></li> + <li role="none"><button role="menuitem" class="theme" id="rust">Rust</button></li> + <li role="none"><button role="menuitem" class="theme" id="coal">Coal</button></li> + <li role="none"><button role="menuitem" class="theme" id="navy">Navy</button></li> + <li role="none"><button role="menuitem" class="theme" id="ayu">Ayu</button></li> + </ul> + <button id="search-toggle" class="icon-button" type="button" title="Search. (Shortkey: s)" aria-label="Toggle Searchbar" aria-expanded="false" aria-keyshortcuts="S" aria-controls="searchbar"> + <i class="fa fa-search"></i> + </button> + </div> + + <h1 class="menu-title">Notes</h1> + + <div class="right-buttons"> + <a href="../print.html" title="Print this book" aria-label="Print this book"> + <i id="print-button" class="fa fa-print"></i> + </a> + <a href="https://github.com/johannst/notes" title="Git repository" aria-label="Git repository"> + <i id="git-repository-button" class="fa fa-github"></i> + </a> + + </div> + </div> + + <div id="search-wrapper" class="hidden"> + <form id="searchbar-outer" class="searchbar-outer"> + <input type="search" id="searchbar" name="searchbar" placeholder="Search this book ..." aria-controls="searchresults-outer" aria-describedby="searchresults-header"> + </form> + <div id="searchresults-outer" class="searchresults-outer hidden"> + <div id="searchresults-header" class="searchresults-header"></div> + <ul id="searchresults"> + </ul> + </div> + </div> + + <!-- Apply ARIA attributes after the sidebar and the sidebar toggle button are added to the DOM --> + <script> + document.getElementById('sidebar-toggle').setAttribute('aria-expanded', sidebar === 'visible'); + document.getElementById('sidebar').setAttribute('aria-hidden', sidebar !== 'visible'); + Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) { + link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1); + }); + </script> + + <div id="content" class="content"> + <main> + <h1 id="profile-guided-optimization-pgo"><a class="header" href="#profile-guided-optimization-pgo">Profile guided optimization (pgo)</a></h1> +<p><code>pgo</code> is an optimization technique to optimize a program for its usual +workload.</p> +<p>It is applied in two phases:</p> +<ol> +<li>Collect profiling data (best with representative benchmarks).</li> +<li>Optimize program based on collected profiling data.</li> +</ol> +<p>The following simple program is used as demonstrator.</p> +<pre><code class="language-c">#include <stdio.h> + +#define NOINLINE __attribute__((noinline)) + +NOINLINE void foo() { puts("foo()"); } +NOINLINE void bar() { puts("bar()"); } + +int main(int argc, char *argv[]) { + if (argc == 2) { + foo(); + } else { + bar(); + } +} +</code></pre> +<h2 id="clang"><a class="header" href="#clang">clang</a></h2> +<p>On the actual machine with <code>clang 15.0.7</code>, the following code is generated for +the <code>main()</code> function.</p> +<pre><code class="language-x86asm"># clang -o test test.c -O3 + +0000000000001160 <main>: + 1160: 50 push rax + ; Jump if argc != 2. + 1161: 83 ff 02 cmp edi,0x2 + 1164: 75 09 jne 116f <main+0xf> + ; foor() is on the hot path (fall-through). + 1166: e8 d5 ff ff ff call 1140 <_Z3foov> + 116b: 31 c0 xor eax,eax + 116d: 59 pop rcx + 116e: c3 ret + ; bar() is on the cold path (branch). + 116f: e8 dc ff ff ff call 1150 <_Z3barv> + 1174: 31 c0 xor eax,eax + 1176: 59 pop rcx + 1177: c3 ret +</code></pre> +<p>The following shows how to compile with profiling instrumentation and how to +optimize the final program with the collected profiling data (<a href="https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization">llvm +pgo</a>).</p> +<p>The arguments to <code>./test</code> are chosen such that <code>9/10</code> runs call <code>bar()</code>, which +is currently on the <code>cold path</code>.</p> +<pre><code class="language-bash"># Compile test program with profiling instrumentation. +clang -o test test.cc -O3 -fprofile-instr-generate + +# Collect profiling data from multiple runs. +for i in {0..10}; do + LLVM_PROFILE_FILE="prof.clang/%p.profraw" ./test $(seq 0 $i) +done + +# Merge raw profiling data into single profile data. +llvm-profdata merge -o pgo.profdata prof.clang/*.profraw + +# Optimize test program with profiling data. +clang -o test test.cc -O3 -fprofile-use=pgo.profdata +</code></pre> +<blockquote> +<p>NOTE: If <code>LLVM_PROFILE_FILE</code> is not given the profile data is written to +<code>default.profraw</code> which is re-written on each run. If the <code>LLVM_PROFILE_FILE</code> +contains a <code>%m</code> in the filename, a unique integer will be generated and +consecutive runs will update the same generated profraw file, +<code>LLVM_PROFILE_FILE</code> can specify a new file every time, however that requires +more storage in general.</p> +</blockquote> +<p>After optimizing the program with the profiling data, the <code>main()</code> function +looks as follows.</p> +<pre><code class="language-x86asm">0000000000001060 <main>: + 1060: 50 push rax + ; Jump if argc == 2. + 1061: 83 ff 02 cmp edi,0x2 + 1064: 74 09 je 106f <main+0xf> + ; bar() is on the hot path (fall-through). + 1066: e8 e5 ff ff ff call 1050 <_Z3barv> + 106b: 31 c0 xor eax,eax + 106d: 59 pop rcx + 106e: c3 ret + ; foo() is on the cold path (branch). + 106f: e8 cc ff ff ff call 1040 <_Z3foov> + 1074: 31 c0 xor eax,eax + 1076: 59 pop rcx + 1077: c3 ret +</code></pre> +<h2 id="gcc"><a class="header" href="#gcc">gcc</a></h2> +<p>With <code>gcc 13.2.1</code> on the current machine, the optimizer puts <code>bar()</code> on the +<code>hot path</code> by default.</p> +<pre><code class="language-x86asm">0000000000001040 <main>: + 1040: 48 83 ec 08 sub rsp,0x8 + ; Jump if argc == 2. + 1044: 83 ff 02 cmp edi,0x2 + 1047: 74 0c je 1055 <main+0x15> + ; bar () is on the hot path (fall-through). + 1049: e8 22 01 00 00 call 1170 <_Z3barv> + 104e: 31 c0 xor eax,eax + 1050: 48 83 c4 08 add rsp,0x8 + 1054: c3 ret + ; foo() is on the cold path (branch). + 1055: e8 06 01 00 00 call 1160 <_Z3foov> + 105a: eb f2 jmp 104e <main+0xe> + 105c: 0f 1f 40 00 nop DWORD PTR [rax+0x0] + +</code></pre> +<p>The following shows how to compile with profiling instrumentation and how to +optimize the final program with the collected profiling data.</p> +<p>The arguments to <code>./test</code> are chosen such that <code>2/3</code> runs call <code>foo()</code>, which +is currently on the <code>cold path</code>.</p> +<pre><code class="language-bash">gcc -o test test.cc -O3 -fprofile-generate +./test 1 +./test 1 +./test 2 2 +gcc -o test test.cc -O3 -fprofile-use +</code></pre> +<blockquote> +<p>NOTE: Consecutive runs update the generated <code>test.gcda</code> profile data file +rather than re-write it.</p> +</blockquote> +<p>After optimizing the program with the profiling data, the <code>main()</code> function</p> +<pre><code class="language-x86asm">0000000000001040 <main.cold>: + ; bar() is on the cold path (branch). + 1040: e8 05 00 00 00 call 104a <_Z3barv> + 1045: e9 25 00 00 00 jmp 106f <main+0xf> + +0000000000001060 <main>: + 1060: 51 push rcx + ; Jump if argc != 2. + 1061: 83 ff 02 cmp edi,0x2 + 1064: 0f 85 d6 ff ff ff jne 1040 <main.cold> + ; for() is on the hot path (fall-through). + 106a: e8 11 01 00 00 call 1180 <_Z3foov> + 106f: 31 c0 xor eax,eax + 1071: 5a pop rdx + 1072: c3 ret +</code></pre> + + </main> + + <nav class="nav-wrapper" aria-label="Page navigation"> + <!-- Mobile navigation buttons --> + <a rel="prev" href="../development/gcov.html" class="mobile-nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left"> + <i class="fa fa-angle-left"></i> + </a> + + <a rel="next" href="../linux/index.html" class="mobile-nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right"> + <i class="fa fa-angle-right"></i> + </a> + + <div style="clear: both"></div> + </nav> + </div> + </div> + + <nav class="nav-wide-wrapper" aria-label="Page navigation"> + <a rel="prev" href="../development/gcov.html" class="nav-chapters previous" title="Previous chapter" aria-label="Previous chapter" aria-keyshortcuts="Left"> + <i class="fa fa-angle-left"></i> + </a> + + <a rel="next" href="../linux/index.html" class="nav-chapters next" title="Next chapter" aria-label="Next chapter" aria-keyshortcuts="Right"> + <i class="fa fa-angle-right"></i> + </a> + </nav> + + </div> + + + + + <script> + window.playground_copyable = true; + </script> + + + <script src="../elasticlunr.min.js"></script> + <script src="../mark.min.js"></script> + <script src="../searcher.js"></script> + + <script src="../clipboard.min.js"></script> + <script src="../highlight.js"></script> + <script src="../book.js"></script> + + <!-- Custom JS scripts --> + + + </div> + </body> +</html> |