From b737cc8ca5bb8ca5e07cd0151d678a7b4b10d5cb Mon Sep 17 00:00:00 2001
From: Johannes Stoelp <johannes.stoelp@gmail.com>
Date: Wed, 1 May 2024 14:57:52 +0200
Subject: cli: add new group for cli foo tools

---
 src/cli/awk.md | 197 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 197 insertions(+)
 create mode 100644 src/cli/awk.md

(limited to 'src/cli/awk.md')
diff --git a/src/cli/awk.md b/src/cli/awk.md
new file mode 100644
index 0000000..d6f6c9c
--- /dev/null
+++ b/src/cli/awk.md
@@ -0,0 +1,197 @@
+# awk(1)
+
+```markdown
+awk [opt] program [input]
+    -F <sepstr>        field separator string (can be regex)
+    program            awk program
+    input              file or stdin if not file given
+```
+
+## Input processing
+
+Input is processed in two stages:
+1. Splitting input into a sequence of `records`.
+   By default split at `newline` character, but can be changed via the
+   builtin `RS` variable.
+2. Splitting a `record` into `fields`. By default strings without `whitespace`,
+   but can be changed via the builtin variable `FS` or command line option
+   `-F`.
+
+Fields are accessed as follows:
+- `$0` whole `record`
+- `$1` field one
+- `$2` field two
+- ...
+
+## Program
+
+An `awk` program is composed of pairs of the form:
+```markdown
+pattern { action }
+```
+The program is run against each `record` in the input stream. If a `pattern`
+matches a `record` the corresponding `action` is executed and can access the
+`fields`.
+
+```markdown
+INPUT
+  |
+  v
+record ----> ∀ pattern matched
+  |                   |
+  v                   v
+fields ----> run associated action
+```
+
+Any valid awk `expr` can be a `pattern`.
+
+An example is the regex pattern `/abc/ { print $1 }` which prints the first
+field if the record matches the regex `/abc/`. This form is actually a short
+version for `$0 ~ /abc/ { print $1 }`, see the regex comparison operator
+below.
+
+### Special pattern
+
+awk provides two special patterns, `BEGIN` and `END`, which can be used
+multiple times. Actions with those patterns are **executed exactly once**.
+- `BEGIN` actions are run before processing the first record
+- `END` actions are run after processing the last record
+
+### Special variables
+
+- `RS` _record separator_: first char is the record separator, by default
+  <newline>
+- `FS` _field separator_: regex to split records into fields, by default
+  <space>
+- `NR` _number record_: number of current record
+- `NF` _number fields_: number of fields in the current record
+
+### Special statements & functions
+
+- `printf "fmt", args...`
+
+  Print format string, args are comma separated.
+  - `%s` string
+  - `%d` decimal
+  - `%x` hex
+  - `%f` float
+
+  Width can be specified as `%Ns`, this reserves `N` chars for a string.
+  For floats one can use `%N.Mf`, `N` is the total number including `.` and
+  `M`.
+
+- `sprintf("fmt", expr, ...)`
+
+    Format the expressions according to the format string. Similar as `printf`,
+    but this is a function and return value can be assigned to a variable.
+
+- `strftime("fmt")`
+
+  Print time stamp formatted by `fmt`.
+  - `%Y` full year (eg 2020)
+  - `%m` month (01-12)
+  - `%d` day (01-31)
+  - `%F` alias for `%Y-%m-%d`
+  - `%H` hour (00-23)
+  - `%M` minute (00-59)
+  - `%S` second (00-59)
+  - `%T` alias for `%H:%M:%S`
+
+- `S ~ R`, `S !~ R`
+
+  The regex comparison operator, where the former returns true if the string
+  `S` matches the regex `R`, and the latter is the negated form.
+  The regex can be either a
+  [constant](https://www.gnu.org/software/gawk/manual/html_node/Regexp-Usage.html)
+  or [dynamic](
+  https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html)
+  regex.
+
+## Examples
+
+### Filter records
+```bash
+awk 'NR%2 == 0 { print $0 }' <file>
+```
+The pattern `NR%2 == 0` matches every second record and the action `{ print $0 }`
+prints the whole record.
+
+### Negative patterns
+```bash
+awk '!/^#/ { print $1 }' <file>
+```
+Matches records not starting with `#`.
+
+### Range patterns
+```bash
+echo -e "a\nFOO\nb\nc\nBAR\nd" | \
+    awk '/FOO/,/BAR/ { print }'
+```
+`/FOO/,/BAR/` define a range pattern of `begin_pattern, end_pattern`. When
+`begin_pattern` is matched the range is **turned on** and when the
+`end_pattern` is matched the range is **turned off**. This matches every record
+in the range _inclusive_.
+
+An _exclusive_ range must be handled explicitly, for example as follows.
+```bash
+echo -e "a\nFOO\nb\nc\nBAR\nd" | \
+    awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }'
+```
+
+### Access last fields in records
+```bash
+echo 'a b c d e f' | awk '{ print $NF $(NF-1) }'
+```
+Access last fields with arithmetic on the `NF` number of fields variable.
+
+### Split on multiple tokens
+```bash
+echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }'
+```
+Use regex as field separator.
+
+### Capture in variables
+```bash
+# /proc/<pid>/status
+#   Name:    cat
+#   ...
+#   VmRSS:   516 kB
+#   ...
+
+for f in /proc/*/status; do
+    cat $f | awk '
+             /^VmRSS/ { rss = $2/1024 }
+             /^Name/ { name = $2 }
+             END { printf "%16s %6d MB\n", name, rss }';
+done | sort -k2 -n
+```
+We capture values from `VmRSS` and `Name` into variables and print them at the
+`END` once processing all records is done.
+
+### Capture in array
+```bash
+echo 'a 10
+b 2
+b 4
+a 1' | awk '{
+    vals[$1] += $2
+    cnts[$1] += 1
+}
+END {
+    for (v in vals)
+        printf "%s %d\n", v, vals[v] / cnts [v]
+}'
+```
+Capture keys and values from different columns and some up the values.
+At the `END` we compute the average of each key.
+
+### Run shell command and capture output
+```bash
+cat /proc/1/status | awk '
+                     /^Pid/ {
+                        "ps --no-header -o user " $2 | getline user;
+                         print user
+                     }'
+```
+We build a `ps` command line and capture the first line of the processes output
+in the `user` variable and then print it.
-- 
cgit v1.2.3