aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/cli/awk.md
diff options
context:
space:
mode:
authorJohannes Stoelp <johannes.stoelp@gmail.com>2024-05-01 14:57:52 +0200
committerJohannes Stoelp <johannes.stoelp@gmail.com>2024-05-01 14:57:52 +0200
commitb737cc8ca5bb8ca5e07cd0151d678a7b4b10d5cb (patch)
tree86814d8fb3557ea2cbf73892dd0ec4e590e854de /src/cli/awk.md
parent50e07a8bca68d2f568df44166fa94383141c2696 (diff)
downloadnotes-b737cc8ca5bb8ca5e07cd0151d678a7b4b10d5cb.tar.gz
notes-b737cc8ca5bb8ca5e07cd0151d678a7b4b10d5cb.zip
cli: add new group for cli foo tools
Diffstat (limited to 'src/cli/awk.md')
-rw-r--r--src/cli/awk.md197
1 files changed, 197 insertions, 0 deletions
diff --git a/src/cli/awk.md b/src/cli/awk.md
new file mode 100644
index 0000000..d6f6c9c
--- /dev/null
+++ b/src/cli/awk.md
@@ -0,0 +1,197 @@
+# awk(1)
+
+```markdown
+awk [opt] program [input]
+ -F <sepstr> field separator string (can be regex)
+ program awk program
+ input file or stdin if not file given
+```
+
+## Input processing
+
+Input is processed in two stages:
+1. Splitting input into a sequence of `records`.
+ By default split at `newline` character, but can be changed via the
+ builtin `RS` variable.
+2. Splitting a `record` into `fields`. By default strings without `whitespace`,
+ but can be changed via the builtin variable `FS` or command line option
+ `-F`.
+
+Fields are accessed as follows:
+- `$0` whole `record`
+- `$1` field one
+- `$2` field two
+- ...
+
+## Program
+
+An `awk` program is composed of pairs of the form:
+```markdown
+pattern { action }
+```
+The program is run against each `record` in the input stream. If a `pattern`
+matches a `record` the corresponding `action` is executed and can access the
+`fields`.
+
+```markdown
+INPUT
+ |
+ v
+record ----> ∀ pattern matched
+ | |
+ v v
+fields ----> run associated action
+```
+
+Any valid awk `expr` can be a `pattern`.
+
+An example is the regex pattern `/abc/ { print $1 }` which prints the first
+field if the record matches the regex `/abc/`. This form is actually a short
+version for `$0 ~ /abc/ { print $1 }`, see the regex comparison operator
+below.
+
+### Special pattern
+
+awk provides two special patterns, `BEGIN` and `END`, which can be used
+multiple times. Actions with those patterns are **executed exactly once**.
+- `BEGIN` actions are run before processing the first record
+- `END` actions are run after processing the last record
+
+### Special variables
+
+- `RS` _record separator_: first char is the record separator, by default
+ <newline>
+- `FS` _field separator_: regex to split records into fields, by default
+ <space>
+- `NR` _number record_: number of current record
+- `NF` _number fields_: number of fields in the current record
+
+### Special statements & functions
+
+- `printf "fmt", args...`
+
+ Print format string, args are comma separated.
+ - `%s` string
+ - `%d` decimal
+ - `%x` hex
+ - `%f` float
+
+ Width can be specified as `%Ns`, this reserves `N` chars for a string.
+ For floats one can use `%N.Mf`, `N` is the total number including `.` and
+ `M`.
+
+- `sprintf("fmt", expr, ...)`
+
+ Format the expressions according to the format string. Similar as `printf`,
+ but this is a function and return value can be assigned to a variable.
+
+- `strftime("fmt")`
+
+ Print time stamp formatted by `fmt`.
+ - `%Y` full year (eg 2020)
+ - `%m` month (01-12)
+ - `%d` day (01-31)
+ - `%F` alias for `%Y-%m-%d`
+ - `%H` hour (00-23)
+ - `%M` minute (00-59)
+ - `%S` second (00-59)
+ - `%T` alias for `%H:%M:%S`
+
+- `S ~ R`, `S !~ R`
+
+ The regex comparison operator, where the former returns true if the string
+ `S` matches the regex `R`, and the latter is the negated form.
+ The regex can be either a
+ [constant](https://www.gnu.org/software/gawk/manual/html_node/Regexp-Usage.html)
+ or [dynamic](
+ https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html)
+ regex.
+
+## Examples
+
+### Filter records
+```bash
+awk 'NR%2 == 0 { print $0 }' <file>
+```
+The pattern `NR%2 == 0` matches every second record and the action `{ print $0 }`
+prints the whole record.
+
+### Negative patterns
+```bash
+awk '!/^#/ { print $1 }' <file>
+```
+Matches records not starting with `#`.
+
+### Range patterns
+```bash
+echo -e "a\nFOO\nb\nc\nBAR\nd" | \
+ awk '/FOO/,/BAR/ { print }'
+```
+`/FOO/,/BAR/` define a range pattern of `begin_pattern, end_pattern`. When
+`begin_pattern` is matched the range is **turned on** and when the
+`end_pattern` is matched the range is **turned off**. This matches every record
+in the range _inclusive_.
+
+An _exclusive_ range must be handled explicitly, for example as follows.
+```bash
+echo -e "a\nFOO\nb\nc\nBAR\nd" | \
+ awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }'
+```
+
+### Access last fields in records
+```bash
+echo 'a b c d e f' | awk '{ print $NF $(NF-1) }'
+```
+Access last fields with arithmetic on the `NF` number of fields variable.
+
+### Split on multiple tokens
+```bash
+echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }'
+```
+Use regex as field separator.
+
+### Capture in variables
+```bash
+# /proc/<pid>/status
+# Name: cat
+# ...
+# VmRSS: 516 kB
+# ...
+
+for f in /proc/*/status; do
+ cat $f | awk '
+ /^VmRSS/ { rss = $2/1024 }
+ /^Name/ { name = $2 }
+ END { printf "%16s %6d MB\n", name, rss }';
+done | sort -k2 -n
+```
+We capture values from `VmRSS` and `Name` into variables and print them at the
+`END` once processing all records is done.
+
+### Capture in array
+```bash
+echo 'a 10
+b 2
+b 4
+a 1' | awk '{
+ vals[$1] += $2
+ cnts[$1] += 1
+}
+END {
+ for (v in vals)
+ printf "%s %d\n", v, vals[v] / cnts [v]
+}'
+```
+Capture keys and values from different columns and some up the values.
+At the `END` we compute the average of each key.
+
+### Run shell command and capture output
+```bash
+cat /proc/1/status | awk '
+ /^Pid/ {
+ "ps --no-header -o user " $2 | getline user;
+ print user
+ }'
+```
+We build a `ps` command line and capture the first line of the processes output
+in the `user` variable and then print it.