diff options
author | Johannes Stoelp <johannes.stoelp@gmail.com> | 2024-05-01 14:57:52 +0200 |
---|---|---|
committer | Johannes Stoelp <johannes.stoelp@gmail.com> | 2024-05-01 14:57:52 +0200 |
commit | b737cc8ca5bb8ca5e07cd0151d678a7b4b10d5cb (patch) | |
tree | 86814d8fb3557ea2cbf73892dd0ec4e590e854de /src/cli/awk.md | |
parent | 50e07a8bca68d2f568df44166fa94383141c2696 (diff) | |
download | notes-b737cc8ca5bb8ca5e07cd0151d678a7b4b10d5cb.tar.gz notes-b737cc8ca5bb8ca5e07cd0151d678a7b4b10d5cb.zip |
cli: add new group for cli foo tools
Diffstat (limited to 'src/cli/awk.md')
-rw-r--r-- | src/cli/awk.md | 197 |
1 files changed, 197 insertions, 0 deletions
diff --git a/src/cli/awk.md b/src/cli/awk.md new file mode 100644 index 0000000..d6f6c9c --- /dev/null +++ b/src/cli/awk.md @@ -0,0 +1,197 @@ +# awk(1) + +```markdown +awk [opt] program [input] + -F <sepstr> field separator string (can be regex) + program awk program + input file or stdin if not file given +``` + +## Input processing + +Input is processed in two stages: +1. Splitting input into a sequence of `records`. + By default split at `newline` character, but can be changed via the + builtin `RS` variable. +2. Splitting a `record` into `fields`. By default strings without `whitespace`, + but can be changed via the builtin variable `FS` or command line option + `-F`. + +Fields are accessed as follows: +- `$0` whole `record` +- `$1` field one +- `$2` field two +- ... + +## Program + +An `awk` program is composed of pairs of the form: +```markdown +pattern { action } +``` +The program is run against each `record` in the input stream. If a `pattern` +matches a `record` the corresponding `action` is executed and can access the +`fields`. + +```markdown +INPUT + | + v +record ----> ∀ pattern matched + | | + v v +fields ----> run associated action +``` + +Any valid awk `expr` can be a `pattern`. + +An example is the regex pattern `/abc/ { print $1 }` which prints the first +field if the record matches the regex `/abc/`. This form is actually a short +version for `$0 ~ /abc/ { print $1 }`, see the regex comparison operator +below. + +### Special pattern + +awk provides two special patterns, `BEGIN` and `END`, which can be used +multiple times. Actions with those patterns are **executed exactly once**. +- `BEGIN` actions are run before processing the first record +- `END` actions are run after processing the last record + +### Special variables + +- `RS` _record separator_: first char is the record separator, by default + <newline> +- `FS` _field separator_: regex to split records into fields, by default + <space> +- `NR` _number record_: number of current record +- `NF` _number fields_: number of fields in the current record + +### Special statements & functions + +- `printf "fmt", args...` + + Print format string, args are comma separated. + - `%s` string + - `%d` decimal + - `%x` hex + - `%f` float + + Width can be specified as `%Ns`, this reserves `N` chars for a string. + For floats one can use `%N.Mf`, `N` is the total number including `.` and + `M`. + +- `sprintf("fmt", expr, ...)` + + Format the expressions according to the format string. Similar as `printf`, + but this is a function and return value can be assigned to a variable. + +- `strftime("fmt")` + + Print time stamp formatted by `fmt`. + - `%Y` full year (eg 2020) + - `%m` month (01-12) + - `%d` day (01-31) + - `%F` alias for `%Y-%m-%d` + - `%H` hour (00-23) + - `%M` minute (00-59) + - `%S` second (00-59) + - `%T` alias for `%H:%M:%S` + +- `S ~ R`, `S !~ R` + + The regex comparison operator, where the former returns true if the string + `S` matches the regex `R`, and the latter is the negated form. + The regex can be either a + [constant](https://www.gnu.org/software/gawk/manual/html_node/Regexp-Usage.html) + or [dynamic]( + https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html) + regex. + +## Examples + +### Filter records +```bash +awk 'NR%2 == 0 { print $0 }' <file> +``` +The pattern `NR%2 == 0` matches every second record and the action `{ print $0 }` +prints the whole record. + +### Negative patterns +```bash +awk '!/^#/ { print $1 }' <file> +``` +Matches records not starting with `#`. + +### Range patterns +```bash +echo -e "a\nFOO\nb\nc\nBAR\nd" | \ + awk '/FOO/,/BAR/ { print }' +``` +`/FOO/,/BAR/` define a range pattern of `begin_pattern, end_pattern`. When +`begin_pattern` is matched the range is **turned on** and when the +`end_pattern` is matched the range is **turned off**. This matches every record +in the range _inclusive_. + +An _exclusive_ range must be handled explicitly, for example as follows. +```bash +echo -e "a\nFOO\nb\nc\nBAR\nd" | \ + awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }' +``` + +### Access last fields in records +```bash +echo 'a b c d e f' | awk '{ print $NF $(NF-1) }' +``` +Access last fields with arithmetic on the `NF` number of fields variable. + +### Split on multiple tokens +```bash +echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }' +``` +Use regex as field separator. + +### Capture in variables +```bash +# /proc/<pid>/status +# Name: cat +# ... +# VmRSS: 516 kB +# ... + +for f in /proc/*/status; do + cat $f | awk ' + /^VmRSS/ { rss = $2/1024 } + /^Name/ { name = $2 } + END { printf "%16s %6d MB\n", name, rss }'; +done | sort -k2 -n +``` +We capture values from `VmRSS` and `Name` into variables and print them at the +`END` once processing all records is done. + +### Capture in array +```bash +echo 'a 10 +b 2 +b 4 +a 1' | awk '{ + vals[$1] += $2 + cnts[$1] += 1 +} +END { + for (v in vals) + printf "%s %d\n", v, vals[v] / cnts [v] +}' +``` +Capture keys and values from different columns and some up the values. +At the `END` we compute the average of each key. + +### Run shell command and capture output +```bash +cat /proc/1/status | awk ' + /^Pid/ { + "ps --no-header -o user " $2 | getline user; + print user + }' +``` +We build a `ps` command line and capture the first line of the processes output +in the `user` variable and then print it. |