aboutsummaryrefslogtreecommitdiffhomepage
path: root/src/tools/awk.md
diff options
context:
space:
mode:
Diffstat (limited to 'src/tools/awk.md')
-rw-r--r--src/tools/awk.md197
1 files changed, 0 insertions, 197 deletions
diff --git a/src/tools/awk.md b/src/tools/awk.md
deleted file mode 100644
index d6f6c9c..0000000
--- a/src/tools/awk.md
+++ /dev/null
@@ -1,197 +0,0 @@
-# awk(1)
-
-```markdown
-awk [opt] program [input]
- -F <sepstr> field separator string (can be regex)
- program awk program
- input file or stdin if not file given
-```
-
-## Input processing
-
-Input is processed in two stages:
-1. Splitting input into a sequence of `records`.
- By default split at `newline` character, but can be changed via the
- builtin `RS` variable.
-2. Splitting a `record` into `fields`. By default strings without `whitespace`,
- but can be changed via the builtin variable `FS` or command line option
- `-F`.
-
-Fields are accessed as follows:
-- `$0` whole `record`
-- `$1` field one
-- `$2` field two
-- ...
-
-## Program
-
-An `awk` program is composed of pairs of the form:
-```markdown
-pattern { action }
-```
-The program is run against each `record` in the input stream. If a `pattern`
-matches a `record` the corresponding `action` is executed and can access the
-`fields`.
-
-```markdown
-INPUT
- |
- v
-record ----> ∀ pattern matched
- | |
- v v
-fields ----> run associated action
-```
-
-Any valid awk `expr` can be a `pattern`.
-
-An example is the regex pattern `/abc/ { print $1 }` which prints the first
-field if the record matches the regex `/abc/`. This form is actually a short
-version for `$0 ~ /abc/ { print $1 }`, see the regex comparison operator
-below.
-
-### Special pattern
-
-awk provides two special patterns, `BEGIN` and `END`, which can be used
-multiple times. Actions with those patterns are **executed exactly once**.
-- `BEGIN` actions are run before processing the first record
-- `END` actions are run after processing the last record
-
-### Special variables
-
-- `RS` _record separator_: first char is the record separator, by default
- <newline>
-- `FS` _field separator_: regex to split records into fields, by default
- <space>
-- `NR` _number record_: number of current record
-- `NF` _number fields_: number of fields in the current record
-
-### Special statements & functions
-
-- `printf "fmt", args...`
-
- Print format string, args are comma separated.
- - `%s` string
- - `%d` decimal
- - `%x` hex
- - `%f` float
-
- Width can be specified as `%Ns`, this reserves `N` chars for a string.
- For floats one can use `%N.Mf`, `N` is the total number including `.` and
- `M`.
-
-- `sprintf("fmt", expr, ...)`
-
- Format the expressions according to the format string. Similar as `printf`,
- but this is a function and return value can be assigned to a variable.
-
-- `strftime("fmt")`
-
- Print time stamp formatted by `fmt`.
- - `%Y` full year (eg 2020)
- - `%m` month (01-12)
- - `%d` day (01-31)
- - `%F` alias for `%Y-%m-%d`
- - `%H` hour (00-23)
- - `%M` minute (00-59)
- - `%S` second (00-59)
- - `%T` alias for `%H:%M:%S`
-
-- `S ~ R`, `S !~ R`
-
- The regex comparison operator, where the former returns true if the string
- `S` matches the regex `R`, and the latter is the negated form.
- The regex can be either a
- [constant](https://www.gnu.org/software/gawk/manual/html_node/Regexp-Usage.html)
- or [dynamic](
- https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html)
- regex.
-
-## Examples
-
-### Filter records
-```bash
-awk 'NR%2 == 0 { print $0 }' <file>
-```
-The pattern `NR%2 == 0` matches every second record and the action `{ print $0 }`
-prints the whole record.
-
-### Negative patterns
-```bash
-awk '!/^#/ { print $1 }' <file>
-```
-Matches records not starting with `#`.
-
-### Range patterns
-```bash
-echo -e "a\nFOO\nb\nc\nBAR\nd" | \
- awk '/FOO/,/BAR/ { print }'
-```
-`/FOO/,/BAR/` define a range pattern of `begin_pattern, end_pattern`. When
-`begin_pattern` is matched the range is **turned on** and when the
-`end_pattern` is matched the range is **turned off**. This matches every record
-in the range _inclusive_.
-
-An _exclusive_ range must be handled explicitly, for example as follows.
-```bash
-echo -e "a\nFOO\nb\nc\nBAR\nd" | \
- awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }'
-```
-
-### Access last fields in records
-```bash
-echo 'a b c d e f' | awk '{ print $NF $(NF-1) }'
-```
-Access last fields with arithmetic on the `NF` number of fields variable.
-
-### Split on multiple tokens
-```bash
-echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }'
-```
-Use regex as field separator.
-
-### Capture in variables
-```bash
-# /proc/<pid>/status
-# Name: cat
-# ...
-# VmRSS: 516 kB
-# ...
-
-for f in /proc/*/status; do
- cat $f | awk '
- /^VmRSS/ { rss = $2/1024 }
- /^Name/ { name = $2 }
- END { printf "%16s %6d MB\n", name, rss }';
-done | sort -k2 -n
-```
-We capture values from `VmRSS` and `Name` into variables and print them at the
-`END` once processing all records is done.
-
-### Capture in array
-```bash
-echo 'a 10
-b 2
-b 4
-a 1' | awk '{
- vals[$1] += $2
- cnts[$1] += 1
-}
-END {
- for (v in vals)
- printf "%s %d\n", v, vals[v] / cnts [v]
-}'
-```
-Capture keys and values from different columns and some up the values.
-At the `END` we compute the average of each key.
-
-### Run shell command and capture output
-```bash
-cat /proc/1/status | awk '
- /^Pid/ {
- "ps --no-header -o user " $2 | getline user;
- print user
- }'
-```
-We build a `ps` command line and capture the first line of the processes output
-in the `user` variable and then print it.