diff options
Diffstat (limited to 'src/tools/awk.md')
-rw-r--r-- | src/tools/awk.md | 197 |
1 files changed, 0 insertions, 197 deletions
diff --git a/src/tools/awk.md b/src/tools/awk.md deleted file mode 100644 index d6f6c9c..0000000 --- a/src/tools/awk.md +++ /dev/null @@ -1,197 +0,0 @@ -# awk(1) - -```markdown -awk [opt] program [input] - -F <sepstr> field separator string (can be regex) - program awk program - input file or stdin if not file given -``` - -## Input processing - -Input is processed in two stages: -1. Splitting input into a sequence of `records`. - By default split at `newline` character, but can be changed via the - builtin `RS` variable. -2. Splitting a `record` into `fields`. By default strings without `whitespace`, - but can be changed via the builtin variable `FS` or command line option - `-F`. - -Fields are accessed as follows: -- `$0` whole `record` -- `$1` field one -- `$2` field two -- ... - -## Program - -An `awk` program is composed of pairs of the form: -```markdown -pattern { action } -``` -The program is run against each `record` in the input stream. If a `pattern` -matches a `record` the corresponding `action` is executed and can access the -`fields`. - -```markdown -INPUT - | - v -record ----> ∀ pattern matched - | | - v v -fields ----> run associated action -``` - -Any valid awk `expr` can be a `pattern`. - -An example is the regex pattern `/abc/ { print $1 }` which prints the first -field if the record matches the regex `/abc/`. This form is actually a short -version for `$0 ~ /abc/ { print $1 }`, see the regex comparison operator -below. - -### Special pattern - -awk provides two special patterns, `BEGIN` and `END`, which can be used -multiple times. Actions with those patterns are **executed exactly once**. -- `BEGIN` actions are run before processing the first record -- `END` actions are run after processing the last record - -### Special variables - -- `RS` _record separator_: first char is the record separator, by default - <newline> -- `FS` _field separator_: regex to split records into fields, by default - <space> -- `NR` _number record_: number of current record -- `NF` _number fields_: number of fields in the current record - -### Special statements & functions - -- `printf "fmt", args...` - - Print format string, args are comma separated. - - `%s` string - - `%d` decimal - - `%x` hex - - `%f` float - - Width can be specified as `%Ns`, this reserves `N` chars for a string. - For floats one can use `%N.Mf`, `N` is the total number including `.` and - `M`. - -- `sprintf("fmt", expr, ...)` - - Format the expressions according to the format string. Similar as `printf`, - but this is a function and return value can be assigned to a variable. - -- `strftime("fmt")` - - Print time stamp formatted by `fmt`. - - `%Y` full year (eg 2020) - - `%m` month (01-12) - - `%d` day (01-31) - - `%F` alias for `%Y-%m-%d` - - `%H` hour (00-23) - - `%M` minute (00-59) - - `%S` second (00-59) - - `%T` alias for `%H:%M:%S` - -- `S ~ R`, `S !~ R` - - The regex comparison operator, where the former returns true if the string - `S` matches the regex `R`, and the latter is the negated form. - The regex can be either a - [constant](https://www.gnu.org/software/gawk/manual/html_node/Regexp-Usage.html) - or [dynamic]( - https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html) - regex. - -## Examples - -### Filter records -```bash -awk 'NR%2 == 0 { print $0 }' <file> -``` -The pattern `NR%2 == 0` matches every second record and the action `{ print $0 }` -prints the whole record. - -### Negative patterns -```bash -awk '!/^#/ { print $1 }' <file> -``` -Matches records not starting with `#`. - -### Range patterns -```bash -echo -e "a\nFOO\nb\nc\nBAR\nd" | \ - awk '/FOO/,/BAR/ { print }' -``` -`/FOO/,/BAR/` define a range pattern of `begin_pattern, end_pattern`. When -`begin_pattern` is matched the range is **turned on** and when the -`end_pattern` is matched the range is **turned off**. This matches every record -in the range _inclusive_. - -An _exclusive_ range must be handled explicitly, for example as follows. -```bash -echo -e "a\nFOO\nb\nc\nBAR\nd" | \ - awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }' -``` - -### Access last fields in records -```bash -echo 'a b c d e f' | awk '{ print $NF $(NF-1) }' -``` -Access last fields with arithmetic on the `NF` number of fields variable. - -### Split on multiple tokens -```bash -echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }' -``` -Use regex as field separator. - -### Capture in variables -```bash -# /proc/<pid>/status -# Name: cat -# ... -# VmRSS: 516 kB -# ... - -for f in /proc/*/status; do - cat $f | awk ' - /^VmRSS/ { rss = $2/1024 } - /^Name/ { name = $2 } - END { printf "%16s %6d MB\n", name, rss }'; -done | sort -k2 -n -``` -We capture values from `VmRSS` and `Name` into variables and print them at the -`END` once processing all records is done. - -### Capture in array -```bash -echo 'a 10 -b 2 -b 4 -a 1' | awk '{ - vals[$1] += $2 - cnts[$1] += 1 -} -END { - for (v in vals) - printf "%s %d\n", v, vals[v] / cnts [v] -}' -``` -Capture keys and values from different columns and some up the values. -At the `END` we compute the average of each key. - -### Run shell command and capture output -```bash -cat /proc/1/status | awk ' - /^Pid/ { - "ps --no-header -o user " $2 | getline user; - print user - }' -``` -We build a `ps` command line and capture the first line of the processes output -in the `user` variable and then print it. |