# awk(1) ```markdown awk [opt] program [input] -F field separator string (can be regex) program awk program input file or stdin if not file given ``` ## Input processing Input is processed in two stages: 1. Splitting input into a sequence of `records`. By default split at `newline` character, but can be changed via the builtin `RS` variable. 2. Splitting a `record` into `fields`. By default strings without `whitespace`, but can be changed via the builtin variable `FS` or command line option `-F`. Fields are accessed as follows: - `$0` whole `record` - `$1` field one - `$2` field two - ... ## Program An `awk` program is composed of pairs of the form: ```markdown pattern { action } ``` The program is run against each `record` in the input stream. If a `pattern` matches a `record` the corresponding `action` is executed and can access the `fields`. ```markdown INPUT | v record ----> ∀ pattern matched | | v v fields ----> run associated action ``` Any valid awk `expr` can be a `pattern`. An example is the regex pattern `/abc/ { print $1 }` which prints the first field if the record matches the regex `/abc/`. This form is actually a short version for `$0 ~ /abc/ { print $1 }`, see the regex comparison operator below. ### Special pattern awk provides two special patterns, `BEGIN` and `END`, which can be used multiple times. Actions with those patterns are **executed exactly once**. - `BEGIN` actions are run before processing the first record - `END` actions are run after processing the last record ### Special variables - `RS` _record separator_: first char is the record separator, by default - `FS` _field separator_: regex to split records into fields, by default - `NR` _number record_: number of current record - `NF` _number fields_: number of fields in the current record ### Special statements & functions - `printf "fmt", args...` Print format string, args are comma separated. - `%s` string - `%d` decimal - `%x` hex - `%f` float Width can be specified as `%Ns`, this reserves `N` chars for a string. For floats one can use `%N.Mf`, `N` is the total number including `.` and `M`. - `sprintf("fmt", expr, ...)` Format the expressions according to the format string. Similar as `printf`, but this is a function and return value can be assigned to a variable. - `strftime("fmt")` Print time stamp formatted by `fmt`. - `%Y` full year (eg 2020) - `%m` month (01-12) - `%d` day (01-31) - `%F` alias for `%Y-%m-%d` - `%H` hour (00-23) - `%M` minute (00-59) - `%S` second (00-59) - `%T` alias for `%H:%M:%S` - `S ~ R`, `S !~ R` The regex comparison operator, where the former returns true if the string `S` matches the regex `R`, and the latter is the negated form. The regex can be either a [constant](https://www.gnu.org/software/gawk/manual/html_node/Regexp-Usage.html) or [dynamic]( https://www.gnu.org/software/gawk/manual/html_node/Computed-Regexps.html) regex. ## Examples ### Filter records ```bash awk 'NR%2 == 0 { print $0 }' ``` The pattern `NR%2 == 0` matches every second record and the action `{ print $0 }` prints the whole record. ### Negative patterns ```bash awk '!/^#/ { print $1 }' ``` Matches records not starting with `#`. ### Range patterns ```bash echo -e "a\nFOO\nb\nc\nBAR\nd" | \ awk '/FOO/,/BAR/ { print }' ``` `/FOO/,/BAR/` define a range pattern of `begin_pattern, end_pattern`. When `begin_pattern` is matched the range is **turned on** and when the `end_pattern` is matched the range is **turned off**. This matches every record in the range _inclusive_. An _exclusive_ range must be handled explicitly, for example as follows. ```bash echo -e "a\nFOO\nb\nc\nBAR\nd" | \ awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }' ``` ### Access last fields in records ```bash echo 'a b c d e f' | awk '{ print $NF $(NF-1) }' ``` Access last fields with arithmetic on the `NF` number of fields variable. ### Split on multiple tokens ```bash echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }' ``` Use regex as field separator. ### Capture in variables ```bash # /proc//status # Name: cat # ... # VmRSS: 516 kB # ... for f in /proc/*/status; do cat $f | awk ' /^VmRSS/ { rss = $2/1024 } /^Name/ { name = $2 } END { printf "%16s %6d MB\n", name, rss }'; done | sort -k2 -n ``` We capture values from `VmRSS` and `Name` into variables and print them at the `END` once processing all records is done. ### Capture in array ```bash echo 'a 10 b 2 b 4 a 1' | awk '{ vals[$1] += $2 cnts[$1] += 1 } END { for (v in vals) printf "%s %d\n", v, vals[v] / cnts [v] }' ``` Capture keys and values from different columns and some up the values. At the `END` we compute the average of each key. ### Run shell command and capture output ```bash cat /proc/1/status | awk ' /^Pid/ { "ps --no-header -o user " $2 | getline user; print user }' ``` We build a `ps` command line and capture the first line of the processes output in the `user` variable and then print it.