awk(1)

awk [opt] program [input]
+    -F <sepstr>        field separator string (can be regex)
+    program            awk program
+    input              file or stdin if not file given
+

Input processing

Input is processed in two stages:

Splitting input into a sequence of records. +By default split at newline character, but can be changed via the +builtin RS variable.
Splitting a record into fields. By default strings without whitespace, +but can be changed via the builtin variable FS or command line option +-F.

Fields are accessed as follows:

$0 whole record
$1 field one
$2 field two
...

An example is the regex pattern /abc/ { print $1 } which prints the first +field if the record matches the regex /abc/. This form is actually a short +version for $0 ~ /abc/ { print $1 }, see the regex comparison operator +below.

Special pattern

awk provides two special patterns, BEGIN and END, which can be used +multiple times. Actions with those patterns are executed exactly once.

BEGIN actions are run before processing the first record
END actions are run after processing the last record

Special variables

RS record separator: first char is the record separator, by default +
FS field separator: regex to split records into fields, by default +
NR number record: number of current record
NF number fields: number of fields in the current record

Special statements & functions

+
printf "fmt", args...
+
Print format string, args are comma separated.
+
- %s string
- %d decimal
- %x hex
- %f float
+
Width can be specified as %Ns, this reserves N chars for a string. +For floats one can use %N.Mf, N is the total number including . and +M.
+
+
sprintf("fmt", expr, ...)
+
Format the expressions according to the format string. Similar as printf, +but this is a function and return value can be assigned to a variable.
+
+
strftime("fmt")
+
Print time stamp formatted by fmt.
+
- %Y full year (eg 2020)
- %m month (01-12)
- %d day (01-31)
- %F alias for %Y-%m-%d
- %H hour (00-23)
- %M minute (00-59)
- %S second (00-59)
- %T alias for %H:%M:%S
+
+
S ~ R, S !~ R
+
The regex comparison operator, where the former returns true if the string +S matches the regex R, and the latter is the negated form. +The regex can be either a +constant +or dynamic +regex.
+

Examples

Filter records

awk 'NR%2 == 0 { print $0 }' <file>
+

The pattern NR%2 == 0 matches every second record and the action { print $0 } +prints the whole record.

Negative patterns

awk '!/^#/ { print $1 }' <file>
+

Matches records not starting with #.

Range patterns

echo -e "a\nFOO\nb\nc\nBAR\nd" | \
+    awk '/FOO/,/BAR/ { print }'
+

/FOO/,/BAR/ define a range pattern of begin_pattern, end_pattern. When +begin_pattern is matched the range is turned on and when the +end_pattern is matched the range is turned off. This matches every record +in the range inclusive.

An exclusive range must be handled explicitly, for example as follows.

echo -e "a\nFOO\nb\nc\nBAR\nd" | \
+    awk '/FOO/,/BAR/ { if (!($1 ~ "FOO") && !($1 ~ "BAR")) { print } }'
+

Access last fields in records

echo 'a b c d e f' | awk '{ print $NF $(NF-1) }'
+

Access last fields with arithmetic on the NF number of fields variable.

Split on multiple tokens

echo 'a,b;c:d' | awk -F'[,;:]' '{ printf "1=%s | 4=%s\n", $1, $4 }'
+

Use regex as field separator.

Capture in variables

# /proc/<pid>/status
+#   Name:    cat
+#   ...
+#   VmRSS:   516 kB
+#   ...
+
+for f in /proc/*/status; do
+    cat $f | awk '
+             /^VmRSS/ { rss = $2/1024 }
+             /^Name/ { name = $2 }
+             END { printf "%16s %6d MB\n", name, rss }';
+done | sort -k2 -n
+

We capture values from VmRSS and Name into variables and print them at the +END once processing all records is done.

Capture in array

echo 'a 10
+b 2
+b 4
+a 1' | awk '{
+    vals[$1] += $2
+    cnts[$1] += 1
+}
+END {
+    for (v in vals)
+        printf "%s %d\n", v, vals[v] / cnts [v]
+}'
+

Capture keys and values from different columns and some up the values. +At the END we compute the average of each key.

Run shell command and capture output

cat /proc/1/status | awk '
+                     /^Pid/ {
+                        "ps --no-header -o user " $2 | getline user;
+                         print user
+                     }'
+

We build a ps command line and capture the first line of the processes output +in the user variable and then print it.

+ +

sed [opts] [script] [file] + opts: + -i edit file in place + -i.bk edit file in place and create backup file + (with .bk suffix, can be specified differently) + --follow-symlinks + follow symlinks when editing in place + -e SCRIPT add SCRIPT to commands to be executed + (can be specified multiple times) + -f FILE add content of FILE to command to be executed + + --debug annotate program execution +

# Delete two lines. +echo -e 'aa\nbb\ncc\ndd' | sed '1d;3d' +# bb +# dd + +# Delete last ($) line. +echo -e 'aa\nbb\ncc\ndd' | sed '$d' +# aa +# bb +# cc + +# Delete range of lines. +echo -e 'aa\nbb\ncc\ndd' | sed '1,3d' +# dd + +# Delete lines matching pattern. +echo -e 'aa\nbb\ncc\ndd' | sed '/bb/d' +# aa +# cc +# dd + +# Delete lines NOT matching pattern. +echo -e 'aa\nbb\ncc\ndd' | sed '/bb/!d' +# bb +

# Insert before line. +echo -e 'aa\nbb' | sed '2iABC' +# aa +# ABC +# bb + +# Insert after line. +echo -e 'aa\nbb' | sed '2aABC' +# aa +# bb +# ABC + +# Replace line. +echo -e 'aa\nbb' | sed '2cABC' +# aa +# ABC + +# Insert before pattern match. +echo -e 'aa\nbb' | sed '/bb/i 123' +# aa +# 123 +# bb +

# Substitute by regex. +echo -e 'aafooaa\ncc' | sed 's/foo/MOOSE/' +# aaMOOSEaa +# cc +

echo -e 'foo\nbar' | sed -e 's/foo/FOO/' -e 's/FOO/BAR/' +# BAR +# bar +

touch file +ln -s file link +ls -l link +# lrwxrwxrwx 1 johannst johannst 4 Feb 7 23:02 link -> file + +sed -i --follow-symlinks '1iabc' link +ls -l link +# lrwxrwxrwx 1 johannst johannst 4 Feb 7 23:02 link -> file + +sed -i '1iabc' link +ls -l link +# -rw-r--r-- 1 johannst johannst 0 Feb 7 23:02 link +

sort [opts] [file] + opts: + -r reverse output + -b ignore leading blanks + + -n sort by numeric + -h sort by human numeric + -V sort by version + + -k<N> sort by Nth key + -t<S> field separator +

# Sort by directory sizes. +du -sh * | sort -h +

# Sort numeric by second key. +# The default key separator is non-blank to blank transition. +echo 'a 4 +d 10 +c 21' | sort -k2 -n + +# Sort numeric by second key, split at comma. +echo 'a,4 +d,10 +c,21' | sort -k2 -n -t, +

Use --debug to annotate part of the line used to sort and hence debug the key usage.

echo MoOsE | tr '[:upper:]' '[:lower:]' +# output: moose +

echo moose | tr 'o' '-' +# output: m--se + +echo moose | tr 'os' '-' +# output: m---e +

echo moose | tr -d 'o' +# output: mse + +echo moose | tr -d 'os' +# output: me +

echo moooooossse | tr -s 'os' +# output: mose +

Notes

awk(1)

Input processing

Program

Special pattern

Special variables

Special statements & functions

Examples

Filter records

Negative patterns

Range patterns

Access last fields in records

Split on multiple tokens

Capture in variables

Capture in array

Run shell command and capture output

Notes

column(1)

Examples

Notes

CLI foo

Notes

sed(1)

Examples

Delete lines

Insert lines

Substitute lines

Multiple scripts

Edit inplace through symlink

Notes

sort(1)

Examples

Notes

tr(1)

Examples

To lower

Replace characters

Remove specific characters

Squeeze character sequences