2021-05-13
I’m regularly asked to write something about the magic of shell scripting, so here goes. While I don’t expect deep understanding from the reader, I assume basic knowledge of how to work with the terminal itself and having seen some scripts (while maybe being too scared to touch them yet).
Shell scripting is different from other scripting or programming in that we don’t have “libraries” we include. Instead, all programs we have installed serve as our huge library of tools we can invoke, chain together, loop over, etc. Thus, “learning shell scripting” consists of a) learning the tools commonly available on your regular UNIX/Linux workstation, and b) learning the language that chains together these tools.
For the language we will use the POSIX Shell subset 1
that’s virtually supported by any shell, including Bash, Zsh, but also
more modern incarnations of Ksh. This isn’t only a plus due to
portability, but also because POSIX Shell is much more simple
than the many different ways we can build if
in Bash, or
iterate in Zsh. While they are definitely useful in some contexts, most
often the multitude of syntaxes only confuse the user.
Probably the most esoteric part of the classic shell is
if
in combination with the test
program. Ksh,
Bash, Zsh and so on all set out to “fix” this, however, the added
complexity made things, arguably, worse. And while definitely an
idiosyncratic design, it’s rather easy to understand, so let’s
start:
The if
built-in keyword simply executes a
program and checks its exit code. If the program exited with
code 0, this is considered to be a true condition. Or, as described more
verbosely in the standard under The if Conditional
Construct:
The if compound-list shall be executed; if its exit status is zero, the then compound-list shall be executed and the command shall complete. Otherwise, each elif compound-list shall be executed, in turn, and if its exit status is zero, the then compound-list shall be executed and the command shall complete. Otherwise, the else compound-list shall be executed.
In most environments you will have two programs called
true
and false
available at
/bin/true
& /bin/false
or
/usr/bin/true
and /usr/bin/false
respectively.
Let’s check what exit-code they have! You can either enter
sh
to get a POSIX interactive shell and type the following
directly, or save it as a file, e.g., foo.sh
and run it as
sh ./foo.sh
:
if /usr/bin/true; then
echo 'exit code 0!'
else
echo 'exit code non-zero!'
fi
It prints “exit code 0!” which makes sense since the executable is called “true”.
More commonly, however, we don’t want to check the exit code of a
program, but check the value of a variable. We can reduce this problem
to the checking of the exit code, if we’d have a program which takes an
expression and exits with the appropriate exit-code. Luckily for us,
someone already went through the hassle of writing this and called the
program test
. Let’s give it a ride:
answer=42
if test "$answer" -eq 42; then
echo "The Answer is $answer"
fi
While working, this looks a bit clumsy, so the shorthands
[
and ]
were created as alternative names and
delimiters to test
:
answer=42
if [ "$answer" -eq 42 ]; then
echo "The Answer is $answer"
fi
Since [
is a program with the arguments
"$answer"
(shell-expanded to the value of the variable),
-eq
, 42
and ]
you need to
separate all of these with spaces. The following does not
work!:
answer=42
if ["$answer" -eq 42]; then
echo "The Answer is $answer"
fi
In order to check for the truth value of multiple conditions, we call
test
multiple times, chaining the results:
if test "$answer" -eq 42 && test "$earth" = "exploded"; then
echo "BOOOM"
fi
or, with the prettier []
-Syntax:
if [ "$answer" -eq 42 ] && [ "$earth" = "exploded" ]; then
echo "BOOOM"
fi
Depending on your previous knowledge, the &&
may
already be known to you. While it acts as a logical-and here, it’s
semantics are slightly different: The command on the left-hand side of
the &&
is executed, if it exited successfully
(i.e., exit status is zero), the right-hand side is executed as well
with the exit-status of the complete expression being the the
latter.
But, if the first command did fail, the second command will not be executed, and failure signaled with a non-zero exit status.
Analogously, we can produce or using ||
which
short-curcuits as well, i.e., stops after the first command exited
successfully.
For completeness sake, there are also Sequential
Lists using ;
which simply executed the commands
in order, without exiting early and simply passing the last command in
the list. More advanced are Asynchronous Lists (using a
single &
) which run commands in the background,
(possibly) in parallel but are not appropriate for usage in
if
, since they always exit with 0.
We can also do else blocks as well as else-if blocks, however, typing is hard, and Shell syntax even worse, which is why:
if [ "$answer" -eq 42 ];
echo "Answer given in decimal"
else if [ "$answer" -eq 101010 ];
echo "Answer given in binary"
fi
doesn’t work—we must type less and use the keyword elif
instead:
if [ "$answer" -eq 42 ];
echo "Answer given in decimal"
elif [ "$answer" -eq 101010 ];
echo "Answer given in binary"
else
:
fi
Further, if we have an empty body, we cannot just leave it empty, the
shell expects something. Luckily, the :
serves as a
no-op.
The while
loop works almost identical to the
if
construct with the slight adjustment that the command
specified (i.e., most commonly test
) is called multiple
times:
while [ "$answer" -ne 42 ]; do
echo "Wrong answer... increasing"
answer="$((answer + 1))"
done
This uses Arithmetic Expansion using the
$((...))
syntax. Within these we do not need
$
to refer to variables and can now do pretty complex
maths, directly from the console, neat!
Similar to the while
loop, we can use the
until
loop:
until [ "$answer" -eq 42 ]; do
echo "Wrong answer... increasing"
answer="$((answer + 1))"
done
The for
loop is probably the most avant-garde construct
of the shell as it is a range-based for-in loop, unlike a
three-expression-style as in C:
for x in foo bar baz; do
echo "$x"
done
The syntax is quite easy to pick up and using the program
seq
we can also iterate over indices:
for i in $(seq 1 42); do
echo "$i"
done
Since the for
loop doesn’t expect to run a program as
part of its “head” (unlike if
) we need to explicitly ask
the shell to do Command Substitution using the
$(...)
construct which runs the program with the specified
arguments and replaces the expression with the output of it
(not the exit-code, again, unlike if
). Since
seq
produces a list of numbers from 1 to 42 inclusive when
called like above, i
will take precisely these values.
However, unlike our first example, the numbers produced aren’t delimited by spaces but by newlines! Indeed, tabs would’ve worked just as well. The shell does something called Field Splitting here, and, by default, fields are split at space, tab or newline. Again, quoting the standard:
any sequence of
, , or characters at the beginning or end of the input shall be ignored and any sequence of those characters within the input shall delimit a field.
We can actually modify at what position fields are split, e.g., for
parsing semicolon-delimited CSVs using IFS=';'
, but this is
out of scope for this article :-)
In some cases you want to check the value of one variable against a
whole range of patterns. In many languages this can be done using a
switch-case or match construction. Since shell is a language for those
who don’t like to type much, only say case
and terminate
the construct with esac
(“case” reversed).
case "$x" in
foo) "x is foo" ;;
bar|baz) "x is bar or baz" ;;
*) "x ain't no hoopy frood" ;;
esac
We can match everything, using the glob character *
.
You can see that I quoted the variable x
in the body of
the example for loop:
for x in foo bar baz; do
echo "$x"
done
In this case there’d have been no difference if I’d have ommitted the quotes, but it is often considered good style to use them everywhere where you can.
To demonstrate the difference, let’s replace the first value (foo) of the list we iterate over by the string “The world is ending” which contains spaces. In order to tell the loop that we consider this one item of the list (and not four), we put quotes around it:
for x in "The world is ending" bar baz; do
printf "Found: %s\n" $x
done
I also replaced the echo
with a printf
to
highlight the issue we will now observe: The output is:
Found: The
Found: world
Found: is
Found: ending
Found: bar
Found: baz
But… didn’t we ask the for
to consider this as just one
item? We did, but I also sneakily removed the quotes around
$x
, leading to the following chain of executed
commands:
printf "Found: %s\n" The world is ending
printf "Found: %s\n" bar
printf "Found: %s\n" baz
That is, the printf
is executed with five arguments, the
format string (first) plus the four additional strings. However, we only
used one format specifier %s
and thus expected just one
string following the format. This is the culprit here, as
printf
has a rather unexpected behavior if passed more
arguments than allowed for in the format string.
The correct command execution would’ve been with quotes:
printf "Found: %s\n" "The world is ending"
printf "Found: %s\n" "bar"
printf "Found: %s\n" "baz"
Which can be achieved by quoting the $x
.
Indeed, I recommend quoting all variables by default, and only thinking of it as “when must I omit the quotes” instead of the other way around.
However, there are also single-quotes which we didn’t talk about yet. All strings within double-quotes are subject to Word Expansions, that is, we could write:
echo "$answer"
To print the value of the variable answer
since the
shell expanded it before passing the resulting string to
echo
. Sometimes, we don’t want things like these to happen,
and actually want to print, say, a dollar sign:
echo 'Your life is worth $0.02'
If we’d have used double quotes here, our shell would’ve been very
confused. In fact, many cases, like the printf
format
strings above, we could (and possibly should) use single quotes to
prevent errornous expansion(s).
We can also nest quotes, use escape sequences, etc., but this is again out-of-scope for this article.
We’ve now had a brief look at some of the most simple constructs of the POSIX Shell, but it is, by itself, not that powerful. We need the tools of the UNIX workbench in order to do any useful composition using the shell language.
While using echo
is simple, unfortunately, for all more
advanced usages, the exact behavior of echo
is different
from platform to platform. Thus, if you do something differently than
echoing a simple variable or printing simple text, use
printf
instead.
The tool grep
had it’s origins in the line editor
ed
, from the editor command g/re/p
, meaning,
“work globally”, “match by regular expression given as re
”,
and “print the resulting lines”.
Spinned out as its own command-line tool, we can do just that, without learning The Standard Editor (which is the precursor to ex, precursor to vi, precursor to vim, precursor(?) to nvim). Most usage of grep boils down to learning regular expressions, which is out of scope of this article. However, I want to give some notes that many seem not to be aware of:
grep -e pattern1 -e pattern2
-E
to switch to extended regular expressions which require less
quoting.-v
.Whatever you do with grep, remember though that it works on lines,
due to its heritage to ed
.
The stream-editor sed
also shares a heritage with
ed
, basically being a simple scriptable version of it.
Instead of searching for a pattern and printing the results, we can
replace occurances, delete them, list them, print them, etc.
The most common usage is probably replacement, using the syntax of
s/regexp/replacement/
with an optional trailing
g
to replace all matches globally.
A specialisation to sed/grep is tr
. Instead of replacing
one occurance with another string, we can replace ranges with other
ranges. E.g., to capitalize all the letters in a given text:
echo 'The slow red tiger jumps over the energetic cat.' | tr 'a-z' 'A-Z'
Yielding
THE SLOW RED TIGER JUMPS OVER THE ENERGETIC CAT
AWK supercharges the featureset of grep and sed by allowing us to execute arbitrary code if a certain pattern is matched. That is, the input is iterated over line by line, split into columns and you can formulate patterns as well as conditions by referring to single columns or the whole line. This is best understood in action, and, since I cannot describe this any better, I copy this verbatim from the excellent book “The AWK Programming Language” 2 by Alfred V. Aho (The Dragon Book on compiler design), Peter J. Weinberger, and Brian W. Kernighan (“The C Programming Language”):
Useful awk programs are often short, just a line or two. Suppose you
have a file called emp.data
that contains the name, pay
rate in dollars per hour, and number of hours worked for your employees,
one employee per line, like this:
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18
Now you want to print the name and pay (rate times hours) for everyone who worked more than zero hours. This is the kind of job that awk is mneant for, so it’s easy. Just type this command line:
awk '$3 > 0 { print $1, $2 * $3 }' emp.data
You should get this output:
Kathy 40
Mark 100
Mary 121
Susie 76.5
Let’s analyze the program, given in the single quotes: The
$3
refers to the third column, and thus the pattern matches
every line where the employee $1
worked more than 0 hours.
In these cases, we execute the action given in the {...}
,
printing the name, as well as the pay.
If awk matches patterns against lines in a file, find
matches patterns against files in your file system. As with awk, it can
execute code, when a pattern is matched, for example printing the line
count of every file with the extension .c
in the current
directory, or any subdirectory:
find . -name '*.c' -exec wc -l {} \;
The expression -name '*.c'
matches, and the expression
-exec wc -l {} ;
executes the program wc
with
the option -l
(printing lines only), while substituting the
{}
with eached matched file. Note that we need to escape
the ;
since ;
is a keyword in the shell
language (';'
would’ve worked as well, but is one more
character to type). This results in e.g., the following executions:
wc -l foo.c
wc -l src/bar.c
With the output being:
1312
161
A bit unfortunate for us, however, the wc
program now
prints only the line counts themselves, but we have a hard time
associating them with each file.
Most command-line tools are built “intelligently” though – they change behavior, depending on whether they are called with multiple arguments or just one. If we’d run:
wc -l foo.c src/bar.c
We’d get:
foo.c: 1312
src/bar.c: 161
How to achieve that with find
? Well, asking
find
nicely, would be a plus, so we replace the
;
with a +
, and behold:
find . -name '*.c' -exec wc -l {} +
Since the +
is no shell keyword, we don’t need to escape
it either, neat!
With this, we can build powerful meta-tools, many of my personal
scripts are just wrappers around one powerful find
construct. And we don’t need to sin an use the non-POSIX GNU/grep
specific grep -R
option, we can simply use the short:
find . -type f -exec grep {} +
Easy!