Safer Bash Scripting
I was pairing with a colleague today writing a moderately sized shell script, during the session some of the best practices I try to follow came up. There are some best practices I try to follow when programming, be it in shell or any other language. We took a little time to talk about two of the habits I picked up which encouraged me to share them here. The first has two parts: treating warnings as errors and logging what happens. The second best practice has to do with clean argument and variable handling in Unix (bash) shell scripts.
Enable All Warnings
This is a universal best practice for all my software development. For gcc you can do this by adding
-Werror which enables all warnings and treats them as errors. For perl you use
-w, or within the code, the
strict pragmas. The additional strictness is more work up front but pays off in not having to debug later. gcc even goes so far as to ensure your
printf format strings have corresponding arguments of the correct type! As you learn what constitutes an error or warning to the compiler or run-time you will write cleaner code and it will no longer feel like a burden, and whole classes of errors will cease to happen in your software.
Bash supports a setting, enabled via
set -e, that causes your shell-script to immediately cease if any of the commands you’ve called return a non-zero exit value back to the shell (a zero exit value is the standard indication of success for a program or command). Enabling this feature is similar to exception handling, like getting a free ‘if this errors, abort the program’.
As pedagogical example, if you were writing a shell script to create a sub-directory and create a file with the current date and time as its contents, you could easily write:
kyleburton@indigo64 ~/tmp$ rm -rf logs/ kyleburton@indigo64 ~/tmp$ cat dlog.sh mkdir logs date >> logs/uptime.log kyleburton@indigo64 ~/tmp$ bash dlog.sh kyleburton@indigo64 ~/tmp$ cat logs/uptime.log Fri Aug 7 22:30:18 EDT 2009 kyleburton@indigo64 ~/tmp$ bash dlog.sh mkdir: cannot create directory `logs': File exists kyleburton@indigo64 ~/tmp$ cat logs/uptime.log Fri Aug 7 22:30:18 EDT 2009 Fri Aug 7 22:30:21 EDT 2009 kyleburton@indigo64 ~/tmp$
This would work every time it was run (permissions problems and insufficient disk space not withstanding). The second time it was run though, and every subsequent time, the
mkdir program will error informing you the logs directory already exists. With the same script, with
set -e set, this line will fail immediately, and not append the current date and time to the log file:
kyleburton@indigo64 ~/tmp$ cat dlog.sh set -e mkdir logs date >> logs/uptime.log kyleburton@indigo64 ~/tmp$ bash dlog.sh kyleburton@indigo64 ~/tmp$ cat logs/uptime.log Fri Aug 7 22:31:07 EDT 2009 kyleburton@indigo64 ~/tmp$ bash dlog.sh mkdir: cannot create directory `logs': File exists kyleburton@indigo64 ~/tmp$ cat logs/uptime.log Fri Aug 7 22:31:07 EDT 2009 kyleburton@indigo64 ~/tmp$
This protects us against the script continuing and, depending on the behavior and goals of the script, potentially causing damage or corruption. You could wrap each command in an if/else branch, exiting in the case when the command fails, but the
set -e effectively inverts that from having to check for errors as the default behavior to errors aborting processing as the default behavior.
The second declaration I often use at the top of my scripts is
set -x, which causes bash to echo each command that it executes:
kyleburton@indigo64 ~/tmp$ cat dlog.sh set -x set -e mkdir logs date >> logs/uptime.log kyleburton@indigo64 ~/tmp$ bash dlog.sh + set -e + mkdir logs + date kyleburton@indigo64 ~/tmp$
Combined with the tee command,
set -x gives you a log of what your bash script (which is a program after all) was trying to do. These are often invaluable in determining where and why failures have occurred.
Safe-r Variable Handling
Bash scripts, almost by definition, call other programs, some of them bash scripts. With any but the most trivial bash scripts, you will also encounter variables. The main gotcha with variables and the shell is the way argument parsing takes place. The default is to use spaces to separate arguments, which if you’re not careful about how you handle your variables, can cause them to be parsed in unexpected ways. You protect against this by wrapping your variable usage with double quotes. You must be diligent about this, anywhere you see a dollar-sign, it should be wrapped in double quotes.
In the case of a bash script wanting to pass its arguments on to another program without modification (perhaps you’re writing a wrapper script to log timing information or log other information about the execution), you use
$@ to refer to all of the command line arguments. To pass them on with no additional accidental parsing, you just surround it with double quotes (
"$@") as in the following example. Here we have 2 scripts, the first calls the second, first without the double quotes and then again with the double quotes:
kyleburton@indigo64 ~/tmp$ cat first.sh set -e bash second.sh not $@ bash second.sh with "$@" kyleburton@indigo64 ~/tmp$ cat second.sh set -e echo 1 = $1 echo 2 = $2 echo 3 = $3 echo 4 = $4 echo 5 = $5 echo 6 = $6 echo 7 = $7 echo "" kyleburton@indigo64 ~/tmp$ bash first.sh 1 2 "three for five" 1 = not 2 = 1 3 = 2 4 = three 5 = for 6 = five 7 = 1 = with 2 = 1 3 = 2 4 = three for five 5 = 6 = 7 = kyleburton@indigo64 ~/tmp$
The first invocation of second.sh from first.sh does not use the quotes, which causes bash to re-parse the arguments within the first.sh script. Resulting in
"three four five" being split into 3 distinct parameters to be passed to second.sh. In the second case, using
"$@", the atomicity of the argument is preserved.
Writing robust bash shell scripts is helped significantly through the enabling of error checking as well as being diligent when using and passing command line arguments. I hope these tips help you in your scripting.