Categories
bash

Oppa Semicolon Style

[No relation to that infuriatingly viral video.]

A recent Reddit thread about ffmpeg encoding somehow sidetracked into an unrelated discussion about semicolons. In the process of answering the questions therein, I suddenly realized that most bash newbies get confused about semicolons in scripts, and that there’s in fact a striking similarity between compound statements in bash and C.

First, a simple fact: Semicolons punctuate compound statements, and punctuation promotes proper parsing. Just look at the descriptions of compound statements in the bash man page:

case word in [ [(] pattern [ | pattern ] ... ) list ;; ] ... esac

if list; then list; [ elif list; then list; ] ... [ else list; ] fi

for name [ [ in [ word ... ] ] ; ] do list ; done

and think about how any program could be expected to assume that:

for i in 1 2 3 do echo do you do this $i time\? done

actually means:

for i in 1 2 3; do echo do you do this $i time\?; done

as opposed to the programmer having a complete brain-fart.

Now, you’re probably going: “Hang on, I don’t see anyone writing code all in a line like that!” That leads me to my next point: Semicolons can generally be replaced by newlines in compound statements. Hence, you’re far more likely to see this:

for i in 1 2 3; do
  echo $i
  echo breaker
done

or this:

for i in 1 2 3
do
  echo $i
  echo breaker
done

Notice that the first form simply substitutes a newline for the semicolon before done, and the second form also substitutes a newline for the semicolon before do. Both are functionally identical to each other and to the single-line (note the newline-semicolon substitution between the two echo statements as well):

for i in 1 2 3; do echo $i; echo breaker; done

as well as the odd duck:

for i in 1 2 3
do
          echo $i
  echo breaker; done

Don’t get crazy, though. The following is actually illegal in bash:

for i in 1
  2
     3
do
  echo $i; echo
    breaker; done

because it translates to the incorrect one-liner:

for i in 1; 2; 3; do echo $i; echo; breaker; done

“Wait a sec!” I hear you call, dear reader. “How about the double-semicolon in case clauses?”

Technically, bash treats this double-semicolon as a single token, so you can’t replace it with two newlines (or one, for that matter). You also can’t write:

case $i in
(1) echo yes ; ;
esac

That space between the two semicolons will cause bash to choke.


Now, if you’re a C programmer, you probably recognized the “shape” of the first two examples as being the equivalent of the indentation styles1 popularly known as K&R:

while (1) {
  do(something);
}

and Allman:

while (1)
{
  do(something);
}

I’ll therefore call the equivalents in bash K&R-style and Allman-style. Which one to use is purely a matter of preference; pick one and stick with it.

As for the all-in-one-liner, I’ll just call it No-style. You should avoid this style wherever possible, as it gets seriously unreadable very quickly. You’ll also regret it when you get hit by a dozen syntax errors on line 1; good luck trying to find them all.

And if you were working for me, and showed me that odd-duck code chunk…let’s just say you’d no longer be working for me. Call it NoJob-style.


  1. See Indentation Style – Wikipedia for more styles than you should ever care about. ↩︎
Categories
bash

$SHELL: The $0-sum Game

Q: Which shell am I currently running?
A: $SHELL.

NOPE.

The SHELL environment variable has a very specific meaning. Here’s what the POSIX standard has to say about it:

SHELL
This variable shall represent a pathname of the user’s preferred command language interpreter.

For every shell I’ve ever used, that highlighted phrase has been interpreted as the user’s login shell, which may not be relevant to the currently-running environment. For instance, cron specifically forces $SHELL in all cron jobs to /bin/sh by default, ostensibly for sanity and/or security reasons.

Also:

~ $ echo $SHELL
/bin/bash
~ $ zsh
~ % echo $SHELL
/bin/bash

In everyday scripting, $SHELL should only ever be used to answer one question: “What shell should my process spawn if necessary?

And so we return to the original question…

Q: Which shell am I currently running?
A: $0 from the command line.

~ $ echo $0
-bash
~ $ zsh
~ % echo $0
zsh

But I want to know what shell is running my script in the script itself. How do I do that?

On Linux, you could run this code snippet (credit: Evan Benn):

sh -c 'ps -p $$ -o ppid=' | xargs -I'{}' readlink -f '/proc/{}/exe'

But a much better way of going about it is to first ask yourself why you need to know. If, as is almost always the case, you want to do something that’s shell-specific, then simply test for a variable that’s unique to your desired shell (e.g. $BASH, $ZSH_NAME), then execute the desired code accordingly. Easy peasy.

Categories
bash

Process substitutions are pipes, NOT files

(Originally published on Reddit, archived here in lightly edited form for posterity.)

A recent question led to a suggestion to use process substitution to emulate the effects of a proper shebang invocation, but the OP eventually discovered that 8 characters of padding had to be added to the substitution’s output to get it working.

Here’s why: SBCL seems to scan the first 8 characters in an input script file to determine what sort of file it is, then rewinds to the beginning and parses the file anew.

But pipes are not seekable, so with a process substitution, this process fails miserably: SBCL fails to rewind, so it starts parsing from the 9th character onwards, leading to very odd errors.

In addition, there’s one broad class of process substitutions that are guaranteed to fail when used: those that input/output ZIP/JAR archives, e.g. streamed via curl from a remote server. The problem stems from the ZIP file structure; since the main file header is at EOF, commands expect to seek backwards through an archive. That’s just not gonna happen on a pipe.

So the next time you think “hey, I can use a process substitution for this”, and find that your chosen command chokes on it, but works just fine on a file with the exact same contents, it’s almost certainly seeking in its input, or doing something else that works with files but not with pipes.

Categories
bash

Shifting bash arrays

(Originally published on Reddit, archived here in lightly edited form for posterity.)

Someone just asked me how to shift a bash array. He knew how to shift positional parameters (the familiar shift N command), but when he tried:

shift arr N

he of course got the error: bash: shift: arr: numeric argument required

After reminding him to RTFbashM, I gave him the magic incantation:

arr=("${arr[@]:N}")

bash helpfully extends the ${parameter:offset:length} substring expansion syntax to array slicing as well, so the above simply says “take the contents of arr from index N onwards, create a new array out of it, then overwrite arr with this new value”.

But is it significant that I use @ instead of * as the array index, and are those double quotes really necessary?

YES. Here’s why:

a=(This are a "test with spaces")  # grammatically incorrect to avoid later confusion

echo ${a[0]:2}    => "is"  # substring of a[0]
echo ${a:2}       => "is"  # $a === $a[0], so we get the same result
echo ${a[1]:2}    => "e"   # substring of a[1]

echo ${a[@]:2}    => "a test with spaces"    # slice of a[2:end] as a single string
echo ${a[*]:2}    => "a test with spaces"    # ditto, but...
echo "${a[@]:2}"  => "a" "test with spaces"  # individual elements of a[2:end]
echo "${a[*]:2}"  => "a test with spaces"    # a[2:end] as a concatenated string

And here’s what happens when we try to assign the above array slice attempts to actual arrays:

$ b1=(${a[@]:2})            # expand as single string, bash then word-splits
$ c1=(${a[*]:2})            # ditto
$ b2=("${a[@]:2}")          # expand as individual elements, no bash word-splitting
$ c2=("${a[*]:2}")          # expand as single string, no bash word-splitting

$ declare -p b1 c1 b2 c2
declare -a b1=([0]="a" [1]="test" [2]="with" [3]="spaces")
declare -a c1=([0]="a" [1]="test" [2]="with" [3]="spaces")
declare -a b2=([0]="a" [1]="test with spaces")
declare -a c2=([0]="a test with spaces")

And, with the help of bash namerefs, we can actually create a shift_array function that mimics the shift command for indexed arrays:

# shift_array <arr_name> [<n>]
shift_array() {
  # Create nameref to real array
  local -n arr="$1"
  local n="${2:-1}"
  arr=("${arr[@]:${n}}")
}

$ a=(This is a "test with spaces")
$ shift_array a 2
$ declare -p a
declare -a a=([0]="a" [1]="test with spaces")

How to use the length parameter as a substring/slice length is left as an exercise to the reader.

NOTE: I’ve published shift_array in my bash_functions GitHub repo.

Categories
c

Libraries should NOT print diagnostic messages

(Originally published on reddit, archived here in lightly edited form for posterity.)

A recent (now deleted) discussion thread aroused some rather strong emotions w.r.t. how a key library embedded in a product used by many millions of folks reported errors via fprintf(stderr,...). (It turned out to be a false alarm: the stderr logging lines were for various command-line utilities and example code, and none of it was linked into the actual library.)

As a rule of thumb, general-use libraries should NOT do this, for reasons ranging to “stderr may have been inadvertently closed” to “that might itself cause further errors”. The only exception: logging libraries. (D’oh!)

Instead, define an enum to properly scope the range of error codes you return with each function; that should cover 99% of your needs. If your users need more details, provide a function in the spirit of strerror(3) for callers to retrieve them.

There are certainly more complicated ways to handle errors, but the above should cover all but the most esoteric circumstances.

Categories
bash

The `lastpipe` Maneuver

Have you ever wanted to construct a long pipeline with a while read loop or a mapfile at the end of it? It’s so common that it’s practically a shell idiom.

Have you then (re)discovered that all pipeline components are run in separate shell environments?

#!/bin/bash
seq 20 | mapfile -t results
declare -p results  # => bash: declare: results: not found (WTF!)

Then someone told you to use a process substitution to get around it?

#!/bin/bash
mapfile -t results < <(seq 20)
declare -p results  # => declare -a results=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5" [5]="6" [6]="7" [7]="8" [8]="9" [9]="10" [10]="11" [11]="12" [12]="13" [13]="14" [14]="15" [15]="16" [16]="17" [17]="18" [18]="19" [19]="20")

And you go away unsatisfied, because:

mapfile -t results < <(gargantuan | pipeline | that | stretches | into | infinity | and | beyond)

just looks bass-ackward?

That’s what the lastpipe shell option solves. From the bash man page:

lastpipe

If set, and job control is not active, the shell runs the last command of a pipeline not executed in the background in the current shell environment.

So this works just fine:

#!/bin/bash
shopt -s lastpipe
seq 20 | mapfile -t results
declare -p results  # => declare -a results=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5" [5]="6" [6]="7" [7]="8" [8]="9" [9]="10" [10]="11" [11]="12" [12]="13" [13]="14" [14]="15" [15]="16" [16]="17" [17]="18" [18]="19" [19]="20")

There’s a catch: You can’t do this on the command line, because job control (a.k.a. suspend/resume) is active in interactive mode:

$ shopt -s lastpipe
$ seq 20 | mapfile -t results
$ declare -p results  # => bash: declare: results: not found

But in your scripts, you can now write your while read/mapfile pipelines the logical way.

Conversely, if you really want the code in your while read consumer loop to not pollute your main shell environment, you can explicitly turn off lastpipe just in case:

#!/bin/bash -x
# Check current state of lastpipe option...
[[ $BASHOPTS == *lastpipe* ]] && old_lastpipe="-s" || old_lastpipe="-u"
# ...then force it off
shopt -u lastpipe

cmd=(echo)
seq 20 | while read n; do
  cmd+=($n)
  "${cmd[@]}"
done
# => 1
# => 1 2
# ...
# => 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

# Restore lastpipe state, in case you want to add more code below
shopt "${old_lastpipe}" lastpipe

declare -p cmd  # => declare -a cmd=([0]="echo")