Categories
bash

Process substitutions are pipes, NOT files

(Originally published on Reddit, archived here in lightly edited form for posterity.)

A recent question led to a suggestion to use process substitution to emulate the effects of a proper shebang invocation, but the OP eventually discovered that 8 characters of padding had to be added to the substitution’s output to get it working.

Here’s why: SBCL seems to scan the first 8 characters in an input script file to determine what sort of file it is, then rewinds to the beginning and parses the file anew.

But pipes are not seekable, so with a process substitution, this process fails miserably: SBCL fails to rewind, so it starts parsing from the 9th character onwards, leading to very odd errors.

In addition, there’s one broad class of process substitutions that are guaranteed to fail when used: those that input/output ZIP/JAR archives, e.g. streamed via curl from a remote server. The problem stems from the ZIP file structure; since the main file header is at EOF, commands expect to seek backwards through an archive. That’s just not gonna happen on a pipe.

So the next time you think “hey, I can use a process substitution for this”, and find that your chosen command chokes on it, but works just fine on a file with the exact same contents, it’s almost certainly seeking in its input, or doing something else that works with files but not with pipes.

Categories
bash

Shifting bash arrays

(Originally published on Reddit, archived here in lightly edited form for posterity.)

Someone just asked me how to shift a bash array. He knew how to shift positional parameters (the familiar shift N command), but when he tried:

shift arr N

he of course got the error: bash: shift: arr: numeric argument required

After reminding him to RTFbashM, I gave him the magic incantation:

arr=("${arr[@]:N}")

bash helpfully extends the ${parameter:offset:length} substring expansion syntax to array slicing as well, so the above simply says “take the contents of arr from index N onwards, create a new array out of it, then overwrite arr with this new value”.

But is it significant that I use @ instead of * as the array index, and are those double quotes really necessary?

YES. Here’s why:

a=(This are a "test with spaces")  # grammatically incorrect to avoid later confusion

echo ${a[0]:2}    => "is"  # substring of a[0]
echo ${a:2}       => "is"  # $a === $a[0], so we get the same result
echo ${a[1]:2}    => "e"   # substring of a[1]

echo ${a[@]:2}    => "a test with spaces"    # slice of a[2:end] as a single string
echo ${a[*]:2}    => "a test with spaces"    # ditto, but...
echo "${a[@]:2}"  => "a" "test with spaces"  # individual elements of a[2:end]
echo "${a[*]:2}"  => "a test with spaces"    # a[2:end] as a concatenated string

And here’s what happens when we try to assign the above array slice attempts to actual arrays:

$ b1=(${a[@]:2})            # expand as single string, bash then word-splits
$ c1=(${a[*]:2})            # ditto
$ b2=("${a[@]:2}")          # expand as individual elements, no bash word-splitting
$ c2=("${a[*]:2}")          # expand as single string, no bash word-splitting

$ declare -p b1 c1 b2 c2
declare -a b1=([0]="a" [1]="test" [2]="with" [3]="spaces")
declare -a c1=([0]="a" [1]="test" [2]="with" [3]="spaces")
declare -a b2=([0]="a" [1]="test with spaces")
declare -a c2=([0]="a test with spaces")

And, with the help of bash namerefs, we can actually create a shift_array function that mimics the shift command for indexed arrays:

# shift_array <arr_name> [<n>]
shift_array() {
  # Create nameref to real array
  local -n arr="$1"
  local n="${2:-1}"
  arr=("${arr[@]:${n}}")
}

$ a=(This is a "test with spaces")
$ shift_array a 2
$ declare -p a
declare -a a=([0]="a" [1]="test with spaces")

How to use the length parameter as a substring/slice length is left as an exercise to the reader.

NOTE: I’ve published shift_array in my bash_functions GitHub repo.

Categories
c

Libraries should NOT print diagnostic messages

(Originally published on reddit, archived here in lightly edited form for posterity.)

A recent (now deleted) discussion thread aroused some rather strong emotions w.r.t. how a key library embedded in a product used by many millions of folks reported errors via fprintf(stderr,...). (It turned out to be a false alarm: the stderr logging lines were for various command-line utilities and example code, and none of it was linked into the actual library.)

As a rule of thumb, general-use libraries should NOT do this, for reasons ranging to “stderr may have been inadvertently closed” to “that might itself cause further errors”. The only exception: logging libraries. (D’oh!)

Instead, define an enum to properly scope the range of error codes you return with each function; that should cover 99% of your needs. If your users need more details, provide a function in the spirit of strerror(3) for callers to retrieve them.

There are certainly more complicated ways to handle errors, but the above should cover all but the most esoteric circumstances.

Categories
bash

The `lastpipe` Maneuver

Have you ever wanted to construct a long pipeline with a while read loop or a mapfile at the end of it? It’s so common that it’s practically a shell idiom.

Have you then (re)discovered that all pipeline components are run in separate shell environments?

#!/bin/bash
seq 20 | mapfile -t results
declare -p results  # => bash: declare: results: not found (WTF!)

Then someone told you to use a process substitution to get around it?

#!/bin/bash
mapfile -t results < <(seq 20)
declare -p results  # => declare -a results=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5" [5]="6" [6]="7" [7]="8" [8]="9" [9]="10" [10]="11" [11]="12" [12]="13" [13]="14" [14]="15" [15]="16" [16]="17" [17]="18" [18]="19" [19]="20")

And you go away unsatisfied, because:

mapfile -t results < <(gargantuan | pipeline | that | stretches | into | infinity | and | beyond)

just looks bass-ackward?

That’s what the lastpipe shell option solves. From the bash man page:

lastpipe

If set, and job control is not active, the shell runs the last command of a pipeline not executed in the background in the current shell environment.

So this works just fine:

#!/bin/bash
shopt -s lastpipe
seq 20 | mapfile -t results
declare -p results  # => declare -a results=([0]="1" [1]="2" [2]="3" [3]="4" [4]="5" [5]="6" [6]="7" [7]="8" [8]="9" [9]="10" [10]="11" [11]="12" [12]="13" [13]="14" [14]="15" [15]="16" [16]="17" [17]="18" [18]="19" [19]="20")

There’s a catch: You can’t do this on the command line, because job control (a.k.a. suspend/resume) is active in interactive mode:

$ shopt -s lastpipe
$ seq 20 | mapfile -t results
$ declare -p results  # => bash: declare: results: not found

But in your scripts, you can now write your while read/mapfile pipelines the logical way.

Conversely, if you really want the code in your while read consumer loop to not pollute your main shell environment, you can explicitly turn off lastpipe just in case:

#!/bin/bash -x
# Check current state of lastpipe option...
[[ $BASHOPTS == *lastpipe* ]] && old_lastpipe="-s" || old_lastpipe="-u"
# ...then force it off
shopt -u lastpipe

cmd=(echo)
seq 20 | while read n; do
  cmd+=($n)
  "${cmd[@]}"
done
# => 1
# => 1 2
# ...
# => 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

# Restore lastpipe state, in case you want to add more code below
shopt "${old_lastpipe}" lastpipe

declare -p cmd  # => declare -a cmd=([0]="echo")