Path Functions

Manipulating paths using shell functions

The Problem

Is there a problem? After all

PATH=$PATH:/usr/local/bin

is not that hard to do.

It's true but there is a subtle problem with the above for a start. If the current value of PATH is empty or unset then the result will be

:/usr/local/bin

where the empty value in front of the colon means the current directory. That's not good if there are unexpected commands in the current directory.

Besides, how many times do you have /usr/local/bin on your PATH?

Hmm, you still aren't convinced. Well, let's write some functions anyway. We might be able to do some useful things on the side: check the directory exists; remove duplicates; insert elements before or after another. They might begin to look useful after all.

The Next Problem

We'll not just be playing with PATH but any similarly formed list:

MANPATH and LD_LIBRARY_PATH
PERL5LIB
CLASSPATH is a list of jar files as well as directories
TCLLIBPATH is a SPACE separated list of directories
Windows systems will use a SEMICOLON separator

So we'll need to pass the name of the PATH we want to manipulate and handle different path SEPARATORS.

First things first. If we're passing the name of the path we want to manipulate we have two problems: how do we get the value given the name and how do we set the value?

Variable Indirection

Variable indirection is our friend:

% var=PATH
% echo ${var}
PATH
% echo ${!var}
/bin:/usr/bin

We could do with all those elements separately. For that we want to play with IFS. Given that it is used as the field separator during the Word Splitting stage of expansion it looks like we can trivially split the value of the path into words by setting IFS to the value of the separator.

Note

Whenever you manipulate IFS you must remember to save the original value and put it back afterwards.

% OIFS="${IFS}"
% IFS=:
% echo ${!var}
/bin /usr/bin

which looks good. If we use that in an array initialization we'll be looking very good:

% dirs=( ${!var} )
% IFS="${OIFS}"

% echo "${#dirs[*]}: ${dirs[@]}"
2: /bin /usr/bin

Modifying the PATH

To manipulate the PATH we can manipulate the elements of the array.

Note

Note the quoting in "${dirs[@]}" in the following sections to preserve the whitespace.

Prepending

dirs=( new value "${dirs[@]}" )

Appending

dirs=( "${dirs[@]}" new value )

Removing Elements

This is slightly more subtle as we can't simply walk over the list with:

for d in "${dirs[@]}" ; do

because, whilst we have the value ${d} we don't have a reference into the array. We'll have to walk the array the long way:

max=${#dirs[*]}
for ((i=0; i< max; i++ )) ; do
    if [[ ! -d "${dirs[i]}" ]] ; then
        unset dirs[i]
    fi
done

Note

We have to calculate the length of the array before we start as the length of the array ${#dirs[*]} will change if we remove elements and were using the length calculation in the for loop:

for ((i=0; i < ${#dirs[*]}; i++ )) ; do

Getting the new value

Converting an array into a SEPARATOR separated string can be done with a similar IFS trick:

% echo "${dirs[*]}"
new value /bin /usr/bin

% IFS=:
% echo "${dirs[*]}"
new:value:/bin:/usr/bin

Note

Remember to set IFS back.

Setting the path

This turns out to be a real problem. The obvious thing to do is:

${var}=new:value
PATH=new:value: command not found

Hmm, it seems the shell isn't fooled by our hijinks and doesn't believe that ${var}= is a variable assignment because ${var} has several characters ($, { and }) that aren't allowed in an identifier.

eval considered bad

We could say:

% eval ${var}=new:value
% echo ${var}
new:value

However, eval introduces a whole new world of pain.

% new='more;echo oops'
% echo ${new}
more; echo oops

% PATH=/bin:/usr/bin
% eval ${var}=${!var}:${new}
oops
% echo ${!var}
/bin:/usr/bin:more

The problem is that the eval line has been expanded to:

PATH=/bin:/usr/bin:more; echo oops

and you can see why oops is printed and the value of PATH isn't what you expect.

We can delay the inevitable by escaping, ie. preventing the expansion of ${new}:

% eval ${var}=${!var}:\${new}

Here, the expanded line that eval will evaluate is:

PATH=/bin:/usr/bin:${new}

which is safe enough:

% echo ${!var}
/bin:/usr/bin:more;echo oops

However, when we run this again:

% eval ${var}=${!var}:\${new}
oops:more;echo oops
% echo ${!var}
/bin:/usr/bin:more

This time it is the value expanded from ${!var}, /bin:/usr/bin:more;echo oops, which is causing the problem, not the escaped addition (\${new}). The expanded line eval will evaluate looks like:

% PATH=/bin:/usr/bin:more;echo oops:${new}

We could start trying to enclose the variables in escaped double-quotes:

eval ${var}=\"${!var}:\${new}\"

but we will quickly run out of the will to live when attempting to escape all possible dangerous strings. As a trivial example, consider if ${new} contains a double-quote character.

declare

declare looks like a promising candidate:

declare ${var}="${!var}:${new}"

which it would be except for one small thing. If declare is used in a function it acts like local and the NAME takes on local scope, ie. we won't be setting the value outside the function.

read and <<<

In the end we need to use a couple of shell tricks. read lets us set variables:

read NAME

but it is reading from its stdin. To forge a stdin we need to use the string version of a here document, <<<:

read NAME << VALUE

putting it together:

read ${var} <<< "${!var}:${new}"

read ${var} <<< "${dirs[*]}"

Prototype path_append

Putting all the parts together will give us something like:

path_append ()
{
   typeset var=$1
   typeset val="$2"
   typeset sep="${3:-:}"

   typeset OIFS
   OIFS="${IFS}"

   IFS="${sep}"
   typeset origdirs
   origdirs=( ${!var} )

   typeset newdirs
   newdirs=( ${val} )

   typeset vardirs
   vardirs=( "${origdirs[@]}" "${newdirs[@]}" )

   read ${var} <<< "${vardirs[*]}"

   IFS="${OIFS}"
}

Options

As suggested, we might want to check that entries exist before adding them (YMMV). Shell functions can use getopts just like shell scripts can:

typeset opt_op

OPTIND=1
while getopts "def" opt ; do
    case "${opt}" in
    d|e|f)
        opt_op=${opt}
        ;;
    ?)
        error "Unexpected argument"
        ;;
    esac
done

shift $(( $OPTIND - 1 ))

and

if [[ ${opt_op} ]] ; then
    typeset n
    typeset maxn=${#newdirs[*]}

    for (( n=0 ; n < ${maxn} ; n++ )) ; do

        # if ... ; then
        # where ... is a case statement!

        if
            case "${opt_op}" in
            d) [[ ! -d "${newdirs[n]}" ]] ;;
            e) [[ ! -e "${newdirs[n]}" ]] ;;
            f) [[ ! -f "${newdirs[n]}" ]] ;;
            esac
        then
            unset newdirs[n]
        fi
    done
fi

Note

We have to do this complex if + case statement (or something similar) because we're using [[ which insists on conditional operators being unquoted (and certainly not the result of variable expansion).

If we were using the non-preferred [ (or test) then we could have simply said:

if [ -${opt_op} "${newdirs[n]}" ]

although we definitely have to double quote the ${newdirs[n]} expression.

Prototype 2 path_append

path_append ()
{
   typeset opt_op

   OPTIND=1
   while getopts "def" opt ; do
       case "${opt}" in
       d|e|f)
           opt_op=${opt}
           ;;
       ?)
           error "Unexpected argument"
           ;;
       esac
   done

   shift $(( $OPTIND - 1 ))

   typeset var=$1
   typeset val="$2"
   typeset sep="${3:-:}"

   typeset OIFS
   OIFS="${IFS}"

   IFS="${sep}"
   typeset origdirs
   origdirs=( ${!var} )

   typeset newdirs
   newdirs=( ${val} )

   if [[ ${opt_op} ]] ; then
       typeset n
       typeset maxn=${#newdirs[*]}

       for (( n=0 ; n < ${maxn} ; n++ )) ; do

           if
               case "${opt_op}" in
               d) [[ ! -d "${newdirs[n]}" ]] ;;
               e) [[ ! -e "${newdirs[n]}" ]] ;;
               f) [[ ! -f "${newdirs[n]}" ]] ;;
               esac
           then
               unset newdirs[n]
           fi
       done
   fi

   if [[ ${#newdirs[*]} -eq 0 ]] ; then
       return 0
   fi

   typeset vardirs
   vardirs=( "${origdirs[@]}" "${newdirs[@]}" )

   read ${var} <<< "${vardirs[*]}"

   IFS="${OIFS}"
}

Other functions

path_append (ie. insert at the end), path_prepend (insert at the start) and path_insert all sound very similar. So do path_remove, path_replace and path_verify. It sounds like we want an all singing all dancing path_modify function and some wrapper functions to it.

path_remove and path_replace will probably want an option to do their action only once.

path_modify

path_modify, then, wants to do stuff:

path_modify ()
{
    typeset opt_op opt_once

    OPTIND=1
    while getopts "1def" opt ; do
       case "${opt}" in
       1)
           opt_once=1
           ;;
       d|e|f)
            opt_op=${opt}
            ;;
       ?)
            error "Unexpected argument"
            ;;
       esac
    done

    shift $(( $OPTIND - 1 ))

    typeset var=$1
    typeset val="$2"
    typeset act="$3"
    typeset wrt="$4"
    typeset sep="${5:-:}"

    typeset OIFS
    OIFS="${IFS}"

    IFS="${sep}"
    typeset origdirs
    origdirs=( ${!var} )

    typeset newdirs
    newdirs=( ${val} )

    if [[ ${opt_op} ]] ; then
       typeset n
       typeset maxn=${#newdirs[*]}

       for (( n=0 ; n < ${maxn} ; n++ )) ; do

           if
               case "${opt_op}" in
               d) [[ ! -d "${newdirs[n]}" ]] ;;
               e) [[ ! -e "${newdirs[n]}" ]] ;;
               f) [[ ! -f "${newdirs[n]}" ]] ;;
               esac
           then
               unset newdirs[n]
           fi
       done
    fi

    if [[ ${#newdirs[*]} -eq 0 ]] ; then
       case "${act}" in
       verify|replace|remove)
           ;;
       *)
           IFS="${OIFS}"
           return 0
           ;;
       esac
    fi

    typeset vardirs
    case "${act}" in
    first|start)
       vardirs=( "${newdirs[@]}" "${origdirs[@]}" )
       ;;
    last|end)
       vardirs=( "${origdirs[@]}" "${newdirs[@]}" )
       ;;
    verify)
       vardirs=( "${newdirs[@]}" )
       ;;
    after|before|replace|remove)
       typeset todo=1
       typeset o
       typeset maxo=${#origdirs[*]}

       for (( o=0 ; o < ${maxo} ; o++ )) ; do
           if [[ "${todo}" && "${origdirs[o]}" = "${wrt}" ]] ; then
               case "${act}" in
               after)
                   vardirs=( "${vardirs[@]}" "${origdirs[o]}" "${newdirs[@]}" )
                   ;;
               before)
                   vardirs=( "${vardirs[@]}" "${newdirs[@]}" "${origdirs[o]}" )
                   ;;
               replace)
                   vardirs=( "${vardirs[@]}" "${newdirs[@]}" )
                   ;;
               remove)
                   ;;
               esac

               if [[ "${opt_once}" ]] ; then
                   todo=
               fi
           else
               vardirs=( "${vardirs[@]}" "${origdirs[o]}" )
           fi
       done
       ;;
    *)
       vardirs=( "${origdirs[@]}" )
       ;;
    esac

    read ${var} <<< "${vardirs[*]}"

    IFS="${OIFS}"
}

and therefore path_append can become a wrapper to path_modify:

path_append ()
{
    typeset opt_flags

    OPTIND=1
    while getopts "def" opt ; do
       case "${opt}" in
       d|e|f)
           opt_flags=-${opt}
           ;;
       ?)
            error "Unexpected argument"
            ;;
       esac
    done

    shift $(( $OPTIND - 1 ))

    path_modify ${opt_flags} "$1" "$2" last '' "${3:-:}"
}

and (with option handling removed) the other path functions look like:

path_prepend ()
{
    ...

    path_modify ${opt_flags} "$1" "$2" first '' "${3:-:}"
}

path_verify ()
{
    ...

    # As path_modify checks the paths to be added we pass the expansion of NAME, ie
    # our own value

    path_modify ${opt_flags} "$1" "${!1}" verify '' "${2:-:}"
}

path_replace ()
{
    ...

    # The expression is path_replace OLD NEW but path_modify takes the arguments
    # the other way round

    path_modify ${opt_flags} "$1" "$3" replace "$2" "${4:-:}"
}

path_remove ()
{
    ...

    path_modify ${opt_flags} "$1" '' remove "$2" "${3:-:}"
}

path_trim

Depending on how we've gotten to where we are, we might well have /usr/local/bin, say, on our PATH more than once. We could do with trimming the cruft.

To do this we would use a set or a map in other languages but we're a bit short of those in the shell (Bash 4 does have associative arrays but we should look for something more portable). What we can do is string comparisons, in particular with case. For example:

case "${string}" in
*a*) ;;
esac

To make this work with our paths we have to construct a ${string} such that it can be matched against each individual element and if we've seen the element before do nothing and if we've not seen it before then add it to the path.

The constructed path itself is an obvious such string. However, we need to be very careful when matching as /bin can be matched against /bin, /usr/bin, /usr/local/bin etc.. We'll need to include the separator as part of the match:

case "${path}" in
*${sep}${dir}${sep}*) ;;
esac

But that's not quite all as there are a couple of other issues:

quoting - we need to quote the patterns:
```
*"${sep}${dir}${sep}"*)
```
as both ${sep} and ${dir} can contain whitespace
if the element we are trying to match against is the first (or last) element in the path then ${sep}${dir}${sep} won't match it (as the path (probably) won't have a leading/trailing ${sep}). To fix that we need to augment the string being compared to:
```
case "${sep}${path}${sep}" in
```

This leads us to the visually confusing (but quite simple):

case "${sep}${path}${sep}" in
*"${sep}${dir}${sep}"*) ;;
esac

Brought together it looks like:

path_trim ()
{
    typeset var=$1
    typeset sep="${2:-:}"

    typeset OIFS
    OIFS="${IFS}"

    IFS="${sep}"
    typeset origdirs
    origdirs=( ${!var} )

    IFS="${OIFS}"

    typeset o
    typeset maxo=${#origdirs[*]}
    typeset seen=
    for (( o=0 ; o < ${maxo} ; o++ )) ; do
       case "${sep}${seen}${sep}" in
       *"${sep}${origdirs[o]:-.}${sep}"*)
           unset origdirs[o]
           ;;
       *)
           seen="${seen+${seen}${sep}}${origdirs[o]:-.}"
           ;;
       esac
    done

    IFS="${sep}"
    read ${var} <<< "${origdirs[*]}"

    IFS="${OIFS}"
}

Convenience Wrappers

When we're adding distributions of code we're quite likely to be performing the same steps repeatedly:

path_append PATH /usr/local/bin
path_append MANPATH /usr/local/man

and in some cases

path_append LD_LIBRARY_PATH /usr/local/lib

It fairly obvious we should be writing some shortcuts, std_paths_append, say, for PATH and MANPATH and all_paths_append for the same plus LD_LIBRAY_PATH.

path_append and path_prepend behave in much the same way and it would be a shame to have to write both std_paths_append and std_paths_prepend and thanks to the way the shell processes lines we don't:

% base=/usr/local
% act=prepend
% path_${act} PATH "${base}"/bin

Variable expansion occurs before Word Splitting and therefore before the shell decides what the command name is [1] (or even if there is a command). path_${act} is expanded to path_prepend and away we go.

std_paths (and all_paths) should therefore take an argument which is the action it should be performing. On top of which they both can do some small checks that the relevant directories exist or perhaps check a few (eg. man or share/man). For example:

std_paths ()
{
    typeset act="$1"
    typeset val="$2"
    typeset sep="${3:-:}"

    typeset OIFS
    OIFS="${IFS}"

    IFS="${sep}"
    typeset origdirs
    origdirs=( ${!var} )

    IFS="${OIFS}"

    typeset dir
    for dir in "${origdirs[@]}" ; do
       path_${act} PATH "${dir}/bin"
       typeset md
       for md in man share/man ; do
           if [[ -d "${dir}/${md}" ]] ; then
               path_${act} MANPATH "${dir}/${md}"
           fi
       done
    done
}

pathname_flatten

While we're messing about with paths we could write a useful little function to flatten pathnames. As we play with pathnames automatically, particularly with wrappers, we're likely to encounter pathnames with embedded ., .. directories and multiple / separators, eg. ///full//./path/to/../to/bin/.

There's nothing wrong with that, it is a perfectly valid pathname, but if we're going to set some environment variables with it then it is hard to scan and wastes a few bytes.

What do we need to look at?

/ - we don't need multiple ones. We're going to be using an IFS trick again and split the pathname up by the directory separator, /. In the resultant array, the / will disappear but if you have two adjacent separators in your pathname then you will get an empty string element in the array, ie. foo//bar would become a three element array, ( foo '' bar ).

There is one special case, if the pathname is an absolute pathname, ie. begins with a /, then we need to preserve the empty string in the array.

While we're here, if flattening the pathname results in just the empty string in the array (eg., /.. would become ( '' )) then the usual IFS trick of recombining array elements will not give us a /. We'll have to handle that case specially.
. - we can simply junk this element
.. - we need to remove the last element in the array, ie. go back up a directory, unless:
1. it is the first element, eg. ../bin, in which case there is no directory (in the pathname) to go back up
2. you are already at the top of the directory tree, ie. /.. is /

The resultant function looks like:

pathname_flatten ()
{
    typeset val=$1
    typeset sep="${2:-/}"

    typeset OIFS
    OIFS="${IFS}"

    IFS="${sep}"
    typeset origdirs
    origdirs=( ${val} )

    IFS="${OIFS}"

    typeset newdirs
    newdirs=()

    typeset o
    typeset maxo=${#origdirs[*]}
    typeset seen=
    for (( o=0 ; o < ${maxo} ; o++ )) ; do
       case "${origdirs[o]}" in
       '')
           # ///foo -> ( '' '' '' foo )
           # but we still need the first!
           if [[ $o -eq 0 ]] ; then
               newdirs=( '' )
           fi
           ;;
       .)
           ;;
       ..)
           if [[ $o -eq 0 ]] ; then
               # .. at the start cannot be flattened
               newdirs=( "${newdirs[@]}" "${origdirs[o]}" )
           else
               # remove the last element
               if [[ ${#newdirs[*]} -gt 1 ]] ; then
                   unset newdirs[$((${#newdirs[*]} - 1))]
               fi
           fi
           ;;
       *)
           newdirs=( "${newdirs[@]}" "${origdirs[o]}" )
           ;;
       esac
    done

    # If all we are left with in newdirs is '' (ie /) then the IFS
    # trick fails us, we need to handle this case specially

    if [[ ${#newdirs[*]} -eq 1 && "${newdirs[0]}" = "" ]] ; then
       echo /
    else
       IFS="${sep}"
       echo "${newdirs[*]}"
       IFS="${OIFS}"
    fi
}

[1]	Unfortunately, it does identify and handle separately Variable Assignments which is why we can't do the `${var}="${dirs[*]}"` trick before.

Document Actions