Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Bash wait command ignoring specified process IDs

DIRECTORIES=( group1 group2 group3 group4 group5 )
PIDS=()

function GetFileSpace() {
    shopt -s nullglob
    TARGETS=(/home/${1}/data/*)
    for ITEM in "${TARGETS[@]}"
    do
            # Here we launch du on a user in the background
            # And then add their process id to PIDS
            du -hs $ITEM >> ./${1}_filespace.txt &
            PIDS+=($!)
    done
}

# Here I launch function GetFileSpace for each group.
for GROUP in "${DIRECTORIES[@]}"
do
    echo $GROUP
    # Store standard error to collect files with bad permissions
    GetFileSpace $GROUP 2>> ./${GROUP}_permission_denied.txt &
done

for PID in "${PIDS[@]}"
do
    wait $PID
done

echo "Formatting Results..."
# The script will after this, but it isn't relevant.

I am trying to write a script that monitors storage volume and file permissions of individual users across 5 groups.

|_home          # For additional reference to understand my code,
  |_group1      # directories are laid out like this
  | |_data
  |   |_user1
  |   |_user2
  |   |_user3
  |
  |_group2
    |_data
      |_user4
      |_user5

First, I use a loop to iteratively launch a function, GetFileSpace, for each group in DIRECTORIES. This function then runs du -sh for each user found within a group.

To speed up this whole process, I launch each instance of GetFileSpace and the subsequent du -sh sub processes in the background with &. This makes it so everything can run pretty much simultaneously, which takes much less time.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My issue is that after I launch these processes I want my script to wait for every background instance of du -sh to finish before moving on to the next step.

To do this, I have tried to collect process IDs after each task is launched within the array PIDS. Then I try to loop through the array and wait for each PID until all sub-processes finish. Unfortunately this doesn’t seem to work. The script correctly launches du -sh for each user, but then immediately tries to move on to the next step, breaking.

My question then, is why does my script not wait for my background tasks to finish and how can I implement this behavior?

As a final note, I have tried several other methods to accomplish this from this SO post, but haven’t been able to get them working either.

>Solution :

GetFileSpace ... &

You are running the whole function as a subproces. So it immediately tries to move on to the next step and PID is unset, cause it beeing set in subprocess.

Do not run it in the background.

GetFileSpace ...   # no & on the end.

Notes: Consider using xargs or GNU parallel. Prefer lower case for script local variables. Quote variable expansions. Use shellcheck to check for such errors.

work() {
   tmp=$(du -hs "$2")
   echo "$tmp" >> "./${1}_filespace.txt"
}
export -f work
for i in "${directories[@]}"; do
   printf "$i %s\n" /home/${1}/data/*
done | xargs -n2 -P$(nproc) bash -c 'work "$@"' _

Note that when job is I/O bound, running multiple processes (escpecially without no upper bound) doesn’t really help much, if it’s on one disc.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading