Pyradiomics: Multithread not working propely

Created on 4 Jan 2020  ·  7Comments  ·  Source: AIM-Harvard/pyradiomics

Dear all, thank you for this nice tool, I'm really appreciating it.

I pose you my doubts, since I cannot find an answer elsewhere:

I'm working with a batch of about 400 image+mask combos (in a csv file) and I'm using a small computer for the computation. Since I want to scan about 50 binWidth/binCounts I'm trying to optimize the extraction in parallel.

Specifically I wrote a bash script to scan the different configurations and i run it with the -j 24 parameter (for 24 workers). I have 2x Xeon 6Core HT (total logical core = 24) and about 200GB of RAM.

However I noticed that the CPU is working at 100% only a fraction of the time... is it normal? Is there a bottleneck in the usage of some scripts that cannot be executed in parallel?

Obviously I start the computation with different parameters NOT in paraller, i.e. i start the computation with the next parameter only when the first batch is finished.
Example:

bC="8 12 16 22 32 64 43 128 171 256 341 512 682 1024"
for i in $bC
do
  pyradiomics ./input/input_dyn.csv --param ./json/RadiomicsLogicParams_all.json --setting 
  "binCount:${i}" --setting "normalize:false" -o ./output/all_stackedOut.csv -f csv -j 24 -v 4
done

Screen
Screenshot 2020-01-04 at 13 20 14

Thank you for your attention!!

All 7 comments

However I noticed that the CPU is working at 100% only a fraction of the time... is it normal? Is there a bottleneck in the usage of some scripts that cannot be executed in parallel?

Yes it's normal. This is unrelated to PyRadiomics.

Yes it's normal. This is unrelated to PyRadiomics.

Thank you for the cryptic response. Is there a way to solve it? Is there a reason?
Is it common?

Yes to all three of those questions. How's that for cryptic?

I'm using ubuntu 19.4. Is it related to the OS? In this case I can close the thread.
I've already checked the UEFI to look for powersaving settings to disable....

What could I do? Thank u and happy new cryptic year.

It's related to how CPUs and software work in general.

100% CPU core utilisation is uncommon because processes typically need to wait for each other. If you had a single core machine it probably wouldn't reach a persistent 100%, because code needs to wait for memory access, network, etc. And in the case of your python script the cores are probably also being swapped to other processes by the kernel (or your screen would freeze etc).

In summary: just because you don't see 100% utilisation it doesn't mean code isn't running parallel. Lastly, depending on the code and problem, parallelizing benefits rarely have a linear relationship with threadcount.

So you're telling me that if I start 24 different pyradiomics instances with -j 1 I'll obtain the same effect, isn't it?
I'm not so sure about that, I guess it's something related to the disk access for read/write or something else..

At the end I decided to use a manual check to launch a new pyradiomics extraction every time the CPU drops below 70%.
Now I always have a reasonable CPU usage (>50% with all the logical units, i.e. always 100% utilization)

wait4CpuReady () {
  cpuUsed=5 #debouncer
  while  [ ${cpuUsed} -gt 0 ] || [ $(top -b -n1 | grep "Cpu(s)" | awk '{print $2 + $4}') -ge 70 ]
  do
    sleep 5
    if [ ${cpuUsed} -gt 0 ] && [ $(top -b -n1 | grep "Cpu(s)" | awk '{print $2 + $4}') -ge 70 ]
    then
      cpuUsed=5
    else
      ((cpuUsed--))
      echo "CPU not used, debouncer = ${cpuUsed}"
    fi
  done
}

bW="$(seq -s ' ' 1 1 9) $(seq -s ' ' 10 2 28) $(seq -s ' ' 30 3 60)"


for i in $bW
do
  wait4CpuReady

  echo -e "\e[41m#########################################################################################\e[0m"
  echo -e "\e[41m#### Starting computing all features for all_times and SUB - NONnorm and binWidth $i ####\e[0m"
  echo -e "\e[41m#########################################################################################\e[0m"

  pyradiomics ./input/input_all_msub.csv --param ./json/RadiomicsLogicParams_all_noWarningGLCM.json --setting "binWidth:${i}.0" --setting "normalize:false" -o ./output/manualOut/all_allTmSub_stackedOut_bw${i}.csv -f csv -j 3 -v 4 &
done

Screenshot 2020-01-05 at 21 12 25

Was this page helpful?
0 / 5 - 0 ratings