Hi! I have taken on a project that was previously started by someone who was using pyradiomics v1.2. Given that I just started, I opted for the latest version (v.1.3). To ensure that I was on the right track, I was comparing resulting features from my runs with the person who was working with v1.2 and I found some significant discrepancies in the features calculated from the wavelet filtered images. I have been able to pin-point two issues:
Issue 1:
In version 1.3, I found that some of the decomposition names were not attributed to the correct decomposition image. I discovered this when I saw that there was almost perfect symmetry comparing to version 1.2.
This is specific to the wavelet decompositions whose names are not palindromes (e.g. “LLL” features were the same across versions whereas HHL(v1.3) = LHH(v.1.2), LLH (v1.3) = HHL (v.1.2), etc). Digging a bit into the getWaveletImage function in imageoperations.py, I saw that there was the addition of the following lines in v1.3:
axes = {2, 1, 0} # set
if kwargs.get('force2D', False):
axes -= {kwargs.get('force2Ddimension', 0)} # set
approx, ret = _swt3(inputImage, kwargs.get('wavelet', 'coif1'), kwargs.get('level', 1), kwargs.get('start_level', 0), axes=tuple(axes))
From my understanding of pywavelet, axes is the variable whose order determines the order of the letters in the decomposition name. I was wondering why this is a set? It seems to me that its order would then be unpredicatable when converting to a tuple in the _swt3 function which could then change the decomposition name and feature labels. When I change the axes variable to a list, I'm no longer getting the issue.
Issue 2:
When I fixed the above issue, I started getting the same wavelet feature values as v1.2 for some of the images but not others. And this time the discrepancies were random (and significant).
Again looking at the getWaveletImage function in imageoperations.py v1.3, it seems that the images are getting padded if their dimensions are not divisible by 2. This seems to be a constraint placed by pywavelets where signal length must be divisible by 2**level (2 in this case). I don't think the same padding is occurring in v 1.2. Indeed, when I look at the dimensions of the images that I tested, the images that match perfectly with v1.2 had even dimensions (no padding in v.1.3) whereas those that did not had at least one odd dimension (padding in v1.3). Is it safe to say that v1.3 has the most accurate feature values for the wavelet filtered images (with the exception of the first issue)?
@sandfis thank you for investigating this!
Two more related commits that happened between 1.2 and 1.3:
cc: @michaelschwier
@sandfis about your notes:
Actually issue 2 is most definitely related to 7ff05482d3615e26ff7439e8f9044aefcba50a9a as well. Padding for images with odd dimensions was not correct before!
Great, thanks @fedorov! I'll wait for @JoostJM to weigh in too
I have to say, if indeed this is the source of the discrepancy, it is remarkable how much difference that change in padding strategy introduced.
@sandfis do you think you could continue your investigations, and see if reverting the commit referenced above would make the values match? This would be super helpful.
@fedorov @sandfis Sorry, I should maybe have explained it in a little more detail before, because it is not obvious from the code change alone without knowing what numpy's resize does: The way padding was done before (with resize), literally scrambled the image! The resize would add zeros to the end of the array/matrix buffer (not the end of each dimension). See example 3 "Enlarging an array" here: https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.ndarray.resize.html#numpy.ndarray.resize
Hence the big difference, wavelet computation for odd-dimension images was unfortunately completely wrong before 7ff05482d3615e26ff7439e8f9044aefcba50a9a
@michaelschwier thanks for clarifying this, indeed I completely missed it in https://github.com/Radiomics/pyradiomics/commit/7ff05482d3615e26ff7439e8f9044aefcba50a9a. I only looked at the second relevant commit, and thought that the change is in switching from pad
to wrap
. Indeed, now that you explain it, it was completely wrong. So I think this should be listed in the "Bug fixes" section of https://github.com/Radiomics/pyradiomics/releases/tag/1.3.0.
Now, what do we do about all those papers that managed to develop novel radiomics signatures predicting disease and eradicating cancer based on the wavelets features as implemented in v1.2.0? 🤣
Thanks @michaelschwier
Indeed we can see that the features line up with v1.2 when reverting 7ff0548
Sorry for the late reply!
I agree with both @fedorov and @michaelschwier, altough I think this should be documented in the upcoming release, as this change is in the current master, and after the release of v1.3.0.
Am I correct to assume v1.3 here means the current master?
Additonally regarding issue 1, A set is indeed incorrect here. However, changing to a tuple does not work, as you cannot delete elements as would be necessary when extracting features in 2D (during which the wavelet will also be calculated in 2D, and therefore requires removal of the between-plane axis).
I fixed this issue by using a list and list.remove()
, I will push this bugfix to the master shortly.
see 4027a52
Thank you @JoostJM! Indeed I was working with the current master.
Most helpful comment
Thanks @michaelschwier
Indeed we can see that the features line up with v1.2 when reverting 7ff0548