Pytorch: Bilinear interpolation behavior inconsistent with TF, CoreML and Caffe

Created on 17 Aug 2018  ·  4Comments  ·  Source: pytorch/pytorch

Issue description

Trying to compare and transfer models between Caffe, TF and Pytorch found difference in output of bilinear interpolations between all. Caffe is using depthwise transposed convolutions instead of straightforward resize, so it's easy to reimplement both in TF and Pytorch.
However, there is difference between output for TF and Pytorch with align_corners=False, which is default for both.

Code example

img = cv2.resize(cv2.imread('./lenna.png')[:, :, ::-1], (256, 256))
img = img.reshape(1, 256, 256, 3).astype('float32') / 255.
img = tf.convert_to_tensor(img)
output_size = [512, 512]
output = tf.image.resize_bilinear(img, output_size, align_corners=True)
with tf.Session() as sess:
        values = sess.run([output])
out_tf = values[0].astype('float32')[0]

img = img.reshape(1, 256, 256, 3).transpose(0, 3, 1, 2).astype('float32') / 255.
out_pt = nn.functional.interpolate(torch.from_numpy(nimg), 
                                   scale_factor=2, 
                                   mode='bilinear', 
                                   align_corners=True)
out_pt = out_pt.data.numpy().transpose(0, 2, 3, 1)[0]

print(np.max(np.abs(out_pt - out_tf)))
# output 5.6624413e-06

But

img = cv2.resize(cv2.imread('./lenna.png')[:, :, ::-1], (256, 256))
img = img.reshape(1, 256, 256, 3).astype('float32') / 255.
img = tf.convert_to_tensor(img)
output_size = [512, 512]
output = tf.image.resize_bilinear(img, output_size, align_corners=False)
with tf.Session() as sess:
        values = sess.run([output])
out_tf = values[0].astype('float32')[0]

img = img.reshape(1, 256, 256, 3).transpose(0, 3, 1, 2).astype('float32') / 255.
out_pt = nn.functional.interpolate(torch.from_numpy(nimg), 
                                   scale_factor=2, 
                                   mode='bilinear', 
                                   align_corners=False)
out_pt = out_pt.data.numpy().transpose(0, 2, 3, 1)[0]

print(np.max(np.abs(out_pt - out_tf)))
# output 0.22745097

Output diff * 10:
image

Output of CoreML is consistent with TF, so it seems that there is a bug with implementation of bilinear interpolation with align_corners=False in Pytorch.

Diff is reproducible both on cpu and cuda with cudnn 7.1, cuda 9.1.

Most helpful comment

For pytorch implementation, here's the logic to compute src_idx <-> dst_idx mapping.
https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/linear_upsampling.h#L22
For TF implementation, scale is caculated in the same way. https://github.com/tensorflow/tensorflow/blob/f66daa493e7383052b2b44def2933f61faf196e0/tensorflow/core/kernels/image_resizer_state.h#L41
But the src_idx <-> dst_idx mapping is different. https://github.com/tensorflow/tensorflow/blob/6795a8c3a3678fb805b6a8ba806af77ddfe61628/tensorflow/core/kernels/resize_bilinear_op.cc#L85
I think Pytorch takes into account the pixel center when scaling, while TF doesn't.

All 4 comments

Thanks for the report. We will look into this!

I've run a few experiments and it seems that Pytorch with align_corners=False is consistent with python-opencv, PIL and skimage with mode='edge'. I'll look into CoreML closer today.

For pytorch implementation, here's the logic to compute src_idx <-> dst_idx mapping.
https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/linear_upsampling.h#L22
For TF implementation, scale is caculated in the same way. https://github.com/tensorflow/tensorflow/blob/f66daa493e7383052b2b44def2933f61faf196e0/tensorflow/core/kernels/image_resizer_state.h#L41
But the src_idx <-> dst_idx mapping is different. https://github.com/tensorflow/tensorflow/blob/6795a8c3a3678fb805b6a8ba806af77ddfe61628/tensorflow/core/kernels/resize_bilinear_op.cc#L85
I think Pytorch takes into account the pixel center when scaling, while TF doesn't.

For pytorch implementation, here's the logic to compute src_idx <-> dst_idx mapping.
https://github.com/pytorch/pytorch/blob/master/aten/src/THNN/generic/linear_upsampling.h#L22
For TF implementation, scale is caculated in the same way. https://github.com/tensorflow/tensorflow/blob/f66daa493e7383052b2b44def2933f61faf196e0/tensorflow/core/kernels/image_resizer_state.h#L41
But the src_idx <-> dst_idx mapping is different. https://github.com/tensorflow/tensorflow/blob/6795a8c3a3678fb805b6a8ba806af77ddfe61628/tensorflow/core/kernels/resize_bilinear_op.cc#L85
I think Pytorch takes into account the pixel center when scaling, while TF doesn't.

So do you think there is any workaround in Pytorch to make it's interpolation consistent with that of TensorFlow?
I thought it may be possible to try torch.nn.functional.grid_sample() of Pytorch.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

bartolsthoorn picture bartolsthoorn  ·  3Comments

dablyo picture dablyo  ·  3Comments

SeparateReality picture SeparateReality  ·  3Comments

rajarshd picture rajarshd  ·  3Comments

NgPDat picture NgPDat  ·  3Comments