Caffe: Understanding the deconvolution in FCN-32.

Created on 5 Nov 2016  ·  4Comments  ·  Source: BVLC/caffe

Hello,

I am trying to understand the design of the FCN-32 model and especially the parameters of the
deconvolutional layer (convolution transposed).

Specifically, why the stride was chosen to be 32 and the kernel size 64.

So, for example if the input image is of the size 768 by 1024.
After the input is processed by all pooling layers we get 24 by 32 subsampled predictions.

Then the goal is basically to go from those subsampled predictions back to input image size.
Using the equation from the chapter No zero padding, non-unit strides, transposed from here and using stride 32 and kernel 64, I get output of size 800 by
1056. Is it how it is actually done in the current implementation?
I understand that after that we can just crop those to original input size.

My main question is: how did the authors come up with stride 32 and kernel 64 parameters?
I know that after all pooling layers the input gets downsampled by 32 but why the kernel size is 64?
Is it due to the fact that in the paper they initialize the filters to bilinear interpolation filter and wanted the kernel to capture the 4 closest points?

Sorry for posting it here. I just couldn't find the answer for the question in the paper or somewhere else.

Most helpful comment

All 4 comments

Questions like these should be asked on the Caffe user mailing list or on another website, such as http://stackoverflow.com/ or http://stats.stackexchange.com/.

From https://github.com/BVLC/caffe/blob/master/CONTRIBUTING.md:

_Please do not post usage, installation, or modeling questions, or other requests for help to Issues._
Use the caffe-users list instead. This helps developers maintain a clear, uncluttered, and efficient view of the state of Caffe.

@warmspringwinds thank you very much!

Was this page helpful?
0 / 5 - 0 ratings