pytorch 🚀 - [Feature Request] Implement "same" padding for convolution operations?

This seems worth doing. What is the interface you are proposing? like nn.Conv2d(..., padding="same") ?

soumith on 1 Dec 2017

👍32

Note yhat if you are looking for the same behavior of TensorFlow, the implementation will not be that straighforward, because the number of pixels to add depend on the input size. See https://github.com/caffe2/caffe2/blob/master/caffe2/proto/caffe2_legacy.proto for reference

fmassa on 2 Dec 2017

👍4

Thank you for indicating the issue and the reference.
To resolve the issue stated by @fmassa, I propose two interfaces.
First, as @soutmith mentioned, the first interface would be likenn.Conv*d(..., padding="same"), calculating the padding every forward() call.
However, it would be an inefficient way when the input shape is known in the initialization phase. Therefore, I suggest an interface like nn.CalcPadConv*d(<almost same parameters as Conv*d>). Using it, a user can calculate the padding using known width and height in initialization, and pass the output (the shape of padding) to the padding parameter of nn.Conv2d(...)
I'm not sure if the second proposal could be a premature optimization.
How do you think about these? Is there any idea of a better name?

qbx2 on 4 Dec 2017

👍5 👎1

I think the biggest source of inefficiency will come from the fact that we will need to add a F.pad layer before every other convolution that requires the padding=same case (because the amount of padding might not the same on the left and right sides), see for example how TensorFlow has to handle that in the cudnn case. So that means that the nn.CalcPadConv*d would be normally as expensive as a nn.Conv*d(..., padding="same").

This could be made more efficient if we supported different paddings for each side of the convolution (like in Caffe2, so left, right, top, bottom), but cudnn still doesn't support that so we would require the extra padding in those cases.

Also, I think if we add the padding="same" to nn.Conv*d, we should probably do the same for nn.*Pool*d, right?

I think what bothers me a bit is that users might expect the behavior of padding=same to be equivalent to TF, but they might not be expecting a performance drop.

What do you think?

fmassa on 4 Dec 2017

Why would that be inefficient? couldn't we just compute the padding at every forward step? the cost should be tiny, so there's no need to optimize that. Maybe I don't fully understand the semantics, but I can't see why F.pad would be needed.

apaszke on 4 Dec 2017

👍6

making padding dependent on input size is quite bad. We just had an internal discussion about this, with @Yangqing outlining why this is a bad idea for a variety of serialization and efficiency reasons.

soumith on 4 Dec 2017

👍2

@fmassa, what I intended was to calculate "constant" padding shape in __init__() using nn.CalcPadConv*d(). As you said, this way won't just work when calculated padding is odd. Therefore, it is needed for F.pad layer to be added, or, support of F.conv*d for odd paddings should help.

EDIT: Then what I suggested should be a function and placed in, say, torch.nn.utils or torch.utils.

qbx2 on 4 Dec 2017

In result, what I suggest is a simple utility function, like (pseudocode):

def calc_pad_conv1d(width, padding='same', check_symmetric=True, ... <params that conv1d has>):
    shape = <calculate padding>

    assert not check_symmetric or <shape is symmetric>, \
        'Calculated padding shape is asymmetric, which is not supported by conv1d. ' \ 
        'If you just want to get the value, consider using check_symmetric=False.'

    return shape


width = 100  # for example
padding = calc_pad_conv1d(width, ...)
m = nn.Conv1d(..., padding=padding)

Also, The function could be used with F.pad in user's favor.

qbx2 on 5 Dec 2017

@qbx2 maybe I don't understand fully your proposal, but if we want to replicate TensorFlow behavior I don't think this is enough.

Here is a snippet of what I think mimics TensorFlow SAME padding (I'm writing it down into the functional interface, so that nn.Conv2d can just call into F.conv2d_same_padding):

def conv2d_same_padding(input, weight, bias=None, stride=1, dilation=1, groups=1):
  input_rows = input.size(2)
  filter_rows = weight.size(2)
  effective_filter_size_rows = (filter_rows - 1) * dilation[0] + 1
  out_rows = (input_rows + stride[0] - 1) // stride[0]
  padding_needed =
          max(0, (out_rows - 1) * stride[0] + effective_filter_size_rows -
                  input_rows)
  padding_rows = max(0, (out_rows - 1) * stride[0] +
                        (filter_rows - 1) * dilation[0] + 1 - input_rows)
  rows_odd = (padding_rows % 2 != 0)
  # same for padding_cols

  if rows_odd or cols_odd:
    input = F.pad(input, [0, int(cols_odd), 0, int(rows_odd)])

  return F.conv2d(input, weight, bias, stride,
                  padding=(padding_rows // 2, padding_cols // 2),
                  dilation=dilation, groups=groups)

It was mostly copy-pasted from TensorFlow code in here and here.

As you can see, there is a lot of hidden things going on there, and that's why I think it might not be worth it adding a padding='same'. And I think not replicating the SAME behavior in TensorFlow is not ideal either.

Thoughts?

fmassa on 5 Dec 2017

👍3

@fmassa Yes, you're right. It may be inefficient to calculate the padding on every forward().

However, my proposal is NOT to calculate the padding every forward() call. A researcher (developer) may expect the sizes of images to nn.Conv2d before runtime. And if he/she wants the 'same' padding, he/she can use the function to calculate required padding to mimic 'SAME'.

For example, think the case that a researcher has images with 200x200, 300x300, 400x400. Then he/she can calculate paddings for the three cases in the initialization phase and just pass the images to F.pad() with the corresponding padding. Or he/she just change the padding field of nn.Conv2d before the forward() call, either. Refer to this:

>>> import torch
>>> import torch.nn as nn
>>> from torch.autograd import Variable
>>> m = nn.Conv2d(1,1,1)
>>> m(Variable(torch.randn(1,1,2,2))).shape
torch.Size([1, 1, 2, 2])
>>> m.padding = (1, 1)
>>> m(Variable(torch.randn(1,1,2,2))).shape
torch.Size([1, 1, 4, 4])

Yes, I just want to add the "padding calculating utility function" in pytorch core.

When the researcher wants dependent padding on each input image size, he/she can combine the function with F.pad() before passing the image to nn.Conv2d. I want to let the code writer decide whether to pad the inputs on every forward() call or not.

qbx2 on 5 Dec 2017

Is there any plan of implementing a similar api in pytorch in the near future? People coming from a tensorflow / keras background will certainly appreciate it.

imgyuri on 29 Jan 2018

👍47

So, a basic padding calculation strategy (which does not gives the same results as TensorFlow, but the shapes are similar) is to have

def _get_padding(padding_type, kernel_size):
    assert padding_type in ['SAME', 'VALID']
    if padding_type == 'SAME':
        return tuple((k - 1) // 2 for k in kernel_size))
    return tuple(0 for _ in kernel_size)

Is that what you have in mind @im9uri ?

fmassa on 31 Jan 2018

It's similar to what I had in mind, but as you mentioned previously the calculation gets complicated with stride and dilation.

Also having such an api in other convolution operations such as ConvTranspose2d would be great.

imgyuri on 31 Jan 2018

👍9

I think that "sliding-window operators" should all support asymmetric padding.

About the "same" argument...
@soumith Can you explain why making padding depending on the input size is bad, please?
If that's a problem, anyway, a pragmatic solution could be to require stride == 1 when using "same". For stride == 1, the padding doesn't depend on the input size and can be computed a single time. The constructor should raise a ValueError if the user attempts to use padding='same' with stride > 1.

I know, it's not the cleanest solution but the constraint sounds reasonable enough to me given that:

the original semantic of the label "same" was introduced for not strided convolutions and was: the output has the _same_ size of the input; of course, this is not true in tensorflow for stride > 1 and that makes the use of the word "same" a bit misleading IMO;
it would cover 99% of the cases one wants to use "same"; I can barely imagine a case when someone really needs the behavior of tensorflow for stride > 1, while if we give to "same" its original semantic, well, of course it doesn't make any sense to use a strided convolution if you want the output has the same size of the input.

janLuke on 10 May 2018

conv2d documentation gives the explicit formulas for output sizes. Equating e.g. Hout with Hin one can solve for the padding:

def _get_padding(size, kernel_size, stride, dilation):
    padding = ((size - 1) * (stride - 1) + dilation * (kernel_size - 1)) //2
    return padding

teucer on 25 Jun 2018

Since same padding means padding = (kernel_size - stride)//2, what if padding = "same" is introduced such that when written, it automatically reads kernel size and stride (as that is also mentioned in nn.Conv2d) and applies padding automatically accordingly

sidr97 on 27 Jun 2018

Here is a very simple Conv2d layer with same padding for reference. It only support square kernels and stride=1, dilation=1, groups=1.

class Conv2dSame(torch.nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, bias=True, padding_layer=torch.nn.ReflectionPad2d):
        super().__init__()
        ka = kernel_size // 2
        kb = ka - 1 if kernel_size % 2 == 0 else ka
        self.net = torch.nn.Sequential(
            padding_layer((ka,kb,ka,kb)),
            torch.nn.Conv2d(in_channels, out_channels, kernel_size, bias=bias)
        )
    def forward(self, x):
        return self.net(x)

c = Conv2dSame(1,3,5)
print(c(torch.rand((16,1,10,10))).shape)

# torch.Size([16, 3, 10, 10])

kylemcdonald on 25 Jul 2018

👍22

If this is still being evaluated for being added to PyTorch, then regarding the tradeoffs between complexity / inefficiency vs. ease-of-use for developers:

In the road to 1.0 blog post, it states:

PyTorch’s central goal is to provide a great platform for research and hackability. So, while we add all these [production-use] optimizations, we’ve been working with a hard design constraint to never trade these off against usability.

Anecdotally, I come from a background of using Keras as well as the original tf.layers / estimator APIs. All have support for same padding. I'm currently reimplementing a convnet I had originally written in TF with PyTorch, and the fact that I've had to build in the arithmetic for zero-padding myself has cost me about a half-day of time.

If the "central goal" really is focused on usability, than I'd argue that even if there's an efficiency hit to computing zero-padding on every forward pass (as mentioned above), the time saved in terms of developer efficiency and maintainability (e.g. not having to write custom code to compute zero padding) may be worth the tradeoff. Thoughts?

traviskaufman on 14 Dec 2018

👍19 ❤1

I would use this feature

bionicles on 30 Dec 2018

👍6

It doesn't make sense to me why can't an optional API of padding=SAME be offered? If someone is willing to incur the additional cost of padding then let them do so. For many researchers, quick prototyping is a requirement.

tremblerz on 9 Jan 2019

👍25 👎1

Yes, if someone can please add and approve this, it would be great.

BoPengGit on 24 Jan 2019

👍3

Definitely add this, conner wants it.

leijurv on 24 Jan 2019

Does pytorch support it now? Can it using same operation like first in VGG, set padding = (kernel_size-1)/2 ?
The VGG network can make output size does not change in the first group. Then you can using stride to resize the featuremap, does it sounds ok?

jinfagang on 29 Jan 2019

Here is one example to call padding same conv2d from deepfakes:

# modify con2d function to use same padding
# code referd to @famssa in 'https://github.com/pytorch/pytorch/issues/3867'
# and tensorflow source code

import torch.utils.data
from torch.nn import functional as F

import math
import torch
from torch.nn.parameter import Parameter
from torch.nn.functional import pad
from torch.nn.modules import Module
from torch.nn.modules.utils import _single, _pair, _triple


class _ConvNd(Module):

    def __init__(self, in_channels, out_channels, kernel_size, stride,
                 padding, dilation, transposed, output_padding, groups, bias):
        super(_ConvNd, self).__init__()
        if in_channels % groups != 0:
            raise ValueError('in_channels must be divisible by groups')
        if out_channels % groups != 0:
            raise ValueError('out_channels must be divisible by groups')
        self.in_channels = in_channels
        self.out_channels = out_channels
        self.kernel_size = kernel_size
        self.stride = stride
        self.padding = padding
        self.dilation = dilation
        self.transposed = transposed
        self.output_padding = output_padding
        self.groups = groups
        if transposed:
            self.weight = Parameter(torch.Tensor(
                in_channels, out_channels // groups, *kernel_size))
        else:
            self.weight = Parameter(torch.Tensor(
                out_channels, in_channels // groups, *kernel_size))
        if bias:
            self.bias = Parameter(torch.Tensor(out_channels))
        else:
            self.register_parameter('bias', None)
        self.reset_parameters()

    def reset_parameters(self):
        n = self.in_channels
        for k in self.kernel_size:
            n *= k
        stdv = 1. / math.sqrt(n)
        self.weight.data.uniform_(-stdv, stdv)
        if self.bias is not None:
            self.bias.data.uniform_(-stdv, stdv)

    def __repr__(self):
        s = ('{name}({in_channels}, {out_channels}, kernel_size={kernel_size}'
             ', stride={stride}')
        if self.padding != (0,) * len(self.padding):
            s += ', padding={padding}'
        if self.dilation != (1,) * len(self.dilation):
            s += ', dilation={dilation}'
        if self.output_padding != (0,) * len(self.output_padding):
            s += ', output_padding={output_padding}'
        if self.groups != 1:
            s += ', groups={groups}'
        if self.bias is None:
            s += ', bias=False'
        s += ')'
        return s.format(name=self.__class__.__name__, **self.__dict__)


class Conv2d(_ConvNd):

    def __init__(self, in_channels, out_channels, kernel_size, stride=1,
                 padding=0, dilation=1, groups=1, bias=True):
        kernel_size = _pair(kernel_size)
        stride = _pair(stride)
        padding = _pair(padding)
        dilation = _pair(dilation)
        super(Conv2d, self).__init__(
            in_channels, out_channels, kernel_size, stride, padding, dilation,
            False, _pair(0), groups, bias)

    def forward(self, input):
        return conv2d_same_padding(input, self.weight, self.bias, self.stride,
                        self.padding, self.dilation, self.groups)


# custom con2d, because pytorch don't have "padding='same'" option.
def conv2d_same_padding(input, weight, bias=None, stride=1, padding=1, dilation=1, groups=1):

    input_rows = input.size(2)
    filter_rows = weight.size(2)
    effective_filter_size_rows = (filter_rows - 1) * dilation[0] + 1
    out_rows = (input_rows + stride[0] - 1) // stride[0]
    padding_needed = max(0, (out_rows - 1) * stride[0] + effective_filter_size_rows -
                  input_rows)
    padding_rows = max(0, (out_rows - 1) * stride[0] +
                        (filter_rows - 1) * dilation[0] + 1 - input_rows)
    rows_odd = (padding_rows % 2 != 0)
    padding_cols = max(0, (out_rows - 1) * stride[0] +
                        (filter_rows - 1) * dilation[0] + 1 - input_rows)
    cols_odd = (padding_rows % 2 != 0)

    if rows_odd or cols_odd:
        input = pad(input, [0, int(cols_odd), 0, int(rows_odd)])

    return F.conv2d(input, weight, bias, stride,
                  padding=(padding_rows // 2, padding_cols // 2),
                  dilation=dilation, groups=groups)

jinfagang on 29 Jan 2019

👍6

Just dropping by to say I'd also very much appreciate this. Currently porting a simple model over from tensorflow and the calculations are taking a very long time for me to figure out...

harritaylor on 7 Feb 2019

👍1

Looks like this thread just died out. Given the number of thumbs up here, it would be really great to add this feature for faster prototyping.

sytelus on 15 Feb 2019

I'll write a proposal for this and we can find someone to implement it.
I'm putting this against the v1.1 milestone.

soumith on 15 Feb 2019

👍22 🚀7

Thank you, you are awesome! I also filed separate feature request to make padding argument accept 4-tuple. This would allow for asymmetric as well as symmetric padding which is also good low cost route to get halfway there.

sytelus on 15 Feb 2019

@soumith It would be nice to have a padding mode SAME in the pytorch.

AlexeyAB on 12 Mar 2019

@soumith How about using a compile type interface ?

model=torch.compile(model,input_shape=(3,224,224))

mungu42 on 13 Mar 2019

👀1

I made a Conv2D with same padding that supports dilation and strides, based on how TensorFlow does theirs. This one calculates it in real time though, if you want to precalculate it just move the padding to init() and have an input size parameter.

import torch as tr
import math

class Conv2dSame(tr.nn.Module):

    def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1):
        super(Conv2dSame, self).__init__()
        self.F = kernel_size
        self.S = stride
        self.D = dilation
        self.layer = tr.nn.Conv2d(in_channels, out_channels, kernel_size, stride, dilation=dilation)

    def forward(self, x_in):
        N, C, H, W = x_in.shape
        H2 = math.ceil(H / self.S)
        W2 = math.ceil(W / self.S)
        Pr = (H2 - 1) * self.S + (self.F - 1) * self.D + 1 - H
        Pc = (W2 - 1) * self.S + (self.F - 1) * self.D + 1 - W
        x_pad = tr.nn.ZeroPad2d((Pr//2, Pr - Pr//2, Pc//2, Pc - Pc//2))(x_in)
        x_out = self.layer(x_pad)
        return x_out

Ex1:
Input shape: (1, 3, 96, 96)
Filters: 64
Size: 9x9

Conv2dSame(3, 64, 9)

Padded shape: (1, 3, 104, 104)
Output shape: (1, 64, 96, 96)

Ex2:
Same as before, but with stride=2

Conv2dSame(3, 64, 9, 2)

Padded shape = (1, 3, 103, 103)
Output shape = (1, 64, 48, 48)

jpatts on 12 Apr 2019

👍6

@jpatts I believe your output shape calculation is wrong, it should be ceil(input_dimension / stride). Integer division in python is floor division - your code should have a different result from tensorflow for e.g. h=w=28, stride=3, kernel_size=1.

Here is a variant that does the calculation beforehand:

def pad_same(in_dim, ks, stride, dilation=1):
    """
    Refernces:
          https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/common_shape_fns.h
          https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/common_shape_fns.cc#L21
    """
    assert stride > 0
    assert dilation >= 1
    effective_ks = (ks - 1) * dilation + 1
    out_dim = (in_dim + stride - 1) // stride
    p = max(0, (out_dim - 1) * stride + effective_ks - in_dim)

    padding_before = p // 2
    padding_after = p - padding_before
    return padding_before, padding_after

If the input dimension is known and not calculated on the fly, this can be used e.g.:

# Pass this to nn.Sequential
def conv2d_samepad(in_dim, in_ch, out_ch, ks, stride, dilation=1, bias=True):
    pad_before, pad_after = pad_same(in_dim, ks, stride, dilation)
    if pad_before == pad_after:
        return [nn.Conv2d(in_ch, out_ch, ks, stride, pad_after, dilation, bias=bias)]
    else:
        return [nn.ZeroPad2d((pad_before, pad_after, pad_before, pad_after)),
                nn.Conv2d(in_ch, out_ch, ks, stride, 0, dilation, bias=bias)]

However, in this case some book-keeping needs to be done for the input dimension (this is the core issue), so if you use the above you may find useful:

def conv_outdim(in_dim, padding, ks, stride, dilation):
    if isinstance(padding, int) or isinstance(padding, tuple):
        return conv_outdim_general(in_dim, padding, ks, stride, dilation)
    elif isinstance(padding, str):
        assert padding in ['same', 'valid']
        if padding == 'same':
            return conv_outdim_samepad(in_dim, stride)
        else:
            return conv_outdim_general(in_dim, 0, ks, stride, dilation)
    else:
        raise TypeError('Padding can be int/tuple or str=same/valid')


def conv_outdim_general(in_dim, padding, ks, stride, dilation=1):
    # See https://arxiv.org/pdf/1603.07285.pdf, eq (15)
    return ((in_dim + 2 * padding - ks - (ks - 1) * (dilation - 1)) // stride) + 1


def conv_outdim_samepad(in_dim, stride):
    return (in_dim + stride - 1) // stride

mirceamironenco on 12 Apr 2019

@mirceamironenco thanks for pointing that out, I made this quick and dirty and never checked. Updated to use ceiling instead

jpatts on 12 Apr 2019

@harritaylor Agree, this feature would definitely simplify porting of Keras/TF models into PyTorch. Every once in a while, I still use "manual" calculations of padding size to build my same-padded layers.

devforfu on 16 Apr 2019

@kylemcdonald

Here is a very simple Conv2d layer with same padding for reference. It only support square kernels and stride=1, dilation=1, groups=1.

class Conv2dSame(torch.nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, bias=True, padding_layer=torch.nn.ReflectionPad2d):
        super().__init__()
        ka = kernel_size // 2
        kb = ka - 1 if kernel_size % 2 == 0 else ka
        self.net = torch.nn.Sequential(
            padding_layer((ka,kb,ka,kb)),
            torch.nn.Conv2d(in_channels, out_channels, kernel_size, bias=bias)
        )
    def forward(self, x):
        return self.net(x)

c = Conv2dSame(1,3,5)
print(c(torch.rand((16,1,10,10))).shape)

# torch.Size([16, 3, 10, 10])

Should it be kb = ka - 1 if kernel_size % 2 else ka or not ?

missingdaysqxy on 24 Apr 2019

Will this also apply to Conv1d?

tushar-semwal on 12 Jun 2019

👍8

Maybe adding new padding method to the class ConvND would be a elegant choice, and by overloading the method, padding schedule could easily be extend.

HudsonHuang on 24 Jun 2019

👍1

I can probably take this if @soumith ever wrote that proposal or if someone summarizes what needs to be done. There's been a lot of discussion above and I'm not sure what we've settled on. Are we calculating padding dependent on input data or not, do we need to implement padding="same" for pool as well, etc.?

Chillee on 26 Jun 2019

👍6

I'd like to add causal padding as well. and please also add this to conv1d.
i stopped following the comments at some point but i think this feature is very well done in keras. you should follow it exactly.

danFromTelAviv on 27 Jun 2019

👍1

@Chillee here you go:

Scope

We should add padding to the following layers:

Conv*d
MaxPool*d
AvgPool*d

For the first PR, let's keep it simple and just stick to Conv*d.

Complexity and Downsides

The complexity discussed above is around the layer becoming dynamic in nature, after a same padding option is written. That is, it goes from the parameters of the layer being statically known, which is great for model export (for example ONNX export), to the parameters of the layer being dynamic. In this case, the dynamic parameter is padding.
While this looks pretty harmless, non-staticness gets pretty important in limited runtimes, like mobile or exotic-hardware runtimes, where for example you want to do static shape analysis and optimization.

The other practical downside is that this dynamically calculated padding is not always symmetric anymore, because depending on the size / stride of the kernel, dilation factor, and the input size, the padding might have to be assymmetric (i.e. different padding amount on left side vs right). It would mean that you cannot use CuDNN kernels for example.

Design

Currently, the signature of Conv2d is:

torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

Here, we support padding to be an int or tuple of ints (i.e. for each dimension of height / width).
We should support an additional overload for padding that would take a string, with value same.

The same padding should pad the input in such a way before giving it to the convolution that the output size is the same as the input size.

Implementation details

When 'same' is given to padding, we have to calculate the amount of left and right padding needed in each dimension.

There are two cases to consider after the required L (left) and R (right) padding is calcluated:

L == R: in this case it is symmetric padding. One can simply call F.conv2d with a padding value equal to L
L != R: In this case, the padding is assymmetric, and it has significant performance and memory implications. We do the following:
- we call input_padded = F.pad(input, ...) and send the input_padded into the F.conv2d.
- we throw a warning for this case (atleast for initial release, and we can revisit if the warning is needed) about the performance implication.
- I don't remember the details of the formulation and where we enter this case, but if I remember, it might be as simple as having an even-sized kernel. If that's the case, the warning can have an easy fix on the user end.

Needless to say, it has to be tested to also work on the JIT path

soumith on 30 Jun 2019

👍7

@Chilee for reference, here is a potential implementation to get inspiration from https://github.com/mlperf/inference/blob/master/others/edge/object_detection/ssd_mobilenet/pytorch/utils.py#L40

It matched the TF implementation for the configurations that it was tested, but the testing was not exhaustive

fmassa on 30 Jun 2019

👍2

@soumith Some quick questions:

Is there any reason we shouldn't implement this through functional.conv2d? The design you wrote seems to imply that it shouldn't. There's nothing about padding = "same" that seems like it should be specific to layers. (EDIT: Nvm, didn't realize the F.conv2d impl I was looking at was the quantized one).
I think Tensorflow's valid padding mode is simply equivalent to ours with padding=0, right?

Also, it doesn't seem that there will be an easy fix for the user to deal with asymmetric padding. The full rule for determining the amount of padding that needs to occur is
(ceil(x/stride) -1)*stride + (filter-1)*dilation + 1 - x along a dimension. In particular, we will need to do asymmetric padding when this is not a multiple of 2. As a counterexample to your hope that this only happens with even sized filters, take input = 10, stride=3, filter=3, dilation=1. I don't see any simple rules for resolving the situations in which this can happen.

Furthermore, we won't be able to statically determine the padding except in the case when stride=1, as then ceil(x/stride) = x, and we have padding equal to (filter-1)*dilation.

Chillee on 3 Jul 2019

@Chillee about (1), no reason, I hadn't thought through the implications -- perf or otherwise.

(2) Yes.

Furthermore, we won't be able to statically determine the padding except in the case when stride=1, as then ceil(x/stride) = x, and we have padding equal to (filter-1)*dilation

Yes, but stride=1 is common-enough and the benefits of static padding good enough that we should definitely handle it specially.

About asymmetric padding, oh welll.....

soumith on 4 Jul 2019

It doesn't make sense to me why can't an optional API of padding=SAME be offered? If someone is willing to incur the additional cost of padding then let them do so. For many researchers, quick prototyping is a requirement.

Yes，

It doesn't make sense to me why can't an optional API of padding=SAME be offered? If someone is willing to incur the additional cost of padding then let them do so. For many researchers, quick prototyping is a requirement.

Agree! I got stuck in this fuckin “padding” for 4 hours.

xxoospring on 27 Sep 2019

👍3

Do we have any update about solution for this issue?

Oktai15 on 17 Nov 2019

Wow and here I thought that Pytorch would be easier than Keras/Tensorflow 2.0...

zwep on 6 Dec 2019

@zwep there is a bit more effort in getting started. You have to write your trianing loop which can be annoying and you have to write layers more explicitly. Once you get that done ( once ) you can advance much farther on the actual improvement beyond that.

My rule of thumb is use Keras if its something you have done a million times/ super standard.
use pytorch any time there is research and development involved.

here is my code for padded 1d convs

import torch
from torch import nn
import numpy as np
import torch.functional as F

class Conv1dSamePad(nn.Module):
    def __init__(self, in_channels, out_channels, filter_len, stride=1, **kwargs):
        super(Conv1dSamePad, self).__init__()
        self.filter_len = filter_len
        self.conv = nn.Conv1d(in_channels, out_channels, filter_len, padding=(self.filter_len // 2), stride=stride,
                              **kwargs)
        nn.init.xavier_uniform_(self.conv.weight)
        # nn.init.constant_(self.conv.bias, 1 / out_channels)

    def forward(self, x):
        if self.filter_len % 2 == 1:
            return self.conv(x)
        else:
            return self.conv(x)[:, :, :-1]


class Conv1dCausalPad(nn.Module):
    def __init__(self, in_channels, out_channels, filter_len, **kwargs):
        super(Conv1dCausalPad, self).__init__()
        self.filter_len = filter_len
        self.conv = nn.Conv1d(in_channels, out_channels, filter_len, **kwargs)
        nn.init.xavier_uniform_(self.conv.weight)

    def forward(self, x):
        padding = (self.filter_len - 1, 0)
        return self.conv(F.pad(x, padding))


class Conv1dPad(nn.Module):
    def __init__(self, in_channels, out_channels, filter_len, padding="same", groups=1):
        super(Conv1dPad, self).__init__()
        if padding not in ["same", "causal"]:
            raise Exception("invalid padding type %s" % padding)
        self.conv = Conv1dCausalPad(in_channels, out_channels, filter_len, groups=groups) \
            if padding == "causal" else Conv1dSamePad(in_channels, out_channels, filter_len, groups=groups)

    def forward(self, x):
        return self.conv(x)

danFromTelAviv on 6 Dec 2019

@danFromTelAviv He man, thanks for the code. Will keep that pytorch philosophy in mind!

zwep on 6 Dec 2019

It's 2020. Still no padding='same' in Pytorch?

michaelklachko on 4 Jan 2020

😕61 👍11 👀8

This is one way to get same padding working for any kernel size, stride and dilation (even kernel sizes work too).

class Conv1dSame(nn.Module):
    def __init__(self, in_channels, out_channels, kernel_size, stride=1, dilation=1):
        super().__init__()
        self.cut_last_element = (kernel_size % 2 == 0 and stride == 1 and dilation % 2 == 1)
        self.padding = math.ceil((1 - stride + dilation * (kernel_size-1))/2)
        self.conv = nn.Conv1d(in_channels, out_channels, kernel_size, padding=self.padding, stride=stride, dilation=dilation)

    def forward(self, x):
        if self.cut_last_element:
            return self.conv(x)[:, :, :-1]
        else:
            return self.conv(x)

McHughes288 on 12 Mar 2020

❤7 👍2

I want the "same padding" feature in nn.Conv2d too.

songyuc on 23 Mar 2020

BTW, in addition to the perf/serialization concerns discussed above, there are correctness/accuracy reasons on why the size-dependent "same" padding mode in TF is not a good default. I've discussed in https://github.com/tensorflow/tensorflow/issues/18213 and showed that actually many google's own code uses a size-independent "same" padding mode instead.

It seems there are no ongoing work right now about this issue but if there is, I hope it's a size-independent solution.

ppwwyyxx on 26 Mar 2020

👍1

Hi, @ppwwyyxx Yuxin, thank you for the response.
I think the implementation from @McHughes288 is good, and I wonder about your opinion about his implementation.

songyuc on 26 Mar 2020

😄1

Here is my solution for Conv1D SAME padding(only works correctly when dilation==1 & groups==1, more complicated when you consider dilation and groups):

import torch.nn.functional as F
from torch import nn

class Conv1dSamePadding(nn.Conv1d):
    """Represents the "Same" padding functionality from Tensorflow.
    NOTE: Only work correctly when dilation == 1, groups == 1 !!!
    """
    def forward(self, input):
        size, kernel, stride = input.size(-1), self.weight.size(
            2), self.stride[0]
        padding = kernel - stride - size % stride
        while padding < 0:
            padding += stride
        if padding != 0:
            # pad left by padding // 2, pad right by padding - padding // 2
            # in Tensorflow, one more padding value(default: 0) is on the right when needed
            input = F.pad(input, (padding // 2, padding - padding // 2))
        return F.conv1d(input=input,
                        weight=self.weight,
                        bias=self.bias,
                        stride=stride,
                        dilation=1,
                        groups=1)

wizcheu on 14 May 2020

@Chillee did you intend to continue working on this feature? I'm going to unassign you for now so that we can better track progress of this issue, please feel free to reassign if you are still working on it.

zou3519 on 22 Jun 2020

after read the code of @wizcheu , I create another version of conv1d with padding='same'

class Conv1dPaddingSame(nn.Module):
    '''pytorch version of padding=='same'
    ============== ATTENTION ================
    Only work when dilation == 1, groups == 1
    =========================================
    '''
    def __init__(self, in_channels, out_channels, kernel_size, stride):
        super(Conv1dPaddingSame, self).__init__()
        self.kernel_size = kernel_size
        self.stride = stride
        self.weight = nn.Parameter(torch.rand((out_channels, 
                                                 in_channels, kernel_size)))
        # nn.Conv1d default set bias=True，so create this param
        self.bias = nn.Parameter(torch.rand(out_channels))

    def forward(self, x):
        batch_size, num_channels, length = x.shape
        if length % self.stride == 0:
            out_length = length // self.stride
        else:
            out_length = length // self.stride + 1

        pad = math.ceil((out_length * self.stride + 
                         self.kernel_size - length - self.stride) / 2)
        out = F.conv1d(input=x, 
                       weight = self.weight,
                       stride = self.stride, 
                       bias = self.bias,
                       padding=pad)
        return out