Pytorch: RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.9_1487346124464/work/torch/lib/THC/generic/THCStorage.cu:66

Created on 8 Mar 2017  ·  41Comments  ·  Source: pytorch/pytorch

I encountered an error :

THCudaCheck FAIL file=/data/users/soumith/miniconda2/conda-bld/pytorch-0.1.9_1487346124464/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "main_snli.py", line 293, in <module>
    experiment=BaseExperiment()
  File "main_snli.py", line 74, in __init__
    self.model.cuda()
  File "/home/bbbian/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 143, in cuda
    return self._apply(lambda t: t.cuda(device_id))
  File "/home/bbbian/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 114, in _apply
    module._apply(fn)
  File "/home/bbbian/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 114, in _apply
    module._apply(fn)
  File "/home/bbbian/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 120, in _apply
    param.data = fn(param.data)
  File "/home/bbbian/anaconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 143, in <lambda>
    return self._apply(lambda t: t.cuda(device_id))
  File "/home/bbbian/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 51, in _cuda
    return self.type(getattr(torch.cuda, self.__class__.__name__), async)
  File "/home/bbbian/anaconda3/lib/python3.6/site-packages/torch/_utils.py", line 24, in _type
    return new_type(self.size()).copy_(self, async)
RuntimeError: cuda runtime error (2) : out of memory at /data/users/soumith/miniconda2/conda-bld/pytorch-0.1.9_1487346124464/work/torch/lib/THC/generic/THCStorage.cu:66

how can i solve this error?

Most helpful comment

You're running out of memory on the GPU. It's not a bug.

All 41 comments

You're running out of memory on the GPU. It's not a bug.

@apaszke
I just write a simple test code like follows,and it occurs the error 'out of memory .....',the test input data dimension is 49200.
but when i tried a lower data dimention from 49200 to 1000,the code runs ok.
Is there any setup of parameters in pytorch that i have to change?

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.dropout = nn.Dropout(p=0.2)
        self.relu = nn.ReLU()
        self.fc1 = nn.Linear(49200, 49200)
        self.fc2 = nn.Linear(49200, 49200)
        self.fc3 = nn.Linear(49200, 3)
        self.out = nn.Sequential(
            self.fc1,
            self.relu,
            self.dropout,
            self.fc1,
            self.relu,
            self.dropout,
            self.fc3
            )
    def forward(self, premise, hypothesis):
        return self.out(torch.cat([premise, hypothesis], 1))

net = Net().cuda()
print (net)
premise = Variable(torch.randn(64, 82, 300))
hypothesis = Variable(torch.randn(64, 82, 300))
premise = premise.cuda()
hypothesis = hypothesis.cuda()
out = net(premise.contiguous().view(64,-1), hypothesis.contiguous().view(64,-1))
print(out)

Between the parameters and their gradients of the two big FC layers, your network (with size 49200) requires 40GB of memory...

@jekbradbury Can you explain your calculations? how much memory each layer takes with respect to the parameters and gradients? Thanks.

If you only consider the weights of a single Linear layer from that model. You get

49200^2 = 2 420 640 000

elements + each element takes 4 bytes, which gives you

2 420 640 000 * 4 / 1024^3 = 9,01GB

for the weights alone. Then, you need another memory chunk of this size to store gradients. You also need to save the intermediate results so you can compute the gradient.

Hi, I have got the same error but it is coming only for the validation. The whole training process worked completely fine. I'm trying to do transfer learning using inception v3. Can anyone help me out please ? Thanks

@tabibusairam I also encountered the same issue: the training process worked fine (with 6G cuda memory and my GPU has 12G memory) but the evaluating process which goes through the same network got a error information as follows:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
  File "evaluate.py", line 132, in <module>
    evaluate(pnet, args)
  File "evaluate.py", line 94, in evaluate
    predictions = pnet(X_test, initial_states)
  File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/zcrwind/workspace/pro/predict/zcr/pnet.py", line 497, in forward
    output, hidden_states = self.step(A0, hidden_states)
  File "/home/zcrwind/workspace/pro/predict/zcr/pnet.py", line 377, in step
    forget_gate = hard_sigmoid(self.conv_layers['f'][lay](inputs))
  File "/home/zcrwind/workspace/pro/predict/zcr/pnet.py", line 28, in hard_sigmoid
    x = F.threshold(-x, 0, 0)
  File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 459, in threshold
    return _functions.thnn.Threshold.apply(input, threshold, value, inplace)
  File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 174, in forward
    getattr(ctx._backend, update_output.name)(ctx._backend.library_state, input, output, *args)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu:66

Have you worked out? Thanks.

The computation graph while validation is different as in the train as the
parameters don't get trained in validation. Try using the command -
nvidia-smi to see the gpu memory requirement while validation.

Try reducing the batch size ( if you are working on a single gpu only), The
memory requirement is less for smaller batch sizes.

On Sun, Jan 14, 2018 at 11:17 AM, Chenrui Zhang notifications@github.com
wrote:

@tabibusairam https://github.com/tabibusairam I also encountered the
same issue: the training process worked fine (with 6G cuda memory and my
GPU has 12G memory) but the evaluating process which goes through the same
network got a error information as follows:

THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory
Traceback (most recent call last):
File "evaluate.py", line 132, in
evaluate(prednet, args)
File "evaluate.py", line 94, in evaluate
predictions = prednet(X_test, initial_states)
File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
result = self.forward(input, *kwargs)
File "/home/zcrwind/workspace/ijcai2018/predict/zcrPredNet/prednet.py", line 497, in forward
output, hidden_states = self.step(A0, hidden_states)
File "/home/zcrwind/workspace/ijcai2018/predict/zcrPredNet/prednet.py", line 377, in step
forget_gate = hard_sigmoid(self.conv_layers['f'][lay](inputs))
File "/home/zcrwind/workspace/ijcai2018/predict/zcrPredNet/prednet.py", line 28, in hard_sigmoid
x = F.threshold(-x, 0, 0)
File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 459, in threshold
return _functions.thnn.Threshold.apply(input, threshold, value, inplace)
File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 174, in forward
getattr(ctx._backend, update_output.name)(ctx._backend.library_state, input, output, *args)
RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu:66

Have you worked out? Thanks.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytorch/pytorch/issues/958#issuecomment-357490369,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMHzdCQ_jJ9ogDm1jaNSLB6wCbfP08XOks5tKZT8gaJpZM4MW6we
.

@tabibusairam Thanks a lot. I have reduced the batch size and the evaluating code works quite well now.

@tabibusairam Do you write your transfer leaning code as the example in the pytorch.org? If so, I have another idea to tackle it.

Yes I wrote the code in that format.
I also added nn.DataParallel to the model
Any other idea is surely welcome

On Jan 22, 2018 5:32 AM, "Tommeychang" notifications@github.com wrote:

@tabibusairam https://github.com/tabibusairam Do you write your
transfer leaning code as the example in the pytorch.org? If so, I have
another idea to tackle it.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytorch/pytorch/issues/958#issuecomment-359294050,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMHzdN8mRJKNr_0czrXDd-p66-iJImubks5tM9AggaJpZM4MW6we
.

@tabibusairam Same error occurred to me in the same situation. It was solved by changing "volatile" in Variable() when inference. If we set volatile=True, the computational graph will be retained during inference. In inference time, we don't need to retain computational graphs. It's very memory consuming.
You can just set flags of volatile to True like this, `Variable(x, volatile=True).

In the example, two models will be generated for training and validation respectively. With this setting, there will be another model run in the GPU when validation, and the GPU will be out of memory even you wrap the validation data with volatile parameter.
I solve this problem by just set only one model, and wrap the validation data with volatile parameter to reduce the calculation. @tabibusairam

Thanks, @TommeyChang. I checked the transfer learning sample,but I couldn't make sense where model is set also in validation. Would you show us where the model is set in code?

This problem may be caused by the pytorch not the code. The codes are as below:
if phase == 'train':
scheduler.step()
model.train(True) # Set model to training mode
else:
model.train(False) # Set model to evaluate mode
If you trace the GPU stat with watch -n 1 -d nvidia-smi, you will see the memory usage will increase when the first validation epoch.

How did you select same model for validation for both train and validation ?

On Jan 27, 2018 11:44 AM, "Tommeychang" notifications@github.com wrote:

This problem may be caused by the pytorch not the code. The codes are as
below:
if phase == 'train':
scheduler.step()
model.train(True) # Set model to training mode
else:
model.train(False) # Set model to evaluate mode
If you trace the GPU stat with watch -n 1 -d nvidia-smi, you will see the
memory usage will increase when the first validation epoch.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytorch/pytorch/issues/958#issuecomment-360963591,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMHzdBKY_UCQ3QMtnUhdHoahxUx-oG4eks5tOr6ugaJpZM4MW6we
.

If we don't set the mode of the model, it will be training mode implicit. So, we don't need the mode set lines, but wrap the tensor to variable with volatile parameter when validation phase. My codes are as below:

if phase == 'train':
scheduler.step()

........

for data in dataloaders[phase]:  ## Iterate over data.

inputs, labels = data  ## get the inputs

if use_gpu:  ## pass them into GPU
inputs = inputs.cuda()
labels = labels.cuda()

if phase == 'train':  ## wrap them in Variable
inputs, labels = Variable(inputs), Variable(labels)
else:
inputs = Variable(inputs, volatile=True)
labels = Variable(labels, volatile=True)

Thanks. But I'm afraid that if we don't set the train flag to False even in during validation, we can't get appropriate results since BatchNormalization and Dropout behave differently in train/validation phase.

Yes, I agree with you. And I have tested my model with train flag False, the performance improves. Thanks for your advice.

I tried volatile=True and it works for me. Thanks for teaching me this @jekbradbury

@TommeyChang @tabibusairam I am hitting same error, but on different case. I am adding a new regularization term in my model, through this function:

def l2_reg(mdl):
        l2_reg = None
        for W in mdl.parameters():
                if W.ndimension() < 2:
                        continue
                else:   
                        if l2_reg is None:
                                l2_reg = (torch.max(torch.abs(W)))**2
                        else:   
                                l2_reg = l2_reg + (torch.max(torch.abs(W)))**2

        return l2_reg

What i observe is even if I change my batch size from 128 to 8, The Error comes after just the 1st epoch, and If I simply change the regularization and return a l2 regularization. I don't get this error.
Any suggestions/comment would be really appreciated!

@TommeyChang you usually want to differentiate the regularization term (after all it is there to affect the value of the gradient) so pu probably don’t want to do it like you suggested

@apaszke hello~ I got the same question, but I can train the model correctly in the begining. After maybe 600 steps, I got the error "RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCStorage.cu:58 ".

While training, the memory is just cost 7G(my gpu has 11G). Normally,in my opinion training correctly in the begining means my code is correct. Is that right? Is there some other things will accumulated by the training process forward? Thank you very much!!

Some variable is accumulating and taking more and more space as your model
trains further .. try finding out such variable and see you are not saving
any unwanted ones

On Sat, Apr 21, 2018, 7:35 AM EricKani notifications@github.com wrote:

@apaszke https://github.com/apaszke hello~ I got the same question, but
I can train the model correctly in the begining. After maybe 600 steps, I
got the error "RuntimeError: cuda runtime error (2) : out of memory at
/opt/conda/conda-bld/pytorch_1518243271935/work/torch/lib/THC/generic/THCStorage.cu:58
".


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/pytorch/pytorch/issues/958#issuecomment-383259455,
or mute the thread
https://github.com/notifications/unsubscribe-auth/AMHzdN8KuyZIjewB6gkY1MvswGWuF1QMks5tqpPegaJpZM4MW6we
.

@tabibusairam Thank you very much, firstly. It is a main reason for many oom problem. But I didn't find the problem of my network. When I train my network without validation( image transformation network), the memory of GPU is stable all the time. But when there is a validation step, the memory of my first GPU( validation on that GPU) will raise twice.
For example, at the beginning of my first epoch, my Gpu memory consume 7G, then change to 9G after the first epoch over with the beginning of validation. After validation to the second epoch, the memory consuming become 10G. Since then the memory stabilize. I am very confused...

Are you running validation with volatile Variables (in 0.3) or in torch.no_grad context (if using master)?

@apaszke @tabibusairam hello, I find this error when I am using pytorch to build GAN with GP, and I have stuck here for 2 days, I have already tried several methods to solve this, but neither works. I really need some help plz.
the error is :
_RuntimeError: cuda runtime error (2) : out of memory at xx\torch\lib\thc\generic/THCStorage.cu:66_
when i am doing backward

_File "xxx/train_extractor.py", line 128, in
gradient_penalty.backward()
File "xxx\lib\site-packages\torch\autograd\variable.py", line 156, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "xxx\lib\site-packages\torch\autograd__init__.py", line 98, in backward
variables, grad_variables, retain_graph)_

It occurs at the 12th epoch of my training process every time, and I have already reduce the batch_size and the size in my nets.
There is no validation procedure.
here is a small segment of my code:
alpha = torch.rand(conf.batch_size,1).expand(X.size())
x_hat = autograd.Variable(alphareal.data.cpu()+(1-alpha)(real.data.cpu()+0.5real.data.std()torch.rand(real.size())),requires_grad=True)
x_hat = x_hat.cuda() if conf.cuda else x_hat
pred_hat,_ = Dis(x_hat)
label = torch.ones(pred_hat.size())
label = label.cuda() if conf.cuda else label
gradients = autograd.grad(outputs = pred_hat, inputs = x_hat, grad_outputs=label, create_graph=True, retain_graph=True,only_inputs=True)[0]
gradient_penalty = conf.gp_lambda((gradients.norm(2,dim=1)-1)2).mean()
*
gradient_penalty.backward()

Reducing Batch size from 64 to 32 worked for me.

@lyakaap Variable(x, volatile=True). It's work for me.Thanks a lot.

@EricKani ,
hello, you have solve this problem ?
I also get the same question.
can you tell me the method ?

@qlwang25 @EricKani The most likely situation is that gradients are accumulated inadvertently during calculation of losses like below.

loss = criterion(y_, y)
loss.backward()
loss_meter += loss  # incorrect
# loss_meter += loss.item()  # correct

@lyakaap
Thank you very much, firstly.
I am writing the same as you mentioned.
After each validation batch, the GPU memory consumption increases, so next train will be error :

THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCStorage.cu line=58 error=2 : out of memory
Traceback (most recent call last):
  File "train.py", line 290, in <module>
    main()
  File "train.py", line 263, in main
    train(i)
  File "train.py", line 152, in train
    loss, num_total, num_correct = model.train_model(src, src_len, src_sent_len, tgt, tgt_len, optim)
  File "/home/wangqianlong/model/bytecup/models/seq2seq.py", line 110, in train_model
    loss.backward()
  File "/home/wangqianlong/.local/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/wangqianlong/.local/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: cuda runtime error (2) : out of memory at /pytorch/aten/src/THC/generic/THCStorage.cu:58

General flow of code :

def train_model(self, data):
    outputs = self(data)
    loss = self.criterion(outputs, y)
    loss.backward()
    optim.step()
    return loss

def sample(self, data):
    src, src_len = data
        with torch.no_grad():
                bos = torch.ones(src.size(0)).long().fill_(dict.BOS)
        if self.use_cuda:
            src = src.cuda()
            src_len = src_len.cuda()
            bos = bos.cuda()

        contexts = other_function(src, src_len)
            samples = self.decoder.sample([bos], contexts)
            return samples

def train(i):
    model.train()
    global train_dataloader
    for data in train_dataloader:
        model.zero_grad()
        loss = model.train_model(data)

        count_loss += loss.item()
        if ...:
            # not important
            print(count_loss)

def eval(i):
    model.eval()
    for batch in eval_dataloader:
        samples = model.sample(data)
        print(samples)

def main():
    global train_dataloader
    for i in range(epoch):
        train_dataloader = load(data_(i%9)) 
        train(i)

        eval(i)

The trainset is relatively large, I have divided it into eight (data_0, data_1, ...., data_8)
Can you give me some suggestions ?
Thank you very much.

@qlwang25 I checked your code but I couldn't find out what the part is wrong.
I guess two possibilities:

  1. use optimizer.zero_grad() instead model.zero_grad()
  2. some variables in gpu have persistent reference and thus these variables never release the gpu memory. How about review your sample().

@lyakaap
First of all, thank you for your reply so quickly.
I can understand your first point.
However, thus these variables never release the gpu memory makes me confused.
which variable? can you give an example?
How to release these variables. torch.cuda.empty_cache() is useful ?

@qlwang25
I have no idea Which variables, but src, bos, are likely.
AFAIK,torch.cuda.empty_cache() doesn't release referenced variables. You should locate which vars are causes and write del {var_name} before calling this func.

@lyakaap
Thanks very much!
I already know your suggestion.
Thank you again for your reply.

@ladyrick
Thanks very much!
I already know your suggestion.
Thank you again for your reply.

I think you mean @lyakaap , right?
haha

I tried volatile - didn't work ( later I figured out it's because I am in pytroch 1.01 and "UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.")
But a simple restart also fixed things for me....

I had the same problem so i tried to decreased the batch size as minimum as my model was able to train. plus you can increase ur epoch, learning rate, training sample to maintain the trade off in terms of accuracy

This problem may be caused by the huge size of validation dataset, you could select a small dataset and then test by input the huge dataset.

The computation graph while validation is different as in the train as the parameters don't get trained in validation. Try using the command - nvidia-smi to see the gpu memory requirement while validation. Try reducing the batch size ( if you are working on a single gpu only), The memory requirement is less for smaller batch sizes.

On Sun, Jan 14, 2018 at 11:17 AM, Chenrui Zhang @.*> wrote: @tabibusairam https://github.com/tabibusairam I also encountered the same issue: the training process worked fine (with 6G cuda memory and my GPU has 12G memory) but the evaluating process which goes through the same network got a error information as follows: THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu line=66 error=2 : out of memory Traceback (most recent call last): File "evaluate.py", line 132, in evaluate(prednet, args) File "evaluate.py", line 94, in evaluate predictions = prednet(X_test, initial_states) File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__ result = self.forward(input, *kwargs) File "/home/zcrwind/workspace/ijcai2018/predict/zcrPredNet/prednet.py", line 497, in forward output, hidden_states = self.step(A0, hidden_states) File "/home/zcrwind/workspace/ijcai2018/predict/zcrPredNet/prednet.py", line 377, in step forget_gate = hard_sigmoid(self.conv_layers['f'][lay](inputs)) File "/home/zcrwind/workspace/ijcai2018/predict/zcrPredNet/prednet.py", line 28, in hard_sigmoid x = F.threshold(-x, 0, 0) File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/functional.py", line 459, in threshold return _functions.thnn.Threshold.apply(input, threshold, value, inplace) File "/home/zcrwind/.conda/envs/condapython3.6/lib/python3.6/site-packages/torch/nn/_functions/thnn/auto.py", line 174, in forward getattr(ctx._backend, update_output.name)(ctx._backend.library_state, input, output, *args) RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1503965122592/work/torch/lib/THC/generic/THCStorage.cu:66 Have you worked out? Thanks. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#958 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/AMHzdCQ_jJ9ogDm1jaNSLB6wCbfP08XOks5tKZT8gaJpZM4MW6we .

This worked. Thank you so much.

Was this page helpful?
0 / 5 - 0 ratings

Related issues

szagoruyko picture szagoruyko  ·  3Comments

negrinho picture negrinho  ·  3Comments

cdluminate picture cdluminate  ·  3Comments

kdexd picture kdexd  ·  3Comments

eliabruni picture eliabruni  ·  3Comments