Pytorch: BrokenPipeError: [Errno 32] Broken pipe

Created on 8 Aug 2017 · 35Comments · Source: pytorch/pytorch

Hi, I use Pytorch to run a triplet network(GPU), but when I got data , there was always a BrokenPipeError:[Errno 32] Broken pipe.

I thought it was something wrong in the following codes：

for batch_idx, (data1, data2, data3) in enumerate(test_loader):
if args.cuda:
data1, data2, data3 = data1.cuda(), data2.cuda(), data3.cuda()
data1, data2, data3 = Variable(data1), Variable(data2), Variable(data3)

Can you give me some suggestions? Thank you so much.

Source

mjchen611

Most helpful comment

@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your plotter correctly configured?

peterjc123 on 31 Aug 2017

👍78 🎉22 🚀9 ❤6

All 35 comments

Would you be able to post a snippet of code that can reproduce this?

alykhantejani on 8 Aug 2017

@alykhantejani

1) The code link was :https://github.com/andreasveit/triplet-network-pytorch/blob/master/train.py

2) The error ocured in train.py -- 136

3) The error was:

runfile('G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py', wdir='G:/researchWork2/pytorch/triplet-network-pytorch-master')
Reloaded modules: triplet_mnist_loader, triplet_image_loader, tripletnet

Number of params: 21840
Traceback (most recent call last):
File "", line 1, in
runfile('G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py', wdir='G:/researchWork2/pytorch/triplet-network-pytorch-master')

File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 258, in
main()

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 116, in main
train(train_loader, tnet, criterion, optimizer, epoch)

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 137, in train
for batch_idx, (data1, data2) in enumerate(train_loader):

File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 303, in iter
return DataLoaderIter(self)

File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 162, in init
w.start()

File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)

File "D:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)

File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)

File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)

File "D:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

4) Some part of train related codes as follows:
def train(train_loader, tnet, criterion, optimizer, epoch):
losses = AverageMeter()
accs = AverageMeter()
emb_norms = AverageMeter()

switch to train mode

tnet.train()
for batch_idx, (data1, data2, data3) in enumerate(train_loader):
if args.cuda:
data1, data2, data3 = data1.cuda(), data2.cuda(), data3.cuda()
data1, data2, data3 = Variable(data1), Variable(data2), Variable(data3)

# compute output
dista, distb, embedded_x, embedded_y, embedded_z = tnet(data1, data2, data3)
# 1 means, dista should be larger than distb
target = torch.FloatTensor(dista.size()).fill_(1)
if args.cuda:
    target = target.cuda()
target = Variable(target)

loss_triplet = criterion(dista, distb, target)
loss_embedd = embedded_x.norm(2) + embedded_y.norm(2) + embedded_z.norm(2)
loss = loss_triplet + 0.001 * loss_embedd

# measure accuracy and record loss
acc = accuracy(dista, distb)
losses.update(loss_triplet.data[0], data1.size(0))
accs.update(acc, data1.size(0))
emb_norms.update(loss_embedd.data[0]/3, data1.size(0))

# compute gradient and do optimizer step
optimizer.zero_grad()
loss.backward()
optimizer.step()

if batch_idx % args.log_interval == 0:
    print('Train Epoch: {} [{}/{}]\t'
          'Loss: {:.4f} ({:.4f}) \t'
          'Acc: {:.2f}% ({:.2f}%) \t'
          'Emb_Norm: {:.2f} ({:.2f})'.format(
        epoch, batch_idx * len(data1), len(train_loader.dataset),
        losses.val, losses.avg, 
        100. * accs.val, 100. * accs.avg, emb_norms.val, emb_norms.avg))

log avg values to somewhere

plotter.plot('acc', 'train', epoch, accs.avg)
plotter.plot('loss', 'train', epoch, losses.avg)
plotter.plot('emb_norms', 'train', epoch, emb_norms.avg)

Thank you so much.

mjchen611 on 8 Aug 2017

@alykhantejani
And I use it in Windows8.1 with Cuda

mjchen611 on 8 Aug 2017

we do not support windows officially yet. Maybe @peterjc123 knows what's wrong.

soumith on 30 Aug 2017

@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your plotter correctly configured?

peterjc123 on 31 Aug 2017

👍78 🎉22 🚀9 ❤6

I can actually verify that setting the num_workers to 0 or 1 helped out. No matter the case, DataLoader always failed with me regardless of dataset with a higher value. The error has to do with multiprocessing with DataLoader:

  File "D:/Opiskelu/PyTorch Tutorials/cnn_transfer_learning_cuda.py", line 76, in <module>
    inputs, classes = next(iter(dataloaders['train']))

  File "C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py", line 301, in __iter__
    return DataLoaderIter(self)

  File "C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py", line 158, in __init__
    w.start()

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

karmus89 on 23 Nov 2017

👍32

@karmus89 Actually this error only occurs when you try to do multiprocessing on some code with errors in it. It's unexpected that you face with this issue when your code is right. I don't know which version you are using. Can you send a small piece of code that can reproduce your issue?

peterjc123 on 23 Nov 2017

👍3

Will do! And remember, I'm a using Windows machine. The code is directly copied from the tutorial PyTorch: Transfer Learning Tutorial. This means that the dataset has to be downloaded and extracted as instructed.

The code to reproduce the error:

import torch
import torchvision
from torchvision import datasets, models, transforms
import os

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Scale(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}

# The code fill fail here trying to iterate over the DataLoader with multiple num_workers (Windows only)
inputs, classes = next(iter(dataloaders['train']))

And I just made some PyTorch forum posts regarding this. The problem lies with Python's multiprocessing and Windows. Please see this PyTorch discussion reply as I don't want to overly copy paste stuff here.

Edit:

Here's the code that doesn't crash, which at the same time complies with Python's multiprocessing programming guidelines for Windows machines:

import torch
import torchvision
from torchvision import datasets, models, transforms
import os

if __name__ == "__main__":

    data_transforms = {
        'train': transforms.Compose([
            transforms.RandomSizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
        'val': transforms.Compose([
            transforms.Scale(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
    }

    data_dir = 'hymenoptera_data'
    image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                              data_transforms[x])
                      for x in ['train', 'val']}
    dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                                 shuffle=True, num_workers=4)
                  for x in ['train', 'val']}

    inputs, classes = next(iter(dataloaders['train']))

karmus89 on 23 Nov 2017

👍20 🎉7

@karmus89 Well, I think I have stated it where the package was published. I'm so sad that you installed the package without reading the notice.

peterjc123 on 23 Nov 2017

👎3 😕1

@peterjc123 Please see my edited response where I did exactly that. The requirement for wrapping the code inside of if __name__ == '__main__' code isn't immediately obvoius, as it is only required for Windows machines.

__Edit__:
Regarding the stating of the requirement, I indeed have missed it. I used conda to install the package directly, so I never came accross any introductory requirements. But thanks anyway! And sorry for making you sad!

__Edit 2__:
Wow, couldn't have known even where to look for that 😄 👍

karmus89 on 23 Nov 2017

👍12 😄7 🎉2 🚀1

A question regarding the above. I am running into the above problem within a jupyter notebook. How do you solve this in a jupyter notebook? Wrapping the code in "if __name__ == '__main__' " does not change a thing. Does someone know how to translate this to jupyter notebooks?

Dehde on 21 Sep 2018

👍2

@Dehde What about setting the num_worker of the DataLoader to zero?

peterjc123 on 21 Sep 2018

@peterjc123
Thanks for the quick reply! I did not fully make myself clear, sorry: Is there a way to run pytorch on windows in jupyter notebook and still use the worker functionality, so not set them to zero? I definitely need parellelized preprocessing.. Thanks for your time!

Dehde on 21 Sep 2018

Could you show me the minimal code so that I could reproduce?

peterjc123 on 22 Sep 2018

@peterjc123
I will edit it into this post here on monday, don't have access to the code right now. Thank you!

As promised, the code I use:

`
if __name__ == '__main__':

batch_size = 256

size = (128, 128)
image_datasets = {}
image_datasets["train"] = WaterbodyDataset(masks=train_masks, images=train_imgs,
                                            transform_img=transforms.Compose([
                                                RandomCrop(size),
                                                transforms.ToTensor(),
                                            ]),
                                            transform_mask=transforms.Compose([
                                                RandomCrop(size),
                                                transforms.ToTensor(),
                                            ]))

image_datasets["val"] = WaterbodyDataset(masks=val_masks, images=val_imgs,
                                            transform_img=transforms.Compose([
                                                transforms.ToTensor(),
                                            ]),
                                            transform_mask=transforms.Compose([
                                                transforms.ToTensor()
                                            ]))

dataloaders = {'train': torch.utils.data.DataLoader(image_datasets['train'], batch_size=batch_size, 
                                                    shuffle=True, num_workers=1),
               'val' : torch.utils.data.DataLoader(image_datasets['val'], batch_size=batch_size, 
                                                   shuffle=False, num_workers=1)}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

hps = HyperParams()
hps.update("name=resnet34_128_deconv_pret00rained_bs32_adam_lr0.0001_wd0_pat5,"
           "arch=resnet34,input_channel=4,freeze=0,deconv=1,opt=adam,debug=0,"
           "weight_decay=0.0,patience=100,pretrained=1,lr=0.0001,print_freq=10,every_x_epoch_eval=1")
pprint(attr.asdict(hps))

model = Model(hps)
model.train(dataloaders)`

The WaterbodyDataset inherits from the pytorch dataset class.

Dehde on 22 Sep 2018

I also got the same error. When I set num_workers to 0, the error does not appear again. However, when I set num_workers to 1, the error is still there.

Jerry-Jie-Xie on 29 Nov 2018

👍18 🎉2

When I set num_workers to 0, there is no error.

saurabh502 on 6 Dec 2018

👍8 🎉2

Please i need assistance with this error "BrokenPipeError: [Errno 32] Broken pipe"
code from :https://github.com/higgsfield/np-hard-deep-reinforcement-learning/blob/master/Neural%20Combinatorial%20Optimization.ipynb
i am using windows 10.

ghost on 13 Jan 2019

Wrap the code in if __name__ == '__main__':
but for me, nonetheless, the error sometimes appears again. I know it sounds silly, but what helps me is just
rebooting the computer.
Windows 10 here

MarcinMisiurewicz on 27 Feb 2019

I found that the issue is still present, but only when I use a custom collate_fn.

BramVanroy on 12 Mar 2019

For me, just changing num_workers from 2 to 0 made the code work properly...

angeloyeo on 23 Jul 2019

Had same issue when I ran the PyTorch Data Loading and Processing Tutorial. Changing num_workers from 2 to 0 solved the problem, but num_workers = 2 worked fine with other datasets.. I use Windows

cp9612 on 2 Aug 2019

num_workers > 0 doesn't work for windows.
Even with the new IterableDataset.

divyanshj16 on 3 Aug 2019

😕11 😄1

I met this same error. And when I try to find method to solve this problem, the program continues to run automatically (wait about 10 minutes ) amazing :confused:

ShoufaChen on 11 Sep 2019

I've run the exact same code multiple times with different results. Also, I've copied code that causes a broken pipe to a new file (the contents being exactly the same) and it would run fine. I think there's an external factor in play here. I can't reproduce the bug anymore, but maybe try deleting your __pycache__ directory if there's any.

CorentinJ on 28 Sep 2019

have some problem on Windows10. dunno why but i think problem is dataloader (num_workers to 0 doesn't help) and multiprocessing

germanjke on 17 Jan 2020

have some problem on Windows10. dunno why but i think problem is dataloader (num_workers to 0 doesn't help) and multiprocessing

After using Ubuntu for quire some time, I am trying Windows-10 lately (just for prototyping before using the cluster machine) and bumped into the same error, setting num_workers to 0 helped. Make sure you are setting all dataloaders, train, test, and validate.

morawi on 3 Mar 2020

I also have same problem on Win10. I got the error message '[Errno 32] Broken pipe' when I set the num_workers greater than 0.
And my code is download from Pytorch official tutorial.

I guess that is a bug for Win10, and I am looking forward to see a fixed version on next release.

PiPiNam on 5 Mar 2020

same error, num_workers=0 worked, but I want multiprocessing to speed up dataloading.

paleomoon on 24 Mar 2020

same error, num_workers=0 worked, but I want multiprocessing to speed up dataloading.

Seems that the only way for this to work is using Linux, I am using Windows-10 for prototyping and then pushing everything to the cluster which is based on Linux.

if platform.system()=='Windows': n_cpu= 0

morawi on 24 Mar 2020

I also encountered a similar problem in windows 10 when defining my custom torchvision dataset and trying to run it in jupyter lab. Apparently the custom dataset does not get registered as an attribute to the __main__ module which is called by the DataLoader in the multiprocessing.py\spawn.py file. I fixed it by writing the dataset into a module and then importing it as mentioned here:

https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror

  File "C:\Users\johndoe\Anaconda3\envs\PyTorch15\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\johndoe\Anaconda3\envs\PyTorch15\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'RandomPatchExtractor' on <module '__main__' (built-in)>

msminhas93 on 28 Apr 2020

👍2

@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your plotter correctly configured?

Setting num_workers to 0 worked for me. Could you explain why is this causing an error?

arnabsinha99 on 28 May 2020

I have noticed this issue is closed, but I do not think this is fixed. Is there any effort to fix multi-processing dataloader on windows? Currently there are 2 options as far as I know:

wrap it in if __name__ == '__main__':, which does not always work.
do not use multi-processing on windows: if platform.system()=='Windows': n_cpu= 0

So the first one is an imperfect fix, while the second one amounts to just giving up. Is there any effort on fixing multi-processed dataloading on windows currently going on somewhere else or should we re-open this one?

ltjkoomen on 21 Jul 2020

👍4

Use
if __name__ == '__main__' and '__file__' in globals(): instead of if __name__ == '__main__':
That works for me. I use Jupyter notebook and windows 10.

this is the reference

BlackTeaAttenuation on 4 Oct 2020

👍2

I got problem when trying to train on my custom Coco dataset (which is little bit difference from default CocoDetection Pytorch class). Add params collate_fn=utils.collate_fn worked for me:
trainloader = torch.utils.data.DataLoader(coco_train, batch_size=2, shuffle=False, num_workers=1, collate_fn=utils.collate_fn)

doanhung95wkm on 9 Nov 2020

Was this page helpful?

0 / 5 - 0 ratings

Related issues

add instructions to install older versions

soumith · 3Comments

'torch.nn' has no attribute 'TripletMarginLoss'

bartolsthoorn · 3Comments

Feature Request: load_state_dict should take filenames

soumith · 3Comments

Error when doing CUDA Conv2d with 1x1 kernel.

NgPDat · 3Comments

Add a note in the docs about the momentum formulation used in optim

keskarnitish · 3Comments