Pytorch: BrokenPipeError: [Errno 32] Broken pipe

Created on 8 Aug 2017  ·  35Comments  ·  Source: pytorch/pytorch

Hi, I use Pytorch to run a triplet network(GPU), but when I got data , there was always a BrokenPipeError:[Errno 32] Broken pipe.

I thought it was something wrong in the following codes:

for batch_idx, (data1, data2, data3) in enumerate(test_loader):
if args.cuda:
data1, data2, data3 = data1.cuda(), data2.cuda(), data3.cuda()
data1, data2, data3 = Variable(data1), Variable(data2), Variable(data3)

Can you give me some suggestions? Thank you so much.

Most helpful comment

@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your plotter correctly configured?

All 35 comments

Would you be able to post a snippet of code that can reproduce this?

@alykhantejani

1) The code link was :https://github.com/andreasveit/triplet-network-pytorch/blob/master/train.py

2) The error ocured in train.py -- 136

3) The error was:

runfile('G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py', wdir='G:/researchWork2/pytorch/triplet-network-pytorch-master')
Reloaded modules: triplet_mnist_loader, triplet_image_loader, tripletnet

Number of params: 21840
Traceback (most recent call last):
File "", line 1, in
runfile('G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py', wdir='G:/researchWork2/pytorch/triplet-network-pytorch-master')

File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 880, in runfile
execfile(filename, namespace)

File "D:\Anaconda3\lib\site-packages\spyder\utils\site\sitecustomize.py", line 102, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 258, in
main()

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 116, in main
train(train_loader, tnet, criterion, optimizer, epoch)

File "G:/researchWork2/pytorch/triplet-network-pytorch-master/train.py", line 137, in train
for batch_idx, (data1, data2) in enumerate(train_loader):

File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 303, in iter
return DataLoaderIter(self)

File "D:\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py", line 162, in init
w.start()

File "D:\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)

File "D:\Anaconda3\lib\multiprocessing\context.py", line 223, in _Popen
return _default_context.get_context().Process._Popen(process_obj)

File "D:\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)

File "D:\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in init
reduction.dump(process_obj, to_child)

File "D:\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

4) Some part of train related codes as follows:
def train(train_loader, tnet, criterion, optimizer, epoch):
losses = AverageMeter()
accs = AverageMeter()
emb_norms = AverageMeter()

switch to train mode

tnet.train()
for batch_idx, (data1, data2, data3) in enumerate(train_loader):
if args.cuda:
data1, data2, data3 = data1.cuda(), data2.cuda(), data3.cuda()
data1, data2, data3 = Variable(data1), Variable(data2), Variable(data3)

# compute output
dista, distb, embedded_x, embedded_y, embedded_z = tnet(data1, data2, data3)
# 1 means, dista should be larger than distb
target = torch.FloatTensor(dista.size()).fill_(1)
if args.cuda:
    target = target.cuda()
target = Variable(target)

loss_triplet = criterion(dista, distb, target)
loss_embedd = embedded_x.norm(2) + embedded_y.norm(2) + embedded_z.norm(2)
loss = loss_triplet + 0.001 * loss_embedd

# measure accuracy and record loss
acc = accuracy(dista, distb)
losses.update(loss_triplet.data[0], data1.size(0))
accs.update(acc, data1.size(0))
emb_norms.update(loss_embedd.data[0]/3, data1.size(0))

# compute gradient and do optimizer step
optimizer.zero_grad()
loss.backward()
optimizer.step()

if batch_idx % args.log_interval == 0:
    print('Train Epoch: {} [{}/{}]\t'
          'Loss: {:.4f} ({:.4f}) \t'
          'Acc: {:.2f}% ({:.2f}%) \t'
          'Emb_Norm: {:.2f} ({:.2f})'.format(
        epoch, batch_idx * len(data1), len(train_loader.dataset),
        losses.val, losses.avg, 
        100. * accs.val, 100. * accs.avg, emb_norms.val, emb_norms.avg))

log avg values to somewhere

plotter.plot('acc', 'train', epoch, accs.avg)
plotter.plot('loss', 'train', epoch, losses.avg)
plotter.plot('emb_norms', 'train', epoch, emb_norms.avg)

Thank you so much.

@alykhantejani
And I use it in Windows8.1 with Cuda

we do not support windows officially yet. Maybe @peterjc123 knows what's wrong.

@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your plotter correctly configured?

I can actually verify that setting the num_workers to 0 or 1 helped out. No matter the case, DataLoader always failed with me regardless of dataset with a higher value. The error has to do with multiprocessing with DataLoader:

  File "D:/Opiskelu/PyTorch Tutorials/cnn_transfer_learning_cuda.py", line 76, in <module>
    inputs, classes = next(iter(dataloaders['train']))

  File "C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py", line 301, in __iter__
    return DataLoaderIter(self)

  File "C:\Anaconda3\envs\ml\lib\site-packages\torch\utils\data\dataloader.py", line 158, in __init__
    w.start()

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\context.py", line 212, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\context.py", line 313, in _Popen
    return Popen(process_obj)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\popen_spawn_win32.py", line 66, in __init__
    reduction.dump(process_obj, to_child)

  File "C:\Anaconda3\envs\ml\lib\multiprocessing\reduction.py", line 59, in dump
    ForkingPickler(file, protocol).dump(obj)

BrokenPipeError: [Errno 32] Broken pipe

@karmus89 Actually this error only occurs when you try to do multiprocessing on some code with errors in it. It's unexpected that you face with this issue when your code is right. I don't know which version you are using. Can you send a small piece of code that can reproduce your issue?

Will do! And remember, I'm a using Windows machine. The code is directly copied from the tutorial PyTorch: Transfer Learning Tutorial. This means that the dataset has to be downloaded and extracted as instructed.

The code to reproduce the error:

import torch
import torchvision
from torchvision import datasets, models, transforms
import os

data_transforms = {
    'train': transforms.Compose([
        transforms.RandomSizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'val': transforms.Compose([
        transforms.Scale(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

data_dir = 'hymenoptera_data'
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                          data_transforms[x])
                  for x in ['train', 'val']}
dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                             shuffle=True, num_workers=4)
              for x in ['train', 'val']}

# The code fill fail here trying to iterate over the DataLoader with multiple num_workers (Windows only)
inputs, classes = next(iter(dataloaders['train']))

And I just made some PyTorch forum posts regarding this. The problem lies with Python's multiprocessing and Windows. Please see this PyTorch discussion reply as I don't want to overly copy paste stuff here.

Edit:

Here's the code that doesn't crash, which at the same time complies with Python's multiprocessing programming guidelines for Windows machines:

import torch
import torchvision
from torchvision import datasets, models, transforms
import os

if __name__ == "__main__":

    data_transforms = {
        'train': transforms.Compose([
            transforms.RandomSizedCrop(224),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
        'val': transforms.Compose([
            transforms.Scale(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
        ]),
    }

    data_dir = 'hymenoptera_data'
    image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
                                              data_transforms[x])
                      for x in ['train', 'val']}
    dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,
                                                 shuffle=True, num_workers=4)
                  for x in ['train', 'val']}

    inputs, classes = next(iter(dataloaders['train']))

@karmus89 Well, I think I have stated it where the package was published. I'm so sad that you installed the package without reading the notice.

@peterjc123 Please see my edited response where I did exactly that. The requirement for wrapping the code inside of if __name__ == '__main__' code isn't immediately obvoius, as it is only required for Windows machines.

__Edit__:
Regarding the stating of the requirement, I indeed have missed it. I used conda to install the package directly, so I never came accross any introductory requirements. But thanks anyway! And sorry for making you sad!

__Edit 2__:
Wow, couldn't have known even where to look for that 😄 👍

A question regarding the above. I am running into the above problem within a jupyter notebook. How do you solve this in a jupyter notebook? Wrapping the code in "if __name__ == '__main__' " does not change a thing. Does someone know how to translate this to jupyter notebooks?

@Dehde What about setting the num_worker of the DataLoader to zero?

@peterjc123
Thanks for the quick reply! I did not fully make myself clear, sorry: Is there a way to run pytorch on windows in jupyter notebook and still use the worker functionality, so not set them to zero? I definitely need parellelized preprocessing.. Thanks for your time!

Could you show me the minimal code so that I could reproduce?

@peterjc123
I will edit it into this post here on monday, don't have access to the code right now. Thank you!

As promised, the code I use:

`
if __name__ == '__main__':

batch_size = 256

size = (128, 128)
image_datasets = {}
image_datasets["train"] = WaterbodyDataset(masks=train_masks, images=train_imgs,
                                            transform_img=transforms.Compose([
                                                RandomCrop(size),
                                                transforms.ToTensor(),
                                            ]),
                                            transform_mask=transforms.Compose([
                                                RandomCrop(size),
                                                transforms.ToTensor(),
                                            ]))

image_datasets["val"] = WaterbodyDataset(masks=val_masks, images=val_imgs,
                                            transform_img=transforms.Compose([
                                                transforms.ToTensor(),
                                            ]),
                                            transform_mask=transforms.Compose([
                                                transforms.ToTensor()
                                            ]))

dataloaders = {'train': torch.utils.data.DataLoader(image_datasets['train'], batch_size=batch_size, 
                                                    shuffle=True, num_workers=1),
               'val' : torch.utils.data.DataLoader(image_datasets['val'], batch_size=batch_size, 
                                                   shuffle=False, num_workers=1)}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val']}

hps = HyperParams()
hps.update("name=resnet34_128_deconv_pret00rained_bs32_adam_lr0.0001_wd0_pat5,"
           "arch=resnet34,input_channel=4,freeze=0,deconv=1,opt=adam,debug=0,"
           "weight_decay=0.0,patience=100,pretrained=1,lr=0.0001,print_freq=10,every_x_epoch_eval=1")
pprint(attr.asdict(hps))

model = Model(hps)
model.train(dataloaders)`

The WaterbodyDataset inherits from the pytorch dataset class.

I also got the same error. When I set num_workers to 0, the error does not appear again. However, when I set num_workers to 1, the error is still there.

When I set num_workers to 0, there is no error.

Please i need assistance with this error "BrokenPipeError: [Errno 32] Broken pipe"
code from :https://github.com/higgsfield/np-hard-deep-reinforcement-learning/blob/master/Neural%20Combinatorial%20Optimization.ipynb
i am using windows 10.

  1. Wrap the code in if __name__ == '__main__':
    but for me, nonetheless, the error sometimes appears again. I know it sounds silly, but what helps me is just
  2. rebooting the computer.
    Windows 10 here

I found that the issue is still present, but only when I use a custom collate_fn.

For me, just changing num_workers from 2 to 0 made the code work properly...

Had same issue when I ran the PyTorch Data Loading and Processing Tutorial. Changing num_workers from 2 to 0 solved the problem, but num_workers = 2 worked fine with other datasets.. I use Windows

num_workers > 0 doesn't work for windows.
Even with the new IterableDataset.

I met this same error. And when I try to find method to solve this problem, the program continues to run automatically (wait about 10 minutes ) amazing :confused:

I've run the exact same code multiple times with different results. Also, I've copied code that causes a broken pipe to a new file (the contents being exactly the same) and it would run fine. I think there's an external factor in play here. I can't reproduce the bug anymore, but maybe try deleting your __pycache__ directory if there's any.

have some problem on Windows10. dunno why but i think problem is dataloader (num_workers to 0 doesn't help) and multiprocessing

have some problem on Windows10. dunno why but i think problem is dataloader (num_workers to 0 doesn't help) and multiprocessing

After using Ubuntu for quire some time, I am trying Windows-10 lately (just for prototyping before using the cluster machine) and bumped into the same error, setting num_workers to 0 helped. Make sure you are setting all dataloaders, train, test, and validate.

I also have same problem on Win10. I got the error message '[Errno 32] Broken pipe' when I set the num_workers greater than 0.
And my code is download from Pytorch official tutorial.

I guess that is a bug for Win10, and I am looking forward to see a fixed version on next release.

same error, num_workers=0 worked, but I want multiprocessing to speed up dataloading.

same error, num_workers=0 worked, but I want multiprocessing to speed up dataloading.

Seems that the only way for this to work is using Linux, I am using Windows-10 for prototyping and then pushing everything to the cluster which is based on Linux.

if platform.system()=='Windows': n_cpu= 0

I also encountered a similar problem in windows 10 when defining my custom torchvision dataset and trying to run it in jupyter lab. Apparently the custom dataset does not get registered as an attribute to the __main__ module which is called by the DataLoader in the multiprocessing.py\spawn.py file. I fixed it by writing the dataset into a module and then importing it as mentioned here:

https://stackoverflow.com/questions/41385708/multiprocessing-example-giving-attributeerror

  File "C:\Users\johndoe\Anaconda3\envs\PyTorch15\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "C:\Users\johndoe\Anaconda3\envs\PyTorch15\lib\multiprocessing\spawn.py", line 115, in _main
    self = reduction.pickle.load(from_parent)
AttributeError: Can't get attribute 'RandomPatchExtractor' on <module '__main__' (built-in)>

@mjchen611 You can set num_workers to 0 to see the actual error. Did you have your plotter correctly configured?

Setting num_workers to 0 worked for me. Could you explain why is this causing an error?

I have noticed this issue is closed, but I do not think this is fixed. Is there any effort to fix multi-processing dataloader on windows? Currently there are 2 options as far as I know:

  1. wrap it in if __name__ == '__main__':, which does not always work.
  2. do not use multi-processing on windows: if platform.system()=='Windows': n_cpu= 0

So the first one is an imperfect fix, while the second one amounts to just giving up. Is there any effort on fixing multi-processed dataloading on windows currently going on somewhere else or should we re-open this one?

Use
if __name__ == '__main__' and '__file__' in globals(): instead of if __name__ == '__main__':
That works for me. I use Jupyter notebook and windows 10.

this is the reference

I got problem when trying to train on my custom Coco dataset (which is little bit difference from default CocoDetection Pytorch class). Add params collate_fn=utils.collate_fn worked for me:
trainloader = torch.utils.data.DataLoader(coco_train, batch_size=2, shuffle=False, num_workers=1, collate_fn=utils.collate_fn)

Was this page helpful?
0 / 5 - 0 ratings