Hello,
I have successfully trained a Faster R-CNN model on my custom dataset (my config file is based on e2e_faster_rcnn_X-101-32x8d-FPN_1x.yaml
with modifications of the DATASETS
lines). However, when using tools/infer_simple.py
to test the model:
python2 tools/infer_simple.py \
--cfg configs/mydataset/mydataset_faster_rcnn_X-101-32x8d-FPN_1x.yaml \
--output-dir /home/ubuntu/tmp/mydataset_faster_rcnn_X-101-32x8d-FPN_1x/ \
--image-ext jpg \
--wts /home/ubuntu/detectron/output/train/mydataset_train:mydataset_val/generalized_rcnn/model_final.pkl \
/home/ubuntu//test-data/
it runs without any errors, but the results are very poor, and more strangely, the names of the predicted objects are from COCO (e.g. car
, air plane
, etc.) and not from my custom dataset.
Could you please help?
Thank you very much in advance!
@ir413 Thanks a lot!
Unfortunately I still cannot make it work :(
As you suggested I replaced the line dummy_coco_dataset = dummy_datasets.get_coco_dataset()
by dummy_coco_dataset = dummy_datasets.get_custom_dataset()
where my get_custom_dataset()
is defined as follows:
def get_custom_dataset():
"""A dummy COCO dataset that includes only the 'classes' field."""
ds = AttrDict()
classes = [
'__background__', 'class1', 'class2', 'class3', 'class4', 'class5', 'class6'
]
ds.classes = {i: name for i, name in enumerate(classes)}
return ds
Then after executing the command (as shown in the original question), I got the following error:
INFO infer_qopius.py: 132: Processing /home/ubuntu//test-data/IMG_0560.jpg -> /home/ubuntu/tmp/mydataset_faster_rcnn_X-101-32x8d-FPN_1x/IMG_0560.jpg.pdf
E0604 13:19:32.223304 71160 net_dag.cc:195] Exception from operator chain starting at '' (type 'Conv'): caffe2::EnforceNotMet: [enforce fail at conv_op_cudnn.cc:546] filter.dim32(1) == C / group_. 64 vs 2 Error from operator:
input: "gpu_0/res2_0_branch2a" input: "gpu_0/res2_0_branch2b_w" output: "gpu_0/res2_0_branch2b" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "group" i: 32 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
WARNING workspace.py: 185: Original python traceback for operator7
in networkgeneralized_rcnn
in exception above (most recent call last):
WARNING workspace.py: 190: File "tools/infer_qopius.py", line 168, in
WARNING workspace.py: 190: File "tools/infer_qopius.py", line 118, in main
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/core/test_engine.py", line 328, in initialize_model_from_cfg
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/model_builder.py", line 124, in create
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/model_builder.py", line 89, in generalized_rcnn
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/model_builder.py", line 229, in build_generic_detection_model
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/optimizer.py", line 54, in build_data_parallel_model
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/model_builder.py", line 169, in _single_gpu_build_func
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/FPN.py", line 62, in add_fpn_ResNet101_conv5_body
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/FPN.py", line 103, in add_fpn_onto_conv_body
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/ResNet.py", line 46, in add_ResNet101_conv5_body
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/ResNet.py", line 101, in add_ResNet_convX_body
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/ResNet.py", line 83, in add_stage
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/ResNet.py", line 181, in add_residual_block
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/ResNet.py", line 255, in bottleneck_transformation
WARNING workspace.py: 190: File "/home/ubuntu/detectron/lib/modeling/detector.py", line 406, in ConvAffine
WARNING workspace.py: 190: File "/usr/local/lib/python2.7/dist-packages/caffe2/python/cnn.py", line 97, in Conv
WARNING workspace.py: 190: File "/usr/local/lib/python2.7/dist-packages/caffe2/python/brew.py", line 107, in scope_wrapper
WARNING workspace.py: 190: File "/usr/local/lib/python2.7/dist-packages/caffe2/python/helpers/conv.py", line 186, in conv
WARNING workspace.py: 190: File "/usr/local/lib/python2.7/dist-packages/caffe2/python/helpers/conv.py", line 139, in _ConvBase
Traceback (most recent call last):
File "tools/infer_qopius.py", line 168, in
main(args)
File "tools/infer_qopius.py", line 138, in main
model, im, None, timers=timers
File "/home/ubuntu/detectron/lib/core/test.py", line 66, in im_detect_all
model, im, cfg.TEST.SCALE, cfg.TEST.MAX_SIZE, boxes=box_proposals
File "/home/ubuntu/detectron/lib/core/test.py", line 158, in im_detect_bbox
workspace.RunNet(model.net.Proto().name)
File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 217, in RunNet
StringifyNetName(name), num_iter, allow_fail,
File "/usr/local/lib/python2.7/dist-packages/caffe2/python/workspace.py", line 178, in CallWithExceptionIntercept
return func(args, *kwargs)
RuntimeError: [enforce fail at conv_op_cudnn.cc:546] filter.dim32(1) == C / group_. 64 vs 2 Error from operator:
input: "gpu_0/res2_0_branch2a" input: "gpu_0/res2_0_branch2b_w" output: "gpu_0/res2_0_branch2b" name: "" type: "Conv" arg { name: "kernel" i: 3 } arg { name: "group" i: 32 } arg { name: "exhaustive_search" i: 0 } arg { name: "stride" i: 1 } arg { name: "pad" i: 1 } arg { name: "order" s: "NCHW" } arg { name: "dilation" i: 1 } device_option { device_type: 1 cuda_gpu_id: 0 } engine: "CUDNN"
As you can see my custom dataset has only 6 classes. Maybe this is the reason?
The error looks unrelated to the number of classes; it's thrown by a Conv op with grouped convolution where there's a mismatch between the number of input channels, number of groups, and number of filter channels. This indicates that you accidentally have used mismatched config and model files.
@rbgirshick Thank you very much for your answer! To make sure everything is done properly, I have started the training again, and now it works. Sorry for the stupid question.
Most helpful comment
The error looks unrelated to the number of classes; it's thrown by a Conv op with grouped convolution where there's a mismatch between the number of input channels, number of groups, and number of filter channels. This indicates that you accidentally have used mismatched config and model files.