Detectron: RetinaNet:workspace blob weight shape is not matched with predtrained model weights shape on custom dataset

Created on 1 Feb 2018  ·  11Comments  ·  Source: facebookresearch/Detectron

When I am trying to train retinanet model(X-101-32x8d-FPN) on my own dataset, I am getting the below error.
AssertionError: Workspace blob retnet_cls_pred_fpn3_w with shape (72, 256, 3, 3) does not match weights file shape (720, 256, 3, 3)

Any solution?

community help wanted

Most helpful comment

I was able to finetune of RetinaNet (model X-101-64x4d-FPN_2x) on my own dataset as follows:

  1. Converting to COCO and adding my dataset to libs/dataset catalogue as described by @mattifrind
  2. Downloading the pretrained model, providing the path to it in TRAIN.WEIGHTS and modifying NUM_CLASSES, BASE_LR, MAX_ITER etc., all in the appropriate .yaml file
  3. Printing out the dst_name s, and the shapes of the loaded weights (ws_blob.shape,src_blobs[src_name].shape) in this for loop to find out exactly which layers were class dependent (in my case, retnet_cls_pred_fpn3_w and retnet_cls_pred_fpn3_b)
  4. Manually reinitializing the layers to be the same shape as ws_blob (following paper guidelines concerning initial layer values) and their corresponding "momentum" layers in the same loop (from what I understand, these layers' weights contain the values of the velocity vector in momentum SGD obtained while optimizing for COCO, so I believe they should be set to 0 given we're using a different dataset):
classes_layers_list = ['gpu_0/retnet_cls_pred_fpn3_w', 'gpu_0/retnet_cls_pred_fpn3_b']

 # Paper says 
 # - weights initialized to gaussian 
 #   (training was only stable for me for var=0.0001),
 # - biases initialized to -log((1-pi/pi) for last conv layer in subnet, where pi is 
 #   prob of anchor being foreground at start of fine-tuning
 #   for me this was ~ 0.00001 (you can check this from the output at the first iteration)               
 if dst_name in classes_layers_list:

     if dst_name == 'gpu_0/retnet_cls_pred_fpn3_w':
         src_blobs[src_name] =  0.0001*np.random.randn(*(ws_blob.shape))
     else:
         src_blobs[src_name] = -np.log((1-0.00001)/0.00001)*np.ones(*(ws_blob.shape))

     src_blobs[src_name + '_momentum'] = np.zeros(ws_blob.shape)

This got my network training, with a steadily decreasing overall loss. I started out with 0's for the individual FPN focal loss values but they gradually increased after training for a few k iterations or so.

All 11 comments

I think this is because your new dataset has a different number of categories. The "cls_pred" (and also "bbox_reg") layers are class-dependent, so are their shapes. If you load a pre-trained Detectron model and fine-tune on another dataset, you may either change the name of these related layers during fine-tuning, or not load these weight blobs when initializing.

@KaimingHe Sorry for asking, but can you explain the fine-tuning process further? I understand that I have to change the number of classes in the /configs/pre-trained-network.yaml and have to modify the output layers of the network.

My assumed steps for training on own data:

  1. Create a COCO-like dataset
  2. Add it to the dataset catalog (lib/datasets)
  3. load the pre-trained model and modify the output layers
  4. train the model using train_net.py

Now I'm at the third step and need some help. In my understanding of Detectron, the models are defined by a configuration file (.yaml) and a pkl file with their weights. I can edit the .yaml, but how can I change the name of the output related layers in the pkl file for the fine-tuning? It's a binary file, isn't it?
Regarding you're last suggestion: How can I prevent loading these weight blobs while initializing?

Thank's for any help!
(I'm sorry for eventually stupid questions, but I need to do this for my school project and I don't find any good explanations on the Internet)

@mattifrind We have not exposed this as a config option. Basically, what you may try is to revise the lines in https://github.com/facebookresearch/Detectron/blob/master/lib/utils/net.py#L88, for example, removing those class-dependent layers the dst list.

@KaimingHe Sorry I didn't get it. I have to ask again...
I don't understand what you mean by dst list. The function initialize_gpu_from_weights_file is called with the argument model which is initialized as configured in the configuration.yaml (with the needed number of classes). My interpretation of your tip is to prevent that this model structure is overridden by the weights of the .pkl file. But cls_score_w(/b) and bbox_pred_w(/b) aren't found when I run the demo (I thought that these are the only class-dependent layers) so the code continues to the next iteration. Conclusion: I don't get to the line which is mentioned in your comment.
So I took the next list that I could found (unscoped_param_names) and prevented that cls_score_w(/b) and bbox_pred_w(/b) can be added to that list.
To do this I added these Lines in this for loop:

for blob in model.params:
        keyname = c2_utils.UnscopeName(str(blob))
        if (keyname == 'cls_score_w' or keyname == 'cls_score_b' or keyname == 'bbox_pred_w' or keyname == 'bbox_pred_b'):
            continue
        unscoped_param_names[keyname] = True

That changed nothing. The demo still runs without problems and my dataset with just 2 categories and the corresponding .yaml file (demo file with edited NUM_CLASSES: 2 and Dataset) still throws this error:
E0207 18:21:57.894165 2256 pybind_state.h:422] Exception encountered running PythonOp function: ValueError: could not broadcast input array from shape (4) into shape (0)
While reading the issues I discovered that I'm not the only one that tries to fine tune with a different number of categories. Thank you for your patience!

Hi @mattifrind I have a idea to use the pre-trained model and fine-tune it on your own dataset without removing the class-dependent layers.
Here is my idea:
As we know COCO 2014 dataset has 80 categories.
image

Suppose you have a dataset with 3 categories which all the categories not list in COCO 2014.
Then what you have to do is to remove 3 categories and the related images from COCO 2014 dataset.
After that you can change your dataset into COCO like format and merged it into COCO 2014.
Now you have a new "COCO 2014" dataset which includes your own categories and their related training samples and remains the categories number as 80.
I think that should be OK to deceive the dimension checking strategy of Detectron.

B.T.W. In our experience, it shows that pre-trained model is very important for getting a good performance. So you might not abandon the pre-trained model and retrain a new one from the very beginning.

Now I have a problem that how can I transfer the label information which is represented by the coordinate into COCO like format like this:
image

@SniperZhao Thanks! It isn't really the beautiful way because I don't need the 77 other classes, but that should work. Have you tested that?

I don't want to abandon the idea of a pre-trained model. I thought that for example the demo config.yaml loads a .pkl file with pre-trained weights except the weights from the output layers (Detectron prints out "not found"). So if I run train_net.py I keep the whole model (except the ones that aren't found) and fine-tune the last layers according to my new task. Am I wrong?

I'm sorry but I'm not sure if I understand your last problem correctly. Do you have a problem with converting your data? I used Java with a JSON Library so I could easily bring the data into the needed format.

@mattifrind Yes! You are right, it's not beautiful. But I am not so sure the other 77 classes would not help. Some paper said that multi-class detection task has better performance than single class detection. May be other classes can also offer some more feature information for modeling the target that you interested. But, yes, it's another story and not a perfect solution for your question.

Apparently you read more code of Detectron than me did. So I can not answer your question about the code now. I'll debug the code later and wish we can figure it out. And I'm pretty sure that your theory about fine-tune is correct.

For my question, I have a dataset which all the labeling samples have the bounding box coordinate in integer numbers. However, the COCO's labeling information has floating number as the coordinate. I don't know the corresponding relationship between them. So I can't transfer the data into right format. And I also don't know the meaning of segmentation which a list of floating numbers in COCO annotations.

Ok maybe the multi class detection helps. I'm greatful for any solution.
Good luck with your test!
About your question: I didn't thaught much about the floating numbers. I just placed the ints of my data into the COCO format and I couldn't test it because of our number of classes problem. But your right that maybe there is a more complex relationship between the pixel coordinates and the floating numbers.
About the segmentation: I have experienced that there are two arrays of integers with the same length in the segmentation tag. I guess that they represent the x and y coordinates.
Maybe you can find some more information in the COCOAPI repository or on the COCO website.

I was able to finetune of RetinaNet (model X-101-64x4d-FPN_2x) on my own dataset as follows:

  1. Converting to COCO and adding my dataset to libs/dataset catalogue as described by @mattifrind
  2. Downloading the pretrained model, providing the path to it in TRAIN.WEIGHTS and modifying NUM_CLASSES, BASE_LR, MAX_ITER etc., all in the appropriate .yaml file
  3. Printing out the dst_name s, and the shapes of the loaded weights (ws_blob.shape,src_blobs[src_name].shape) in this for loop to find out exactly which layers were class dependent (in my case, retnet_cls_pred_fpn3_w and retnet_cls_pred_fpn3_b)
  4. Manually reinitializing the layers to be the same shape as ws_blob (following paper guidelines concerning initial layer values) and their corresponding "momentum" layers in the same loop (from what I understand, these layers' weights contain the values of the velocity vector in momentum SGD obtained while optimizing for COCO, so I believe they should be set to 0 given we're using a different dataset):
classes_layers_list = ['gpu_0/retnet_cls_pred_fpn3_w', 'gpu_0/retnet_cls_pred_fpn3_b']

 # Paper says 
 # - weights initialized to gaussian 
 #   (training was only stable for me for var=0.0001),
 # - biases initialized to -log((1-pi/pi) for last conv layer in subnet, where pi is 
 #   prob of anchor being foreground at start of fine-tuning
 #   for me this was ~ 0.00001 (you can check this from the output at the first iteration)               
 if dst_name in classes_layers_list:

     if dst_name == 'gpu_0/retnet_cls_pred_fpn3_w':
         src_blobs[src_name] =  0.0001*np.random.randn(*(ws_blob.shape))
     else:
         src_blobs[src_name] = -np.log((1-0.00001)/0.00001)*np.ones(*(ws_blob.shape))

     src_blobs[src_name + '_momentum'] = np.zeros(ws_blob.shape)

This got my network training, with a steadily decreasing overall loss. I started out with 0's for the individual FPN focal loss values but they gradually increased after training for a few k iterations or so.

@djr2015 Have you done the fine tuning part on the mask-rcnn? I have followed your steps (1-3), but in the mask-rcnn paper the initialisation of the layers is not clearly explained.

Hi @gabriellap
Are you able to do fine tuning on mask-rcnn?
If yes, can you please list the steps to be followed and the changes that are to be made to the files.

thank you!

Was this page helpful?
0 / 5 - 0 ratings