Detectron: Output from training

Created on 8 Mar 2018 · 4Comments · Source: facebookresearch/Detectron

{"accuracy_cls": 0.923639,
"eta": "1 day, 13:30:44",
"iter": 120000,
"loss": 0.562384,
"loss_bbox": 0.107754,
"loss_cls": 0.243266,
"loss_rpn_bbox": 0.085141,
"loss_rpn_cls": 0.100456,
"lr": 0.010000,
"mb_qsize": 64,
"mem": 6407,
"time": 0.562685}

Q1：what the mean of “accuracy_cls”，“mb_qsize”，mem“”？
Q2："loss_bbox": 0.107754 + "loss_cls": 0.243266 + "loss_cls": 0.243266 + "loss_rpn_bbox": 0.085141 + "loss_rpn_cls": 0.100456 = 0.536617 ,it's not equal with "loss": 0.562384?

Source

fangpengcheng95

Most helpful comment

accuracy_cls is the classification accuracy on the RoI mini-batches seen in training. Note that these are balanced 1:3 foreground:background by default, so if a network is just guessing background its accuracy will be 75% (which doesn't mean it's good).

mb_qsize is the dataloader's output queue size. This should be close to 64 (full) most of the time. If it drops to 0 then the GPUs will be stalled waiting for data.

mem is the max amount of GPU memory on a single GPU used by caffe2 (note that this does not reflect some GPU memory usage outside of what caffe2 can track, e.g., cudnn caches, etc.).

The sum of the losses is not equal to the total loss because each logged value is median filtered separately to smooth the displayed output values.

rbgirshick on 8 Mar 2018

👍3

All 4 comments

mb_qsize is the dataloader's output queue size. This should be close to 64 (full) most of the time. If it drops to 0 then the GPUs will be stalled waiting for data.

mem is the max amount of GPU memory on a single GPU used by caffe2 (note that this does not reflect some GPU memory usage outside of what caffe2 can track, e.g., cudnn caches, etc.).

The sum of the losses is not equal to the total loss because each logged value is median filtered separately to smooth the displayed output values.

rbgirshick on 8 Mar 2018

👍3

That's greate!

fangpengcheng95 on 9 Mar 2018

Any explanation, or is it normal that it grows this fast?

json_stats: {"accuracy_cls": 0.351562, "eta": "121 days, 0:34:07", "iter": 0, "loss": 6.128010, "loss_bbox": 0.007976, "loss_cls": 3.535384, "loss_rpn_bbox_fpn2": 0.000000, "loss_rpn_bbox_fpn3": 0.000000, "loss_rpn_bbox_fpn4": 0.000000, "loss_rpn_bbox_fpn5": 0.063088, "loss_rpn_bbox_fpn6": 0.000000, "loss_rpn_cls_fpn2": 2.373329, "loss_rpn_cls_fpn3": 0.110721, "loss_rpn_cls_fpn4": 0.032310, "loss_rpn_cls_fpn5": 0.005202, "loss_rpn_cls_fpn6": 0.000000, "lr": 0.000333, "mb_qsize": 64, "mem": 2965, "time": 174.274131}

I0409 23:48:21.545917 14708 context_gpu.cu:305] GPU 0: 3037 MB

I0409 23:48:21.586969 14708 context_gpu.cu:309] Total: 3037 MB

I0409 23:48:23.049207 14711 context_gpu.cu:305] GPU 0: 3169 MB

I0409 23:48:23.049262 14711 context_gpu.cu:309] Total: 3169 MB

json_stats: {"accuracy_cls": 0.972342, "eta": "7 days, 4:53:41", "iter": 20, "loss": 16.139348, "loss_bbox": 0.666931, "loss_cls": 11.527749, "loss_rpn_bbox_fpn2": 0.000000, "loss_rpn_bbox_fpn3": 0.000000, "loss_rpn_bbox_fpn4": 0.000000, "loss_rpn_bbox_fpn5": 0.074818, "loss_rpn_bbox_fpn6": 0.000000, "loss_rpn_cls_fpn2": 0.000000, "loss_rpn_cls_fpn3": 0.000000, "loss_rpn_cls_fpn4": 0.000000, "loss_rpn_cls_fpn5": 0.052323, "loss_rpn_cls_fpn6": 0.000000, "lr": 0.000360, "mb_qsize": 64, "mem": 3254, "time": 10.377150}`