{"accuracy_cls": 0.923639,
"eta": "1 day, 13:30:44",
"iter": 120000,
"loss": 0.562384,
"loss_bbox": 0.107754,
"loss_cls": 0.243266,
"loss_rpn_bbox": 0.085141,
"loss_rpn_cls": 0.100456,
"lr": 0.010000,
"mb_qsize": 64,
"mem": 6407,
"time": 0.562685}
Q1:what the mean of “accuracy_cls”,“mb_qsize”,mem“”?
Q2:"loss_bbox": 0.107754 + "loss_cls": 0.243266 + "loss_cls": 0.243266 + "loss_rpn_bbox": 0.085141 + "loss_rpn_cls": 0.100456 = 0.536617 ,it's not equal with "loss": 0.562384?
accuracy_cls
is the classification accuracy on the RoI mini-batches seen in training. Note that these are balanced 1:3 foreground:background by default, so if a network is just guessing background its accuracy will be 75% (which doesn't mean it's good).
mb_qsize
is the dataloader's output queue size. This should be close to 64 (full) most of the time. If it drops to 0 then the GPUs will be stalled waiting for data.
mem
is the max amount of GPU memory on a single GPU used by caffe2 (note that this does not reflect some GPU memory usage outside of what caffe2 can track, e.g., cudnn caches, etc.).
The sum of the losses is not equal to the total loss because each logged value is median filtered separately to smooth the displayed output values.
That's greate!
Any explanation, or is it normal that it grows this fast?
json_stats: {"accuracy_cls": 0.351562, "eta": "121 days, 0:34:07", "iter": 0, "loss": 6.128010, "loss_bbox": 0.007976, "loss_cls": 3.535384, "loss_rpn_bbox_fpn2": 0.000000, "loss_rpn_bbox_fpn3": 0.000000, "loss_rpn_bbox_fpn4": 0.000000, "loss_rpn_bbox_fpn5": 0.063088, "loss_rpn_bbox_fpn6": 0.000000, "loss_rpn_cls_fpn2": 2.373329, "loss_rpn_cls_fpn3": 0.110721, "loss_rpn_cls_fpn4": 0.032310, "loss_rpn_cls_fpn5": 0.005202, "loss_rpn_cls_fpn6": 0.000000, "lr": 0.000333, "mb_qsize": 64, "mem": 2965, "time": 174.274131}
I0409 23:48:21.545917 14708 context_gpu.cu:305] GPU 0: 3037 MB
I0409 23:48:21.586969 14708 context_gpu.cu:309] Total: 3037 MB
I0409 23:48:23.049207 14711 context_gpu.cu:305] GPU 0: 3169 MB
I0409 23:48:23.049262 14711 context_gpu.cu:309] Total: 3169 MB
json_stats: {"accuracy_cls": 0.972342, "eta": "7 days, 4:53:41", "iter": 20, "loss": 16.139348, "loss_bbox": 0.666931, "loss_cls": 11.527749, "loss_rpn_bbox_fpn2": 0.000000, "loss_rpn_bbox_fpn3": 0.000000, "loss_rpn_bbox_fpn4": 0.000000, "loss_rpn_bbox_fpn5": 0.074818, "loss_rpn_bbox_fpn6": 0.000000, "loss_rpn_cls_fpn2": 0.000000, "loss_rpn_cls_fpn3": 0.000000, "loss_rpn_cls_fpn4": 0.000000, "loss_rpn_cls_fpn5": 0.052323, "loss_rpn_cls_fpn6": 0.000000, "lr": 0.000360, "mb_qsize": 64, "mem": 3254, "time": 10.377150}`
@RafaRuiz I am also getting the same results...
Did you figure out why it grows so fast?
Most helpful comment
accuracy_cls
is the classification accuracy on the RoI mini-batches seen in training. Note that these are balanced 1:3 foreground:background by default, so if a network is just guessing background its accuracy will be 75% (which doesn't mean it's good).mb_qsize
is the dataloader's output queue size. This should be close to 64 (full) most of the time. If it drops to 0 then the GPUs will be stalled waiting for data.mem
is the max amount of GPU memory on a single GPU used by caffe2 (note that this does not reflect some GPU memory usage outside of what caffe2 can track, e.g., cudnn caches, etc.).The sum of the losses is not equal to the total loss because each logged value is median filtered separately to smooth the displayed output values.