Caffe: choosing batch sizes and tuning sgd

Created on 16 Mar 2014  ·  2Comments  ·  Source: BVLC/caffe

Hello,

I've noticed that in the training prototxt file, if we set the batch size too small or the scale too large, then eventually, the distribution of predicted labels for each training example within each batch is the same. That is, the weights do change between different batches, but within a batch, the predicted distribution of labels is the same for all training examples in that batch. This gives us a similar result as in (https://github.com/BVLC/caffe/issues/59) where the validation performance is just 1/C where C is the number of classes.

For example, in the mnist example, if we set the batch size to 4, then we see that after a while, the predicted label distributions for each training example is the same within each batch and the validation performance is around 10%. However, if we set the batch size to 6, then it works fine.

Could anyone tell me why this is happening? Furthermore, how do we decide a good batch size and scale?

Thanks!

EDIT: I suppose that scale is chosen such that the features will be in [0,1), but my question still remains why small batch sizes leads to the behaviour described above.

question

Most helpful comment

If you choose a batch size too small then the gradients will become more unstable and would need to reduce the learning rate. So batch size and learning rate are linked.
Also if one use a batch size too big then the gradients will become less noisy but it will take longer to converge.

I would recommend you to read http://leon.bottou.org/research/stochastic and his tricks about SGD

All 2 comments

If you choose a batch size too small then the gradients will become more unstable and would need to reduce the learning rate. So batch size and learning rate are linked.
Also if one use a batch size too big then the gradients will become less noisy but it will take longer to converge.

I would recommend you to read http://leon.bottou.org/research/stochastic and his tricks about SGD

Thanks a lot for your help!

Was this page helpful?
0 / 5 - 0 ratings