Caffe: ImageDataLayer enhancement: images as labels

Created on 21 Mar 2016 · 3Comments · Source: BVLC/caffe

Can we edit the implementation of the ImageData Layer?
Currently it supports taking one image and its label as input. But in use cases like segmentation (SegNet), motion flow (FlowNet), and super resolution (SRCNN), the output is also an image. To implement this we have to resort to using two ImageData layers and use fake labels as arguments.

Instead of this, can we add a parameter to the layer which controls this. For example

LabelFormat := 0 -> label is a digit (default)
LabelFormat: := 1 -> label is also an image

If the idea is acceptable, then I can do the implementation.

enhancement

Source

malreddysid

Most helpful comment

Making the label optional seems fine to me. I think it would be more intuitive to use an enum, rather than a number constant that has to be documented. For example:

label_type: INTEGER (for the current setup)
label_type: IMAGE (to allow for multiple images)

If you are interested in further expanding the layer, you could make the field repeated to allow for multiple images or a combination of images and integer labels.

For example, if you have one image label and one integer label (e.g. a siamese network), then you could specify:

label_type: IMAGE
label_type: IMAGE
label_type: INTEGER

which would map to 3 top blobs. To maintain backwards compatibility, you could default to [IMAGE, INTEGER] if label_type is not specified.

Regarding whether or not this idea will get merged, that depends on how much interest there is. Having unit tests is also important. I personally am very interested in this kind of modification (and have already modified my fork to do something similar), but it would be good to see more feedback from others.

See #2108.

seanbell on 23 Mar 2016

👍3

All 3 comments

@seanbell Can I go ahead with the implementation?

malreddysid on 23 Mar 2016

Making the label optional seems fine to me. I think it would be more intuitive to use an enum, rather than a number constant that has to be documented. For example:

label_type: INTEGER (for the current setup)
label_type: IMAGE (to allow for multiple images)

If you are interested in further expanding the layer, you could make the field repeated to allow for multiple images or a combination of images and integer labels.

For example, if you have one image label and one integer label (e.g. a siamese network), then you could specify:

label_type: IMAGE
label_type: IMAGE
label_type: INTEGER

which would map to 3 top blobs. To maintain backwards compatibility, you could default to [IMAGE, INTEGER] if label_type is not specified.

See #2108.

seanbell on 23 Mar 2016

👍3

Could I get your feedback @naibaf7 @shelhamer @jeffdonahue. Thanks in advance.

malreddysid on 24 Mar 2016

Was this page helpful?

0 / 5 - 0 ratings

Related issues

Any tools to convert pretrained mxnet model to model for caffe?

iamhankai · 3Comments

g++ cannot find cuda_runtime.h

kelvinxu · 3Comments

Check failed: error == cudaSuccess (2 vs. 0) out of memory in solver phase.

vladislavdonchev · 3Comments

Errors in LMDB read and Registering ImageData after adding custom layer

serimp · 3Comments

Segfault during caffe::init

dfotland · 3Comments