Tensorflow: Support streaming from hdf5

Created on 18 Jul 2016  ·  3Comments  ·  Source: tensorflow/tensorflow

It would be nice if streaming HDF5 (which is required in out-of-core situations) would be implemented in Tensorflow.

Most helpful comment

Well, what I'm actually asking for is something along the lines of a tf.TextLineReader that supports both streaming / random access. The request came up before e.g. in #2089 . The problem with always closing these feature requests is that people who are looking for easy, new contributions might not see them, although they might be a good first step into the TF code base.

All 3 comments

This feature request is very broad, and we will likely not work on it in the foreseeable future. To keep the issue tracker focused, I will close this issue.

Well, what I'm actually asking for is something along the lines of a tf.TextLineReader that supports both streaming / random access. The request came up before e.g. in #2089 . The problem with always closing these feature requests is that people who are looking for easy, new contributions might not see them, although they might be a good first step into the TF code base.

+1. For reference, in https://www.tensorflow.org/api_guides/python/reading_data, the file format supported are only csv, binary and tfrecord. But hdf5 is a pretty common format. For big datasets, it is not possible to load a whole dataset with format .hdf5 once like this example. https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/learn/hdf5_classification.py. Instead, we use small hdf5 files for each sample.

The only feasible way to deal with this is to transfer hdf5 file to tfrecord or binary file first.

Was this page helpful?
0 / 5 - 0 ratings