Keras: New to Keras, how to format image data in numpy arrays for training?

Created on 12 May 2016  ·  3Comments  ·  Source: keras-team/keras

Hello!

I have a pickle file that contains three sets—train, test, and validation.

Each of these sets contain two arrays—a Numpy ndarray of ndarrays containing image data (each image data array having the shape (300,300,3), with there being X arrays of image data. It also contains a set of labels, with each label mapped to the data array, such that the number of image data arrays and the number of labels are the same.

In theory (at least, in my naive theory), I should be able to pass in to model.fit my train data and train labels. However, when I do so, I get the error,

Error when checking model input: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 1 arrays but instead got the following list of 12 arrays: with my image data arrays then listed.

Here is the relevant code for the creation of my Pickle file and the attempted fitting of the model.

Each image is read by PIL.Image, and resized to the same dimensions (300,300).

image = Image.open(os.path.join(root, dirname, file))
print "Creating numpy representation of image %s " % file
resize = image.resize((300,300), Image.NEAREST) 
resize.load()
data = np.asarray( resize, dtype="uint8" )
print(data.shape)
master_dataset.append(data)

(Eventually, I create the test and validation datasets by pulling elements out of the master dataset).

Here is how I create my pkl file:

train_set = master_dataset, np.asarray(master_labels)

valid_set = valid_data, np.asarray(valid_labels)

test_set = test_data, np.asarray(test_labels)

dataset = [train_set, valid_set, test_set]

print("Creating pickle file")
f = gzip.open('data.pkl.gz', 'wb')
cPickle.dump(dataset, f, protocol=2)
f.close()

And finally, here is how I load and (attempt to) use the pickle within Keras.

data = cPickle.load(f)

train = data[0]
valid = data[1]
test = data[2]

train_x, train_y = data[0]

model.fit(train_x, np.asarray(train_y), nb_epoch=1, batch_size=1, verbose=1)

If anyone is able to help out a machine learning newbie, I'd be eternally grateful!

stale

Most helpful comment

I guess that train_x and train_y are list of ndarray. Try:

train_x = np.array(train_x)
# train_x.shape should be (nb_sample, height, width, channel)

All 3 comments

I guess that train_x and train_y are list of ndarray. Try:

train_x = np.array(train_x)
# train_x.shape should be (nb_sample, height, width, channel)

joelthchao's axis ordering is correct if you're using TensorFlow.
Theano uses nb_sample, channels, height, width.

@joelthchao if we have many images, does master_dataset.append(data) create memory problem? instead loading images with memory could we read them from disk?

Was this page helpful?
0 / 5 - 0 ratings

Related issues

StevenLOL picture StevenLOL  ·  3Comments

braingineer picture braingineer  ·  3Comments

MarkVdBergh picture MarkVdBergh  ·  3Comments

harishkrishnav picture harishkrishnav  ·  3Comments

amityaffliction picture amityaffliction  ·  3Comments