Keras: Understanding stateful_lstm.py

Created on 21 Jun 2016  ·  3Comments  ·  Source: keras-team/keras

Hi I am quite new to keras and have question about example stateful_lstm.py.


tsteps = 1
batch_size =25
model = Sequential()
model.add(LSTM(50,
batch_input_shape=(batch_size, tsteps, 1),
return_sequences=True,
stateful=True))
model.add(LSTM(50,
batch_input_shape=(batch_size, tsteps, 1),
return_sequences=False,
stateful=True))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop')

Sorry for possible trivial questions.
I don't know where to ask these things..

Q1. What does 'return_sequence' mean?

I read doc " Whether to return the last output in the output sequence, or the full sequence."
But having hardtime understand how does LSTM behavior change with this.
So as far as I know, the model has two layers of LSTM.
what is 'output _sequence_'?
one possible explanation I thought is that LSTM produces output in batch unit and
setting 'return_sequence' produces one 50-dimension vector per batch(size=25),
and "return_sequence=false" produces 25 50-dimension vector per batch.
is it true?

Q2. About Dimensions of stateful_lstm.py

this may be related to Q1. as far as i understand, model of stateful_lstm is like

LSTM1
|
LSTM2
|
Dense

and output from LSTM1 is 50 dimension but LSTM2 takes input 1x1. how can this happen?

Q3. can't understand doc about batch_input_size = "This is useful for specifying a fixed batch size (e.g. with stateful RNNs)."

what does 'fixed batch size' have to do with stateful RNN? I posted related question yesterday
How is 'batch_size' parameter internally used in LSTM?
is it related with initializing state after processing one batch or something?

Q4. How to specify LSTM cell_size?

from colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM has cell state and I couldn't find a way to specify cell state dimension of LSTM in keras

Hope this question can help people who have trouble understanding lstm and keras.

I guess understanding basic concept and formula and usage of RNN and LSTM is easy but
understanding the real process how RNN and LSTM is trained via batch and state and those things..
Does anyone know material that explains whole training process of LSTM,RNN line by line for beginners like me? T T..

stale

Most helpful comment

Q1.
The input of a recurrent network is three-dimensional: each batch of data consists of a number of samples, and each sample consists of a number of _timesteps._ A "sequence" is basically a sample that has an additional timestep dimension.

An LSTM with return_sequences=True will return one vector for each timestep in each sample of the batch. An LSTM with return_sequences=False will only return one vector for each sample, it basically squashes the time dimension. (Note that none of those just return one vector per batch, which would be ... strange, I think.)

Q2.
From a quick glance, I believe you're right in your confusion that it doesn't make sense. The input_shape and batch_input_shape arguments are usually only relevant for the _first_ layer, since Keras can usually infer the shape automatically for all subsequent ones. It might actually just ignore this parameter here...

Q3.
This is explained in the docs: "the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch."
So the position of a sample in the batch is important with stateful RNNs, since hidden state is remembered per index in the batch. To make this work, batch size must be constant.

Q4.
I just glanced over that tutorial, but isn't this just the output_dim, i.e. the "50" in your code example? (In the docs it's described as the "dimension of the internal projections and the final output.")

All 3 comments

Q1.
The input of a recurrent network is three-dimensional: each batch of data consists of a number of samples, and each sample consists of a number of _timesteps._ A "sequence" is basically a sample that has an additional timestep dimension.

An LSTM with return_sequences=True will return one vector for each timestep in each sample of the batch. An LSTM with return_sequences=False will only return one vector for each sample, it basically squashes the time dimension. (Note that none of those just return one vector per batch, which would be ... strange, I think.)

Q2.
From a quick glance, I believe you're right in your confusion that it doesn't make sense. The input_shape and batch_input_shape arguments are usually only relevant for the _first_ layer, since Keras can usually infer the shape automatically for all subsequent ones. It might actually just ignore this parameter here...

Q3.
This is explained in the docs: "the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch."
So the position of a sample in the batch is important with stateful RNNs, since hidden state is remembered per index in the batch. To make this work, batch size must be constant.

Q4.
I just glanced over that tutorial, but isn't this just the output_dim, i.e. the "50" in your code example? (In the docs it's described as the "dimension of the internal projections and the final output.")

Guys I wrote a big article on how stateful works : http://philipperemy.github.io/keras-stateful-lstm/
Check this out

Q4:

@mbollmann is right, according to the code the cell weight matrix dimension is directly determined by the input and output dimension.

Was this page helpful?
0 / 5 - 0 ratings