Hi I am quite new to keras and have question about example stateful_lstm.py.
tsteps = 1
batch_size =25
model = Sequential()
model.add(LSTM(50,
batch_input_shape=(batch_size, tsteps, 1),
return_sequences=True,
stateful=True))
model.add(LSTM(50,
batch_input_shape=(batch_size, tsteps, 1),
return_sequences=False,
stateful=True))
model.add(Dense(1))
model.compile(loss='mse', optimizer='rmsprop')
Sorry for possible trivial questions.
I don't know where to ask these things..
I read doc " Whether to return the last output in the output sequence, or the full sequence."
But having hardtime understand how does LSTM behavior change with this.
So as far as I know, the model has two layers of LSTM.
what is 'output _sequence_'?
one possible explanation I thought is that LSTM produces output in batch unit and
setting 'return_sequence' produces one 50-dimension vector per batch(size=25),
and "return_sequence=false" produces 25 50-dimension vector per batch.
is it true?
this may be related to Q1. as far as i understand, model of stateful_lstm is like
LSTM1
|
LSTM2
|
Dense
and output from LSTM1 is 50 dimension but LSTM2 takes input 1x1. how can this happen?
what does 'fixed batch size' have to do with stateful RNN? I posted related question yesterday
How is 'batch_size' parameter internally used in LSTM?
is it related with initializing state after processing one batch or something?
from colah.github.io/posts/2015-08-Understanding-LSTMs/
LSTM has cell state and I couldn't find a way to specify cell state dimension of LSTM in keras
Hope this question can help people who have trouble understanding lstm and keras.
I guess understanding basic concept and formula and usage of RNN and LSTM is easy but
understanding the real process how RNN and LSTM is trained via batch and state and those things..
Does anyone know material that explains whole training process of LSTM,RNN line by line for beginners like me? T T..
Q1.
The input of a recurrent network is three-dimensional: each batch of data consists of a number of samples, and each sample consists of a number of _timesteps._ A "sequence" is basically a sample that has an additional timestep dimension.
An LSTM with return_sequences=True
will return one vector for each timestep in each sample of the batch. An LSTM with return_sequences=False
will only return one vector for each sample, it basically squashes the time dimension. (Note that none of those just return one vector per batch, which would be ... strange, I think.)
Q2.
From a quick glance, I believe you're right in your confusion that it doesn't make sense. The input_shape
and batch_input_shape
arguments are usually only relevant for the _first_ layer, since Keras can usually infer the shape automatically for all subsequent ones. It might actually just ignore this parameter here...
Q3.
This is explained in the docs: "the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch."
So the position of a sample in the batch is important with stateful RNNs, since hidden state is remembered per index in the batch. To make this work, batch size must be constant.
Q4.
I just glanced over that tutorial, but isn't this just the output_dim
, i.e. the "50" in your code example? (In the docs it's described as the "dimension of the internal projections and the final output.")
Guys I wrote a big article on how stateful works : http://philipperemy.github.io/keras-stateful-lstm/
Check this out
Q4:
@mbollmann is right, according to the code the cell weight matrix dimension is directly determined by the input and output dimension.
Most helpful comment
Q1.
The input of a recurrent network is three-dimensional: each batch of data consists of a number of samples, and each sample consists of a number of _timesteps._ A "sequence" is basically a sample that has an additional timestep dimension.
An LSTM with
return_sequences=True
will return one vector for each timestep in each sample of the batch. An LSTM withreturn_sequences=False
will only return one vector for each sample, it basically squashes the time dimension. (Note that none of those just return one vector per batch, which would be ... strange, I think.)Q2.
From a quick glance, I believe you're right in your confusion that it doesn't make sense. The
input_shape
andbatch_input_shape
arguments are usually only relevant for the _first_ layer, since Keras can usually infer the shape automatically for all subsequent ones. It might actually just ignore this parameter here...Q3.
This is explained in the docs: "the last state for each sample at index i in a batch will be used as initial state for the sample of index i in the following batch."
So the position of a sample in the batch is important with stateful RNNs, since hidden state is remembered per index in the batch. To make this work, batch size must be constant.
Q4.
I just glanced over that tutorial, but isn't this just the
output_dim
, i.e. the "50" in your code example? (In the docs it's described as the "dimension of the internal projections and the final output.")