tensorflow 🚀 - ValueError: Attempt to reuse RNNCell with a different variable scope than its first use.

I am getting the same error when trying to run the translate example (even when doing the small self test) which can be found here: https://github.com/tensorflow/models/tree/master/tutorials/rnn/translate

ghost on 8 Mar 2017

👍22

I met the same issue. If you are all using compiled version on master branch, I believe that we are the same issue caused by the recent commit. As the commit message says:

Make all RNNCells in tf.contrib.rnn act like tf.layers Layers, but with stricter semantics for no
w:

Upon first use of __call__, the used scope is stored in the cell. The RNNCell tries to create weights in that scope but if some are already set, an error is raised unless the RNNCell was constructed with argument reuse=True.

A subsequent use of __call__ of the same cell instance must be in the same scope.
If it is not, an error is raised.

From my case, which is running the ptb tutorial, the solution is just to add a parameter named with reuse like this at line 112:

def lstm_cell():
  return tf.contrib.rnn.BasicLSTMCell(
      size, forget_bias=0.0, state_is_tuple=True, reuse=tf.get_variable_scope().reuse)

Then it works.

tongda on 8 Mar 2017

👍28 ❤7 🎉7 😄5

@ebrevdo Could you please take a look at this?

prb12 on 8 Mar 2017

The issue replicates for me when using the Windows/GPU build 105 on the Shakespeare RNN Repo.

When running the code with the Win 1.0.0/GPU Release, there is no issue.

tomwanzek on 9 Mar 2017

That repo looks like it's targeted at tf 1.0, not intermediate releases.

On Mar 8, 2017 3:56 PM, "Tom Wanzek" notifications@github.com wrote:

The issue replicates for me when using the Windows/GPU build 105 on the Shakespeare
RNN Repo https://github.com/martin-gorner/tensorflow-rnn-shakespeare.

When running the code with the Win 1.0.0/GPU Release, there is no issue.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-285209555,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim5ansaL1KN51T4nCaqLnqw2QHN4Wks5rj0BBgaJpZM4MWl4f
.

ebrevdo on 9 Mar 2017

@tongda , I am using the Release Version of Tensorflow 1.0, working on MacOS in cpu mode. I will switch to the master branch to see if it work by adding the "reuse" parameter, thanks.

doncat99 on 9 Mar 2017

doncat99: if you do, please ensure your code queries the tensorflow version
and raises a flag if the version is lower than the master branch version.
you may need to check against:

from tensorflow.core import versions
versions.GIT_VERSION

On Wed, Mar 8, 2017 at 6:58 PM, doncat99 notifications@github.com wrote:

@tongda https://github.com/tongda , I am using the Release Version of
Tensorflow 1.0, working on MacOS in cpu mode. I will switch to the master
branch to see if it work by adding the "reuse" parameter, thanks.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-285240438,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim66cU9e16lgD-2D0QLGcQCiHbV0zks5rj2rbgaJpZM4MWl4f
.

ebrevdo on 9 Mar 2017

@ebrevdo So what would be the suggested changes to the Shakepeare RNN to allow it to work with the intermediate stable release?

Here is the key architectural section of the code, which now fails with build#105:

#
# the model (see FAQ in README.md)
#
lr = tf.placeholder(tf.float32, name='lr')  # learning rate
pkeep = tf.placeholder(tf.float32, name='pkeep')  # dropout parameter
batchsize = tf.placeholder(tf.int32, name='batchsize')

# inputs
X = tf.placeholder(tf.uint8, [None, None], name='X')    # [ BATCHSIZE, SEQLEN ]
Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0)                 # [ BATCHSIZE, SEQLEN, ALPHASIZE ]
# expected outputs = same sequence shifted by 1 since we are trying to predict the next character
Y_ = tf.placeholder(tf.uint8, [None, None], name='Y_')  # [ BATCHSIZE, SEQLEN ]
Yo_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0)               # [ BATCHSIZE, SEQLEN, ALPHASIZE ]
# input state
Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE*NLAYERS], name='Hin')  # [ BATCHSIZE, INTERNALSIZE * NLAYERS]

# using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times
# dynamic_rnn infers SEQLEN from the size of the inputs Xo

onecell = rnn.GRUCell(INTERNALSIZE)
dropcell = rnn.DropoutWrapper(onecell, input_keep_prob=pkeep)
multicell = rnn.MultiRNNCell([dropcell for _ in range(NLAYERS)], state_is_tuple=False)
multicell = rnn.DropoutWrapper(multicell, output_keep_prob=pkeep)
Yr, H = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=Hin)
# Yr: [ BATCHSIZE, SEQLEN, INTERNALSIZE ]
# H:  [ BATCHSIZE, INTERNALSIZE*NLAYERS ] # this is the last state in the sequence

I do not seem to find any documentation regarding a reuse flag?

Thanks in advance.

tomwanzek on 10 Mar 2017

Use:

multicell = rnn.MultiRNNCell([rnn.DropoutWrapper(rnn.GRUCell(INTERNALSIZE),
input_keep_prob=pkeep) for _ in range(NLAYERS)], state_is_tuple=False)

Which creates a separate grucell object for each layer.

On Mar 10, 2017 7:44 AM, "Tom Wanzek" notifications@github.com wrote:

@ebrevdo https://github.com/ebrevdo So what would be the suggested
changes to the Shakepeare RNN to allow it to work with the intermediate
stable release?

Here is the key architectural section of the code, which now fails with
build#105:

the model (see FAQ in README.md)

lr = tf.placeholder(tf.float32, name='lr') # learning rate
pkeep = tf.placeholder(tf.float32, name='pkeep') # dropout parameter
batchsize = tf.placeholder(tf.int32, name='batchsize')

inputs

X = tf.placeholder(tf.uint8, [None, None], name='X') # [ BATCHSIZE, SEQLEN ]
Xo = tf.one_hot(X, ALPHASIZE, 1.0, 0.0) # [ BATCHSIZE, SEQLEN, ALPHASIZE ]# expected outputs = same sequence shifted by 1 since we are trying to predict the next character
Y_ = tf.placeholder(tf.uint8, [None, None], name='Y_') # [ BATCHSIZE, SEQLEN ]
Yo_ = tf.one_hot(Y_, ALPHASIZE, 1.0, 0.0) # [ BATCHSIZE, SEQLEN, ALPHASIZE ]# input state
Hin = tf.placeholder(tf.float32, [None, INTERNALSIZE*NLAYERS], name='Hin') # [ BATCHSIZE, INTERNALSIZE * NLAYERS]

using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times# dynamic_rnn infers SEQLEN from the size of the inputs Xo

onecell = rnn.GRUCell(INTERNALSIZE)
dropcell = rnn.DropoutWrapper(onecell, input_keep_prob=pkeep)
multicell = rnn.MultiRNNCell([dropcell for _ in range(NLAYERS)], state_is_tuple=False)
multicell = rnn.DropoutWrapper(multicell, output_keep_prob=pkeep)
Yr, H = tf.nn.dynamic_rnn(multicell, Xo, dtype=tf.float32, initial_state=Hin)# Yr: [ BATCHSIZE, SEQLEN, INTERNALSIZE ]# H: [ BATCHSIZE, INTERNALSIZE*NLAYERS ] # this is the last state in the sequence

I do not seem to find any documentation regarding a reuse flag?

Thanks in advance.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-285702372,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim6MOOCbx3RJEJe8PQBDXGVIXTGPmks5rkW_jgaJpZM4MWl4f
.

ebrevdo on 15 Mar 2017

I don't understand why I am getting this error with the seq2seq tutorial model:

cell = tf.contrib.rnn.MultiRNNCell([single_cell() for _ in range(num_layers)])

Source

where the cell is created with

def single_cell():
    return tf.contrib.rnn.GRUCell(size)

BSVogler on 15 Mar 2017

👍20 😄1

@ebrevdo Thanks for getting back to this issue. Unfortunately, the suggested change leaves matters as they are, with the aforementioned error. Given the above comment regarding the seq2seq tutorial, I suspect we are all in the same boat?

tomwanzek on 15 Mar 2017

Are you sure it's the exact same error? Please copy and paste it here.

ebrevdo on 16 Mar 2017

My bad, I just went through the change process to the relevant code again (from scratch) and re-ran it as proposed. The error has indeed been removed and the Old Bard is hallucinating just fine now 👍

So, thx, not sure where I went wrong yesterday, but it was clearly on me.

tomwanzek on 16 Mar 2017

I met the same problem when using the Release Version of Tensorflow 1.0 and working on MacOS in cpu mode.Even if add the "reuse" parameter

def cell():
    return tf.contrib.rnn.BasicLSTMCell(rnn_size,state_is_tuple=True,reuse=tf.get_variable_scope().reuse)

muticell = tf.contrib.rnn.MultiRNNCell([cell for _ in range(num_layers)], state_is_tuple=True)

bingfengyiren on 17 Mar 2017

your multicell looks wrong... you should be using "cell() for _ in
range(...)"

On Thu, Mar 16, 2017 at 8:29 PM, cuiming notifications@github.com wrote:

I met the same problem when using the Release Version of Tensorflow 1.0
and working on MacOS in cpu mode.Even if add the "reuse" parameter

def cell():
return tf.contrib.rnn.BasicLSTMCell(rnn_size,state_is_tuple=True,reuse=tf.get_variable_scope().reuse)

muticell = tf.contrib.rnn.MultiRNNCell([cell for _ in range(num_layers)], state_is_tuple=True)

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-287257629,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim3A6JQr8ptRKrdiDW_kgNRIFkHGlks5rmf4WgaJpZM4MWl4f
.

ebrevdo on 17 Mar 2017

I was trying to run the translate example: python2.7 translate.py --data_dir data/ --train_dir train/ --size=256 --num_layers=2 --steps_per_checkpoint=50

It seems the way to use MultiRNNCell is correct:
cell = tf.contrib.rnn.MultiRNNCell([single_cell() for _ in range(num_layers)])

But I got the same error:
ValueError: Attempt to reuse RNNCell with a different variable scope than its first use. First use of cell was with scope 'embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/multi_rnn_cell/cell_0/gru_cell', this attempt is with scope 'embedding_attention_seq2seq/rnn/multi_rnn_cell/cell_0/gru_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([GRUCell(...)] * num_layers), change to: MultiRNNCell([GRUCell(...) for _ in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

bowu on 26 Mar 2017

👍10

@bowu - did you have any luck with this? if you haven't tried it yet, reinstall tensorflow from the latest source. there were some changes to some of the core_rnn files, among a few others. works for me now.

robmsylvester on 29 Mar 2017

@robmsylvester I reinstall tensorflow from the latest source, still the same error. I was on branch master and the latest commit is commit 2a4811054a9e6b83e1f5a2705a92aab50e151b13. What's the latest commit when you build your repo?

oxwsds on 30 Mar 2017

Hi, I am using Tensorflow r1.0 using GPU built using source. I am trying to follow the unmodified Seq2Seq translation tutorial, but I'm getting the same error. i.e.

ValueError: Attempt to reuse RNNCell with a different variable scope than its first use. First use of cell was with scope 'embedding_attention_seq2seq/embedding_attention_decoder/attention_decoder/multi_rnn_cell/cell_0/gru_cell', this attempt is with scope 'embedding_attention_seq2seq/rnn/multi_rnn_cell/cell_0/gru_cell'.....

The relevant portion of the code in my seq2seq_model.py is:

 # Create the internal multi-layer cell for our RNN.
    def single_cell():
      return tf.contrib.rnn.GRUCell(size)
    if use_lstm:
      def single_cell():
        return tf.contrib.rnn.BasicLSTMCell(size)
    cell = single_cell()
    if num_layers > 1:
      cell = tf.contrib.rnn.MultiRNNCell([single_cell() for _ in range(num_layers)])

What can I do to solve the problem?

adding "reuse=tf.get_variable_scope().reuse" to the call where the GRUCell is created doesn't help.

Thanks a ton!

prashantserai on 3 Apr 2017

👍5

@prashantserai - see what happens if you remove the MultiRNNCell line from above, effectively making your network just one layer. Does it work then? It might be a bug somewhere in MultiRNNCell. I've read about that somewhere recently, probably on stack overflow.

If you implement the stacked lstm/gru yourself, you don't get this error, and you can implement the same functionality (actually more, because you're free to do whatever you want with bidirectional architectures, weird residual and skip connections, etc.)

robmsylvester on 4 Apr 2017

@robmsylvester The same error persisted even when I tried with num_layers=1 which should effectively skip that line. Any other ideas? Thanks for the input.

prashantserai on 4 Apr 2017

👍3 👎1

Hmmm. One thing that stands out to me is in the referenced legacy seq2seq file:

encoder_cell = copy.deepcopy(cell)

This line appears to be used because the same architecture is used on both the encoder and decoder side. They make a copy of the cell, then pass the cell argument along to the attention decoder embedding function, then to the attention decoder itself.

What happens if you explicitly create the encoder cell AND the decoder cell in your seq2seq model file and pass both along to the legacy library file, making the small adjustments to the functions and their arguments?

robmsylvester on 4 Apr 2017

👍1

@robmsylvester shouldn't making changes in the scopes of the cells work? It's working for the other two examples as well. In my opinion, this would be a very ugly workaround; a cleaner solution must exist; maybe we are missing something? ( I got the same error on the seq2seq tutorial as well, tried all of the above solutions).

iamgroot42 on 4 Apr 2017

@iamgroot42 - Yeah, that 'solution' is admittedly very ugly, but more so just trying to hunt down where an issue might be. I'll play with it in a few hours and see if I can track something down.

robmsylvester on 4 Apr 2017

In fact, the copy.deepcopy is there because these are legacy functions and
we don't have the resources to maintain/update them. If you'd like to
introduce a backwards-compatible change that allows the user to provide a
second cell for the decoding step, and if it's None then to fallback on the
deepcopy, then I would be happy to review the PR. Keep in mind it would
have to be a backwards compatible change.

On Tue, Apr 4, 2017 at 11:38 AM, Rob Sylvester notifications@github.com
wrote:

@iamgroot42 https://github.com/iamgroot42 - Yeah, that 'solution' is
admittedly very ugly, but more so just trying to hunt down where an issue
might be. I'll play with it in a few hours and see if I can track something
down.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-291593289,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim1QHTDhOC_zT6cKtmUFPOit5Yjn7ks5rso5CgaJpZM4MWl4f
.

ebrevdo on 6 Apr 2017

@ebrevdo - I'll think about it. I do have a translator that works pretty similar to this one but creates cells through a separate class that allows for inserting bidirectional layers where you want, residuals where you want, merging inputs with concat vs. sum, and a few other things. I think I could migrate my class over to this tutorial pretty easily by using static RNN's. I'll let you know.

robmsylvester on 6 Apr 2017

@ebrevdo i am running Tensorflow r1.0 (tensorflow-1.0.1-cp36-cp36m-linux_x86_64) on Red Hat and have the latest version of the translation tutorial from Github.. is there a way you know to make this work currently?

prashantserai on 6 Apr 2017

It's unfortunate that the translation tutorial does not work with TF 1.0. We should fix that. @lukaszkaiser can you take a look? We're working on a new tutorial but it's still a few weeks off and will require a nightly version of TensorFlow (or TF 1.1 or 1.2) to work.

ebrevdo on 6 Apr 2017

(lukasz; it's hard for me to identify from the various comments which part of the tutorial is faulty in TF 1.0. any chance you could identify the line and i can help get it working?)

ebrevdo on 6 Apr 2017

@ebrevdo It's this tutorial. The error is in this cluster of lines. The cells passed here are used for both the backward and forward phase of the legacy seq2seq model, which throws an error because of same cells being used with different scopes.

iamgroot42 on 6 Apr 2017

@iamgroot42 do you want to make a PR with the needed changes? That would be great, I currently don't have the cycles to do that myself. Thanks!

lukaszkaiser on 6 Apr 2017

I noticed that the TF 1.0 works fine with the newest version of translation tutorial if compiled from the source on branch remotes/origin/r1.0

$ git clone https://github.com/tensorflow/tensorflow
$ cd tensorflow
$ git checkout remotes/origin/r1.0

then build and install TensorFlow, it works fine.

On branch remotes/origin/r1.1 it has the "different variable scope" error.
I modified the code as @robmsylvester suggested

What happens if you explicitly create the encoder cell AND the decoder cell in your seq2seq model file and pass both along to the legacy library file, making the small adjustments to the functions and their arguments?

and it works for me now.

oxwsds on 6 Apr 2017

👍1

@oxwsds the Tensorflow I'm using is 1.0.1 so maybe that's having an error..

I had tried what @robmsylvester suggested then actually.. and the training had begun (2 days 13 hours done now).. it fails during decoding though with the error:

  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 883, in embedding_attention_seq2seq
    initial_state_attention=initial_state_attention)
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 787, in embedding_attention_decoder
    initial_state_attention=initial_state_attention)
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 686, in attention_decoder
    cell_output, state = cell(x, state)
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 796, in __call__
    % (len(self.state_size), state))
ValueError: Expected state to be a tuple of length 3, but received: Tensor("model_with_buckets/embedding_attention_seq2seq/rnn/gru_cell_4/add:0", shape=(?, 1024), dtype=float32)

Did you try decoding?

prashantserai on 8 Apr 2017

@prashantserai Don't exactly know, but what you met seems to be another issue.

oxwsds on 10 Apr 2017

@prashantserai If it fails only when you decode, perhaps it has something to do with using a batch size of one? Does the model still train if you lower the batch size to one during training?

robmsylvester on 10 Apr 2017

@bowu Same error here. Mac OX Sierra, TensorFlow 1.1.0-rc1, Python 2.7.10 & Python 3.6.1.

soloice on 10 Apr 2017

@robmsylvester it did train successfully with a batch size of one too, but failed during decoding in the same way or similar way.. here's a full traceback.. the reason I was thinking of this as a connected error was because of the reference to seq2seq_f (which was one of the modified functions) (the #prashant comment from my code to signify a modified line is part of the trace)

2017-04-10 11:32:27.447042: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: GeForce GTX 780 Ti
major: 3 minor: 5 memoryClockRate (GHz) 0.928
pciBusID 0000:42:00.0
Total memory: 2.95GiB
Free memory: 2.88GiB
2017-04-10 11:32:27.447094: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-04-10 11:32:27.447102: I tensorflow/core/common_runtime/gpu/gpu_device.cc:918] 0:   Y 
2017-04-10 11:32:27.447118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 780 Ti, pci bus id: 0000:42:00.0)
Traceback (most recent call last):
  File "translate.py", line 322, in <module>
    tf.app.run()
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 48, in run
    _sys.exit(main(_sys.argv[:1] + flags_passthrough))
  File "translate.py", line 317, in main
    decode()
  File "translate.py", line 248, in decode
    model = create_model(sess, True)
  File "translate.py", line 136, in create_model
    dtype=dtype)
  File "/data/data6/scratch/serai/models/tutorials/rnn/translate/seq2seq_model.py", line 168, in __init__
    softmax_loss_function=softmax_loss_function)
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1203, in model_with_buckets
    decoder_inputs[:bucket[1]])
  File "/data/data6/scratch/serai/models/tutorials/rnn/translate/seq2seq_model.py", line 167, in <lambda>
    self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True),
  File "/data/data6/scratch/serai/models/tutorials/rnn/translate/seq2seq_model.py", line 144, in seq2seq_f
    dtype=dtype) #prashant
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 883, in embedding_attention_seq2seq
    initial_state_attention=initial_state_attention)
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 787, in embedding_attention_decoder
    initial_state_attention=initial_state_attention)
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 686, in attention_decoder
    cell_output, state = cell(x, state)
  File "/homes/3/serai/.conda/envs/tensorflow_r1.0_gpu/lib/python3.6/site-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 796, in __call__
    % (len(self.state_size), state))
ValueError: Expected state to be a tuple of length 3, but received: Tensor("model_with_buckets/embedding_attention_seq2seq/rnn/gru_cell_4/add:0", shape=(?, 1024), dtype=float32)

@oxwsds does your opinion change on the basis of the full trace above?

prashantserai on 10 Apr 2017

@prashantserai I tried decoding and it works fine. I just simply add a encoder_cell arg to function tf.contrib.legacy_seq2seq.embedding_attention_seq2seq and in translate/seq2seq_model.py create the cell and pass it to the function, which were called in function seq2seq_f. How did you change your code?

oxwsds on 11 Apr 2017

👍1

@oxwsds @robmsylvester @ebrevdo
I finally have something that's working now (I mean, results for my single layer 256 unit network are kind of appalling, but that's probably just because the network is ultra light weight and I didn't tune params AT ALL)
Thank you so much everyone...!!!!!

_Here's my thoughts at the end of this:_

@oxwsds comment that the tutorial (in it's current form) works without any need for modification when Tensorflow is compiled from the branch remotes/origin/r1.0 was TRUE. Although, the sad bit was that the version of Tensorflow I had for which modifications within Tensorflow code were needed, and the version in remotes/origin/r1.0 were both identically labelled.

@robmsylvester 's fix in the comment (copied below) DID WORK for my version of Tensorflow where the Tutorial didn't work out of the box (and should work for TF 1.1 too I guess). It is slightly messy to implement, but I could do it, which is saying something :-P
The error in my last two comments before this was due to my mistake. Like a dummy, I was specifying the layers and hidden units parameters only during training, I was leaving the code to use defaults during decoding. (this portion of the tutorial could be slightly more dummy proof: https://www.tensorflow.org/tutorials/seq2seq#lets_run_it )

Hmmm. One thing that stands out to me is in the referenced legacy seq2seq file:

encoder_cell = copy.deepcopy(cell)

This line appears to be used because the same architecture is used on both the encoder and decoder side. They make a copy of the cell, then pass the cell argument along to the attention decoder embedding function, then to the attention decoder itself.

What happens if you explicitly create the encoder cell AND the decoder cell in your seq2seq model file and pass both along to the legacy library file, making the small adjustments to the functions and their arguments?

prashantserai on 11 Apr 2017

👍1

Thanks for the feedback! Seems there's something different between the TF
on pypi and at that tag? Gunhan, is that possible?

On Mon, Apr 10, 2017 at 9:05 PM, prashantserai notifications@github.com
wrote:

@oxwsds https://github.com/oxwsds @robmsylvester
https://github.com/robmsylvester @ebrevdo https://github.com/ebrevdo
I finally have something that's working now (I mean, results for my single
layer 256 unit network are kind of appalling, but that's probably just
because the network is ultra light weight and I didn't tune params AT ALL)

Here's my bottomline:

@oxwsds https://github.com/oxwsds comment that the tutorial (in it's
current form) works without any need for modification when Tensorflow is
compiled from the branch remotes/origin/r1.0 was TRUE. The sad bit
although being that the version of Tensorflow I had for which modifications
within Tensorflow code were needed, and the version in remotes/origin/r1.0
were both identically labelled.

@robmsylvester https://github.com/robmsylvester 's fix in the comment
(copied below) DID WORK for my version of Tensorflow where the Tutorial
didn't work out of the box (and should work for TF 1.1 too I guess). It is
slightly messy to implement, but I could do it, which is saying something
:-P
The error in my last two comments before this was due to my mistake. Like
a dummy, I was specifying the layers and hidden units parameters only
during training, I was leaving the code to use defaults during decoding. (this
portion of the tutorial is could be slightly more dummy proof:
https://www.tensorflow.org/tutorials/seq2seq#lets_run_it
https://www.tensorflow.org/tutorials/seq2seq#lets_run_it )

Hmmm. One thing that stands out to me is in the referenced legacy seq2seq
file:

encoder_cell = copy.deepcopy(cell)

This line appears to be used because the same architecture is used on both
the encoder and decoder side. They make a copy of the cell, then pass the
cell argument along to the attention decoder embedding function, then to
the attention decoder itself.

What happens if you explicitly create the encoder cell AND the decoder
cell in your seq2seq model file and pass both along to the legacy library
file, making the small adjustments to the functions and their arguments?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-293143828,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtimxvcfFnbWbpj7aUs3BUjwGEFj6p5ks5ruvvygaJpZM4MWl4f
.

ebrevdo on 11 Apr 2017

For information I had this issue while trying to stack LSTM cells:
My orginial code was:

    lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_size, forget_bias=0.0, state_is_tuple=True)
    if is_training and keep_prob < 1:
      lstm_cell = tf.nn.rnn_cell.DropoutWrapper(
          lstm_cell, output_keep_prob=keep_prob)
    cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * num_layers, state_is_tuple=True)

Then, with the following code, creating the model was ok, but I couldn't share the variable with another model. (for instance if you create a train_model and a valid_model supposed to share tensors, it will fail)

    lstm_creator = lambda: tf.contrib.rnn.BasicLSTMCell(
                                        hidden_size, 
                                        forget_bias=0.0, state_is_tuple=True)
    if is_training and keep_prob < 1:
      cell_creator = lambda:tf.contrib.rnn.DropoutWrapper(
          lstm_creator(), output_keep_prob=keep_prob)
    else:
      cell_creator = lstm_creator

    cell = tf.contrib.rnn.MultiRNNCell([cell_creator() for _ in range(num_layers)], state_is_tuple=True)

So finally I used lstm_creator to be the function like lstm_cell in tensorflow/models/tutorials/rnn/ptb/ptb_word_lm.py#L112. I now have:

def lstm_cell():
      # With the latest TensorFlow source code (as of Mar 27, 2017),
      # the BasicLSTMCell will need a reuse parameter which is unfortunately not
      # defined in TensorFlow 1.0. To maintain backwards compatibility, we add
      # an argument check here:
      if 'reuse' in inspect.getargspec(
          tf.contrib.rnn.BasicLSTMCell.__init__).args:
        return tf.contrib.rnn.BasicLSTMCell(
            size, forget_bias=0.0, state_is_tuple=True,
            reuse=tf.get_variable_scope().reuse)
      else:
        return tf.contrib.rnn.BasicLSTMCell(
            size, forget_bias=0.0, state_is_tuple=True)
    attn_cell = lstm_cell

    lstm_creator = lstm_cell
    if is_training and keep_prob < 1:
      cell_creator = lambda:tf.contrib.rnn.DropoutWrapper(
          lstm_creator(), output_keep_prob=keep_prob)
    else:
      cell_creator = lstm_creator

    cell = tf.contrib.rnn.MultiRNNCell([cell_creator() for _ in range(num_layers)], state_is_tuple=True)

It is now fully working

pltrdy on 13 Apr 2017

👍3

trying to get this thing running, which results in the same error:

https://gist.github.com/danijar/c7ec9a30052127c7a1ad169eeb83f159#file-blog_tensorflow_sequence_classification-py-L38

@pltrdy 's solution didn't do it for me oddly. I'm getting

ValueError: Variable rnn/multi_rnn_cell/cell_0/basic_lstm_cell/weights does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?

aep on 16 Apr 2017

@aep did you use the function of https://github.com/tensorflow/models/blob/master/tutorials/rnn/ptb/ptb_word_lm.py#L112 I mention at the end of my post (now edited to be more clear)

pltrdy on 18 Apr 2017

cells=[]
for _ in range(15):
    cell = create_lstm_cell(config)
    cells.append(cell)
lsmt_layers = rnn.MultiRNNCell(cells)

it solved my problem

Tshzzz on 28 Apr 2017

Managed to fix this issue by installing older version of Tensorflow:
pip install -Iv tensorflow==1.0

I was receiving the error when executing the seq2seq tutorial

dsoiM on 28 Apr 2017

👍10 ❤1 🎉1

In regards to what @ebrevdo said, I think the solution is not to fix the legacy seq2seq code, but to update the tutorial to use the contrib.seq2seq package instead, which is actively maintained. It is quite demoralizing when the first tensorflow program you ever run spits out a bunch of errors. If I have some time this week, I'll submit a PR.

kyteague on 1 May 2017

👍2

We're working on a new seq2seq tutorial. We had hoped to release by end of
last month but are getting delayed. It will use the new API.

On May 1, 2017 8:07 AM, "Kyle Teague" notifications@github.com wrote:

In regards to what @ebrevdo https://github.com/ebrevdo said, I think
the solution is not to fix the legacy seq2seq code, but to update the
tutorial to use the contrib.seq2seq package instead, which is actively
maintained. It is quite demoralizing when the first tensorflow program you
ever run spits out a bunch of errors. If I have some time this week, I'll
submit a PR.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-298350307,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim587xZx9Gi4-yXmwccSum8_Trc1oks5r1fUogaJpZM4MWl4f
.

ebrevdo on 1 May 2017

@ebrevdo I meet the same error when running the sequence_to_sequence model on the tensorflow1.1 website. And I have try to use 'reuse' parameter but failed. Could you tell me when the new seq2seq tutorial will be released?

njuzrs on 5 May 2017

Looks like at the same time as tf 1.2, since we will rely on some new
features of that release.

On May 4, 2017 9:16 PM, "njuzrs" notifications@github.com wrote:

@ebrevdo https://github.com/ebrevdo I meet the same error when running
the sequence_to_sequence model on the tensorflow1.1 website. And I have try
to use 'reuse' parameter but failed. Could you tell me when the new seq2seq
tutorial will be released?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-299366774,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim8_kFTM7-SsXQAA-Ar0dfhHMGT0Zks5r2qKngaJpZM4MWl4f
.

ebrevdo on 5 May 2017

@ebrevdo I am as well facing the same issue and unable to progress with seq2seq. It will be really helpful if you could let us/me know what is a probable date for a new tutorial.
Thanks a lot for your help.

PratsBhatt on 8 May 2017

👍7

Installing using pip install tensorflow==1.0 (Tensorflow 1.0) is working for me (translate tutorial).

tanmayb123 on 9 May 2017

I have version 1.1.0-rc2.

PratsBhatt on 9 May 2017

TF1.2 will solve this problem? Please help me how to continue training the model. TF 1.0 works but doesn't have devicewrapper api for multiple GPUs.

MingCong18 on 15 May 2017

Having the same problem with tensor flow 1.1. Still working on a solution

thomasqjohns on 19 May 2017

I tried several things, at the end I was able to use tensorflow 1.1 but had to make these changes: (based on Tshzzz above)

Remove this:
multicell = rnn.MultiRNNCell([dropcell]*NLAYERS, state_is_tuple=False)

And add this:
cells=[]
for _ in range(NLAYERS):
cell = rnn.DropoutWrapper(tf.contrib.rnn.GRUCell(INTERNALSIZE), input_keep_prob=pkeep)
cells.append(cell)
multicell = rnn.MultiRNNCell(cells, state_is_tuple=False)

jtubert on 20 May 2017

👍3

@ebrevdo Congratulations, TF 1.2 just got released - was the new tutorial also released somewhere or is it being released anytime soon?

Thanks

prashantserai on 20 May 2017

We'll plan to have an announcement when it's released. Working on it.

On May 19, 2017 7:02 PM, "prashantserai" notifications@github.com wrote:

@ebrevdo https://github.com/ebrevdo Congratulations, TF 1.2 just got
released - was the new tutorial also released somewhere or is it being
released anytime soon?

Thanks

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-302844002,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim0RWDzNCXk-bIjKSyHLvgFxUvq2lks5r7km7gaJpZM4MWl4f
.

ebrevdo on 20 May 2017

👍2

For anyone using tensorflow-gpu==1.1.0 and getting this error, switching to 1.0.0 via pip install tensorflow-gpu==1.0.0 is not going to fix the problem, at least didn't work for me.

I ran into this issue on both mac and ubuntu and compiling from source worked both times. So:
pip install https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.0-cp34-cp34m-linux_x86_64.whl

ajaanbaahu on 24 May 2017

@ajaanbaahu Still waiting for tf1.2 new seq2seq tutorial.

MingCong18 on 25 May 2017

👍3

It worked for me using pip install tensorflow==1.0.

saching270 on 26 May 2017

For tf r1.2, got deepcopy error. As listed in sequence to sequence model error #1050

Vimos on 26 May 2017

As the rookie, I raise some of my opinion.
The following code will make this similar mistake occure:
(Piece of my code)

lstm_cell = self.LSTMCell(self.num_hidden)
lstm_entity = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=0.5)
layer = tf.contrib.rnn.MultiRNNCell([lstm_entity] * self.num_layer)
__, _ = tf.nn.dynamic_rnn(layer, self.data, dtype=tf.float64)

The error dump as the following:

Traceback (most recent call last):
  File "IntentNet.py", line 71, in <module>
    net = Net(data, target, 5, 1)
  File "IntentNet.py", line 45, in __init__
    __, _ = tf.nn.dynamic_rnn(layer, self.data, dtype=tf.float64)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 553, in dynamic_rnn
    dtype=dtype)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 720, in _dynamic_rnn_loop
    swap_memory=swap_memory)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2623, in while_loop
    result = context.BuildLoop(cond, body, loop_vars, shape_invariants)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2456, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/control_flow_ops.py", line 2406, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 705, in _time_step
    (output, new_state) = call_cell()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/rnn.py", line 691, in <lambda>
    call_cell = lambda: cell(input_t, state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 953, in __call__
    cur_inp, new_state = cell(cur_inp, cur_state)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 713, in __call__
    output, new_state = self._cell(inputs, state, scope)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 235, in __call__
    with _checked_scope(self, scope or "basic_lstm_cell", reuse=self._reuse):
  File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/rnn/python/ops/core_rnn_cell_impl.py", line 77, in _checked_scope
    type(cell).__name__))
ValueError: Attempt to reuse RNNCell <tensorflow.contrib.rnn.python.ops.core_rnn_cell_impl.BasicLSTMCell object at 0x7fe4fc7bd150> with a different variable scope than its first use.  First use of cell was with scope 'rnn/multi_rnn_cell/cell_0/basic_lstm_cell', this attempt is with scope 'rnn/multi_rnn_cell/cell_1/basic_lstm_cell'.  Please create a new instance of the cell if you would like it to use a different set of weights.  If before you were using: MultiRNNCell([BasicLSTMCell(...)] * num_layers), change to: MultiRNNCell([BasicLSTMCell(...) for _ in range(num_layers)]).  If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse).  In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

But after I do the revision, It can work.

"""
lstm_cell = self.LSTMCell(self.num_hidden)
lstm_entity = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=0.5)
layer = tf.contrib.rnn.MultiRNNCell([lstm_entity] * self.num_layer)
"""
layer = []
for i in range(self.num_layer):
    lstm_cell = self.LSTMCell(self.num_hidden)
    lstm_entity = tf.contrib.rnn.DropoutWrapper(lstm_cell, output_keep_prob=0.5)
    layer.append(lstm_entity)
layer = tf.contrib.rnn.MultiRNNCell(layer)
__, _ = tf.nn.dynamic_rnn(layer, self.data, dtype=tf.float64)

SunnerLi on 29 May 2017

None of those workarounds worked for me with Tensorflow 1.1

I'm using seq2seq model with MultiRNNCell cells.

I had to reverse back to 1.0.1: pip3 install tensorflow==1.0

philipperemy on 2 Jun 2017

👍6

Anyone have these issues when working with legacy_seq2seq.rnn_decoder()?

rileyedmunds on 6 Jun 2017

@oxwsds As you said, I change input args cell of tf.contrib.legacy_seq2seq.embedding_attention_seq2seq to two different cell {encoder_cells, decoder_cells}. Finally, I get seq2seq model worked. After 73200 setps, I get perplexity 5.54.
Then I run decode part,

Who is the president of the United States?
Qui est le président des États-Unis ?

Problem solved. Thanks.

supermeatboy82 on 7 Jun 2017

@doncat99
It seems that copy.deepcopy(cell) in seq2seq.py doesn't make effect.
So I change the related part in seq2seq_model.py to

if num_layers > 1:
      cell_enc = tf.contrib.rnn.MultiRNNCell([single_cell() for _ in range(num_layers)])
      cell_dec = tf.contrib.rnn.MultiRNNCell([single_cell() for _ in range(num_layers)])

    # The seq2seq function: we use embedding for the input and attention.
    def seq2seq_f(encoder_inputs, decoder_inputs, do_decode):
      return seq2seq.embedding_attention_seq2seq(
          encoder_inputs,
          decoder_inputs,
          cell_enc,
          cell_dec,
          num_encoder_symbols=source_vocab_size,
          num_decoder_symbols=target_vocab_size,
          embedding_size=size,
          output_projection=output_projection,
          feed_previous=do_decode,
          dtype=dtype)

ypruan on 15 Jun 2017

👍2

@supermeatboy82 , Could you share your code?

martinambition on 19 Jun 2017

Upgrading to Tensorflow 1.2.0 and generating the cells in a loop instead of list multiplication fixed this for me.

cpury on 21 Jun 2017

Got the error with TF1.2 when running translate.py, details:
name: GeForce GTX 1080 Ti
major: 6 minor: 1 memoryClockRate (GHz) 1.582
pciBusID 0000:02:00.0
Total memory: 10.91GiB
Free memory: 10.76GiB
2017-06-22 09:15:04.485252: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-06-22 09:15:04.485256: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0: Y
2017-06-22 09:15:04.485265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:02:00.0)
Creating 3 layers of 1024 units.
Traceback (most recent call last):
File "translate.py", line 322, in
tf.app.run()
File "/home/lscm/opt/anaconda2/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "translate.py", line 319, in main
train()
File "translate.py", line 178, in train
model = create_model(sess, False)
File "translate.py", line 136, in create_model
dtype=dtype)
File "/data/research/github/dl/tensorflow/tensorflow/models/tutorials/rnn/translate/seq2seq_model.py", line 179, in __init__
softmax_loss_function=softmax_loss_function)
File "/home/lscm/opt/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1206, in model_with_buckets
decoder_inputs[:bucket[1]])
File "/data/research/github/dl/tensorflow/tensorflow/models/tutorials/rnn/translate/seq2seq_model.py", line 178, in
lambda x, y: seq2seq_f(x, y, False),
File "/data/research/github/dl/tensorflow/tensorflow/models/tutorials/rnn/translate/seq2seq_model.py", line 142, in seq2seq_f
dtype=dtype)
File "/home/lscm/opt/anaconda2/lib/python2.7/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 848, in embedding_attention_seq2seq
encoder_cell = copy.deepcopy(cell)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 174, in deepcopy
y = copier(memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 476, in __deepcopy__
setattr(result, k, copy.deepcopy(v, memo))
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 230, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 230, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 230, in _deepcopy_list
y.append(deepcopy(a, memo))
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
y.append(deepcopy(a, memo))
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 334, in _reconstruct
state = deepcopy(state, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 163, in deepcopy
y = copier(x, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 257, in _deepcopy_dict
y[deepcopy(key, memo)] = deepcopy(value, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 190, in deepcopy
y = _reconstruct(x, rv, 1, memo)
File "/home/lscm/opt/anaconda2/lib/python2.7/copy.py", line 343, in _reconstruct
y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'

syw2014 on 22 Jun 2017

I also met the error caused by copy.deepcopy(cell) in embedding_attention_seq2seq() when running self_test() in the translate model in tutorial.
I tried to change the codes in seq2seq_f() in Seq2SeqModel as follows:

    def seq2seq_f(encoder_inputs, decoder_inputs, do_decode=False):
        tmp_cell = copy.deepcopy(cell) #new
        return tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
            encoder_inputs,
            decoder_inputs,
            tmp_cell, #new
            num_encoder_symbols=source_vocab_size,
            num_decoder_symbols=target_vocab_size,
            embedding_size=size,
            output_projection=output_projection,
            feed_previous=do_decode,
            dtype=dtype)

Then there is no error now.
BUT as a rookie I don't know whether the codes here work as before and it seems the changes make the model run slower.

Miopas on 23 Jun 2017

😄3 👍1

I would like to update everyone that I downgraded the tensorflow to 1.0.0 (tensorflow-GPU) and it is working for me. The models are performing as expected. I assume that the CPU version of 1.0.0 should function as expected? Or?.
Thanks :)

PratsBhatt on 23 Jun 2017

Hi guys, I don't know if you're still interested on it, but I found that the problem is related to the operation of copying the cell passed as params to the embedding_attention_seq2seq function. This is because the same cell definition is used both for encoder and decoder. I think the tutorial is deprecated since it uses a seq2seq model with bucketing in contrast to a dynamic seq2seq. But, I'm pasting a modified function that works. The function is updated in the file tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py.

thanks,
Fabio

```!python
def embedding_attention_seq2seq(encoder_inputs,
decoder_inputs,
enc_cell,
dec_cell,
num_encoder_symbols,
num_decoder_symbols,
embedding_size,
num_heads=1,
output_projection=None,
feed_previous=False,
dtype=None,
scope=None,
initial_state_attention=False):
"""Embedding sequence-to-sequence model with attention.

This model first embeds encoder_inputs by a newly created embedding (of shape
[num_encoder_symbols x input_size]). Then it runs an RNN to encode
embedded encoder_inputs into a state vector. It keeps the outputs of this
RNN at every step to use for attention later. Next, it embeds decoder_inputs
by another newly created embedding (of shape [num_decoder_symbols x
input_size]). Then it runs attention decoder, initialized with the last
encoder state, on embedded decoder_inputs and attending to encoder outputs.

Warning: when output_projection is None, the size of the attention vectors
and variables will be made proportional to num_decoder_symbols, can be large.

Args:
encoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
decoder_inputs: A list of 1D int32 Tensors of shape [batch_size].
cell: tf.nn.rnn_cell.RNNCell defining the cell function and size.
num_encoder_symbols: Integer; number of symbols on the encoder side.
num_decoder_symbols: Integer; number of symbols on the decoder side.
embedding_size: Integer, the length of the embedding vector for each symbol.
num_heads: Number of attention heads that read from attention_states.
output_projection: None or a pair (W, B) of output projection weights and
biases; W has shape [output_size x num_decoder_symbols] and B has
shape [num_decoder_symbols]; if provided and feed_previous=True, each
fed previous output will first be multiplied by W and added B.
feed_previous: Boolean or scalar Boolean Tensor; if True, only the first
of decoder_inputs will be used (the "GO" symbol), and all other decoder
inputs will be taken from previous outputs (as in embedding_rnn_decoder).
If False, decoder_inputs are used as given (the standard decoder case).
dtype: The dtype of the initial RNN state (default: tf.float32).
scope: VariableScope for the created subgraph; defaults to
"embedding_attention_seq2seq".
initial_state_attention: If False (default), initial attentions are zero.
If True, initialize the attentions from the initial state and attention
states.

Returns:
A tuple of the form (outputs, state), where:
outputs: A list of the same length as decoder_inputs of 2D Tensors with
shape [batch_size x num_decoder_symbols] containing the generated
outputs.
state: The state of each decoder cell at the final time-step.
It is a 2D Tensor of shape [batch_size x cell.state_size].
"""
with variable_scope.variable_scope(
scope or "embedding_attention_seq2seq", dtype=dtype) as scope:
dtype = scope.dtype
# Encoder.

encoder_cell = enc_cell

encoder_cell = core_rnn_cell.EmbeddingWrapper(
    encoder_cell,
    embedding_classes=num_encoder_symbols,
    embedding_size=embedding_size)
encoder_outputs, encoder_state = rnn.static_rnn(
    encoder_cell, encoder_inputs, dtype=dtype)

# First calculate a concatenation of encoder outputs to put attention on.
top_states = [
    array_ops.reshape(e, [-1, 1, encoder_cell.output_size]) for e in encoder_outputs
]
attention_states = array_ops.concat(top_states, 1)

# Decoder.
output_size = None
if output_projection is None:
  dec_cell = core_rnn_cell.OutputProjectionWrapper(dec_cell, num_decoder_symbols)
  output_size = num_decoder_symbols

if isinstance(feed_previous, bool):
  return embedding_attention_decoder(
      decoder_inputs,
      encoder_state,
      attention_states,
      dec_cell,
      num_decoder_symbols,
      embedding_size,
      num_heads=num_heads,
      output_size=output_size,
      output_projection=output_projection,
      feed_previous=feed_previous,
      initial_state_attention=initial_state_attention)

# If feed_previous is a Tensor, we construct 2 graphs and use cond.
def decoder(feed_previous_bool):
  reuse = None if feed_previous_bool else True
  with variable_scope.variable_scope(
      variable_scope.get_variable_scope(), reuse=reuse):
    outputs, state = embedding_attention_decoder(
        decoder_inputs,
        encoder_state,
        attention_states,
        dec_cell,
        num_decoder_symbols,
        embedding_size,
        num_heads=num_heads,
        output_size=output_size,
        output_projection=output_projection,
        feed_previous=feed_previous_bool,
        update_embedding_for_previous=False,
        initial_state_attention=initial_state_attention)
    state_list = [state]
    if nest.is_sequence(state):
      state_list = nest.flatten(state)
    return outputs + state_list

outputs_and_state = control_flow_ops.cond(feed_previous,
                                          lambda: decoder(True),
                                          lambda: decoder(False))
outputs_len = len(decoder_inputs)  # Outputs length same as decoder inputs.
state_list = outputs_and_state[outputs_len:]
state = state_list[0]
if nest.is_sequence(encoder_state):
  state = nest.pack_sequence_as(
      structure=encoder_state, flat_sequence=state_list)
return outputs_and_state[:outputs_len], state

```

fabiofumarola on 25 Jun 2017

👍14 ❤4

@fabiofumarola Thank you for the function. Looks really helpful. I also saw that the tutorial is deprecated. I am still waiting for an official tutorial release. Looks like you have used the new api. Do you have any code that can be looked up to start coding on the new api?
Any help is well appreciated. Thank you once again :)

PratsBhatt on 26 Jun 2017

@syw2014 Did you fix your issue?

w268wang on 26 Jun 2017

@w268wang not yet, still waiting for other solutions, but comments of @Miopas may have a try, and I am trying the solution of @fabiofumarola

syw2014 on 27 Jun 2017

it says TypeError: embedding_attention_seq2seq() missing 1 required positional argument: 'dec_cell'
after using the update that @fabiofumarola posted. Can you guys please help me?

sachinh35 on 2 Jul 2017

Yes because the update I have proposed require you to change the
embedding_attention_seq2seq Function. If you go to the source file in you
tensorflow release you can change the method definition you re self.

On Sun, 2 Jul 2017 at 18:15, sachinh35 notifications@github.com trote

it says TypeError: embedding_attention_seq2seq() missing 1 required
positional argument: 'dec_cell'

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-312500996,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABepUEc3W8m5CVDQGnCLu4dcJVFwwLDZks5sJ8IOgaJpZM4MWl4f
.

>

Sent from Gmail Mobile

fabiofumarola on 2 Jul 2017

Yes i did the same thing. I changed the function in seq2seq.py file in the tensorflow release. Still i am getting the same error. Is there one more argument to the function?

sachinh35 on 2 Jul 2017

Yes, now in you code you need to specify to rnn_cells. One for the encoder
and another for the decoder.

On Sun, 2 Jul 2017 at 20:54, fabio fumarola fabiofumarola@gmail.com wrote:

Yes

On Sun, 2 Jul 2017 at 18:50, sachinh35 notifications@github.com wrote:

Yes i did the same thing. I changed the function in seq2seq.py file in
the tensorflow release. Still i am getting the same error. Is there one
more argument to the function?

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-312503106,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABepUOXTQC_mzLuhcwW0iZRVkLmmr8yIks5sJ8pugaJpZM4MWl4f
.

>

Sent from Gmail Mobile

fabiofumarola on 2 Jul 2017

I am totally new to this. Maybe this a pretty basic question but could you tell what argument to be passed as the decoder cell in this code? I am trying to develop the seq2seq as shown in the tensorflow tutorial using own dataset.

`
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import random

import numpy as np
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf

import data_utils

class Seq2SeqModel(object):
def __init__(self,
source_vocab_size,
target_vocab_size,
buckets,
size,
num_layers,
max_gradient_norm,
batch_size,
learning_rate,
learning_rate_decay_factor,
use_lstm=False,
num_samples=512,
forward_only=False,
dtype=tf.float32):

self.source_vocab_size = source_vocab_size
self.target_vocab_size = target_vocab_size
self.buckets = buckets
self.batch_size = batch_size
self.learning_rate = tf.Variable(
    float(learning_rate), trainable=False, dtype=dtype)
self.learning_rate_decay_op = self.learning_rate.assign(
    self.learning_rate * learning_rate_decay_factor)
self.global_step = tf.Variable(0, trainable=False)


output_projection = None
softmax_loss_function = None

if num_samples > 0 and num_samples < self.target_vocab_size:
  w_t = tf.get_variable("proj_w", [self.target_vocab_size, size], dtype=dtype)
  w = tf.transpose(w_t)
  b = tf.get_variable("proj_b", [self.target_vocab_size], dtype=dtype)
  output_projection = (w, b)

  def sampled_loss(labels, inputs):
    labels = tf.reshape(labels, [-1, 1])

    local_w_t = tf.cast(w_t, tf.float32)
    local_b = tf.cast(b, tf.float32)
    local_inputs = tf.cast(inputs, tf.float32)
    return tf.cast(
        tf.nn.sampled_softmax_loss(local_w_t, local_b, local_inputs, labels,
                                   num_samples, self.target_vocab_size),
        dtype)
  softmax_loss_function = sampled_loss


def single_cell():
  return tf.nn.rnn_cell.GRUCell(size)
if use_lstm:
  def single_cell():
    return tf.nn.rnn_cell.BasicLSTMCell(size)
cell = single_cell()
if num_layers > 1:
  cell = tf.nn.rnn_cell.MultiRNNCell([single_cell() for _ in range(num_layers)])


def seq2seq_f(encoder_inputs, decoder_inputs, do_decode):
  return tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
      encoder_inputs,
      decoder_inputs,
      cell,
      num_encoder_symbols=source_vocab_size,
      num_decoder_symbols=target_vocab_size,
      embedding_size=size,
      output_projection=output_projection,
      feed_previous=do_decode,
      dtype=dtype)


self.encoder_inputs = []
self.decoder_inputs = []
self.target_weights = []
for i in xrange(buckets[-1][0]):  # Last bucket is the biggest one.
  self.encoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
                                            name="encoder{0}".format(i)))
for i in xrange(buckets[-1][1] + 1):
  self.decoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
                                            name="decoder{0}".format(i)))
  self.target_weights.append(tf.placeholder(dtype, shape=[None],
                                            name="weight{0}".format(i)))

# Our targets are decoder inputs shifted by one.
targets = [self.decoder_inputs[i + 1]
           for i in xrange(len(self.decoder_inputs) - 1)]

# Training outputs and losses.
if forward_only:
  self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets(
      self.encoder_inputs, self.decoder_inputs, targets,
      self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True),
      softmax_loss_function=softmax_loss_function)
  # If we use output projection, we need to project outputs for decoding.
  if output_projection is not None:
    for b in xrange(len(buckets)):
      self.outputs[b] = [
          tf.matmul(output, output_projection[0]) + output_projection[1]
          for output in self.outputs[b]
      ]
else:
  self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets(
      self.encoder_inputs, self.decoder_inputs, targets,
      self.target_weights, buckets,
      lambda x, y: seq2seq_f(x, y, False),
      softmax_loss_function=softmax_loss_function)

# Gradients and SGD update operation for training the model.
params = tf.trainable_variables()
if not forward_only:
  self.gradient_norms = []
  self.updates = []
  opt = tf.train.GradientDescentOptimizer(self.learning_rate)
  for b in xrange(len(buckets)):
    gradients = tf.gradients(self.losses[b], params)
    clipped_gradients, norm = tf.clip_by_global_norm(gradients,
                                                     max_gradient_norm)
    self.gradient_norms.append(norm)
    self.updates.append(opt.apply_gradients(
        zip(clipped_gradients, params), global_step=self.global_step))

self.saver = tf.train.Saver(tf.global_variables())

def step(self, session, encoder_inputs, decoder_inputs, target_weights,
bucket_id, forward_only):

# Check if the sizes match.
encoder_size, decoder_size = self.buckets[bucket_id]
if len(encoder_inputs) != encoder_size:
  raise ValueError("Encoder length must be equal to the one in bucket,"
                   " %d != %d." % (len(encoder_inputs), encoder_size))
if len(decoder_inputs) != decoder_size:
  raise ValueError("Decoder length must be equal to the one in bucket,"
                   " %d != %d." % (len(decoder_inputs), decoder_size))
if len(target_weights) != decoder_size:
  raise ValueError("Weights length must be equal to the one in bucket,"
                   " %d != %d." % (len(target_weights), decoder_size))

# Input feed: encoder inputs, decoder inputs, target_weights, as provided.
input_feed = {}
for l in xrange(encoder_size):
  input_feed[self.encoder_inputs[l].name] = encoder_inputs[l]
for l in xrange(decoder_size):
  input_feed[self.decoder_inputs[l].name] = decoder_inputs[l]
  input_feed[self.target_weights[l].name] = target_weights[l]

# Since our targets are decoder inputs shifted by one, we need one more.
last_target = self.decoder_inputs[decoder_size].name
input_feed[last_target] = np.zeros([self.batch_size], dtype=np.int32)

# Output feed: depends on whether we do a backward step or not.
if not forward_only:
  output_feed = [self.updates[bucket_id],  # Update Op that does SGD.
                 self.gradient_norms[bucket_id],  # Gradient norm.
                 self.losses[bucket_id]]  # Loss for this batch.
else:
  output_feed = [self.losses[bucket_id]]  # Loss for this batch.
  for l in xrange(decoder_size):  # Output logits.
    output_feed.append(self.outputs[bucket_id][l])

outputs = session.run(output_feed, input_feed)
if not forward_only:
  return outputs[1], outputs[2], None  # Gradient norm, loss, no outputs.
else:
  return None, outputs[0], outputs[1:]  # No gradient norm, loss, outputs.

def get_batch(self, data, bucket_id):

encoder_size, decoder_size = self.buckets[bucket_id]
encoder_inputs, decoder_inputs = [], []

# Get a random batch of encoder and decoder inputs from data,
# pad them if needed, reverse encoder inputs and add GO to decoder.
for _ in xrange(self.batch_size):
  encoder_input, decoder_input = random.choice(data[bucket_id])

  # Encoder inputs are padded and then reversed.
  encoder_pad = [data_utils.PAD_ID] * (encoder_size - len(encoder_input))
  encoder_inputs.append(list(reversed(encoder_input + encoder_pad)))

  # Decoder inputs get an extra "GO" symbol, and are padded then.
  decoder_pad_size = decoder_size - len(decoder_input) - 1
  decoder_inputs.append([data_utils.GO_ID] + decoder_input +
                        [data_utils.PAD_ID] * decoder_pad_size)

# Now we create batch-major vectors from the data selected above.
batch_encoder_inputs, batch_decoder_inputs, batch_weights = [], [], []

# Batch encoder inputs are just re-indexed encoder_inputs.
for length_idx in xrange(encoder_size):
  batch_encoder_inputs.append(
      np.array([encoder_inputs[batch_idx][length_idx]
                for batch_idx in xrange(self.batch_size)], dtype=np.int32))

# Batch decoder inputs are re-indexed decoder_inputs, we create weights.
for length_idx in xrange(decoder_size):
  batch_decoder_inputs.append(
      np.array([decoder_inputs[batch_idx][length_idx]
                for batch_idx in xrange(self.batch_size)], dtype=np.int32))

  # Create target_weights to be 0 for targets that are padding.
  batch_weight = np.ones(self.batch_size, dtype=np.float32)
  for batch_idx in xrange(self.batch_size):
    # We set weight to 0 if the corresponding target is a PAD symbol.
    # The corresponding target is decoder_input shifted by 1 forward.
    if length_idx < decoder_size - 1:
      target = decoder_inputs[batch_idx][length_idx + 1]
    if length_idx == decoder_size - 1 or target == data_utils.PAD_ID:
      batch_weight[batch_idx] = 0.0
  batch_weights.append(batch_weight)
return batch_encoder_inputs, batch_decoder_inputs, batch_weights`

sachinh35 on 3 Jul 2017

This is a good question for stack overflow.

On Jul 3, 2017 8:46 AM, "sachinh35" notifications@github.com wrote:

I am totally new to this. Maybe this a pretty basic question but could you
tell what argument to be passed as the decoder cell in this code? I am
trying to develop the seq2seq as shown in the tensorflow tutorial using own
dataset.
`# Copyright 2015 The TensorFlow Authors. All Rights Reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may
not use this file except in compliance with the License. You may obtain a
copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless
required by applicable law or agreed to in writing, software distributed
under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
OR CONDITIONS OF ANY KIND, either express or implied. See the License for
the specific language governing permissions and limitations under the

License. ============================================================

"""Sequence-to-sequence model with an attention mechanism."""

from future import absolute_import
from future import division
from future import print_function

import random

import numpy as np
from six.moves import xrange # pylint: disable=redefined-builtin
import tensorflow as tf

import data_utils

class Seq2SeqModel(object):
"""Sequence-to-sequence model with attention and for multiple buckets.

This class implements a multi-layer recurrent neural network as encoder,
and an attention-based decoder. This is the same as the model described in
this paper: http://arxiv.org/abs/1412.7449 - please look there for
details,
or into the seq2seq library for complete model implementation.
This class also allows to use GRU cells in addition to LSTM cells, and
sampled softmax to handle large output vocabulary size. A single-layer
version of this model, but with bi-directional encoder, was presented in
http://arxiv.org/abs/1409.0473
and sampled softmax is described in Section 3 of the following paper.
http://arxiv.org/abs/1412.2007
"""

def init(self,
source_vocab_size,
target_vocab_size,
buckets,
size,
num_layers,
max_gradient_norm,
batch_size,
learning_rate,
learning_rate_decay_factor,
use_lstm=False,
num_samples=512,
forward_only=False,
dtype=tf.float32):
"""Create the model.

Args:
source_vocab_size: size of the source vocabulary.
target_vocab_size: size of the target vocabulary.
buckets: a list of pairs (I, O), where I specifies maximum input length
that will be processed in that bucket, and O specifies maximum output
length. Training instances that have inputs longer than I or outputs
longer than O will be pushed to the next bucket and padded accordingly.
We assume that the list is sorted, e.g., [(2, 4), (8, 16)].
size: number of units in each layer of the model.
num_layers: number of layers in the model.
max_gradient_norm: gradients will be clipped to maximally this norm.
batch_size: the size of the batches used during training;
the model construction is independent of batch_size, so it can be
changed after initialization if this is convenient, e.g., for decoding.
learning_rate: learning rate to start with.
learning_rate_decay_factor: decay learning rate by this much when needed.
use_lstm: if true, we use LSTM cells instead of GRU cells.
num_samples: number of samples for sampled softmax.
forward_only: if set, we do not construct the backward pass in the model.
dtype: the data type to use to store internal variables.
"""
self.source_vocab_size = source_vocab_size
self.target_vocab_size = target_vocab_size
self.buckets = buckets
self.batch_size = batch_size
self.learning_rate = tf.Variable(
float(learning_rate), trainable=False, dtype=dtype)
self.learning_rate_decay_op = self.learning_rate.assign(
self.learning_rate * learning_rate_decay_factor)
self.global_step = tf.Variable(0, trainable=False)

If we use sampled softmax, we need an output projection.

output_projection = None
softmax_loss_function = None

Sampled softmax only makes sense if we sample less than vocabulary size.

if num_samples > 0 and num_samples < self.target_vocab_size:
w_t = tf.get_variable("proj_w", [self.target_vocab_size, size], dtype=dtype)
w = tf.transpose(w_t)
b = tf.get_variable("proj_b", [self.target_vocab_size], dtype=dtype)
output_projection = (w, b)

def sampled_loss(labels, inputs):
labels = tf.reshape(labels, [-1, 1])
# We need to compute the sampled_softmax_loss using 32bit floats to
# avoid numerical instabilities.
local_w_t = tf.cast(w_t, tf.float32)
local_b = tf.cast(b, tf.float32)
local_inputs = tf.cast(inputs, tf.float32)
return tf.cast(
tf.nn.sampled_softmax_loss(local_w_t, local_b, local_inputs, labels,
num_samples, self.target_vocab_size),
dtype)
softmax_loss_function = sampled_loss

Create the internal multi-layer cell for our RNN.

def single_cell():
return tf.nn.rnn_cell.GRUCell(size)
if use_lstm:
def single_cell():
return tf.nn.rnn_cell.BasicLSTMCell(size)
cell = single_cell()
if num_layers > 1:
cell = tf.nn.rnn_cell.MultiRNNCell([single_cell() for _ in range(num_layers)])

The seq2seq function: we use embedding for the input and attention.

def seq2seq_f(encoder_inputs, decoder_inputs, do_decode):
return tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
encoder_inputs,
decoder_inputs,
cell,
num_encoder_symbols=source_vocab_size,
num_decoder_symbols=target_vocab_size,
embedding_size=size,
output_projection=output_projection,
feed_previous=do_decode,
dtype=dtype)

Feeds for inputs.

self.encoder_inputs = []
self.decoder_inputs = []
self.target_weights = []
for i in xrange(buckets[-1][0]): # Last bucket is the biggest one.
self.encoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
name="encoder{0}".format(i)))
for i in xrange(buckets[-1][1] + 1):
self.decoder_inputs.append(tf.placeholder(tf.int32, shape=[None],
name="decoder{0}".format(i)))
self.target_weights.append(tf.placeholder(dtype, shape=[None],
name="weight{0}".format(i)))

Our targets are decoder inputs shifted by one.

targets = [self.decoder_inputs[i + 1]
for i in xrange(len(self.decoder_inputs) - 1)]

Training outputs and losses.

if forward_only:
self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets(
self.encoder_inputs, self.decoder_inputs, targets,
self.target_weights, buckets, lambda x, y: seq2seq_f(x, y, True),
softmax_loss_function=softmax_loss_function)
# If we use output projection, we need to project outputs for decoding.
if output_projection is not None:
for b in xrange(len(buckets)):
self.outputs[b] = [
tf.matmul(output, output_projection[0]) + output_projection[1]
for output in self.outputs[b]
]
else:
self.outputs, self.losses = tf.contrib.legacy_seq2seq.model_with_buckets(
self.encoder_inputs, self.decoder_inputs, targets,
self.target_weights, buckets,
lambda x, y: seq2seq_f(x, y, False),
softmax_loss_function=softmax_loss_function)

Gradients and SGD update operation for training the model.

params = tf.trainable_variables()
if not forward_only:
self.gradient_norms = []
self.updates = []
opt = tf.train.GradientDescentOptimizer(self.learning_rate)
for b in xrange(len(buckets)):
gradients = tf.gradients(self.losses[b], params)
clipped_gradients, norm = tf.clip_by_global_norm(gradients,
max_gradient_norm)
self.gradient_norms.append(norm)
self.updates.append(opt.apply_gradients(
zip(clipped_gradients, params), global_step=self.global_step))

self.saver = tf.train.Saver(tf.global_variables())

def step(self, session, encoder_inputs, decoder_inputs, target_weights,
bucket_id, forward_only):
"""Run a step of the model feeding the given inputs.

Args:
session: tensorflow session to use.
encoder_inputs: list of numpy int vectors to feed as encoder inputs.
decoder_inputs: list of numpy int vectors to feed as decoder inputs.
target_weights: list of numpy float vectors to feed as target weights.
bucket_id: which bucket of the model to use.
forward_only: whether to do the backward step or only forward.

Returns:
A triple consisting of gradient norm (or None if we did not do backward),
average perplexity, and the outputs.

Raises:
ValueError: if length of encoder_inputs, decoder_inputs, or
target_weights disagrees with bucket size for the specified bucket_id.
"""

Check if the sizes match.

encoder_size, decoder_size = self.buckets[bucket_id]
if len(encoder_inputs) != encoder_size:
raise ValueError("Encoder length must be equal to the one in bucket,"
" %d != %d." % (len(encoder_inputs), encoder_size))
if len(decoder_inputs) != decoder_size:
raise ValueError("Decoder length must be equal to the one in bucket,"
" %d != %d." % (len(decoder_inputs), decoder_size))
if len(target_weights) != decoder_size:
raise ValueError("Weights length must be equal to the one in bucket,"
" %d != %d." % (len(target_weights), decoder_size))

Input feed: encoder inputs, decoder inputs, target_weights, as provided.

input_feed = {}
for l in xrange(encoder_size):
input_feed[self.encoder_inputs[l].name] = encoder_inputs[l]
for l in xrange(decoder_size):
input_feed[self.decoder_inputs[l].name] = decoder_inputs[l]
input_feed[self.target_weights[l].name] = target_weights[l]

Since our targets are decoder inputs shifted by one, we need one more.

last_target = self.decoder_inputs[decoder_size].name
input_feed[last_target] = np.zeros([self.batch_size], dtype=np.int32)

Output feed: depends on whether we do a backward step or not.

if not forward_only:
output_feed = [self.updates[bucket_id], # Update Op that does SGD.
self.gradient_norms[bucket_id], # Gradient norm.
self.losses[bucket_id]] # Loss for this batch.
else:
output_feed = [self.losses[bucket_id]] # Loss for this batch.
for l in xrange(decoder_size): # Output logits.
output_feed.append(self.outputs[bucket_id][l])

outputs = session.run(output_feed, input_feed)
if not forward_only:
return outputs[1], outputs[2], None # Gradient norm, loss, no outputs.
else:
return None, outputs[0], outputs[1:] # No gradient norm, loss, outputs.

def get_batch(self, data, bucket_id):
"""Get a random batch of data from the specified bucket, prepare for step.

To feed data in step(..) it must be a list of batch-major vectors, while
data here contains single length-major cases. So the main logic of this
function is to re-index data cases to be in the proper format for feeding.

Args:
data: a tuple of size len(self.buckets) in which each element contains
lists of pairs of input and output data that we use to create a batch.
bucket_id: integer, which bucket to get the batch for.

Returns:
The triple (encoder_inputs, decoder_inputs, target_weights) for
the constructed batch that has the proper format to call step(...) later.
"""
encoder_size, decoder_size = self.buckets[bucket_id]
encoder_inputs, decoder_inputs = [], []

Get a random batch of encoder and decoder inputs from data,

pad them if needed, reverse encoder inputs and add GO to decoder.

for _ in xrange(self.batch_size):
encoder_input, decoder_input = random.choice(data[bucket_id])

# Encoder inputs are padded and then reversed.
encoder_pad = [data_utils.PAD_ID] * (encoder_size - len(encoder_input))
encoder_inputs.append(list(reversed(encoder_input + encoder_pad)))

# Decoder inputs get an extra "GO" symbol, and are padded then.
decoder_pad_size = decoder_size - len(decoder_input) - 1
decoder_inputs.append([data_utils.GO_ID] + decoder_input +
[data_utils.PAD_ID] * decoder_pad_size)

Now we create batch-major vectors from the data selected above.

batch_encoder_inputs, batch_decoder_inputs, batch_weights = [], [], []

Batch encoder inputs are just re-indexed encoder_inputs.

for length_idx in xrange(encoder_size):
batch_encoder_inputs.append(
np.array([encoder_inputs[batch_idx][length_idx]
for batch_idx in xrange(self.batch_size)], dtype=np.int32))

Batch decoder inputs are re-indexed decoder_inputs, we create weights.

for length_idx in xrange(decoder_size):
batch_decoder_inputs.append(
np.array([decoder_inputs[batch_idx][length_idx]
for batch_idx in xrange(self.batch_size)], dtype=np.int32))

# Create target_weights to be 0 for targets that are padding.
batch_weight = np.ones(self.batch_size, dtype=np.float32)
for batch_idx in xrange(self.batch_size):
# We set weight to 0 if the corresponding target is a PAD symbol.
# The corresponding target is decoder_input shifted by 1 forward.
if length_idx < decoder_size - 1:
target = decoder_inputs[batch_idx][length_idx + 1]
if length_idx == decoder_size - 1 or target == data_utils.PAD_ID:
batch_weight[batch_idx] = 0.0
batch_weights.append(batch_weight)
return batch_encoder_inputs, batch_decoder_inputs, batch_weights`

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-312679587,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim0l5UMHHtbL1sz7meXserV8NVS7cks5sKQzXgaJpZM4MWl4f
.

ebrevdo on 3 Jul 2017

👍1

Okay! thanks though! :)

sachinh35 on 3 Jul 2017

@ebrevdo is there any update on when the new tutorial of seq2seq using new api will come out?
Thank you. Amazing work!.

PratsBhatt on 3 Jul 2017

👍1

yeah waiting for the new tutorial... would be great to know if it's planned to be released anytime soon.. @ebrevdo

tried to take code in the kernel tests and retrofit the beam search with the legacy seq2seq, but it was challenging...

prashantserai on 3 Jul 2017

We're hoping for this coming week!

On Jul 3, 2017 10:16 AM, "prashantserai" notifications@github.com wrote:

yeah waiting for the new tutorial... would be great to know if it's
planned to be released anytime soon.. @ebrevdo
https://github.com/ebrevdo

tried to take code in the kernel tests and retrofit the beam search with
the legacy seq2seq, but it seemed challenging...

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-312697274,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtim45-HTuQrIRDhphqqHjqkKOKTe53ks5sKSHYgaJpZM4MWl4f
.

ebrevdo on 3 Jul 2017

👍5 ❤1

Hi guys,

Any update to this issue, I'm experiencing the same on tensorflow 1.1-gpu for mac os x

tshi1983 on 20 Jul 2017

@tshi1983
I got the same problem with tensorflow 1.1-gpu for ubuntu.
I upgrade to tf 1.2. It still doesn't work.
Then I change the function embedding_attention_seq2seq in file
tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py
to the one as @fabiofumarola suggested above.
Now it starts training. I haven't tested decoding yet.

selinachenxi on 24 Jul 2017

Move the code on cell definition into seq2seq_f:

def seq2seq_f(encoder_inputs, decoder_inputs, do_decode):
      def single_cell():
        return tf.contrib.rnn.GRUCell(size)
      if use_lstm:
        def single_cell():
          return tf.contrib.rnn.BasicLSTMCell(size)
      cell = single_cell()
      if num_layers > 1:
        cell = tf.contrib.rnn.MultiRNNCell([single_cell() for _ in range(num_layers)])
      return tf.contrib.legacy_seq2seq.embedding_attention_seq2seq(
      ...
      )

Then "python translate.py --data_dir data/ --train_dir checkpoint/ --size=256 --num_layers=2 --steps_per_checkpoint=50" can work.

huxuanlai on 31 Jul 2017

👍12 🎉1

@huxuanlai it works! At least it's training now, thx!

a111xushuai on 31 Jul 2017

@huxuanlai Works for me as well.

nathan-standafer on 5 Aug 2017

I am receiving the same AttributeError: 'NoneType' object has no attribute 'update' but with tf.contrib.legacy_seq2seq.model_with_buckets. I am running tf 1.2.1 (GPU) on ubuntu 16.04 lts.

This only seems to occur when I have more than 1 bucket.

full traceback:

Traceback (most recent call last):
  File "chatbot.py", line 262, in <module>
    main()
  File "chatbot.py", line 257, in main
    train()
  File "chatbot.py", line 138, in train
    model.build_graph()
  File "/home/jkarimi91/Projects/cs20/code/hw/a3/model.py", line 134, in build_graph
    self._create_loss()
  File "/home/jkarimi91/Projects/cs20/code/hw/a3/model.py", line 102, in _create_loss
    softmax_loss_function=self.softmax_loss_function)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 1206, in model_with_buckets
    decoder_inputs[:bucket[1]])
  File "/home/jkarimi91/Projects/cs20/code/hw/a3/model.py", line 101, in <lambda>
    lambda x, y: _seq2seq_f(x, y, False),
  File "/home/jkarimi91/Projects/cs20/code/hw/a3/model.py", line 76, in _seq2seq_f
    feed_previous=do_decode)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/site-packages/tensorflow/contrib/legacy_seq2seq/python/ops/seq2seq.py", line 848, in embedding_attention_seq2seq
    encoder_cell = copy.deepcopy(cell)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 174, in deepcopy
    y = copier(memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/site-packages/tensorflow/python/layers/base.py", line 476, in __deepcopy__
    setattr(result, k, copy.deepcopy(v, memo))
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 230, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 230, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 230, in _deepcopy_list
    y.append(deepcopy(a, memo))
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 237, in _deepcopy_tuple
    y.append(deepcopy(a, memo))
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 334, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 163, in deepcopy
    y = copier(x, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 257, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 190, in deepcopy
    y = _reconstruct(x, rv, 1, memo)
  File "/home/jkarimi91/Apps/anaconda2/envs/tf/lib/python2.7/copy.py", line 343, in _reconstruct
    y.__dict__.update(state)
AttributeError: 'NoneType' object has no attribute 'update'

jkarimi91 on 10 Aug 2017

@Tshzzz @jtubert
thx, your solution worked for me. My tf verstion is 1.1.0.

I changed from:

    lstm_cell = tf.contrib.rnn.BasicLSTMCell(HIDDEN_SIZE, state_is_tuple=True)
    cell = tf.contrib.rnn.MultiRNNCell([lstm_cell() for _ in range(NUM_LAYERS)])
    output, _ = tf.nn.dynamic_rnn(cell, X, dtype=tf.float32)

to:

    cells=[]
    for _ in range(NUM_LAYERS):
        cell = tf.contrib.rnn.BasicLSTMCell(HIDDEN_SIZE, state_is_tuple=True)
        cells.append(cell)
    multicell = tf.contrib.rnn.MultiRNNCell(cells, state_is_tuple=True)
    output, _ = tf.nn.dynamic_rnn(multicell, X, dtype=tf.float32)

LevineHuang on 16 Aug 2017

❤2 👍2

This is still not fixed , tried all possible solutions , ones mentioned in this thread and stackoverflow , it doesn't work with tensorflow 1.3 or 1.2 or 1.1

saurabhvyas on 18 Aug 2017

I'm facing this error:
TypeError: embedding_attention_seq2seq() missing 1 required positional argument: 'dec_cell'

The error points to this function in seq2seq_model.py which is line 142 in seq2seq_model.py:

def seq2seq_f(encoder_inputs, decoder_inputs, do_decode): return tf.contrib.legacy_seq2seq.embedding_attention_seq2seq( encoder_inputs, decoder_inputs, cell, num_encoder_symbols=source_vocab_size, num_decoder_symbols=target_vocab_size, embedding_size=size, output_projection=output_projection, feed_previous=do_decode, dtype=dtype)

Anyone who came across with this error and managed to solve this, please help me correct this issue.

comsian106 on 18 Aug 2017

👍1

ValueError: Attempt to reuse RNNCell with a different variable scope than its first use. First use of cell was with scope 'rnn/multi_rnn_cell/cell_0/gru_cell', this attempt is with scope 'rnn/multi_rnn_cell/cell_1/gru_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([GRUCell(...)] * num_layers), change to: MultiRNNCell([GRUCell(...) for _ in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

the origin code:
from tensorflow.contrib import rnn
inputs = tf.placeholder(dtype=tf.int32, shape=[None, None], name="inputs")
keep_prob = tf.placeholder(dtype=tf.float32, name="keep_prob")
cell = rnn.GRUCell(10)
cell = rnn.DropoutWrapper(cell=cell, input_keep_prob=keep_prob)
cell = rnn.MultiRNNCell([cell for _ in range(5)], state_is_tuple=True)

outs, states = tf.nn.dynamic_rnn(cell=cell, inputs=look_up, dtype=tf.float32)
solution:
inputs = tf.placeholder(dtype=tf.int32, shape=[None, None], name="inputs")
keep_prob = tf.placeholder(dtype=tf.float32, name="keep_prob")
cell = rnn.MultiRNNCell([rnn.DropoutWrapper(rnn.GRUCell(10), input_keep_prob=keep_prob) for _ in range(5)] , state_is_tuple=True)

ybdx on 25 Aug 2017

👍3

Do you have this issue with the tf nightlies?

On Oct 1, 2017 8:34 AM, "Baohua Zhou" notifications@github.com wrote:

I have the same issue when using tensorflow 1.1 on cpu with ios.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/tensorflow/tensorflow/issues/8191#issuecomment-333384725,
or mute the thread
https://github.com/notifications/unsubscribe-auth/ABtimwOv7vf5vvFXBllbZryjCFwmJcU6ks5sn7DxgaJpZM4MWl4f
.

ebrevdo on 1 Oct 2017

AttributeError: 'NoneType' object has no attribute 'update'

in tf=1.3

PR-Iyyer on 23 Oct 2017

ValueError: Attempt to reuse RNNCell with a different variable scope than its first use. First use of cell was with scope 'embedding_attention_seq2seq/rnn/multi_rnn_cell/cell_0/gru_cell', this attempt is with scope 'embedding_attention_seq2seq/rnn/multi_rnn_cell/cell_1/gru_cell'. Please create a new instance of the cell if you would like it to use a different set of weights. If before you were using: MultiRNNCell([GRUCell(...)] * num_layers), change to: MultiRNNCell([GRUCell(...) for _ in range(num_layers)]). If before you were using the same cell instance as both the forward and reverse cell of a bidirectional RNN, simply create two instances (one for forward, one for reverse). In May 2017, we will start transitioning this cell's behavior to use existing stored weights, if any, when it is called with scope=None (which can lead to silent model degradation, so this error will remain until then.)

rashmishrm on 27 Nov 2017

It has been 14 days with no activity and the awaiting tensorflower label was assigned. Please update the label and/or status accordingly.

tensorflowbutler on 22 Dec 2017

Nagging Awaiting TensorFlower: It has been 14 days with no activityand the awaiting tensorflower label was assigned. Please update the label and/or status accordingly.

tensorflowbutler on 5 Jan 2018

The solution is to move to a newer version of TF. This thread has drastically diverged from its original issue. Closing.

ebrevdo on 5 Jan 2018

If you want instant solution you can try what i tried :

pip install tensorflow==1.0
The issue is with tenorflow 1.1 version , it worked for me.

monk1337 on 13 Apr 2018

Tensorflow: ValueError: Attempt to reuse RNNCell with a different variable scope than its first use.

Most helpful comment

All 102 comments

the model (see FAQ in README.md)

inputs

using a NLAYERS=3 layers of GRU cells, unrolled SEQLEN=30 times# dynamic_rnn infers SEQLEN from the size of the inputs Xo

>

>

License. ============================================================

If we use sampled softmax, we need an output projection.

Sampled softmax only makes sense if we sample less than vocabulary size.

Create the internal multi-layer cell for our RNN.

The seq2seq function: we use embedding for the input and attention.

Feeds for inputs.

Our targets are decoder inputs shifted by one.

Training outputs and losses.

Gradients and SGD update operation for training the model.

Check if the sizes match.

Input feed: encoder inputs, decoder inputs, target_weights, as provided.

Since our targets are decoder inputs shifted by one, we need one more.

Output feed: depends on whether we do a backward step or not.

Get a random batch of encoder and decoder inputs from data,

pad them if needed, reverse encoder inputs and add GO to decoder.

Now we create batch-major vectors from the data selected above.

Batch encoder inputs are just re-indexed encoder_inputs.

Batch decoder inputs are re-indexed decoder_inputs, we create weights.

Related issues