tensorflow/tensorflow/python/ops/seq2seq.py
, there are still a couple of tricks used in our model to translate models/tutorials/rnn/translate/seq2seq_model.py
, o which is worth mentioning.seq2seq_model.py
. if num_samples > 0 and num_samples < self.target_vocab_size: w_t = tf.get_variable("proj_w", [self.target_vocab_size, size], dtype=dtype) w = tf.transpose(w_t) b = tf.get_variable("proj_b", [self.target_vocab_size], dtype=dtype) output_projection = (w, b) def sampled_loss(labels, inputs): labels = tf.reshape(labels, [-1, 1]) # We need to compute the sampled_softmax_loss using 32bit floats to # avoid numerical instabilities. local_w_t = tf.cast(w_t, tf.float32) local_b = tf.cast(b, tf.float32) local_inputs = tf.cast(inputs, tf.float32) return tf.cast( tf.nn.sampled_softmax_loss( weights=local_w_t, biases=local_b, labels=labels, inputs=local_inputs, num_sampled=num_samples, num_classes=self.target_vocab_size), dtype)
size
, rather than the number of training samples on target_vocab_size
. To restore logites, you need to multiply it by the weights matrix and add an offset, which is what happens in lines 124-126 in seq2seq_model.py
. if output_projection is not None: for b in xrange(len(buckets)): self.outputs[b] = [tf.matmul(output, output_projection[0]) + output_projection[1] for ...]
encoder_inputs
, and the French sentence is output as decoder_inputs
(with the prefix of the GO character), you must create a seq2seq model for each pair (L1, L2 + 1) of the lengths of the English and French sentences. As a result, we get a huge graph consisting of many similar subgraphs. On the other hand, we can “pad” (pad) each sentence with special PAD symbols. And then we will need only one seq2seq model for “full” lengths. But such a model will be ineffective on short sentences - you will have to encode and decode many useless PAD symbols.translate.py
we use the following default groups. buckets = [(5, 10), (10, 15), (20, 25), (40, 50)]
GO
character to the input. This happens in the get_batch()
function in seq2seq_model.py
, which also turns the English sentence upside down. Reversal of the input data helped to achieve an improvement in the results of the neural translation model in Sutskever et al., 2014 (pdf). To finally understand, let's imagine that there is a sentence “I go.” At the input, divided into tokens ["I", "go", "."]
, And at the output - the sentence "Je vais.", Divided into tokens ["Je", "vais", "."]
. They will be added to the group (5, 10), with the input representation of the encoder [PAD PAD "." "go" "I"]
[PAD PAD "." "go" "I"]
and input decoder [GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]
[GO "Je" "vais" "." EOS PAD PAD PAD PAD PAD]
.train_dir
when the next command runs. python translate.py --data_dir [your_data_directory] --train_dir [checkpoints_directory] --en_vocab_size=40000 --fr_vocab_size=40000
data_dir,
and then the body is tokenized and converted to integer identifiers. Pay attention to the parameters responsible for the size of the dictionary. In the example above, all words outside the 40,000 most frequently used will be converted to a UNK token, which means an unknown word. Thus, when the dictionary is resized, the binary re-forms the body with the id-token. After data preparation, training begins.translate
are very high by default. Large models that have been studying for a long time show good results, but this can take too much time or too much GPU memory. You can set up a smaller model for training, as in the example below. python translate.py --data_dir [your_data_directory] --train_dir [checkpoints_directory] --size=256 --num_layers=2 --steps_per_checkpoint=50
steps_per_checkpoin
t binary will give you statistics on past steps. With the default settings (3 layers of size 1024), the first message looks like this: global step 200 learning rate 0.5000 step-time 1.39 perplexity 1720.62 eval: bucket 0 perplexity 184.97 eval: bucket 1 perplexity 248.81 eval: bucket 2 perplexity 341.64 eval: bucket 3 perplexity 469.04 global step 400 learning rate 0.5000 step-time 1.38 perplexity 379.89 eval: bucket 0 perplexity 151.32 eval: bucket 1 perplexity 190.36 eval: bucket 2 perplexity 227.46 eval: bucket 3 perplexity 238.66
--decode
option. python translate.py --decode --data_dir [your_data_directory] --train_dir [checkpoints_directory] Reading model parameters from /tmp/translate.ckpt-340000 > Who is the president of the United States? Qui est le président des États-Unis ?
basic_tokenizer
in data_utils
. A better tokenizer can be found on the WMT'15 website . If you use it and a dictionary of large sizes, you can achieve improved translations.GradientDescentOptimizer
in seq2seq_model.py
with something more advanced, for example, AdagradOptimizer
. Try and follow the improvement of the result!Source: https://habr.com/ru/post/432302/
All Articles