📜 ⬆️ ⬇️

Sequence-to-Sequence Part 1 models

Good day everyone!

And we have again opened a new stream for the refined “Data Scientist” course: another excellent teacher , a little refined, based on the updates of the program. Well, as usual interesting open lessons and collections of interesting materials. Today we will begin the analysis of seq2seq models from Tensor Flow.

Go.
')
As already discussed in the RNN tutorial (we recommend reading it before reading this article), recurrent neural networks can be taught to model a language. And an interesting question arises: is it possible to train the network on certain data to generate a meaningful answer? For example, can we teach a neural network to translate from English to French? It turns out that we can.

This guide will show you how to create and train such an end-to-end system. Copy the main Tensor Flow repository and the TensorFlow model repository from GitHub . Then, you can start by running the translation program:

cd models/tutorials/rnn/translate python translate.py --data_dir [your_data_directory] 



She will download the data for translation from English to French from the WMT'15 website , prepare it for training and train. This will require about 20 GB of hard disk space and quite a lot of time for downloading and preparation, so you can start the process now and continue reading this tutorial.

The manual will refer to the following files:

FileWhat is in it?
tensorflow / tensorflow / python / ops / seq2seq.pyLibrary for creating sequence-to-sequence models
models / tutorials / rnn / translate / seq2seq_model.pySequence-to-sequence model of neural translation
models / tutorials / rnn / translate / data_utils.pyAuxiliary functions for preparing translation data
models / tutorials / rnn / translate / translate.pyA binary that trains and runs a translation model

The basics of sequence-to-sequence

The basic sequence-to-sequence model, as presented by Cho et al., 2014 ( pdf ), consists of two recurrent neural networks (RNN): an encoder, which processes input data, and a decoder (decoder), which generates data output. The basic architecture is shown below:



Each box in the picture above represents a cell in the RNN, usually a GRU cell - a managed recurrent block, or an LSTM cell - a long short-term memory (read the RNN tutorial for more details). Encoders and decoders can have common weights or, more often, use different sets of parameters. Multi-layered cells have been used successfully in sequence-to-sequence models, for example, to translate Sutskever et al., 2014 ( pdf ).

In the base model described above, each input must be encoded in a state of a fixed-size state, since this is the only thing that is transmitted to the decoder. In order to give the decoder more direct access to input data, attention mechanism was introduced in Bahdanau et al., 2014 ( pdf ). We will not go into the details of the mechanism of attention (for this you can get acquainted with the work on the link); suffice it to say that it allows the decoder to look into the input data at each decoding step. A multi-layered sequence-to-sequence network with LSTM cells and the attentional mechanism in the decoder is as follows:



TensorFlow library seq2seq

As you can see above, there are different sequence-to-sequence models. All of them can use different RNN cells, but all of them accept encoder input data and decoder input data. This is the basis of the TensorFlow seq2seq library interface (tensorflow / tensorflow / python / ops / seq2seq.py). This basic, RNN, codec, sequence-to-sequence model works as follows.

 outputs, states = basic_rnn_seq2seq(encoder_inputs, decoder_inputs, cell) 

In the call above, encoder_inputs is a list of tensors representing encoder input data, corresponding to the letters A, B, C from the image above. Similarly, decoder_inputs are tensors representing decoder input data. GO, W, X, Y, Z from the first picture.

The cell argument is an instance of the tf.contrib.rnn.RNNCell class, which determines which cell will be used in the model. You can use existing cells, for example, GRUCell or LSTMCell , or you can write your own. In addition, tf.contrib.rnn provides a shell for creating layered cells, adding exceptions to cell input and output data, or other transformations. Read the RNN Tutorial for examples.

Calling basic_rnn_seq2seq returns two arguments: outputs and states . They both represent a list of tensors of the same length as decoder_inputs . outputs corresponds to the output of the decoder at each time step, the first picture is W, X, Y, Z, EOS. The returned states represents the internal state of the decoder at each time step.

In many applications using the model's sequence-to-sequence, the decoder output at time t is transmitted back to input to the decoder at time t + 1. When testing, during the decoding sequence, this is how a new one is constructed. On the other hand, during training, it is customary to transmit to the decoder correct input data at each time step, even if the decoder was previously mistaken. Functions in seq2seq.py support both modes with the feed_previous argument. For example, analyze the following use of the nested RNN model.

 outputs, states = embedding_rnn_seq2seq( encoder_inputs, decoder_inputs, cell, num_encoder_symbols, num_decoder_symbols, embedding_size, output_projection=None, feed_previous=False) 

In the embedding_rnn_seq2seq model, all input data (both encoder_inputs and decoder_inputs ) are integer tensors reflecting discrete values. They will be embedded in a solid representation (for details on the attachment, refer to the Vector Representation Guide ), but to create these attachments, you need to specify the maximum number of discrete characters: num_encoder_symbols on the coder side and num_decoder_symbols on the decoder side.

In the call above, we set feed_previous to False. This means that the decoder will use the decoder_inputs tensors as they are provided. If we set feed_previous to True, the decoder will use only the first decoder_inputs element. All other tensors from the list will be ignored, and the previous value of the decoder output will be used instead. This is used to decode translations in our translation model, but can also be used during training, to improve the model’s resistance to its errors. Approximately like Bengio et al., 2015 ( pdf ).

Another important argument used above is output_projection . Without clarification, the conclusions of the nested model will be the form tensors of the number of training samples on num_decoder_symbols , since they represent the logits of each generated symbol. When training models with large dictionaries at the output, for example, with large num_decoder_symbols , storing these large tensors becomes impractical. Instead, it is better to return smaller tensors, which will subsequently be projected onto a large tensor using output_projection . This allows us to use our seq2seq models with sampled softmax losses, as described by Jean et. al., 2014 ( pdf ).

In addition to the basic_rnn_seq2seq and embedding_rnn_seq2seq in seq2seq.py there are several more sequence-to-sequence models. Pay attention to them. All of them have a similar interface, so we will not go into their details. For our translation model below, we use embedding_attention_seq2seq .

Continuation will follow.

Source: https://habr.com/ru/post/430780/


All Articles