Google's free tensor processors in the cloud laboratory

Recently, Google provided free access to its tensor processors (tensor processing unit, TPU) on a cloud-based machine learning lab . The tensor processor is a specialized integrated circuit (ASIC) developed by Google for machine learning tasks using the TensorFlow library. I decided to try teaching a Keras convolutional network on TPU, which recognizes objects in CIFAR-10 images. The complete solution code can be viewed and run in a laptop .

Photos cloud.google.com

Tensor processors

On Habré, they already wrote about how TPUs are arranged ( here , here and here ), and also why TPUs are well suited for learning neural networks . Therefore, I will not go into the details of the TPU architecture, but consider only those features that need to be considered when training neon networks.

Now there are three generations of tensor processors, the performance of TPU is the last, the third generation is 420 TFlops (trillions of floating point operations per second), it contains 128 GB of High Bandwidth Memory. However, only the second generation TPUs with 180 TFlops of performance and 64 GB of memory are available at the Colaboratory. In the future, I will consider these TPUs.
')
The tensor processor consists of four chips, each of which contains two cores, a total of eight cores in the TPU. Training on TPU is conducted in parallel on all cores using replication: each core runs a copy of the TensorFlow graph with one eighth of the data.

The basis of the tensor processor is the matrix unit (MXU). It uses a clever data structure, a 128x128 systolic array for efficient implementation of matrix operations. Therefore, in order to make the best use of the resources of the TPU equipment, the dimension of the mini-sample or attributes must be a multiple of 128 ( source ). Also, due to the peculiarities of the TPU memory system, it is desirable that the dimension of the mini-sample and features be a multiple of 8.

Collaboration Platform

Collaboration is Google’s cloud platform for promoting machine learning technologies. On it you can get a free virtual machine with installed popular libraries TensorFlow, Keras, sklearn, pandas, etc. The most convenient thing is that you can run laptops like Jupyter on Colaboratory. Laptops are saved on Google Drive, you can distribute them and even organize collaboration. This is what a laptop at Colaboratory looks like (the image is clickable ):

You write code in a browser in a laptop, it runs in a virtual machine in the Google cloud. The car is given to you for 12 hours, after that it stops. However, nothing prevents you from running another virtual machine and working for another 12 hours. Just keep in mind that after stopping a virtual machine, all data from it is deleted. Therefore, do not forget to save the necessary data to your computer or Google Drive, and after restarting the virtual machine, download again.

Detailed instructions for working on the Colaboratory platform are here , here and here .

We connect the tensor processor at Colaboratory

By default, Collaboration does not use GPU or TPU computing accelerators. You can connect them in the menu Runtime -> Change runtime type -> Hardware accelerator. In the list that appears, select "TPU":

After choosing the type of accelerator, the virtual machine to which the laptop connects Colaboratory will restart and TPU will become available.

If you upload any data to the virtual machine, it will be deleted during the restart process. Have to download data again.

Keras neural network for recognition of CIFAR-10

As an example, let's try to teach a neural network on Keras on TPU, which recognizes images from the CIFAR-10 data set . This is a popular data set containing small images of 10 class objects: an airplane, a car, a bird, a cat, a deer, a dog, a frog, a horse, a ship, and a truck. Classes do not overlap, each object in the picture belongs to only one class.

Load the CIFAR-10 dataset with Keras:

(x_train, y_train), (x_test, y_test) = cifar10.load_data()

To create a neural network, I started a separate function. We will create the same model two times: the first version of the model for TPU, on which we will train, and the second for the CPU, where we will recognize objects.

 def create_model(): input_layer = Input(shape=(32, 32, 3), dtype=tf.float32, name='Input') x = BatchNormalization()(input_layer) x = Conv2D(32, (3, 3), padding='same', activation='relu')(x) x = Conv2D(32, (3, 3), activation='relu', padding='same')(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Dropout(0.25)(x) x = BatchNormalization()(x) x = Conv2D(64, (3, 3), padding='same', activation='relu')(x) x = Conv2D(64, (3, 3), activation='relu')(x) x = MaxPooling2D(pool_size=(2, 2))(x) x = Dropout(0.25)(x) x = Flatten()(x) x = Dense(512, activation='relu')(x) x = Dropout(0.5)(x) output_layer = Dense(10, activation='softmax')(x) model = Model(inputs=[input_layer], outputs=[output_layer]) model.compile( optimizer=tf.train.AdamOptimizer(0.001), loss=tf.keras.losses.sparse_categorical_crossentropy, metrics=['sparse_categorical_accuracy']) return model

For now, the Keras optimizers cannot be used on the TPU, so when compiling the model, the optimizer from TensorFlow is specified.

Create a Keras model for the CPU, which in the next step will be converted into a model for TPU:

 cpu_model = create_model()

We convert the neural network on Keras into a model for TPU

Keras and TensorFlow models can be trained on the GPU without any changes. For now, it is impossible to do this on TPU, so we will have to convert the model we have created into a model for TPU.

First you need to know where the available TPU is. On the Colaboratory platform, this can be done with the following command:

 TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']

In my case, the TPU address turned out to be this - grpc://10.102.233.146:8470 . Addresses were different for different launches.

Now you can get a model for the TPU using the keras_to_tpu_model function:

 tf.logging.set_verbosity(tf.logging.INFO) tpu_model = tf.contrib.tpu.keras_to_tpu_model( cpu_model, strategy=tf.contrib.tpu.TPUDistributionStrategy( tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))

The first line includes logging at the Info level. Here's what's in the model conversion log:

INFO:tensorflow:Querying Tensorflow master (b'grpc://10.102.233.146:8470') for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
...
WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.

You can see that the TPU was found at the address we specified earlier, it has 8 cores. We also see a warning that tpu_model is experimental and can be changed or deleted at any time. I hope that in time it will be possible to train Keras models directly on TPU without any transformation.

We teach the model on TPU

The model for TPU can be trained in the usual way for Keras by calling the fit method:

 history = tpu_model.fit(x_train, y_train, batch_size=128*8, epochs=50, verbose=2)

What are the features here. We remember that in order to effectively use the TPU, the size of the mini-sample must be a multiple of 128. In addition, one-eighth of all data in the mini-sample is performed on each TPU core. Therefore, when training, the size of the mini-sample is set to 128 * 8, 128 images for each TPU core are obtained. You can use a larger size, for example, 256 or 512, then the performance will be higher.

In my case, an average of 6 seconds is required for the training of one epoch.

The quality of education in the 50 era:

Epoch 50/50
- 6s - loss: 0.2727 - sparse_categorical_accuracy: 0.9006

The share of correct answers to the data for training was 90.06%. Check the quality of the test data using TPU:

 scores = tpu_model.evaluate(x_test, y_test, verbose=0, batch_size=batch_size * 8) print("     : %.2f%%" % (scores[1]*100))

: 80.79%

Now save the weight of the trained model:

 tpu_model.save_weights("cifar10_model.h5")

TensorFlow will give us a message that the weights are transferred from the TPU to the CPU:
INFO:tensorflow:Copying TPU weights to the CPU

It should be noted that the weights of the trained network are preserved on the disk of the Colaboratory virtual machine. When the virtual machine is stopped, all data from it will be erased. If you do not want to lose the trained weights, then save them to your computer:

 from google.colab import files files.download("cifar10_model.h5")

Recognize objects on the CPU

Now let's try to use a TPU-trained model to recognize objects in images using the CPU. To do this, we create a new model and load TPU-trained weights into it:

 model = create_model() model.load_weights("cifar10_model.h5")

The model is ready for use on the central processor. Let's try to recognize with its help one of the images of the test set CIFAR-10:

 index=111 plt.imshow(toimage(x_test[index])) plt.show()

The picture is small, but you can understand that this is an airplane. Run the recognition:

 #      CIFAR-10 classes=['', '', '', '', '', '', '', '', '', ''] x = x_test[index] #  , .. Keras    x = np.expand_dims(x, axis=0) #   prediction = model.predict(x) #       print(prediction) #     prediction = np.argmax(prediction) print(classes[prediction])

We get a list of output values of neurons, almost all of them are close to zero, except for the first value, which corresponds to the plane.

[[9.81738389e-01 2.91262069e-07 1.82225723e-02 9.78524668e-07
5.89265142e-07 6.76223244e-10 1.03252004e-10 9.23009047e-09
3.71878523e-05 3.16599618e-08]]

Recognition was successful!

Results

It was possible to demonstrate the performance of TPU on the platform of Laboratory, it can be used to train neural networks on Keras. However, the CIFAR-10 data set is too small, it is not enough to fully load TPU resources. The acceleration in comparison with the GPU turned out to be small (you can check for yourself by choosing the GPU instead of the TPU as the accelerator and retraining the model again).

On Habré there is an article in which measured the performance of TPU and GPU V100 on the training network ResNet-50 . On this task, the TPU showed the same performance as the four V100 GPUs. It's nice that Google provides such a powerful accelerator for learning neural networks for free!

Video demonstration of teaching neural network Keras on TPU.

useful links

Source: https://habr.com/ru/post/428117/

All Articles