Making an image recognition service using TensorFlow Serving

There always comes a time when the trained model needs to be released in production. To do this, you often have to write bicycles in the form of machine learning library wrappers. But if your model is implemented on Tensorflow, then I have good news for you - you won't have to write a bicycle, because you can use Tensorflow Serving.

In this article we will look at how to use Tensorflow Serving to quickly create a productive service for image recognition.

Tensorflow Serving - a system for deploying Tensorflow-models with features such as:

automatic batching;
hot swapping of models and versioning;
the ability to process parallel requests.

An additional advantage is the ability to overtake the model from Keras to the Tensorflow model and secure it through Serving (if of course the Keens uses the Tensorflow backend).

How Tensorflow Serving Works

The main part of Tensorflow Serving is the Model Server.

Consider the operation of the server models. After launch, the model server loads the model from the path specified at startup and starts listening to the specified port. The server communicates with clients via remote procedure calls using the gRPC library. This allows you to create a client application in any language that supports gRPC.

If the model server receives a request, it can perform the following actions:

Run model execution for this query.
Combine several requests into a batch and perform a calculation for the entire batch if the corresponding option (the --enable_batching flag) is activated at startup. Processing by batchy is more efficient (especially on GPU), so this feature allows you to increase the number of requests processed per unit of time.
Put the request in the queue, if at the moment the computing resources are busy.

As previously mentioned, Tensorflow Serving supports hot swapping of models. The model server constantly scans the path specified at launch for new models and when a new version is found, it automatically downloads this version. This allows you to upload new versions of models without having to stop the server models.

Thus, Tensorflow Serving has enough functionality to complete production in production. Therefore, the use of such approaches as the creation of your own wrapper over the model seems unjustified, since Tensorflow Serving offers the same features and even more without the need to write and maintain self-written solutions.

Installation

Building Tensorflow Serving is probably the hardest part of using this tool. In principle, there is nothing difficult, but there are several underwater rakes. I will tell about them in this section.

For assembly the bazel assembly system is used .

The installation of Tensorflow Serving is described on the official website. https://tensorflow.imtqy.com/serving/setup . I will not describe in detail each step, but will talk about the problems that may arise during the installation.

With all the steps to configuring Tensorflow ( ./configure ) there should be no problems.

When configuring Tensorflow for almost all parameters, you can leave default values. But if you choose to install with CUDA, the configurator will ask for the version of cuDNN. It is necessary to introduce the full version of cuDNN (in my case 5.1.5).

We reach the assembly ( bazel build tensorflow_serving/... ).

First you need to determine what optimizations are available to your processor and specify them during assembly, since bazel cannot recognize them automatically.
Thus, the build command is complicated to the following:

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 tensorflow_serving/...

Check that all these optimizations are available to your processor. My processor does not support AVX2 and FMA so I compiled with the following command:

bazel build -c opt --copt=-mavx --copt=-mfpmath=both --copt=-msse4.2 tensorflow_serving/...

By default, the Tensorflow build consumes a lot of memory, so if you don’t have much of it, then you need to limit the resource consumption. You can do this with the following flag --local_resources availableRAM,availableCPU,availableIO (RAM in MB, CPU in cores, available I / O (1.0 being average workstation), for example, --local_resources 2048,.5,1.0 ).

If you want to build Tensorflow Serving with GPU support, then you need to add the flag --config=cuda . Get about this team.

bazel build -c opt --copt=-mavx --copt=-mfpmath=both --copt=-msse4.2 --config=cuda tensorflow_serving/...

The following error may occur during assembly.

Error text

ERROR: no such target '@org_tensorflow//third_party/gpus/crosstool:crosstool': target 'crosstool' not declared in package 'third_party/gpus/crosstool' defined by /home/movchan/.cache/bazel/_bazel_movchan/835a50f8a234772a7d7dac38871b88e9/external/org_tensorflow/third_party/gpus/crosstool/BUILD.

To correct this error, in the tools/bazel.rc replace @org_tensorflow//third_party/gpus/crosstool with @local_config_cuda//crosstool:toolchain

Another error may appear.

Error text

ERROR: /home/movchan/.cache/bazel/_bazel_movchan/835a50f8a234772a7d7dac38871b88e9/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:23:1: C++ compilation of rule '@org_tensorflow//tensorflow/contrib/nccl:python/ops/_nccl_ops.so' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 80 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1. In file included from external/org_tensorflow/tensorflow/conERROR: /home/movchan/.cache/bazel/_bazel_movchan/835a50f8a234772a7d7dac38871b88e9/external/org_tensorflow/tensorflow/contrib/nccl/BUILD:23:1: C++ compilation of rule '@org_tensorflow//tensorflow/contrib/nccl:python/ops/_nccl_ops.so' failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE '-D_FORTIFY_SOURCE=1' -fstack-protector -fPIE -Wall -Wunused-but-set-parameter ... (remaining 80 argument(s) skipped): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1. In file included from external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.cc:15:0: external/org_tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.h:23:44: fatal error: external/nccl_archive/src/nccl.h: No such file or directory compilation terminated.

To fix it, remove the prefix /external/nccl_archive in the line #include "external/nccl_archive/src/nccl.h" in the following files:
tensorflow/tensorflow/contrib/nccl/kernels/nccl_ops.cc tensorflow/tensorflow/contrib/nccl/kernels/nccl_manager.h

Hooray! Collected finally!

Export Model

Exporting a model from Tensorflow is described in detail in https://tensorflow.imtqy.com/serving/serving_basic in the section "Train And Export TensorFlow Model".

For export, the class SavedModelBuilder . I use Keras to train Tensorflow-models, t.ch. I will describe the process of exporting a model from Keras to Serving using this module.

Export code ResNet-50, trained on ImageNet.

 import os import tensorflow as tf from keras.applications.resnet50 import ResNet50 from keras.preprocessing import image from keras.applications.resnet50 import preprocess_input, decode_predictions from tensorflow.contrib.session_bundle import exporter import keras.backend as K #    test time. K.set_learning_phase(0) #      model = ResNet50(weights='imagenet') sess = K.get_session() #        export_path_base = './model' export_version = 1 export_path = os.path.join( tf.compat.as_bytes(export_path_base), tf.compat.as_bytes(str(export_version))) print('Exporting trained model to', export_path) builder = tf.saved_model.builder.SavedModelBuilder(export_path) #       model_input = tf.saved_model.utils.build_tensor_info(model.input) model_output = tf.saved_model.utils.build_tensor_info(model.output) #    ,        prediction_signature = ( tf.saved_model.signature_def_utils.build_signature_def( inputs={'images': model_input}, outputs={'scores': model_output}, method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME)) #    SavedModelBuilder legacy_init_op = tf.group(tf.tables_initializer(), name='legacy_init_op') builder.add_meta_graph_and_variables( sess, [tf.saved_model.tag_constants.SERVING], signature_def_map={ 'predict': prediction_signature, }, legacy_init_op=legacy_init_op) builder.save()

Instead of 'images' and 'scores' , you can specify any names when setting inputs and outputs. These names will be used further.

If the model has several inputs and / or outputs, then you need to specify this in tf.saved_model.signature_def_utils.build_signature_def . For this you need to use model.inputs and model.outputs . Then the installation code for inputs and outputs will look like this:

 #       model_input = tf.saved_model.utils.build_tensor_info(model.inputs[0]) model_output = tf.saved_model.utils.build_tensor_info(model.outputs[0]) model_aux_input = tf.saved_model.utils.build_tensor_info(model.inputs[1]) model_aux_output = tf.saved_model.utils.build_tensor_info(model.outputs[1]) #     prediction_signature = ( tf.saved_model.signature_def_utils.build_signature_def( inputs={'images': model_input, 'aux_input': model_aux_input}, outputs={'scores': model_output, 'aux_output': model_aux_output}, method_name=tf.saved_model.signature_constants.PREDICT_METHOD_NAME))

It is also worth noting that signature_def_map indicates all available methods (signatures), of which there can be more than 1. In the example above, only one method has been added - predict . The name of the method will be used later.

Running the model server

Starting the model server is performed with the following command:

./bazel-bin/tensorflow_serving/model_servers/tensorflow_model_server --enable_batching --port=9001 --model_name=resnet50 --model_base_path=/home/movchan/ml/serving_post/model

Consider what the flags mean in this command.

enable_batching - the flag of activating automatic batching, allows Tensorflow Serving to combine requests into batch files for more efficient processing.
port - the port that the model will listen on.
model_name is the name of the model (will be used later).
model_base_path - the path to the model (where you saved it in the previous step).

Using Tensorflow Serving from python

To begin with, we will install the grpcio package via pip.

sudo pip3 install grpcio

In general, according to the tutorial on the official website, it is proposed to collect python-scripts via bazel. But I do not like this idea, t.ch. I found another way.

To use the python API, you can copy (softlink) the bazel-bin/tensorflow_serving/example/inception_client.runfiles/tf_serving/tensorflow_serving . It contains everything you need to work python API. I usually just copy to the directory where the script is located that uses this API.

Consider an example of using the python API.

 import numpy as np from grpc.beta import implementations from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2 #        Serving host = '127.0.0.1' port = 9001 channel = implementations.insecure_channel(host, port) stub = prediction_service_pb2.beta_create_PredictionService_stub(channel) #   request = predict_pb2.PredictRequest() #   ,       ( model_name) request.model_spec.name = 'resnet50' #   ,       (. signature_def_map). request.model_spec.signature_name = 'predict' #   .        . request.inputs['images'].CopyFrom( tf.contrib.util.make_tensor_proto(image, shape=image.shape)) #  .   - timeout. result = stub.Predict(request, 10.0) #  .        . prediction = np.array(result.outputs['scores'].float_val)

Full code of the python API usage example

 import time import sys import tensorflow as tf import numpy as np from grpc.beta import implementations from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2 from keras.preprocessing import image from keras.applications.resnet50 import preprocess_input, decode_predictions def preprocess_image(img_path): img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x) return x def get_prediction(host, port, img_path): image = preprocess_image(img_path) start_time = time.time() channel = implementations.insecure_channel(host, port) stub = prediction_service_pb2.beta_create_PredictionService_stub(channel) request = predict_pb2.PredictRequest() request.model_spec.name = 'resnet50' request.model_spec.signature_name = 'predict' request.inputs['images'].CopyFrom( tf.contrib.util.make_tensor_proto(image, shape=image.shape)) result = stub.Predict(request, 10.0) prediction = np.array(result.outputs['scores'].float_val) return prediction, (time.time()-start_time)*1000. if __name__ == "__main__": if len(sys.argv) != 4: print ('usage: serving_test.py <host> <port> <img_path>') print ('example: serving_test.py 127.0.0.1 9001 ~/elephant.jpg') exit() host = sys.argv[1] port = int(sys.argv[2]) img_path = sys.argv[3] for i in range(10): prediction, elapsed_time = get_prediction(host, port, img_path) if i == 0: print('Predicted:', decode_predictions(np.atleast_2d(prediction), top=3)[0]) print('Elapsed time:', elapsed_time, 'ms')

Compare the speed of Tensorflow Serving with Keras-version.

Keras code

 import sys import time from keras.applications.resnet50 import ResNet50 from keras.preprocessing import image from keras.applications.resnet50 import preprocess_input, decode_predictions import numpy as np def preprocess_image(img_path): img = image.load_img(img_path, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x) return x def get_prediction(model, img_path): image = preprocess_image(img_path) start_time = time.time() prediction = model.predict(image) return prediction, (time.time()-start_time)*1000. if __name__ == "__main__": if len(sys.argv) != 2: print ('usage: keras_test.py <img_path>') print ('example: keras_test.py ~/elephant.jpg') exit() img_path = sys.argv[1] model = ResNet50(weights='imagenet') for i in range(10): prediction, elapsed_time = get_prediction(model, img_path) if i == 0: print('Predicted:', decode_predictions(np.atleast_2d(prediction), top=3)[0]) print('Elapsed time:', elapsed_time, 'ms')

All measurements were made on the CPU.

For testing, take this photo of a cat from Pexels.com , which I found through https://everypixel.com .

Keras

 Predicted: [('n02127052', 'lynx', 0.59509182), ('n02128385', 'leopard', 0.050437182), ('n02123159', 'tiger_cat', 0.049577814)] Elapsed time: 419.47126388549805 ms Elapsed time: 125.33354759216309 ms Elapsed time: 122.70569801330566 ms Elapsed time: 122.8172779083252 ms Elapsed time: 122.3604679107666 ms Elapsed time: 116.24360084533691 ms Elapsed time: 116.51420593261719 ms Elapsed time: 113.5416030883789 ms Elapsed time: 112.34736442565918 ms Elapsed time: 110.09907722473145 ms

Serving

 Predicted: [('n02127052', 'lynx', 0.59509176015853882), ('n02128385', 'leopard', 0.050437178462743759), ('n02123159', 'tiger_cat', 0.049577809870243073)] Elapsed time: 117.71702766418457 ms Elapsed time: 75.67715644836426 ms Elapsed time: 72.94225692749023 ms Elapsed time: 71.62714004516602 ms Elapsed time: 71.4271068572998 ms Elapsed time: 74.54872131347656 ms Elapsed time: 70.8014965057373 ms Elapsed time: 70.94025611877441 ms Elapsed time: 70.58024406433105 ms Elapsed time: 68.82333755493164 ms

As you can see, Serving is even faster than the Keras version. This will be even more noticeable with a large number of requests.

Implementing REST API to Tensorflow Serving via Flask

First install Flask.

sudo pip3 install flask

Full REST service code

 from flask import Flask from flask import request from flask import jsonify import tensorflow as tf from grpc.beta import implementations from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2 from keras.preprocessing import image from keras.applications.resnet50 import preprocess_input, decode_predictions import numpy as np application = Flask(__name__) host = '127.0.0.1' port = 9001 def preprocess_image(img): img = image.load_img(img, target_size=(224, 224)) x = image.img_to_array(img) x = np.expand_dims(x, axis=0) x = preprocess_input(x) return x def get_prediction(img): image = preprocess_image(img) channel = implementations.insecure_channel(host, port) stub = prediction_service_pb2.beta_create_PredictionService_stub(channel) request = predict_pb2.PredictRequest() request.model_spec.name = 'resnet50' request.model_spec.signature_name = 'predict' request.inputs['images'].CopyFrom( tf.contrib.util.make_tensor_proto(image, shape=image.shape)) result = stub.Predict(request, 10.0) prediction = np.array(result.outputs['scores'].float_val) return decode_predictions(np.atleast_2d(prediction), top=3)[0] @application.route('/predict', methods=['POST']) def predict(): if request.files.get('data'): img = request.files['data'] resp = get_prediction(img) response = jsonify(resp) return response else: return jsonify({'status': 'error'}) if __name__ == "__main__": application.run()

Run the service.

python3 serving_service.py

Test the service. Send a request through curl.

curl '127.0.0.1:5000/predict' -X POST -F "data=@./cat.jpeg"

We get the following answer.

[ [ "n02127052", "lynx", 0.5950918197631836 ], [ "n02128385", "leopard", 0.05043718218803406 ], [ "n02123159", "tiger_cat", 0.04957781359553337 ] ]

Wonderful! It is working!

Conclusion

In this article, we looked at how Tensorflow Serving can be used to deploy models in production. We also looked at how you can implement a simple REST service on Flask that accesses the model server.

Links

Tensorflow Serving official website
Code of all article scripts

Source: https://habr.com/ru/post/332584/

All Articles