docker pull tensorflow/serving:latest
ResNet-50 v2
model, specifically the Channels_last (NHWC) variant in SavedModel : as a rule, it works better on the CPU. models/ 1/ saved_model.pb variables/ variables.data-00000-of-00001 variables.index
1/
directory corresponds to the version 1 model, which contains the model architecture saved_model.pb
with a snapshot of the model weights (variables). docker run -d -p 9000:8500 \ -v $(pwd)/models:/models/resnet -e MODEL_NAME=resnet \ -t tensorflow/serving:latest
resnet
model at the gRPC and HTTP endpoints: ... I tensorflow_serving/core/loader_harness.cc:86] Successfully loaded servable version {name: resnet version: 1} I tensorflow_serving/model_servers/server.cc:286] Running gRPC ModelServer at 0.0.0.0:8500 ... I tensorflow_serving/model_servers/server.cc:302] Exporting HTTP/REST API at:localhost:8501 ...
tensorflow
for service functions. virtualenv .env && source .env/bin/activate && \ pip install numpy grpcio opencv-python tensorflow tensorflow-serving-api
ResNet-50 v2
model waits on input for floating point tensors in the formatted channels_last data structure (NHWC). Therefore, the input image is read using opencv-python and loaded into the numpy array (height Ă— width Ă— channels) as the float32 data type. The script below creates a prediction client stub and loads the JPEG data into the numpy array, converts it into tensor_proto, to make a gRPC prediction request: #!/usr/bin/env python from __future__ import print_function import argparse import numpy as np import time tt = time.time() import cv2 import tensorflow as tf from grpc.beta import implementations from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2 parser = argparse.ArgumentParser(description='incetion grpc client flags.') parser.add_argument('--host', default='0.0.0.0', help='inception serving host') parser.add_argument('--port', default='9000', help='inception serving port') parser.add_argument('--image', default='', help='path to JPEG image file') FLAGS = parser.parse_args() def main(): # create prediction service client stub channel = implementations.insecure_channel(FLAGS.host, int(FLAGS.port)) stub = prediction_service_pb2.beta_create_PredictionService_stub(channel) # create request request = predict_pb2.PredictRequest() request.model_spec.name = 'resnet' request.model_spec.signature_name = 'serving_default' # read image into numpy array img = cv2.imread(FLAGS.image).astype(np.float32) # convert to tensor proto and make request # shape is in NHWC (num_samples x height x width x channels) format tensor = tf.contrib.util.make_tensor_proto(img, shape=[1]+list(img.shape)) request.inputs['input'].CopyFrom(tensor) resp = stub.Predict(request, 30.0) print('total time: {}s'.format(time.time() - tt)) if __name__ == '__main__': main()
python tf_serving_client.py --image=images/pupper.jpg total time: 2.56152906418s
outputs { key: "classes" value { dtype: DT_INT64 tensor_shape { dim { size: 1 } } int64_val: 238 } } outputs { key: "probabilities" ...
I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
Instruction set | Flags |
---|---|
Avx | --copt = -mavx |
AVX2 | --copt = -mavx2 |
Fma | --copt = -mfma |
SSE 4.1 | --copt = -msse4.1 |
SSE 4.2 | --copt = -msse4.2 |
All processor supported | --copt = -march = native |
USER=$1 TAG=$2 TF_SERVING_VERSION_GIT_BRANCH="r1.13" git clone --branch="$TF_SERVING_VERSION_GIT_BRANCH" https://github.com/tensorflow/serving
TF_SERVING_BUILD_OPTIONS="--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.1 --copt=-msse4.2"
--local_resources=2048,.5,1.0
. For information on flags, see the Tensorflow Serving and Docker Help, as well as the Bazel documentation . #!/bin/bash USER=$1 TAG=$2 TF_SERVING_VERSION_GIT_BRANCH="r1.13" git clone --branch="${TF_SERVING_VERSION_GIT_BRANCH}" https://github.com/tensorflow/serving TF_SERVING_BUILD_OPTIONS="--copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-msse4.1 --copt=-msse4.2" cd serving && \ docker build --pull -t $USER/tensorflow-serving-devel:$TAG \ --build-arg TF_SERVING_VERSION_GIT_BRANCH="${TF_SERVING_VERSION_GIT_BRANCH}" \ --build-arg TF_SERVING_BUILD_OPTIONS="${TF_SERVING_BUILD_OPTIONS}" \ -f tensorflow_serving/tools/docker/Dockerfile.devel . cd serving && \ docker build -t $USER/tensorflow-serving:$TAG \ --build-arg TF_SERVING_BUILD_IMAGE=$USER/tensorflow-serving-devel:$TAG \ -f tensorflow_serving/tools/docker/Dockerfile .
intra_op_parallelism_threads
inter_op_parallelism_threads
0
. This means that the system itself selects the corresponding number, which most often means one thread per core. However, the parameter can be manually changed for multi-core concurrency. docker run -d -p 9000:8500 \ -v $(pwd)/models:/models/resnet -e MODEL_NAME=resnet \ -t $USER/tensorflow-serving:$TAG \ --tensorflow_intra_op_parallelism=4 \ --tensorflow_inter_op_parallelism=4
python tf_serving_client.py --image=images/pupper.jpg total time: 1.64234706879s
tensorflow_serving
and tensorflow
libraries makes a significant contribution to the delay. Each unnecessary tf.contrib.util.make_tensor_proto
call also adds a split second.tensorflow_serving
and tensorflow
packages.tensorflow
and tensorflow_serving
- and then there is no need to pull out the entire (heavy) Tensorflow library on the client.tensorflow
and tensorflow_serving
and add the grpcio-tools
package. pip uninstall tensorflow tensorflow-serving-api && \ pip install grpcio-tools==1.0.0
tensorflow/tensorflow
and tensorflow/serving
repositories and copy the following protobuf files into the client project: tensorflow/serving/ tensorflow_serving/apis/model.proto tensorflow_serving/apis/predict.proto tensorflow_serving/apis/prediction_service.proto tensorflow/tensorflow/ tensorflow/core/framework/resource_handle.proto tensorflow/core/framework/tensor_shape.proto tensorflow/core/framework/tensor.proto tensorflow/core/framework/types.proto
protos/
directory while maintaining the original paths: protos/ tensorflow_serving/ apis/ *.proto tensorflow/ core/ framework/ *.proto
prediction_service.
. prediction_service.
.grpcio.tools.protoc
: PROTOC_OUT=protos/ PROTOS=$(find . | grep "\.proto$") for p in $PROTOS; do python -m grpc.tools.protoc -I . --python_out=$PROTOC_OUT --grpc_python_out=$PROTOC_OUT $p done
tensorflow_serving
module can be removed: from tensorflow_serving.apis import predict_pb2 from tensorflow_serving.apis import prediction_service_pb2
protos/tensorflow_serving/apis
: from protos.tensorflow_serving.apis import predict_pb2 from protos.tensorflow_serving.apis import prediction_service_pb2
make_tensor_proto
, which is needed to wrap the python / numpy object as a TensorProto object. import tensorflow as tf ... tensor = tf.contrib.util.make_tensor_proto(features) request.inputs['inputs'].CopyFrom(tensor)
from protos.tensorflow.core.framework import tensor_pb2 from protos.tensorflow.core.framework import tensor_shape_pb2 from protos.tensorflow.core.framework import types_pb2 ... # ensure NHWC shape and build tensor proto tensor_shape = [1]+list(img.shape) dims = [tensor_shape_pb2.TensorShapeProto.Dim(size=dim) for dim in tensor_shape] tensor_shape = tensor_shape_pb2.TensorShapeProto(dim=dims) tensor = tensor_pb2.TensorProto( dtype=types_pb2.DT_FLOAT, tensor_shape=tensor_shape, float_val=list(img.reshape(-1))) request.inputs['inputs'].CopyFrom(tensor)
python tf_inception_grpc_client.py --image=images/pupper.jpg total time: 0.58314920859s
--enable_batching
and --batching_parameters_file
. Parameters are set according to SessionBundleConfig . For systems on a CPU, set num_batch_threads
to the number of available cores. For GPUs, see the appropriate options here . ... batch = [] for jpeg in os.listdir(FLAGS.images_path): path = os.path.join(FLAGS.images_path, jpeg) img = cv2.imread(path).astype(np.float32) batch.append(img) ... batch_np = np.array(batch).astype(np.float32) dims = [tensor_shape_pb2.TensorShapeProto.Dim(size=dim) for dim in batch_np.shape] t_shape = tensor_shape_pb2.TensorShapeProto(dim=dims) tensor = tensor_pb2.TensorProto( dtype=types_pb2.DT_FLOAT, tensor_shape=t_shape, float_val=list(batched_np.reshape(-1))) request.inputs['inputs'].CopyFrom(tensor)
outputs { key: "classes" value { dtype: DT_INT64 tensor_shape { dim { size: 2 } } int64_val: 238 int64_val: 121 } } ...
Source: https://habr.com/ru/post/445928/
All Articles