We track Millenium Falcon with TensorFlow

At the time of this writing, most major technology companies (such as IBM, Google, Microsoft, and Amazon) offer easy-to-use visual recognition APIs. Similar tools are offered by smaller companies, for example, Clarifai. But none of them offers a means to detect objects (object detection).

')
Both pictures show examples of tagging using the standard Watson Visual Recognition classifier. Only the first picture was first passed through the object detection model.

Object detection can greatly outperform “simple” visual recognition. But if you want to implement object detection, you have to work hard.

Depending on the situation, you may not need a custom object detection model. The corresponding TensorFlow API provides several models with different performance and accuracy based on COCO dataset.

For your convenience, I have compiled a complete list of objects that COCO models can detect:

If you want to detect logos or something else that is not included in the list, you will have to write your own custom detector. I wanted the machine to detect the Han Solo ship Millennium Falcon and the imperial fighter Tie Fighter. Obviously, this is a very important task, because you never know when it will be useful in life ...

Image annotation

Model training is a time consuming task. Probably, you just thought: “Hey, I don’t want to lay down my bones here!”. If so, you can read my article on the use of the finished model. This way is much easier.

You will need to collect a lot of images and provide them with annotations. They include the definition of the coordinates of the object and the corresponding designation. For a picture with two Tie Fighter annotations might look like this:

<annotation> <folder>images</folder> <filename>image1.jpg</filename> <size> <width>1000</width> <height>563</height> </size> <segmented>0</segmented> <object> <name>Tie Fighter</name> <bndbox> <xmin>112</xmin> <ymin>281</ymin> <xmax>122</xmax> <ymax>291</ymax> </bndbox> </object> <object> <name>Tie Fighter</name> <bndbox> <xmin>87</xmin> <ymin>260</ymin> <xmax>95</xmax> <ymax>268</ymax> </bndbox> </object> </annotation>

For my model, which will work with Star Wars, I collected 308 images, each with 2-3 objects. I recommend collecting at least 200-300 examples of each object.

“Wow,” you thought, “have to shovel hundreds of images and write a bunch of XML for everyone?”

Of course not! There are quite a few annotation tools, such as labelImg or RectLabel . I used RectLabel , but it is only for macOS. Although the tool will have to sweat, believe me. It took me 3-4 hours of continuous work to annotate my entire dataset.

If you have money, you can hire someone to do it. Or use something like Mechanical Turk . If you are a poor student, like me, and / or you like many hours of monotonous work, you can make annotations yourself.

If you do not want to write a conversion script, then when writing annotations, make sure that they are exported in the PASCAL VOC format. It is used by many, including me, so you can borrow the above script (I myself borrowed it from someone).

Before running the script, you need to prepare the data for processing TensorFlow.

Repository cloning

Clone my repository first . The directory structure should look like this:

 models |-- annotations | |-- label_map.pbtxt | |-- trainval.txt | `-- xmls | |-- 1.xml | |-- 2.xml | |-- 3.xml | `-- ... |-- images | |-- 1.jpg | |-- 2.jpg | |-- 3.jpg | `-- ... |-- object_detection | `-- ... `-- ...

I included my training data there, so you can run everything out of the box. But if you want to create a model with your data, you will have to add training images to images, XML annotations in annotations / xmls, and also have to update the trainval.txt and label_map.pbtxt.

trainval.txt is a list of files that allows you to find and correlate JPG and XML files. The following is the contents of the trainval.txt list, allowing you to find abc.jpg, abc.xml, 123.jpg, 123.xml, xyz.jpg and xyz.xml:

 abc 123 xyz

Note: make sure that except for the extensions, the names of the JPG and XML files are the same.

label_map.pbtxt - the list of objects that we want to detect. It should look something like this:

 item { id: 1 name: 'Millennium Falcon' } item { id: 2 name: 'Tie Fighter' }

Running script

Install Python and pip, and then install the script requirements:

 pip install -r requirements.txt

Add models and models/slim to PYTHONPATH :

 export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Important : this command must be executed every time the terminal is started. Or add it to the ~/.bashrc .

Run the script:

 python object_detection/create_tf_record.py

When he is done, you will have the files train.record and val.record. We will use them to train the model.

Download the base model

Learning from scratch of an object detector can take days, even when using multiple video cards . To speed up the process, we will take a detector already trained in another dataset and use some of its parameters to initialize our new model.

You can download the model from here . Here, all models differ in accuracy and speed of work. I used faster_rcnn_resnet101_coco .

Extract and transfer all model.ckpt files to the root directory of your repository.

You must have a file faster_rcnn_resnet101.config. These are the settings for working with the faster_rcnn_resnet101_coco model. If you have taken another model, you can find the corresponding configuration file here .

Can be trained

Run the script, after which the training should begin!

 python object_detection/train.py \ --logtostderr \ --train_dir=train \ --pipeline_config_path=faster_rcnn_resnet101.config

Note: replace the pipeline_config_path with the path to your configuration file.

 global step 1: global step 2: global step 3: global step 4: ...

Hooray! Works!

10 minutes later.

 global step 41: global step 42: global step 43: global step 44: ...

The computer starts to smoke.

 global step 71: global step 72: global step 73: global step 74: ...

How long will this go on?

The model that I used in the head gif passed through approximately 22,000 cycles.

What?

I used MacBook Pro. If you run this model on a similar machine, I assume that each cycle takes about 15 seconds. At this rate, getting a decent model will take 3-4 days of continuous work.

This is stupid, I do not have so much time

PowerAI to the rescue!

PowerAI

PowerAI allows you to train our model on IBM Power Systems with P100 graphics processors!

It took me only an hour for 10,000 cycles. And it was just one graphics processor. The real power of PowerAI lies in its ability to perform distributed learning with hundreds of GPUs with 95 percent efficiency.

Thanks to PowerAI, IBM recently set a new record for learning image recognition with an accuracy of 33.8% in 7 hours. The previous record belonged to Microsoft - accuracy of 29.9% in 10 days.

Sooo quick!

Since I’m teaching not millions of images, I don’t really need such resources. One processor is enough.

Nimbix Account Creation

Nimbix provides developers with trial accounts with 10 hours of free work on the PowerAI platform. You can register here .

Note: Registration is not automatic, so approval may take up to 24 hours.

After the registration is approved, you will receive an email with instructions on how to confirm and create an account. You will be asked for a promotional code, leave the field blank.

Now you can login here .

Deploy PowerAI Notebooks Application

Search for PowerAI Notebooks.

Click on the result and select TensorFlow.

Select the type of machine 32 thread POWER8, 128GB RAM, 1x P100 GPU w / NVLink (np8g1).

A panel appears, as in the image below. When the status of the server becomes Processing, you can access it.

Get the password by clicking on (click to show).

Then to start Notebooks, click Click here to connect.

Getting started

Click on the New drop down menu and select Terminal to open a new terminal window.

You will be greeted by a familiar interface:

Note: the terminal may not work in Safari.

The learning process runs the same way as on the local computer. If you are using my training data, then just clone my repository using the command:

git clone github.com/bourdakos1/Custom-Object-Detection.git

Or clone your repository. Then inside the root directory, run cd:

 cd Custom-Object-Detection

Execute the code below, downloading the previously trained model faster_rcnn_resnet101_coco, which we have already downloaded earlier.

 wget http://storage.googleapis.com/download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_11_06_2017.tar.gz tar -xvf faster_rcnn_resnet101_coco_11_06_2017.tar.gz mv faster_rcnn_resnet101_coco_11_06_2017/model.ckpt.*

Then update PYTHONPATH again, since you have a new terminal:

 export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Now you can run the command to start learning:

 python object_detection/train.py \ --logtostderr \ --train_dir=train \ --pipeline_config_path=faster_rcnn_resnet101.config

Downloading your model

When will the model be ready? Depends on your training data. The more of them, the more cycles you need to drive. I got a pretty smart model after about 4500 cycles. And it reached its peak in about 20,000 cycles. In general, I drove 200,000 cycles, but the model did not become better.

I recommend downloading your model every 5000 cycles or so, and evaluating its work in order to understand whether you are moving in the right direction. Click on the Jupyter logo in the upper left corner, then go through the file tree to Custom-Object-Detection / train.

Download all model.ckpt files with the largest number in the title.

• model.ckpt-STEP_NUMBER.data-00000-of-00001
• model.ckpt-STEP_NUMBER.index
• model.ckpt-STEP_NUMBER.meta

Note: you can download only one file at a time.

Note: after completing the training, click on the red button, otherwise the clock will go on indefinitely.

Export the resulting graph

To use your model in code, you need to convert checkpoint files (model.ckpt-STEP_NUMBER. *) Into a fixed inference graph .

Transfer the downloaded checkpoint files to the root folder of your repository.

Then run this command:

 python object_detection/export_inference_graph.py \ --input_type image_tensor \ --pipeline_config_path faster_rcnn_resnet101.config \ --trained_checkpoint_prefix model.ckpt-STEP_NUMBER \ --output_directory output_inference_graph

Do not forget export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim .

A new output_inference_graph directory should appear with the file frozen_inference_graph.pb . We need him.

Model testing

Now run the command:

 python object_detection/object_detection_runner.py

It will apply your object detection model, located in output_inference_graph / frozen_inference_graph.pb, to all the images in the test_images directory and write the results to the output / test_images directory.

results

This is what we get when we run through the model for all frames of the excerpt from "Star Wars: The Awakening of Power."

Source: https://habr.com/ru/post/345250/

All Articles