November 23, 2017

Object Detection on a Raspberry Pi

By

Theta

Image recognition has become a part of our daily lives, and the technology behind it is advancing at a steady pace. We thought it'd be cool to use the increasing speed and tiny size of lightweight computers like the Raspberry Pi, as well as the efficiency and portability of machine learning libraries such as Tensorflow, to create a standalone, handheld object detector.

The first step is to find out whether running live object detection on a small device such as the Raspberry Pi is viable; until recently the technology to detect multiple objects at the speed we require just wasn’t there. Luckily for us, the folks at Google Brain were kind enough to open-source their object detection API, which does just this.

The use cases for a portable object detector are many and varied - there are places where having a full PC set-up isn't viable, and where an internet connection may not be available for outsourcing the detection to the cloud. The Raspberry Pi is so lightweight that you can even mount it on a drone.

Initial Setup

To get started with object detection on the Raspberry Pi, you of course need to have a Raspberry Pi. We used a model 3, running Rasbian Jessie. You also need a camera attached to the Pi.  Once the pi is up and running and connected to a monitor (or through SSH), you can open up the terminal and install the pi camera by entering the following commands:

sudo apt-get update sudo apt-get install python3-picamera

You can then test out the camera by running the raspistill command tool like so: raspistill –o filename.jpg

Once we’ve confirmed that the hardware is working, we have to make sure we’ve got the Python Package Index installed:

sudo apt-get install python3-pip

This will allow us to install most of the required packages, though because the OS we’re using has a slightly limited package index, we’ll have to do a couple by hand.

Software Installation

There are a number of libraries you need to install to get object detection up and running, the main ones being Tensorflow, OpenCV, and the Object Detection API. Installing these on the Raspberry Pi is a little different to installing them on desktop Unix-like environments, so take care that any tutorials you’re following are going to be compatible with the version of Rasbian that you’re using.

Tensorflow

Tensorflow is an open-source machine learning library developed by the Google Brain team. Tensorflow is the core of our object detection, and should be installed first.  Regular Tensorflow doesn’t run on the Raspberry Pi, so we’re going to use Sam Jabrahams TensorFlow on Raspberry Pi 3.

Detailed instructions are available on the Github page, but the main commands required are as follows:

sudo apt-get update sudo apt-get install python3-pip python3-dev wget https://github.com/samjabrahams/tensorflow-on-raspberry-pi/releases/download/v1.1.0/tensorflow-1.1.0-cp34-cp34m-linux_armv7l.whl sudo pip3 install tensorflow-1.1.0-cp34-cp34m-linux_armv7l.whl sudo pip3 uninstall mock

If you run into any errors, check out the official Github page.

Object Detection API

Object detection comes as part of the official Tensorflow research models.  Its purpose is to detect multiple objects in single images.    

Clone this repository somewhere handy:

git clone https://github.com/tensorflow/models.git

We need a variety of libraries to run object detection, so we first need to install all of these:

sudo apt-get install protobuf-compiler sudo pip3 install pillow sudo pip3 install lxml sudo pip3 install jupyter sudo pip3 install matplotlib

If any of these fail, you may need to download the wheel file manually. These can be found on pypi.python.org.

I needed to do this for lxml, as well as a couple of other dependencies for OpenCV. These packages may have dependencies of their own, so if you have any trouble installing these through Pip, go to their official installation instructions and install them manually.

Official installation instructions for the above packages:

Pillow, Lxml, Jupyter, Matplotlib

We now need to compile the Object Detection API using Protobuf. Navigate to your tensorflow/models/research/ directory, and run the following:

sudo protoc object_detection/protos/*.proto --python_out=.

To run object detection, you’ll need to append two directories to your PYTHONPATH.  This can be added to the end of your ~/.bashrc file, or ran manually with each new terminal you run (From the tensorflow/models/research/ directory):

export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/slim

Finally, test the installation with the following command:

python3 object_detection/builders/model_builder_test.py

Note:  I needed to add a copy of object_detection/data into my projects src/ directory for my project to be able to find the required files.

OpenCV

OpenCV is a powerful computer vision framework, containing a huge number of algorithms for processing and analysing images. We’re going to be using it for some its simpler features, but having the full set of tools available means we can later process our images if we want to. OpenCV is perhaps one of the more error-prone libraries to install on the Raspberry Pi, so take care during this step. You should be able to find a solution on Stack Overflow for most issues, but I’ll make note at the points that caused us grief.

Initial Commands:

sudo apt-get update sudo apt-get upgrade sudo rpi-update sudo reboot sudo apt-get install build-essential git cmake pkg-config sudo apt-get install libjpeg-dev libtiff5-dev libjasper-dev libpng12-dev sudo apt-get install libavcodec-dev libavformat-dev libswscale-dev libv4l-dev sudo apt-get install libxvidcore-dev libx264-dev sudo apt-get install pkg-config sudo apt-get install libgtk2.0-dev sudo apt-get install libatlas-base-dev gfortran cd ~ git clone https://github.com/Itseez/opencv.git cd opencv git checkout 3.1.0 cd ~ git clone https://github.com/Itseez/opencv_contrib.git cd opencv_contrib git checkout 3.1.0

Install:

sudo apt-get install python3-dev wget https://bootstrap.pypa.io/get-pip.py sudo python3 get-pip.py pip3 install numpy cd ~/opencv mkdir build cd build cmake -D CMAKE_BUILD_TYPE=RELEASE \ -D CMAKE_INSTALL_PREFIX=/usr/local \ -D INSTALL_C_EXAMPLES=OFF \ -D INSTALL_PYTHON_EXAMPLES=ON \ -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules \ -D BUILD_EXAMPLES=ON .. make sudo make install sudo ldconfig

Warning: sudo make install can take a while.

Tip: Depending on your environment, you may need to add sudo to the start of more of the commands above.  For example, if pip3 install numpy doesn’t work; try sudo pip3 install numpy instead.

If you receive an error about the GTX version you are running when you attempt to use OpenCV:

  1. Navigate to your matplotlibrc file in usr/local/lib/python3.4/dist-packages/matplotlib/mpl-data/
  2. Find the backend line, remove the #, and change the line to: backend : TkAgg

The fun stuff

If all went well, we’ve now got the Object Detection API all ready to go, and we’ve got OpenCV available to display our detection. The Tensorflow team have provided a great tutorial for getting this up and running, which I have adapted here.

We can now write a small python program to:

  1. Initialize object detection with a pre-trained model (a frozen inference graph)
  2. Stream camera frames
  3. Run object detection on each frame
  4. Output resulting image to an OpenCV window

First we import all that stuff we just installed.

import numpy as np import os import six.moves.urllib as urllib import sys import tarfile import tensorflow as tf import zipfile import cv2 from picamera.array import PiRGBArray import picamera from collections import defaultdict from io import StringIO from PIL import Image from object_detection.utils import label_map_util from object_detection.utils import visualization_utils as vis_util

We now download a frozen inference graph – this is from the COCO (Common Objects in Context) dataset.  There are multiple types you can use here, of varying levels of speed and accuracy.  Because we are wanting to stream object detection, and are doing so on a raspberry pi, we use the fastest one; ssd_mobilenet_v1_coco_11_06_2017.  You can find the rest here.

MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017' #fast #MODEL_NAME = 'faster_rcnn_resnet101_coco_11_06_2017' #medium speed MODEL_FILE = MODEL_NAME + '.tar.gz' DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/' PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb' PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt') NUM_CLASSES = 90 self.IMAGE_SIZE = (12, 8) fileAlreadyExists = os.path.isfile(PATH_TO_CKPT) if not fileAlreadyExists: print('Downloading frozen inference graph') opener = urllib.request.URLopener() opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE) tar_file = tarfile.open(MODEL_FILE) for file in tar_file.getmembers(): file_name = os.path.basename(file.name) if 'frozen_inference_graph.pb' in file_name: tar_file.extract(file, os.getcwd())

We can now use this to create our detection graph, the final piece of Tensorflow setup before we can begin our detection.

self.detection_graph = tf.Graph() with self.detection_graph.as_default(): od_graph_def = tf.GraphDef() with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid: serialized_graph = fid.read() od_graph_def.ParseFromString(serialized_graph) tf.import_graph_def(od_graph_def, name='') self.label_map = label_map_util.load_labelmap(PATH_TO_LABELS) self.categories = label_map_util.convert_label_map_to_categories(self.label_map, max_num_classes=NUM_CLASSES, use_display_name=True) self.category_index = label_map_util.create_category_index(self.categories)

Set up the PiCamera, and create an array to store our streaming data in.

camera = picamera.PiCamera() camera.resolution = (1280, 960) camera.vflip = True camera.framerate = 30 rawCapture = PiRGBArray(camera, size = (1280, 960))

Our main loop. This initializes each image, runs the Tensorflow session on it, and visualizes the detection boxes onto the image. It then displays the image in an OpenCV window.

Note: The bottom part of this loop where we exit on a ‘q’ press is required for the OpenCV window to work, so make sure it’s there.

with self.detection_graph.as_default(): with tf.Session(graph=self.detection_graph) as sess: for frame in camera.capture_continuous(rawCapture, format="bgr"): image_np = np.array(frame.array) # Expand dimensions since the model expects images to have shape: [1, None, None, 3] image_np_expanded = np.expand_dims(image_np, axis=0) # Definite input and output Tensors for detection_graph image_tensor = self.detection_graph.get_tensor_by_name('image_tensor:0') # Each box represents a part of the image where a particular object was detected. detection_boxes = self.detection_graph.get_tensor_by_name('detection_boxes:0') # Each score represent how level of confidence for each of the objects. # Score is shown on the result image, together with the class label. detection_scores = self.detection_graph.get_tensor_by_name('detection_scores:0') detection_classes = self.detection_graph.get_tensor_by_name('detection_classes:0') num_detections = self.detection_graph.get_tensor_by_name('num_detections:0') print('Running detection..') (boxes, scores, classes, num) = sess.run( [detection_boxes, detection_scores, detection_classes, num_detections], feed_dict={image_tensor: image_np_expanded}) print('Done. Visualizing..') vis_util.visualize_boxes_and_labels_on_image_array( image_np, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), self.category_index, use_normalized_coordinates=True, line_thickness=8) cv2.imshow('object detection', cv2.resize(image_np, (1280, 960))) rawCapture.truncate(0) if cv2.waitKey(25) & 0xFF == ord('q'): cv2.destroyAllWindows() break print('exiting') cap.release() cv2.destroyAllWindows()

Result

Objects detected: tv (91%), keyboard (54%), cup (76%), bed (64%)

As you can see, the result is pretty good!  The object detection isn’t quite as accurate close up – it seems to think my hand is a bed - but everything else is pretty accurate. Keep in mind that the COCO dataset we're using is trained on a set of about 180 common objects, so you may need to further train this model using your own images if you want something a bit more specific.

The above runs at about one frame per second on our Raspberry Pi, which isn’t too bad for real time object detection on such a small device. This could likely be optimised too – for example, the images could be simplified and processed before passing them into Tensorflow.

All in all, we thought this was a pretty good result for a first attempt.  With a bit of optimisation, this could work in a variety of scenarios. You could mount a raspberry pi mounted inside of a robot, allowing it to navigate and recognise objects or people. You could attach a camera to the bottom of a drone, monitoring crops as it flies over. You could mount it onto a beehive, warning you of potential wasp attacks. I’m sure there are plenty of things you could do with portable object detection that we haven’t thought of, and I’m excited to see what this technology can offer in the future.