Pages

Monday, May 4, 2015

The Jetson TK1 and the Caffe Deep Learning Project

I have been very interested in starting to use OpenCV for adding vision to my interactive space projects. I recently bought a Jetson TK1 DevKit to help me with this. The Jetson TK1 is a development board for the NVIDIA Kepler GPU, a GPU built for mobile platforms. The GPU has 192 CUDA cores, which meant that I could not only use an accelerated OpenCV for computer vision processing, I could also start experimenting with parallel programming on a GPU. The book CUDA By Example by Sanders and Kandrot has been very useful in learning how to program the cores, I imagine I will be talking more about learning how to program a GPU in future posts. The TK1 has OpenCV libraries available for it that take advantage of the CUDA cores and can do quite rapid image processing

The TK1 also has a quad core ARM Cortex processor for its main CPU, GPIO, I2C, SPI, and a ton of other features. Not bad for $192, or just $1 per CUDA Core.

Here is a picture of the board along with the solid state hard drive I attached to it.




There are easy to use instructions to get started with the TK1 at https://developer.nvidia.com/get-started-jetson. These instructions include how to flash the operating system. The site http://www.elinux.org/Jetson_TK1 includes a lot of great information, including how to install the CUDA libraries and how to install the CUDA accelerated OpenCV. The only thing I recommend is a reasonably fast Internet connection as some of the system components are quite large, for example you get the entire Ubuntu 14.04 system image to flash onto the board in one download.

One thing I am very interested in is starting to add much more intelligence to the interactive spaces I want to build.  I don't remember exactly how I found it, but I recently discovered the Caffe Deep Learning Framework open source project. You can get a lot of information about it at its website at http://caffe.berkeleyvision.org/. Deep learning is a machine learning technique that learns how to extract representations of data and then recognize pattern in those representations. These representations help recognize things like faces, or, as I saw in an ACM article today, solving the Cocktail Party problem. This problem is something many people are familiar with, you are at a party, or some other place with lots of people, and you can tune out everyone else's voices and hear just one conversation. This has been a very hard problem for computers to solve, but deep learning systems have made it possible.

Caffe supplies a general framework for deep learning. It comes with a series of models for doing object recognition in images and I hope to find it useful in a variety of other deep learning tasks.

When I installed CUDA on my TK1 I got version 6.5. The standard installs in the instructions from the above web sites gave be version 4.8 of the Gnu C and C++ compiler.

To install Caffe I first installed the following dependencies:

sudo apt-get install libprotobuf-dev protobuf-compiler gfortran \
libboost-dev cmake libleveldb-dev libsnappy-dev \
libboost-thread-dev libboost-system-dev \
libatlas-base-dev libhdf5-serial-dev libgflags-dev \
libgoogle-glog-dev liblmdb-dev


Next I installed the Caffe source. If you don't have it already, first install git.

sudo apt-get install -y git

After git is installed, create a directory where you want to install Caffe.

git clone https://github.com/BVLC/caffe.git
cd caffe

git checkout dev
cp Makefile.config.example Makefile.config


Now you can build Caffe.

make -j 8 all

Once the build is complete, it is a good idea to run the tests and make sure your install worked. This worked for me the first time, so it seems hard to screw it up.

make -j 8 runtest


If all the tests pass, you can then run the timing tests.

build/tools/caffe time --model=models/bvlc_alexnet/deploy.prototxt --gpu=0

The -gpu=0 at the end tells the code to run the tests on the GPU. The command

build/tools/caffe time --model=models/bvlc_alexnet/deploy.prototxt

will run it on the CPU of your computer, not the GPU.

 The test runs over 10 different versions of the image per pass, so to figure out how much time was taken for an image recognition per image, look at the 'Average Forward pass' time and divide by 10.

On the TK1, I found  approximately 25 msec per image on the GPU, while on the CPU I got 602 msec per image. Quite the difference.

Next I am trying to get the NVIDIA cuDNN library working with CAFFE. I have version 2 of the cuDNN library and apparently that means I need to follow special instructions to get Caffe to use it.

But once I get it compiled in, I will start figuring out how to create my own deep learning neural network models and will write about how that goes.


No comments:

Post a Comment

Note: Only a member of this blog may post a comment.