Friday, April 9, 2010

Candle Seeking Vision Software

Pokey, the firefighting robot, absolutely must find the candle this time!  Not like last time when he completely ignored the candle right in front of him. (sigh)

While waiting for a Game Boy camera to show up in my mailbox, I figured I better see how hard it would be to cook up some code that could reliably detect a candle at various distances.

So the next proverbial bite of the elephant was to do some code prototyping in an environment that's comfortable and easy.  To wit, C in Cygwin on my PC (yes despite all my posts referencing Macintosh, and a house full of them, I have--and use--a PC, too, because it was faster than my G4/450 and it cost $5).

Simulating Pictures
The Game Boy camera outputs 128 x 123 pixel, 8-bit grayscale images.  To simulate contest scenarios, I shot pics with my DSLR of a candle in various spots around the room, uploaded them, then batch converted the images using Irfanview to approximately 128x123 pixels, 8-bit greyscale, and saved as an easy-to-work-with Windows BMP (bitmap) file:

Greyscale 200x123 bitmap of candle

Reading a Bitmap File
Then I coded up a simple C program to reprint the BMP as ASCII art, to verify that I can access each and every bit and it's brightness value.  Of course, the aspect ratio is a little skewed but... clearly the program works!  (Click on the image for a much larger, clearer, and hopefully brighter version if you're skeptical).  I will hereby confess that my C skills were pretty rusty.  How could I forget the proper way to malloc() a char ** type??  That's just sad.  Perl has made me soft and weak...

Converted to ASCII art

Notice in the detail shot below, that the candle flame is, in fact, the brightest thing in the picture, represented by the character X (assigned to any pixel with a value greater than 240 out of 255); the next brightest thing is indicated by the character +, like the white candle, itself. Clearly the flame is the brightest thing in the picture. Cool!

Detail of candle; flame is brightest

So that tells me there is actually some hope of detecting bright spots in a snapshot image.  I didn't use any IR filtering, which "should" improve things even more by eliminating most everything in the image except the flame or reflected IR.

Some Difficult Scenarios
This test photo above represents an easy scenario.  I'll need to anticipate the possibility of multiple bright spots of different sizes: sun shining on a wall, or the reflection of the flame on the wall behind it.  The algorithm will have to key in on the brightest spots that are the size and/or proportions of a candle flame.

Candle flame and distant, sunlit door

If that happens, the robot will have to somehow evaluate each candidate candle flame. Maybe with other sensors, maybe by going up closer and taking another 'look'. The robot also has to be able to recognize a flame despite sizes varying in size, whether because of distance, drafts, length of candle wick, type of candle, or whatever the cause.

Candle flame and reflection off of HP LaserJet

Some Experiments
Now that I had the "lab" set up, it was time to experiment with some statistical analysis, perhaps try out some published algorithms for finding bright spots, or whatever else came to mind.

First, I plotted a histogram for each of the images. Roughly speaking, the bright pixels accounted for a pretty small percentage of the intensities represented in the images. My thinking is that histogram statistics might help to yield an optimal exposure so there's more work to do with that.  I'd rather wait on that until I have a better sense of what the camera sensor can do.

Next, I tried simply projecting (summing) the bright spots vertically and horizontally. In the case of one bright candle object, this approach would yield a quick way to identify a bounding box around the object.

Prototyping Flood-Fill
Then I decided to play around with multiple object detection. After some research, the flood-fill algorithm caught my fancy.  It was simple enough to play with and hopefully could be efficient enough to support analysis of multiple objects at a reasonable frame rate (10-30fps). Here's what I did.

The image from the camera will be coming in serially. Likewise, my simple C program reads the bitmap pixels sequentially.

Scenario 1
A two-dimensional array of unsigned integers represents each pixel's object assignment. (Inefficient, but quick to prototype). When the code encounters the first bright pixel (above a set threshold) after one or more dark pixels, it assigns that pixel to the next available object number (essentially, object_mask_array[x][y] = nextavailableobj). All subsequent, contiguous bright pixels are assigned that same object number. Like this.

.. .. 01 01 01 .. .. 02 02 .. XX XX XX

The ".." is a dark pixel. The XX hasn't been processed yet. Two objects identified so far, and the final three pixels will be assigned to 03.

Scenario 2
That's the simple scenario. But if there's a bright pixel above the current bright pixel, the two are contiguous.  So whatever object was previously assigned to the pixel above should be assigned to the current one.  The simplest scenario follows.

.. .. 01 01 01 .. .. 02 02 .. 03 03 03
.. .. .. XX XX .. .. .. .. .. .. .. ..

When the first XX is encountered, it is contiguous to the pixel above, assigned to 01.  So the current pixel is assigned to 01 also, as well as all subsequent, contiguous bright pixels, like this:

.. .. 01 01 01 .. .. 02 02 .. 03 03 03
.. .. .. 01 01 .. .. .. .. .. .. .. ..

Scenario 3
If the above pixels 'start' before the bottom set of pixels do, it's easy. A harder scenario, below, occurs when one's already assigned an object to a row of pixels only to discover part way through that the line is contiguous with an object above.

.. .. 01 01 01 .. .. 02 02 .. 03 03 03
04 04 XX XX XX .. .. .. .. .. .. .. ..

The current pixel (leftmost XX) is contiguous with 01 above, but we've already assigned 04 to this object. Since I was only prototyping, my inefficient solution was simply to stop where I was and re-do the prior pixels.

.. .. 01 01 01 .. .. 02 02 .. 03 03 03
01 01 XX XX XX .. .. .. .. .. .. .. ..

And then I could continue assigning subsequent pixels to the 01 object.

.. .. 01 01 01 .. .. 02 02 .. 03 03 03
01 01 01 01 01 .. .. .. .. .. .. .. ..

Scenario 4
The hardest scenario, which I didn't address in my prototype code, was that of a pair of bunny ears. In other words, the object has two lumps at the top that are not contiguous themselves, but a subsequent row ties them both together. One has to go back and redo the object above.  Like this.

.. .. 01 01 01 .. .. 02 02 .. 03 03 03
01 01 01 01 01 01 01 XX XX .. .. .. ..

The 02 object has to be reassigned to the 01 object.  If it's just one row that isn't even all that hard.  But what if it's several rows.  And what if some of those rows 'start' earlier than the ones below?  You can easily come up with additional tricky situations.

.. .. .. .. .. .. .. 01 01 .. .. .. ..
.. .. .. .. .. .. 01 01 01 .. .. .. ..
.. .. 02 02 02 .. .. 01 01 .. 03 03 03
02 02 02 02 02 02 02 XX XX .. .. .. ..

This complexity is an artifact of processing pixels on the fly -- versus reading everything first, and processing after.  I wanted to see if the former approach was even possible in case the vision system turns out to be memory constrained.

Flood Fill Results
Once again this was just a proof-of-concept to see if there was any chance in the world that I might be able to identify separate bright objects in an image and the experiments successfully showed that it is possible even with a relatively simple algorithm.

Of course to do this 'for real' the algorithm would then have to keep track of the bounding box coordinates for each object and eventually some code would have to determine which objects were likely to be candle flames. All in due time.

A Difficult Scenario

At least for now I can take a pretty tough scenario like the above, with a candle in front of a sunlit door, and identify that the candle and the swath of sunlight are separate objects.  Click on the text image to see that the swath of light is assigned to object 05 and the candle flame is assigned object 03.

The Algorithm Works!

My astute readers will no doubt notice the lower left part of the swath of light is assigned to object 01.  The algorithm processes the bitmap pixels upside down, the order in which they're stored in the file. So it runs into the bunny ears scenario (4 above) and ends up assigning the second bunny ear to 05 then assigns the line connecting 01 and 05, and all subsequent lines, to object 05, leaving a the first bunny ear still assigned to object 01.

Bounding Box
Writing code to calculate the bounding box of each object was pretty straightforward.  The hard stuff was already completed (above).  A "C" struct represents an object and contains an "exists" flag to indicate if the object has been created or deleted, as well as bounding box coordinates for top, bottom, left and right.

One simple function adds a pixel to an object: if the pixel lies outside the bounding box, the box's coordinates are changed to encompass the new pixel.

A function to delete an object is called when encountering scenario 3 above. Pixels that were originally assigned to a new object are later discovered to be connected to a second object. This new object can be discarded because all of its pixels have to be reassigned to the second object.

Finally, a print function displays info about each object, including calculating size, aspect ratio and midpoint, and then printing out the bitmap within the bounding box.  Here's the results from the simple test image:

-------- Candle006.bmp --------
Width: 185 Height: 123 Planes: 1
BitCount: 8 Colors: 256 SizeImage: 23124

Object 02
  Box: (97, 66) (100, 60)
  Size: (4, 7)
  Ratio: 57%
  Mid: (99, 63)

....02..
..0202..
020202..
..0202..
..0202..
02020202
02020202

Recall that the y coordinates are upside down due to the BMP file format.The midpoint coordinates are for pointing the robot at the flame.  The width-to-height proportion may help filter out non-flame objects.  From here, I can add any other info or calculations that are needed, like average intensity within the bounding box.

Also, I could add pixel coordinates to each object struct to enable recreation of the original image or the bright/dark processed image without having to store the entire bitmap in memory.

Whee!
Maybe it seems silly but I'm incredibly excited that I got all this working. The vision approach is starting to look pretty hopeful...

...notwithstanding the mountain of electronics interfacing work yet to do...

No comments:

Post a Comment