Sight Beyond Sight: Dave Grossman comments on Blob Analysis Library

This post is meant as an archive, particularly because it is useful and easier to refer to.
Taken from unstable hosting at osdir: http://osdir.com/ml/lib.opencv/2005-11/msg00200.html?rfp=dta

RE: Re: help for CVBLOBSLIB!: msg#00200
Subject:
RE: Re: help for CVBLOBSLIB!

RE: Re: help for CVBLOBSLIB!: msg#00200

Subject: 
RE: Re: help for CVBLOBSLIB!
By now I imagine that rickypetit understands the truth of the old adage that "no
good deed goes unpunished!" He deserves a lot of credit for converting my blob
analysis code into cvblobslib. The result has been many more users, and
therefore many more questions. So, I will try to help him out here by providing
some answers. However, these answers all relate to my original code, which was
deprecated about 2 years ago when cvblobslib was created. And I've never used
cvblobslib. So, my answers are probably somewhat obsolete... 
I don't have any additional documentation on the blob analysis algorithm. I
first saw this algorithm about 33 years ago in a presentation by Gerry Agin at
SRI International (then called Stanford Research Institute). I have been unable
to find it in any journal articles, so I implemented it by memory. The basic
idea is to raster scan, numbering any new regions that are encountered, but also
merging old regions when they prove to be connected on a lower row. The
computations for area, moments, and bounding box are very straightforward. The
computation of perimeter is very complicated. The way I implemented it was in
two passes. The first pass converts to run code. The second pass processes the
run codes for two consecutive rows. It looks at the starting and ending columns
of a region in each row. It also looks at their colors (B or W), whether or not
they were previously encountered, and whether a region is new or old, or a
bridge between two old regions. There are lots of possible states, but I deal
explicitly with the only states that are important. 
The x centroid Xbar is determined by adding the contribution for each row. In
each row, if the pixels of a region are at y1, y2, ..., yn, then just add up all
these values y1+y2+...+yn. (Or equivalently, n[y1+yn]/2.) After finishing all
the rows, you have to divide the accumulated value by the area A. 
The y centroid Ybar is determined as follows: For each row x, let the length of
the run be n. Then just add x*n for all rows. When finished, divide by the area
A. 
Perimeter is really complicated for 2 reasons: 
(1) A region may contain an interior region, so there is an interior perimeter
as well as an exterior perimeter. Since people usually want perimeter to be only
the exterior perimeter, the perimeter of the interior region has to be
subtracted at the end of the computation. 
(2) When analyzing a particular row, you can only compute the perimeter
contribution of the prior row. (This is different from area and centroid, where
the computation is for the current row.) I found the perimeter computation very
complicated, and I no longer remember how it works. The moment computation
accumulates X^2, XY, and Y^2. But you really want (X-Xbar)^2, (X-Xbar)*(Y-Ybar),
and (Y-Ybar)^2. So there is an adjustment at the end of the computation once the
centroid values Xbar and Ybar are known. 
// Case        1       2       3       4       5       6       7       8 
// LastRow |xxx    |xxxxoo |xxxxxxx|xxxxxxx|ooxxxxx|ooxxx  |ooxxxxx|    xxx| 
// ThisRow |    yyy|    yyy|  yyyy |  yyyyy|yyyyyyy|yyyyyyy|yyyy   |yyyy   | 
// Here o is optional 
At each stage, the algorithm is scanning through two consecutive rows, termed
LastRow and ThisRow. During this scan, it sees a region in each row. 
In Cases 1-4, the region in LastRow starts 2 or more columns before the region
in ThisRow. 
In Case 1, the region in LastRow ends one or more columns before the region in
ThisRow starts. Therefore, these regions are NOT connected. In Case 2, the
region in LastRow starts before the region in ThisRow, but the LastRow region
continues at least to the column just before the region in ThisRow starts. Or it
continues further, but it ends before the region in ThisRow ends. Therefore, if
these regions have the same color, then they are connected. 
(Note that I am using 6/2 connectivity, which means that at a corner like this 
01 
10 
the 0's are connected but the 1s are not connected. In other words, the
connectivity of a pixel is given by this mask: 
110 
1X1 
011 
i.e., pixel X is connected to the 6 pixels with a 1 but is not connected to the
2 pixels with a 0. This introduces a slight bias in favor of regions that slope
to the left, but I consider it preferable to using either 4-connectivity or
8-connectivity. 
*** I believe that the rickypetit cvblobslib generalizes this to allow the user
to choose the form of connectivity.) 
In Cases 3 and 4, the region in LastRow starts before the region in ThisRow. 
In Case 3, the region in LastRow continues beyond the region in ThisRow; in Case
4 they end at the same column. In Cases 5, 6, 7, the region in LastRow starts at
or after the column of the region in ThisRow. The distinction among these three
cases is which region ends first. 
In all of Cases 3 - 7, if the colors match then the regions are connected.
Finally, Case 8 has the region in ThisRow end one column before the region in
LastRow. 
These are ALL the possible cases. Depending on the case, the algorithm then
advances to the next region either in LastRow, or ThisRow, or both. When it
exhausts both rows, it then increments the row by one, so LastRow <- ThisRow,
and ThisRow is the next row encountered.
Sight Beyond Sight

Friday, February 22, 2008

Dave Grossman comments on Blob Analysis Library

No comments:

People

Lightdraw Videos

References

Inspiration

Links

Urban Digital Artists for Freedom

C++ Cheat Sheet

Archives

To Do List