7/16/2013

SLAM (Simultaneous Localization And Mapping) and the AUTONOMOUS ROBOT

Simultaneous Localization And Mapping (SLAM) is a technique used by robots and autonomous vehicles to build up a map within an unknown environment (without a priori knowledge), or to update a map within a known environment (with a priori knowledge from a given map), while at the same time keeping track of their current location.

Operational definition

Maps are used to determine a location within an environment and to depict an environment for planning and navigation; they support the assessment of actual location by recording information obtained from a form of perception and comparing it to a current set of perceptions. The benefit of a map in aiding the assessment of a location increases as the precision and quality of the current perceptions decrease. Maps generally represent the state at the time that the map is drawn; this is not necessarily consistent with the state of the environment at the time the map is used.

The complexity of the technical processes of locating and mapping under conditions of errors and noise do not allow for a coherent solution of both tasks. Simultaneous localization and mapping (SLAM) is a concept that binds these processes in a loop and therefore supports the continuity of both aspects in separated processes; iterative feedback from one process to the other enhances the results of both consecutive steps.

Mapping is the problem of integrating the information gathered by a set of sensors into a consistent model and depicting that information as a given representation. It can be described by the first characteristic question, What does the world look like? Central aspects in mapping are the representation of the environment and the interpretation of sensor data.

In contrast to this, localization is the problem of estimating the place (and pose) of the robot relative to a map; in other words, the robot has to answer the second characteristic question, Where am I? Typically, solutions comprise tracking, where the initial place of the robot is known, and global localization, in which no or just some a priori knowledge of the environmental characteristics of the starting position is given.

SLAM is therefore defined as the problem of building a model leading to a new map, or repetitively improving an existing map, while at the same time localizing the robot within that map. In practice, the answers to the two characteristic questions cannot be delivered independently of each other.

Before a robot can contribute to answering the question of what the environment looks like, given a set of observations, it needs to know e.g.:

the robot's own kinematics,
which qualities the autonomous acquisition of information has, and,
from which sources additional supporting observations have been made.

It is a complex task to estimate the robot's current location without a map or without a directional reference.^[1] "Location" may refer to simply the position of the robot or might also include its orientation.

Technical problems

SLAM can be thought of as a chicken or egg problem: An unbiased map is needed for localization while an accurate pose estimate is needed to build that map. This is the starting condition for iterative mathematical solution strategies.

Beyond, the answering of the two characteristic questions is not as straightforward as it might sound due to inherent uncertainties in discerning the robot's relative movement from its various sensors. Generally, due to the budget of noise in a technical environment, SLAM is not served with just compact solutions, but with a bunch of physical concepts contributing to results.

If at the next iteration of map building the measured distance and direction traveled has a budget of inaccuracies, driven by limited inherent precision of sensors and additional ambient noise, then any features being added to the map will contain corresponding errors. Over time and motion, locating and mapping errors build cumulatively, grossly distorting the map and therefore the robot's ability to determine its actual location and heading with sufficient accuracy.

There are various techniques to compensate for errors, such as recognizing features that it has come across previously (i.e., data association or loop closure detection), and re-skewing recent parts of the map to make sure the two instances of that feature become one. Statistical techniques used in SLAM include Kalman filters, particle filters (aka. Monte Carlo methods) and scan matching of range data. They provide an estimation of the posterior probability function for the pose of the robot and for the parameters of the map. Set-membership techniques are mainly based on interval constraint propagation ^[2] .^[3] They provide a set which encloses the pose of the robot and a set approximation of the map.

Mapping

SLAM in the mobile robotics community generally refers to the process of creating geometrically consistent maps of the environment. Topological maps are a method of environment representation which capture the connectivity (i.e., topology) of the environment rather than creating a geometrically accurate map. As a result, algorithms that create topological maps are not referred to as SLAM.

SLAM is tailored to the available resources, hence not aimed at perfection, but at operational compliance. The published approaches are employed in unmanned aerial vehicles, autonomous underwater vehicles, planetary rovers, newly emergingdomestic robots and even inside the human body.^[4]

It is generally considered that "solving" the SLAM problem has been one of the notable achievements of the robotics research in the past decades.^[5] The related problems of data association and computational complexity are among the problems yet to be fully resolved.

A significant recent advance in the feature based SLAM literature involved the re-examination of the probabilistic foundation for Simultaneous Localisation and Mapping (SLAM) where it was posed in terms of multi-object Bayesian filtering with random finite sets that provide superior performance to leading feature-based SLAM algorithms in challenging measurement scenarios with high false alarm rates and high missed detection rates without the need for data association.^[6]

Sensing

SLAM will always use several different types of sensors to acquire data with statistically independent errors. Statistical independence is the mandatory requirement to cope with metric bias and with noise in measures.

Such optical sensors may be one dimensional (single beam) or 2D- (sweeping) laser rangefinders, 3D Flash LIDAR, 2D or 3D sonar sensors and one or more 2D cameras. Since 2005, there has been intense research into VSLAM (visual SLAM) using primarily visual (camera) sensors, because of the increasing ubiquity of cameras such as those in mobile devices.^[7]

Recent approaches apply quasi-optical wireless ranging for multi-lateration (RTLS) or multi-angulation in conjunction with SLAM as a tribute to erratic wireless measures.

A special kind of SLAM for human pedestrians uses a shoe mounted inertial measurement unit as the main sensor and relies on the fact that pedestrians are able to avoid walls. This approach called FootSLAM can be used to automatically build floor plans of buildings that can then be used by an indoor positioning system.^[8]

Locating

The results from sensing will feed the algorithms for locating. According to propositions of geometry, any sensing must include at least one lateration and (n+1) determining equations for an n-dimensional problem. In addition, there must be some additional a priori knowledge about orienting the results versus absolute or relative systems of coordinates with rotation and mirroring.

Modeling

Contribution to mapping may work in 2D modeling and respective representation or in 3D modeling and 2D projective representation as well. As a part of the model, the kinematics of the robot is included, to improve estimates of sensing under conditions of inherent and ambient noise. The dynamic model balances the contributions from various sensors, various partial error models and finally comprises in a sharp virtual depiction as a map with the location and heading of the robot as some cloud of probability. Mapping is the final depicting of such model, the map is either such depiction or the abstract term for the model.

BoofCV

BoofCV is an open source Java library for real-time computer vision and robotics applications. Written from scratch for ease of use and high performance, it often outperforms even native libraries. Functionality includes optimized low-level image processing routines, feature tracking, and geometric computer vision. BoofCV has been released under an Apache license for both academic and commercial use.

BoofCV is organized into several packages: image processing, features, geometric vision, calibration, visualize, and IO. Image processingcontains commonly used image processing functions which operate directly on pixels. Features contains feature extraction algorithms for use in higher level operations. Calibration has routines for determining the camera's intrinsic and extrinsic parameters. Geometric vision is composed of routines for processing extracted image features using 2D and 3D geometry. Visualize has routines for rendering and displaying extracted features. 'IO stands for input/output and contains common routines for reading in images from various input sources.

Open SLAM

The simultaneous localization and mapping (SLAM) problem has been intensively studied in the robotics community in the past. Different techniques have been proposed but only a few of them are available as implementations to the community. The goal of OpenSLAM.org is to provide a platform for SLAM researchers which gives them the possibility to publish their algorithms. OpenSLAM.org provides to every interested SLAM researcher a subversion (svn) repository and a small webpage in order to publish and promote their work. In the repository, only the authors have full access to the files; other users are restricted to read-only access. OpenSLAM.org does not really aim to provide a repository for the daily development process of early SLAM implementations. Published algorithm should have a certain degree of robustness.

OpenSLAM.org does not force the authors to give away the copyright for their code. We only require that the algorithms are provided as source code and that the authors allow the users to use and modify the source code for their own research. Any commercial application, redistribution, etc has to be arranged between users and authors individually.

VIDEO FEEDS OF SLAM

6/26/2013

Sidewalk Video

Original Video	“Difference Image” Video - shows motion	Result Video
sidewalk.avi	sidewalk_absdiff.avi	sidewalk_output.avi

(yes, the “Difference Image” video may run fast (29.97FPS), but this has been corrected via dynamic FPS matching of source to destination video. I leave this video here only for illustration, as it is, of course, half the size of the other video).

Theory

Mask out the sidewalk region of the image.
For each frame, take a difference image against the starting “background” scene.
When a person enters the scene from the left or right hand side of the screen, it will break the background. When the background becomes again visible “behind” the person (e.g. the person is bounded and is at least 1 pixel “on-screen”), then the person is completely represented.
For the extra credit (differentiating between people and the motorcycle rider): people are limited to two main speeds: “walk” and “run”. Anything faster than this difference in position between frames must be a “ride” action.

Messing Around (Feel free to grab the modified code)

I decided to see what would happen using the HMI example as an overlay. Now, although I suppose I could just grab this wholesale, I’ll stick with what I’ve got and go from there (ROI, HMI detection, figure detection).

(Although I’ll tell you what I will use: the realization that a center-of-mass measurement can be taken over time via HMI. And by its motion, I ought to be able to tell if it’s moving at a “riding”, “running”, or “walking” speed. Also, I can maybe get an area (e.g. BWAREA) and make blobs over a certain size “cars” and those under the threshold “people”. And “people” moving at a “riding” speed will yield “motorcycle”.)

So goes the theory, anyway.

OpenCV Code that Counts People Walking on the Sidewalk

Note: I tried to use contours and then do SeqPops off the sequence stack, but I just couldn’t get contours to work. So here we have a wildly inaccurate algorithm which finds when there is motion within the scene and then plots a “tracer” accordingly.

/*
 Author:  Chris Pilson, cpilson@iastate.edu
 Program Name: hw3_sidewalk
 Description: This program will read a visual scene (devoid of motion at first)
     and then use a region within this sene to detect people walking or 
     running through the scene.  The people will be counted, and this 
     count presented on-screen.
 High-Level Analysis:
     Tasks:
      DONE - (1) Use background scene as a "baseline"; display rest of video
      against the difference image created by ABSDIFFing(currentFrame,baseFrame)
      to get movement through the video.
      (2) Define a region of interest that covers the sidewalk; just look
      within this ROI poly to find motion.
      (3) For each blob in the difference image (e.g. each thing "different"
      from the baseline frame), see if it fits with the size and speed data
      that would be a:
       [optional] Vehicle (car/bus)
       Motorcyce/Bike (human, but moving quickly)
       Humanoid (running)
       Humanoid (walking)
      (4) Keep a running count of each category
*/
// MSVC++ .NET include.
#include "stdafx.h" 
 
// C++ includes.
#include 
#include 
#include 
#include 
 
// OpenCV includes.
#include 
#include 
 
#define DEBUG 1
 
// Set up the AVI Writer object
typedef struct CvAVIWriter CvAVIWriter;
 
int main(int argc, char* argv[])
{
  // STEP 1:
 // Bring the video file (AVI) in.
 CvCapture* VideoFile = cvCaptureFromFile("sidewalk.avi");
 if (VideoFile == NULL)
 {
  std::cout << "Uh-oh.  Either the input file doesn't exist, or OpenCV cannot read it." << std::endl;
  return 1;
 }
 
 // Now let's set up the frame size so that we can vomit out a video...
 CvSize frame_size;
 frame_size.height = cvGetCaptureProperty(VideoFile, CV_CAP_PROP_FRAME_HEIGHT);
 frame_size.width = cvGetCaptureProperty(VideoFile, CV_CAP_PROP_FRAME_WIDTH);
 // We'll go ahead and say that the AVI file is loaded now:
 if(DEBUG)
 {std::cout << "Brought in AVI file." << std::endl;}
 
 // Figure out what our incoming movie file looks like
 double FPS = cvGetCaptureProperty(VideoFile, CV_CAP_PROP_FPS);
 double FOURCC = cvGetCaptureProperty(VideoFile, CV_CAP_PROP_FOURCC);
 if(DEBUG)
 {
  std::cout << "FPS:  " << FPS << std::endl;
  std::cout << "FOURCC:  " << FOURCC << std::endl;
 }
 
 // Create a CvVideoWriter.  The arguments are the name of the output file (must be .avi), 
 // a macro for a four-character video codec installed on your system, the desired frame 
 // rate of the video, and the video dimensions.
 CvVideoWriter* videoWriter = cvCreateVideoWriter("sidewalk_output.avi",CV_FOURCC('D', 'I', 'V', 'X'), FPS, cvSize(frame_size.width, frame_size.height));
 // Now we can say that the VideoWriter is created:
 if(DEBUG)
 {std::cout << "videoWriter is made." << std::endl;}
 
 // Make display windows
 cvNamedWindow("background Frame", CV_WINDOW_AUTOSIZE);
 cvNamedWindow("current Frame", CV_WINDOW_AUTOSIZE);
 cvNamedWindow("diff Frame", CV_WINDOW_AUTOSIZE);
 cvNamedWindow("output Frame", CV_WINDOW_AUTOSIZE);
 cvNamedWindow("ROI Frame", CV_WINDOW_AUTOSIZE);
 cvNamedWindow("ROI Frame (Color)", CV_WINDOW_AUTOSIZE);
 
 // Keep track of frames
 static int imageCount = 0;
 
 // Set up images.
 IplImage* diffFrame = cvCreateImage(cvSize(frame_size.width, frame_size.height), IPL_DEPTH_8U, 1);
 IplImage* backgroundFrame, *eig_image, *temp_image;
 IplImage* currentFrame = cvCreateImage(cvSize(frame_size.width, frame_size.height), IPL_DEPTH_8U, 1);
 IplImage* outFrame = cvCreateImage(cvSize(frame_size.width, frame_size.height), IPL_DEPTH_8U, 3);
 IplImage* tempFrameBGR = cvCreateImage(cvSize(frame_size.width, frame_size.height), IPL_DEPTH_8U, 3);
 IplImage* ROIFrame = cvCreateImage(cvSize((265-72), (214-148)), IPL_DEPTH_8U, 1);
 IplImage* ROIFrame2 = cvCreateImage(cvSize((265-72), (214-148)), IPL_DEPTH_8U, 1);
 IplImage* ROIFrameBGR = cvCreateImage(cvSize((265-72), (214-148)), IPL_DEPTH_8U, 3);
 IplImage* ROIFrameBGRPrior = cvCreateImage(cvSize((265-72), (214-148)), IPL_DEPTH_8U, 3);
 
 // And now set up the data for MinMaxLoc (for ROI image)
 double minVal, maxVal;
 CvPoint minLoc, maxLoc, outPoint;
 
 // Initialize our contour information...
 int contours=0;
 CvMemStorage* storage = cvCreateMemStorage(0);
 CvSeq** firstContour;
 int headerSize;
 CvSeq* contour = 0;
 int color = (0, 0, 0);
 CvContourScanner ContourScanner=0;
 
 // Zero out the people-counting image...
 cvZero(ROIFrameBGR);
 
 int people=0;
 int MOVEMENT=0;
 
 // There's gotta be a better way to do this... like with threading?
 while(1)
 {
  // Let's try to threshold this at 15FPS - the input rate.
  // 66 is used as it's 1/15 * 1000...  
  // Wait a second!  I have the FPS here.  *sigh*  Let's do this dynamically:
  //Sleep((1000/FPS)-10);
  // Awesome.  The video runs in actual time now, after subtracting out the 10ms from WaitKey().
 
  IplImage* tempFrame = cvQueryFrame(VideoFile);
  // If the video HAS a current frame...
  if (tempFrame != NULL)
  {
   // The video is BGR-space.  I wish there were a cvGetColorSpace command or something...
   cvCvtColor(tempFrame, currentFrame, CV_BGR2GRAY);
   // Grrr ... flipped.
   cvFlip(currentFrame);
   // Get initial "background" image...
   if (imageCount==0)
   {
    //IplImage* backgroundFrame = cvCloneImage(currentFrame);
    backgroundFrame = cvCloneImage(currentFrame);
   }
   cvShowImage("background Frame", backgroundFrame);
   cvShowImage("current Frame", currentFrame);
   cvAbsDiff(currentFrame,backgroundFrame,diffFrame);
   if(DEBUG)
   {std::cout << "Pulled in video grab of frame " << imageCount << "." << std::endl;}
 
   // Back to color ...
   cvCvtColor(diffFrame, outFrame, CV_GRAY2BGR);
   // Now let's go ahead and put up a box (rect, actually) for our ROI.
   // (72, 148)+-----------------------+(265, 148)
   //   |      |
   // (72, 214)+-----------------------+(265, 214)
   //MotionRegion cvRect(72, 148, (265-72), (214-148));
   //cvRectangle(outFrame, cvPoint(72, 148), cvPoint(265, 214), CV_RGB(255, 0, 255), 1);
   cvRectangle(diffFrame, cvPoint(72, 148), cvPoint(265, 214), CV_RGB(255, 0, 255), 1);
   cvShowImage("diff Frame", diffFrame);
   cvCvtColor(backgroundFrame, tempFrameBGR, CV_GRAY2BGR);
   cvFlip(tempFrame);
   // ROIFrame is BW.
   ROIFrame = cvCloneImage(outFrame);
   cvSetImageROI(ROIFrame, cvRect(72, 148, (265-72), (214-148)));
   //cvOr(outFrame, tempFrame, outFrame);
   cvShowImage("ROI Frame", ROIFrame);
   // Great.  The ROI Frame works, almost as an "inset".
   // Now let's find when motion exists within the ROI.
   // First:  the cumbersome way...
   
   cvSetImageCOI(ROIFrame, 1);
   cvMinMaxLoc(ROIFrame, &minVal, &maxVal, &minLoc, &maxLoc, NULL);
   if (maxVal < 100)
   {
    // Zero out the LAST people-counting image...
    cvZero(ROIFrameBGRPrior);
    MOVEMENT=0;
   }
   if(maxVal > 100)
   {
    cvSetImageCOI(ROIFrameBGRPrior, 1);
    // We are starting a motion sequence...
    if( (MOVEMENT==0) && (cvCountNonZero(ROIFrameBGRPrior)==0) )
    {
     // Zero out the people-counting image...
     cvZero(ROIFrameBGR);
     MOVEMENT=1;
     people++;
     if(DEBUG)
     {std::cout << "ROI has counted " << people << " people." << std::endl;}
    }
 
    if(DEBUG)
    {std::cout << "We have motion in the ROI!  maxVal: " << maxVal << " minVal: " << minVal << std::endl;}
    // Phew.  Okay, we can figure out when there's motion within the ROI.  Good.
    // Now let's see what we can do with contours.
    //contours = cvFindContours(ROIFrame, storage, firstContour, headerSize=sizeof(CvContour), CV_RETR_LIST, CV_CHAIN_APPROX_SIMPLE);
    //contours = cvFindContours(ROIFrame, storage, &contour, sizeof(CvContour), CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE);
    // Bah.  Couldn't do anything with them.  :(
    // Let's instead try to put a dot on people that are moving...
    //cvCircle( CvArr* img, CvPoint center, int radius, double color, int thickness=1 )
    cvCircle(ROIFrameBGR, maxLoc, 1, CV_RGB(255, 0, 0), 1);
    ROIFrameBGRPrior = cvCloneImage(ROIFrameBGR);
   }
   cvShowImage("ROI Frame (Color)", ROIFrameBGR);
   /*
   // Now:  a better way - we'll know there's motion if contours>0.
   ROIFrameBGR = cvCloneImage(ROIFrame);
   cvCvtColor(ROIFrame, ROIFrameBGR, CV_GRAY2BGR);
   cvSetImageCOI(ROIFrame, 1);
 
   contours = cvFindContours(ROIFrame, storage, &contour, sizeof(CvContour), CV_RETR_CCOMP, CV_CHAIN_APPROX_SIMPLE);
   if (contours > 0)
   {
    if(DEBUG)
    {std::cout << "We have motion in the ROI!" << std::endl;}
   }
   
   // Draw out the contours
   for( ; contour != 0; contour = contour->h_next )
   {
    // replace CV_FILLED with 1 to see the outlines
    cvDrawContours( ROIFrameBGR, contour, CV_RGB( rand(), rand(), rand() ), CV_RGB( rand(), rand(), rand() ), -1, CV_FILLED, 8 );
   }
   cvShowImage("ROI Frame (Color)", ROIFrameBGR);
   */
 
   // Write the current frame to an output movie.
   //cvWriteFrame(videoWriter, diffFrame);
   // Build up the output ...
   cvOr(outFrame, tempFrame, outFrame);
   // ... and draw the ROI rectangle.
   cvRectangle(outFrame, cvPoint(72, 148), cvPoint(265, 214), CV_RGB(255, 0, 255), 1);
   char peopleCount[32];
   if (people==1)
   {sprintf(peopleCount, "%d person", people);}
   else if ( (people < 1) || (people > 1) )
   {sprintf(peopleCount, "%d people", people);}
   CvFont font;
   cvInitFont(&font, CV_FONT_HERSHEY_SIMPLEX, 0.8, 0.8, 0, 2);
   cvPutText(outFrame, peopleCount, cvPoint(0, 25), &font, cvScalar(0, 0, 300));
   cvShowImage("output Frame", outFrame);
   cvWriteFrame(videoWriter, outFrame);
   if(DEBUG)
   {std::cout << "Wrote frame to output AVI file." << std::endl;}
   imageCount++;
   } // end if (image != NULL) loop
  
  // This will return the code of the pressed key or -1 if
  // nothing was pressed before 10 ms elapsed.
  int keyCode = cvWaitKey(10);
  if ( (keyCode == 's') || (keyCode == 'S') )
  {
   while(1)
   {
    keyCode = cvWaitKey(10);
    if ( (keyCode == 's') || (keyCode == 'S') )
    {
     keyCode = 999;
     break;
    }
   }
  }
 
  // But the video may have ended...
  if( (tempFrame == NULL) || (keyCode >= 0) && (keyCode != 999) )
  {
   // Either the video is over or a key was pressed.
   // Dump the video file.
   cvReleaseCapture(&VideoFile);
   // Release the videoWriter from memory.
   cvReleaseVideoWriter(&videoWriter);
   // Release images from memory...
   cvReleaseImage(&currentFrame);
   //cvReleaseImage(&diffFrame);
   // ... And destroy the windows.
   cvDestroyWindow("Video Frame");
   std::cout << "Released VideoFile and VideoWriter." << std::endl;
   return 0;
   exit(0);
  }
 }// end while loop
 return 0;
}

Hands Video

Original Video	Output Video
hands.avi	hands_output.avi

Theory

The 3 items are solidly colored and likely have “hard” edges.
Each item has a shadow below it.
Each item has a predominance of color associated with it - it’s not a completely single-colored object, but it’s “good enough”. I’ll likely want a Sobel edge detector on this.
One item is “counted” when it becomes occluded by a flesh-colored object (the hand) that rests on the object for a thresholded period of time (0.5 second?). What this means is that the hand enters a ROI (e.g. the “book” area) and then leaves. Upon entry, a “count” is registered.
We can tell which item was occluded when, in a frame, the object’s area is less than it’s baseline area.

How Many Times Was Each Book Touched? Code in OpenCV

Notes: I have an issue with pointers right now. Aside from this, I believe the code to be largely working. New Code; largely working. Concessions on the “multi-add problem” are made inline. e.g. I need a skin detector here.

/*
 Author:  Chris Pilson, cpilson@iastate.edu
 Program Name: hw3_hands
 Description: This program will read a visual scene and then 
     use a region within this sene to detect when an 
     onscreen object is being touched (occluded).  The 
     object will be counted, and this count presented 
     on-screen.
*/
// MSVC++ .NET include.
#include "stdafx.h" 
 
// C++ includes.
#include 
#include 
#include 
#include 
#include 
 
// OpenCV includes.
#include 
#include 
 
// Do we want EXTREMELY verbose CLI output?
#define DEBUG 0
 
// Set up the AVI Writer object
typedef struct CvAVIWriter CvAVIWriter;
 
// Set up our 3 rectangles
  // BLUE BOOK:
  // (65, 62) +-----------------------+(80, 62)
  //   |      |
  // (65, 67) +-----------------------+(80, 67)
  // RED BOOK:
  // (47, 135)+-----------------------+(55, 135)
  //   |      |
  // (47, 140)+-----------------------+(55, 140)
  // YELLOW BOOK:
  // (148, 185)+----------------------+(153, 185)
  //    |      |
  // (148, 190)+----------------------+(153, 190)
#define BLUE_RECTANGLE cvRectangle(outFrame, cvPoint(65,62), cvPoint(80,67), CV_RGB(0, 0, 300), 1);
#define BLUE_RECTANGLE_FILLED cvRectangle(outFrame, cvPoint(65,62), cvPoint(80,67), CV_RGB(0, 0, 300), CV_FILLED);
int BLUE_RECT_WIDTH = (80-65);
int BLUE_RECT_HEIGHT = (67-62);
#define RED_RECTANGLE cvRectangle(outFrame, cvPoint(47, 135), cvPoint(55, 140), CV_RGB(300, 0, 0), 1);
#define RED_RECTANGLE_FILLED cvRectangle(outFrame, cvPoint(47, 135), cvPoint(55, 140), CV_RGB(300, 0, 0), CV_FILLED);
int RED_RECT_WIDTH = (55-47);
int RED_RECT_HEIGHT = (140-135);
#define YELLOW_RECTANGLE cvRectangle(outFrame, cvPoint(148, 185), cvPoint(153, 190), CV_RGB(255, 255, 128), 1);
#define YELLOW_RECTANGLE_FILLED cvRectangle(outFrame, cvPoint(148, 185), cvPoint(153, 190), CV_RGB(255, 255, 128), CV_FILLED);
int YELLOW_RECT_WIDTH = (153-148);
int YELLOW_RECT_HEIGHT = (190-185);
 
// Flag and count trackers...
// D'oh.  They're already globals.  *sigh*
static bool BLUEFLAG=false;
static bool REDFLAG=false;
static bool YELLOWFLAG=false;
static int BlueCount=0;
static int RedCount=0;
static int YellowCount=0;
 
 
// Function to detect when a hand is in the Blue Rectangle.  
// Returns BlueCount value (int), BLUEFLAG (bool - cast to int).
//std::string DetectBlue(IplImage* tempFrame, IplImage* outFrame, bool BLUEFLAG, int BlueCount)
int DetectBlue(IplImage* tempFrame, IplImage* outFrame, int BlueCount)
{
 uchar* temp_ptr;
 for (int i = 65; i < 65+BLUE_RECT_WIDTH; i++)
 {
  for (int j = 62; j < 62+BLUE_RECT_HEIGHT; j++)
  {
   // If the value at any (i,j) within the rectangles is flesh-colored, then set a flag...
   // This'll do if the Red value exceeds 150 at any pixel.
   temp_ptr = &((uchar*)(tempFrame->imageData + tempFrame->widthStep*j))[i*3];
   if (DEBUG)
   {std::cout << "Color values [(B, G, R)]: (" << (int)temp_ptr[0] << ", " << (int)temp_ptr[1] << ", " << (int)temp_ptr[2] << ")." << std::endl;}
   if ( (int)temp_ptr[2] > 200)
   {
    if (DEBUG)
    {std::cout << "BBBBBBBBBBBBBB Blue Book is being touched BBBBBBBBBBBB." << std::endl;}
    BLUE_RECTANGLE_FILLED;
    RED_RECTANGLE;
    //*REDFLAG=false;
    YELLOW_RECTANGLE;
    //*YELLOWFLAG=false;
    if (BLUEFLAG==false)
    {
     // Shove 'true' into BLUEFLAG and 'false' into other flags.
     BLUEFLAG = true;
     REDFLAG = false;
     YELLOWFLAG = false;
     BlueCount++;
     return(BlueCount);
     if (DEBUG)
     {std::cout << "--------- Blue Count: " << BlueCount << "." << std::endl;}
    }
   }
  }
 }
 return(BlueCount);
}
 
// Function to detect when a hand is in the Red Rectangle.  
// Returns RedCount value (int), REDFLAG (bool).
int DetectRed(IplImage* tempFrame, IplImage* outFrame, int RedCount)
{
 uchar* temp_ptr;
 for (int i = 47; i < 47+RED_RECT_WIDTH; i++)
 {
  for (int j = 135; j < 135+RED_RECT_HEIGHT; j++)
  {
   // If the value at any (i,j) within the rectangles is flesh-colored, then set a flag...
   // This'll do if the Blue value exceeds 100 at any pixel.
   temp_ptr = &((uchar*)(tempFrame->imageData + tempFrame->widthStep*j))[i*3];
   if (DEBUG)
   {std::cout << "Color values [(B, G, R)]: (" << (int)temp_ptr[0] << ", " << (int)temp_ptr[1] << ", " << (int)temp_ptr[2] << ")." << std::endl;}
   if ( (int)temp_ptr[0] > 100)
   {
    if (DEBUG)
    {std::cout << "RRRRRRRRRRRRR Red Book is being touched RRRRRRRRRRR." << std::endl;}
    BLUE_RECTANGLE;
    //FlagPointer = &BLUEFLAG;
    //*FlagPointer = false;
    RED_RECTANGLE_FILLED;
    YELLOW_RECTANGLE;
    //FlagPointer = &YELLOWFLAG;
    //*FlagPointer = false;
    if (REDFLAG==false)
    {
     // Shove 'true' into REDFLAG
     REDFLAG = true;
     BLUEFLAG=false;
     YELLOWFLAG=false;
     RedCount++;
     return(RedCount);
     if (DEBUG)
     {std::cout << "--------- Red Count: " << RedCount << "." << std::endl;}
    }
   }
  }
 }
 return(RedCount);
}
 
// Function to detect when a hand is in the Yellow Rectangle.  
// Returns YellowCount value (int), YELLOWFLAG (bool).
int DetectYellow(IplImage* tempFrame, IplImage* outFrame, int YellowCount)
{
 if(DEBUG)
 {std::cout << "YELLOWFLAG: " << YELLOWFLAG << std::endl;}
 
 uchar* temp_ptr;
 for (int i = 148; i < 148+YELLOW_RECT_WIDTH; i++)
 {
  for (int j = 185; j < 185+YELLOW_RECT_HEIGHT; j++)
  {
   // If the value at any (i,j) within the rectangles is flesh-colored, then set a flag...
   // This'll do if the Blue value exceeds 100 at any pixel.
   temp_ptr = &((uchar*)(tempFrame->imageData + tempFrame->widthStep*j))[i*3];
   if (DEBUG)
   {std::cout << "Color values [(B, G, R)]: (" << (int)temp_ptr[0] << ", " << (int)temp_ptr[1] << ", " << (int)temp_ptr[2] << ")." << std::endl;}
   if ( (int)temp_ptr[0] > 100)
   {
    if (DEBUG)
    {std::cout << "YYYYYYYYY Yellow Book is being touched YYYYYYYYYY." << std::endl;}
    BLUE_RECTANGLE;
    //*BLUEFLAG=false;
    RED_RECTANGLE;
    //*REDFLAG=false;
    YELLOW_RECTANGLE_FILLED;
    // If the flag is false, meaning that this isn't a continuation of contact...
    //if (YELLOWFLAG==false)
    if (!YELLOWFLAG)
    {
     // Shove 'true' into YELLOWFLAG
     YELLOWFLAG = true;
     REDFLAG=false;
     BLUEFLAG=false;
     YellowCount++;
     return(YellowCount);
     if (DEBUG)
     {
      {std::cout << "YELLOWFLAG (YellowCount++): " << YELLOWFLAG << std::endl;}
      std::cout << "--------- Yellow Count: " << YellowCount << "." << std::endl;
     }
    }
   }
  }
 }
 return(YellowCount);
}
int main(int argc, char* argv[])
{
  // STEP 1:
 // Bring the video file (AVI) in.
 CvCapture* VideoFile = cvCaptureFromFile("hands.avi");
 if (VideoFile == NULL)
 {
  std::cout << "Uh-oh.  Either the input file doesn't exist, or OpenCV cannot read it." << std::endl;
  return 1;
 }
 
 // Now let's set up the frame size so that we can vomit out a video...
 CvSize frame_size;
 frame_size.height = cvGetCaptureProperty(VideoFile, CV_CAP_PROP_FRAME_HEIGHT);
 frame_size.width = cvGetCaptureProperty(VideoFile, CV_CAP_PROP_FRAME_WIDTH);
 // We'll go ahead and say that the AVI file is loaded now:
 if(DEBUG)
 {std::cout << "Brought in AVI file." << std::endl;}
 
 // Figure out what our incoming movie file looks like
 double FPS = cvGetCaptureProperty(VideoFile, CV_CAP_PROP_FPS);
 double FOURCC = cvGetCaptureProperty(VideoFile, CV_CAP_PROP_FOURCC);
 if(DEBUG)
 {
  std::cout << "FPS:  " << FPS << std::endl;
  std::cout << "FOURCC:  " << FOURCC << std::endl;
 }
 
 // Create a CvVideoWriter.  The arguments are the name of the output file (must be .avi), 
 // a macro for a four-character video codec installed on your system, the desired frame 
 // rate of the video, and the video dimensions.
 CvVideoWriter* videoWriter = cvCreateVideoWriter("hands_output.avi",CV_FOURCC('D', 'I', 'V', 'X'), FPS, cvSize(frame_size.width, frame_size.height));
 // Now we can say that the VideoWriter is created:
 if(DEBUG)
 {std::cout << "videoWriter is made." << std::endl;}
 
 // Make display windows
 cvNamedWindow("current Frame", CV_WINDOW_AUTOSIZE);
 //cvNamedWindow("canny Frame", CV_WINDOW_AUTOSIZE);
 cvNamedWindow("output Frame", CV_WINDOW_AUTOSIZE);
 
 // Keep track of frames
 static int imageCount = 0;
 
 // Set up images.
 //IplImage* cannyFrame = cvCreateImage(cvSize(frame_size.width, frame_size.height), IPL_DEPTH_8U, 1);
 IplImage* currentFrame = cvCreateImage(cvSize(frame_size.width, frame_size.height), IPL_DEPTH_8U, 1);
 IplImage* outFrame = cvCreateImage(cvSize(frame_size.width, frame_size.height), IPL_DEPTH_8U, 3);
 
 // There's gotta be a better way to do this... like with threading?
 while(1)
 {
  // Let's try to threshold this at 15FPS - the input rate.
  // 66 is used as it's 1/15 * 1000...  
  // Wait a second!  I have the FPS here.  *sigh*  Let's do this dynamically:
//  Sleep((1000/FPS)-10);
  // Awesome.  The video runs in actual time now, after subtracting out the 10ms from WaitKey().
 
  IplImage* tempFrame = cvQueryFrame(VideoFile);
  // If the video HAS a current frame...
  if (tempFrame != NULL)
  {
   // The video is BGR-space.  I wish there were a cvGetColorSpace command or something...
   cvCvtColor(tempFrame, currentFrame, CV_BGR2GRAY);
   // Grrr ... flipped.
   cvFlip(currentFrame);
   // Grr... if I could detect skin, then AND the regions within the rectangles with the skin-highlighted image, then I could tell with certainty if a hand was there, rather than getting false counts.
   // cvThreshold is almost good enough, though.
   // AHA!  But I'd not have to check for skin at every pixel, just within the rectangles within DetectXXX function calls.
   // ... and if the area within the box has white elements (e.g. skin), then the hand is on the book.
   // this would be MUCH more reliable than using RGB colorspace detection.
   cvThreshold(currentFrame, currentFrame, 200, 255, CV_THRESH_BINARY);
   cvShowImage("current Frame", currentFrame);
   if(DEBUG)
   {std::cout << "Pulled in video grab of frame " << imageCount << "." << std::endl;}
 
   // Set up the output composite image...
   outFrame = cvCloneImage(tempFrame);
   // Reference the TopLeft as (0,0).
   tempFrame->origin=IPL_ORIGIN_TL;
   cvFlip(tempFrame);
   outFrame->origin=IPL_ORIGIN_TL;
   cvFlip(outFrame);
 
   // If a hand isn't in any of the rectangles...
   // Draw some rectangle overlays on the image.
   BLUE_RECTANGLE;
   RED_RECTANGLE;
   YELLOW_RECTANGLE;
 
   // But if a hand is in a rectangle (e.g. counting a book)...
   // we should be able to pick this up at the pixel level...
   // This is downright odd ... the pixel values here (the blue rectangle) change whenever ANY book is touched.
   // Erm ... yes, it would do this at (i, j)=0.
   // (Calls moved to DetectBlue, DetectRed, DetectYellow)
   if (DEBUG)
   {
    std::cout << "+ BlueCount: " << BlueCount << std::endl;
    std::cout << "+ BLUEFLAG: " << BLUEFLAG << std::endl;
    std::cout << "+ RedCount: " << RedCount << std::endl;
    std::cout << "+ REDFLAG: " << REDFLAG << std::endl;
    std::cout << "+ YellowCount: " << YellowCount << std::endl;
    std::cout << "+ YELLOWFLAG: " << YELLOWFLAG << std::endl;
   }
 
   BlueCount = DetectBlue(tempFrame, outFrame, BlueCount);
   RedCount = DetectRed(tempFrame, outFrame, RedCount);
   YellowCount = DetectYellow(tempFrame, outFrame, YellowCount);
 
   // Let's get some output on the screen...
   CvFont font;
   cvInitFont(&font, CV_FONT_HERSHEY_SIMPLEX, 0.8, 0.8, 0, 2);
   cvPutText(outFrame, "Counting Book-Touches", cvPoint(0, 20), &font, cvScalar(255, 255, 255));
   // Blue Counts
   char CountB[3];
   sprintf(CountB, "%d", BlueCount);
   cvPutText(outFrame, CountB, cvPoint(270, 50), &font, cvScalar(300, 0, 0));
   // Red Counts
   char CountR[3];
   sprintf(CountR, "%d", RedCount);
   cvPutText(outFrame, CountR, cvPoint(270, 125), &font, cvScalar(0, 0, 300));
   // Yellow Counts
   char CountY[3];
   sprintf(CountY, "%d", YellowCount);
   cvPutText(outFrame, CountY, cvPoint(270, 175), &font, cvScalar(128, 255, 255));
   
   // Holy freakin' crap, as Peter Griffin might say.  I've never been happier to count book touches.
 
   cvShowImage("output Frame", outFrame);
   
   // Write the current frame to an output movie.
   cvWriteFrame(videoWriter, outFrame);
   if(DEBUG)
   {std::cout << "Wrote frame to output AVI file." << std::endl;}
   imageCount++;
   } // end if (image != NULL) loop
  
  // This will return the code of the pressed key or -1 if
  // nothing was pressed before 10 ms elapsed.
  int keyCode = cvWaitKey(10);
  // "S" or "s" will pause playback.  
  if ( (keyCode == 's') || (keyCode == 'S') )
  {
   while(1)
   {
    keyCode = cvWaitKey(10);
    if ( (keyCode == 's') || (keyCode == 'S') )
    {
     keyCode = 999;
     break;
    }
   }
  }
 
  // But the video may have ended...
  if( ((tempFrame == NULL) || (keyCode >= 0)) && (keyCode != 999) )
  {
   // Either the video is over or a key was pressed.
   // Dump the video file.
   cvReleaseCapture(&VideoFile);
   // Release the videoWriter from memory.
   cvReleaseVideoWriter(&videoWriter);
   // Release images from memory...
   cvReleaseImage(&currentFrame);
   cvReleaseImage(&tempFrame);
   cvReleaseImage(&outFrame);
   // ... And destroy the windows.
   cvDestroyWindow("current Frame");
   cvDestroyWindow("output Frame");
   std::cout << "Released VideoFile and VideoWriter." << std::endl;
   exit(0);
  }
 }// end while loop
 return 0;