Many applications in computer vision, surveillance and other domains require high-resolution close-ups of details of a scene. High resolution cameras often do not provide enough resolution over a wide enough area to support these demands. Consequently we have designed an automatic foveation system that mimics the way eyes operate by steering a high resolution sensor to an area of interest.


Learning the mapping.
The first part of the transformation corrects for the viewpoint difference in the two cameras. We use a ground-plan assumption, assuming that most of the objects being tracked are moving on a ground plane. For points on this plane (we assume the lowest point of a tracked object is on the ground) a homography describes the transformation between the image coordinates in the two images (with the slave in a "home" calibration position.The homography is calculated by running our tracking engine simultaneously on the two cameras views while they overlap in their home positions. A RANSAC algorithm samples pairs of tracks and is used to find the best-fit homography that most accurately maps simultaneous tracks from one view to the other.
On the same data we estimate a linear transform that predicts the head position of a person in the slave view given their position and size in the master view. This allows us to predict the slave camera position of points (in particular heads) not on the ground plane where the homography is applicable.
Finally the system automatically learns the response of the
slave camera to pan/tilt/zoom commands, by tracking points in the image
while executing such commands. A linear map is trained by measuring the
image displacements for a few hundred P/T locations. We have now
automatically constructed all three stages of the map and the system
can be used.

Operation
In operation, the system runs our standard object tracker on the master camera input, tracking the locations of objects in its view. Whenever an object is observed, the pan/tilt coordinates for the save are calculated and the slave driven to the area of interest. The slave camera's video can be encoded for later viewing, or still close-up images can be captured and stored in the database along with track information. We have created a prototype multiscale browser that allows a user to search the track database and replay all activity in the master camera along with the multiscale data that was captured for each track. Figure 4 shows the multiscale browser used to query the multiscale track database.
Fig. 4. The Multiscale browser, showing the video playback window, the horizontal activity bars (small and large scale) and the currently selected multiscale view.
A demo for multiscale tracking is shown in following video (8.8M)
Bottom left: PTZ view 2,
Top right: Fixed camera view 1,
Bottom right: Fixed camera view 2.
Other Research Areas:
- Robust Background Subtraction
- Salient Motion Detection
- Object Classification
- 2D Tracking
- 3D Multi-Person Tracking
- Articulated Human Body Tracking
- Active Head Tracking
- Coarse Head Pose Estimation
- Position Independent Absolute Head Pose Estimation
- Face Cataloger
- Video Privacy
- Real Time Alerts
- Middleware for Large Scale Surveillance (MILS)
- Performance Evaluation of Surveillance Systems

