Multi-Scale Tracking and Index Browser

Many applications in computer vision, surveillance and other domains require high-resolution close-ups of details of a scene. High resolution cameras often do not provide enough resolution over a wide enough area to support these demands. Consequently we have designed an automatic foveation system that mimics the way eyes operate by steering a high resolution sensor to an area of interest.

multiscaletrackoverviewpicture
Fig. 1. The system for automatically steering Camera 2 to track objects seen in camera 1.

The system uses two or more cameras, some of which are fixed and are designated "masters", in which our tracking system tracks objects of interest, and others which are designated "slaves" that can be steered to focus on areas of interest. To be able to do this however, the system must be able to calculate the control parameters for the pan/tilt slave cameras that direct them at points of interest seen from a different viewpoint. The system does this through a series of automatically learnt mappings shown in figure 2 that go from image coordinates in the master image, to image coordinates in the slave, to predicted head location in the slave, and then to pan/tilt commands for the slave.

multiscaletransform
Fig. 2. Transforms from master image coordinates to slave steering parameters

Learning the mapping.

The first part of the transformation corrects for the viewpoint difference in the two cameras. We use a ground-plan assumption, assuming that most of the objects being tracked are moving on a ground plane. For points on this plane (we assume the lowest point of a tracked object is on the ground) a homography describes the transformation between the image coordinates in the two images (with the slave in a "home" calibration position.

The homography is calculated by running our tracking engine simultaneously on the two cameras views while they overlap in their home positions. A RANSAC algorithm samples pairs of tracks and is used to find the best-fit homography that most accurately maps simultaneous tracks from one view to the other.

On the same data we estimate a linear transform that predicts the head position of a person in the slave view given their position and size in the master view. This allows us to predict the slave camera position of points (in particular heads) not on the ground plane where the homography is applicable.

Finally the system automatically learns the response of the slave camera to pan/tilt/zoom commands, by tracking points in the image while executing such commands. A linear map is trained by measuring the image displacements for a few hundred P/T locations. We have now automatically constructed all three stages of the map and the system can be used.

multiscalemappingpicture

Fig.3. The effect of the automatically learned homographies and height maps. Top: two background images superimposed with the predicted height of an object of constant apparrent size at different locations in the other camera. Bottom: the image of the other camera warped with the automatically-learned homography to match the view of the camera shown above.

Operation

In operation, the system runs our standard object tracker on the master camera input, tracking the locations of objects in its view. Whenever an object is observed, the pan/tilt coordinates for the save are calculated and the slave driven to the area of interest. The slave camera's video can be encoded for later viewing, or still close-up images can be captured and stored in the database along with track information. We have created a prototype multiscale browser that allows a user to search the track database and replay all activity in the master camera along with the multiscale data that was captured for each track. Figure 4 shows the multiscale browser used to query the multiscale track database.

Multiscalebrowserpicture

Fig. 4. The Multiscale browser, showing the video playback window, the horizontal activity bars (small and large scale) and the currently selected multiscale view.


A demo for multiscale tracking is shown in following video (8.8M)

Top left:  PTZ view 1, 
Bottom left: PTZ view 2, 
Top right: Fixed camera view 1, 
Bottom right: Fixed camera view 2.

multiscalepicture

 

Other Research Areas: