Touch interaction is, nowadays, becoming pervasive. Mobile phones, tablet PCs, Microsoft Surface tables, large wall sized displays, are examples of the broad spectrum of devices who offer support for interaction techniques based on touch. This sample shows the great diversity that permeates these devices, from the multitude of sizes and form factors, to their hardware and software capabilities, including the possibility, in some of them, of exploring multi-touch gestures.

However, one factor that still limits the penetration of this technology is its cost, most notably for the bigger devices. This is mostly due to the technological requirements of touch display technologies, which span resistive and capacitive panels, acoustic waves or pulse recognition, and, more recently, optical imaging, where two or more image sensors are placed around the edges (mostly the corners) of the screen. Infrared backlights are placed in the camera’s field of view on the other sides of the screen. A touch shows up as a shadow and each pair of cameras can then be triangulated to locate the touch. This technology is growing in popularity, due to its scalability, versatility, and affordability, especially for larger units.

This project aims at introducing developments in order to further increase the availability and versatility of touch driven interaction surfaces, through gesture based interaction techniques, even extending the interaction scope by dropping the surface requirement. To achieve this goal we propose to use tracking of fingers (or any other object being grasped) accomplished by video captured by standard low cost video cameras or webcams. Additionally we aim to research new interaction techniques to explore the possibilities opened up by the proposed flexible interaction configuration.

To address both one and multi-finger tracking, a two step approach is required. The first step to be accomplished is region detection, or foreground detection. The second step is the tracking algorithm itself.

In our settings, the user is equipped with finger-thimbles which are needed for real time requirements. Under this condition, the foreground detection may be accomplished by simply thresholding the first channel of the input image. In this way, several regions are expected to be obtained. However, just one region represents the object of interest. In the first scenario (one finger tracking) the foreground region detection will provide not only the finger (target to be tracked) but also other residual information (i.e., other fingers or regions due to noise image). It is thus crucial, to figure out the true measurement among the several obtained returns. We surpass this problem by using the Probabilistic Data Association Filter (PDAF). This method is rooted in Control theory, originally used to track point targets from radar measurements. It basically consists on the following assumption: "Among the possibly several validated measurements, one can be target originated. The remaining measurements (or image observations) are assumed due to false alarms or residual clutter". The success of the PDAF lies on the data interpretation (or combinations) formulated among the image observations (i.e., finger and clutter) obtained in the current image.

When the multi-finger scenario is considered alternative approaches are available. Indeed the extension of the PDAF for multitarget is explored in this work, termed herein Multiple-Model PDAF (MM-PDAF). Although we explore PDAF for multi-finger tracking, we also address the problem by incorporating tracking algorithms used in the point correspondence literature, which allow tracking a large number of targets. This study aims to find the best algorithm both in terms of robustness and real time in the context of multi-target tracking.

The proposed standard configuration, comprised of a set of video cameras, a laptop and a video projector, would open up the possibility of setting up a touch based interaction facility virtually anywhere. Since the gesture recognition is vision based, several interaction scenarios can be envisioned: (1) standard scenario, where projection and interaction share the same surface; (2) distinct surfaces scenario, where the visual projection is done in one surface and the gestures are made in a different surface; (3) no interaction surface scenario, where visual output is projected in one surface, but users perform “touchless” gestures, i.e. without a supporting surface. It is worth mentioning that, in any of the scenarios, the projection surface is not mandatorily the output of a video projector, but can be, for instance, a simple monitor, or even a mobile phone display. This will allow us to explore remote control of applications, for instance. Accordingly, it is also one of the project

Time Line

The project will start on 01-01-2010 and has the duration of 33 months.


Vista project is funded by Fundação da Ciência e Tecnologia.