[161] define a system able to parse 3D and time-varying
gestures. Their system captures gesture features such as postures of the
hand (straight, relaxed, closed), its motion (moving, stopped), and its
orientation (up, down, left, right, forward, backward --- derived from
normal and longitudinal vectors from the palm). Over time, a stream is of
gesture is them abstracted into more general gestlets (e.g.,
Pointing attack, sweep, end reference). Similarly, low level
eye tracking input was classified into classes of events (fixations,
saccades, and blinks). They integrate these multimodal features in a
hybrid representation architecture.
It should be noted that the term and concept of 'gesture' is also used in pen computing. Although here, the recorded action takes place in the 2D plane, similar phenomena play a role as in the case of 3D hand gesturing, but with a much easier signal processing involved.
Other interesting references are [233,287,364,341].