How do you think the new GigE standards will influence the machine vision industry?
Respond or ask your question now!
By Lee J. Nelson
Although well-suited for classifying facial patterns, behavior analysis is performed by hand with FACS. That labor-intense, time-consuming process presents a major obstacle to ongoing research. Currently, experts make frame-by-frame perceptual judgments from video imagery. It requires approximately 100 hours of instruction to be able to formulate reliable and consistent findings. Once trained, a specialist typically takes three hours to code one minute of recorded video. And, although one can master accurate interpretation of facial morphology (knowing which muscles are active), it is difficult to construe expression dynamics (muscular activation and motion over time). There is good evidence to suggest expression dynamics, not simply morphology, provide an important window into emotion. Spontaneous smiles, which connect with positive self-reports, have fast and smooth onsets with distinct facial actions that peak nearly simultaneously. Posed smiles, conversely, initiate slowly and disjointedly.
Given the state of the technology, machine vision systems can code facial expressions automatically, and at the level of detail exacted by behavioral correlation studies. Researchers at the Machine Perception Laboratory, Institute for Neural Computation (University of California, San Diego) have been refining a computerized FACS. While individual facial action detector accuracy falls below that of skilled persons’, the volume of video data that can be processed is momentous. Statistical analysis on such a huge dataset often reveals behavioral models that otherwise would have consumed hundreds of person-hours in manual evaluation. Furthermore, automation facilitates the investigation of expression dynamics; previously unachievable due to the need for appraising all diminutive gradations.
Presently, Dr. Marion Stewart Bartlett and her associates are testing fully automatic FACS recognition on a continuous video stream. Faces are identified and transformed in two dimensions, based on eye location. For scoring, images are ported to Action Unit detectors which were trained on the DFAT-504 facial expression database (published in 2000 by Jeffrey Cohn, Takeo Kanade, and Yingli Tian, The Robotics Institute, Carnegie Mellon University, Pittsburgh, Pa.).
A prequel to that work was CERT, the Computer Expression Recognition Toolbox, also developed at the University of California, San Diego. Fifteen years in the making, CERT originated from a collaboration between Ekman and Dr. Terrence Sejnowski (Computational Neurobiology Laboratory, Salk Institute for Biological Studies, La Jolla, Calif.). The system now automatically locates frontal faces in a real-time video stream; then, codes each from a library of 30 FACS Action Units plus 40 continuous dimensions, including anger, contempt, disgust, fear, joy, sadness, surprise, and head position (pitch, roll, and yaw).
Two neural routes mediate facial appearance, each of which initiates in a separate area of the brain. Voluntary facial movements originate in the cortical motor strip while candid expressions stem from the subcortical region with a discrete innervation pattern. The former exhibits stronger control of lower-face muscles while the latter more forcefully directs muscles in the upper face. The two also differ in dynamics: the subcortically induced (spontaneous) group reveals a consistent, reflex-like, smooth, synchronized symmetry. Cortically instructed expressions (subject to volitional control) tend to be jerky with a variable quality. From those observations, one might expect to see physiological inconsistencies in genuine versus fabricated expressions of, for example, pleasure, pain, drowsiness, etc.