A Brief History of Motion Capture for Computer Character Animation

David J. Sturman
104, av. du Président Kennedy
75016 Paris France

Reference: "Character Motion Systems", SIGGRAPH 94: Course 9


The use of motion capture for computer character animation is relatively new, having begun in the late 1970's, and only now beginning to become widespread.

Motion capture is the recording of human body movement (or other movement) for immediate or delayed analysis and playback. The information captured can be as general as the simple position of the body in space or as complex as the deformations of the face and muscle masses. Motion capture for computer character animation involves the mapping of human motion onto the motion of a computer character. The mapping can be direct, such as human arm motion controlling a character’s arm motion, or indirect, such as human hand and finger patterns controlling a character’s skin color or emotional state.

The idea of copying human motion for animated characters is, of course, not new. To get convincing motion for the human characters in Snow White, Disney studios traced animation over film footage of live actors playing out the scenes. This method, called rotoscoping, has been successfully used for human characters ever since. In the late 1970's, when it began to be feasible to animate characters by computer, animators adapted traditional techniques, including rotoscoping. At the New York Institute of Technology Computer Graphics Lab, Rebecca Allen used a half-silvered mirror to superimpose videotapes of real dancers onto the computer screen to pose a computer generated dancer for Twyla Tharp's "The Catherine Wheel." The computer used these poses as keys for generating a smooth animation. Rotoscoping is by no means an automatic process, and the complexity of human motion required for "The Catherine Wheel," necessitated the setting of keys every few frames. As such, rotoscoping can be thought of as a primitive form or precursor to motion capture, where the motion is "captured" painstakingly by hand.

1980-1983: Simon Fraser University — Goniometers

Around this same time, biomechanics labs were beginning to use computers to analyze human motion. Techniques and devices used in these studies began to make their way into the computer graphics community. In the early 1980's, Tom Calvert, a professor of kinesiology and computer science at Simon Fraser University, attached potentiometers to a body and used the output to drive computer animated figures for choreographic studies and clinical assessment of movement abnormalities. To track knee flexion, for instance, they strapped a sort of exoskeleton to each leg, positioning a potentiometer alongside each knee so as to bend in concert with the knee. The analog output was then converted to a digital form and fed to the computer animation system. Their animation system used the motion capture apparatus together with Labanotation and kinematic specifications to fully specify character motion.[1]

1982-1983: MIT — Graphical Marionette

Soon after that, commercial optical tracking systems such as the Op-Eye and SelSpot systems began to be used by the computer graphics community. In the early 1980's, both the MIT Architecture Machine Group and the New York Institute of Technology Computer Graphics Lab experimented with optical tracking of the human body.

Optical trackers typically use small markers attached to the body—either flashing LEDs or small reflecting dots—and a series of two or more cameras focused on the performance space. A combination of special hardware and software pick out the markers in each camera's visual field and, by comparing the images, calculate the three-dimensional position of each marker through time.

The technology is limited by the speed at which the makers can be examined (thus affecting the number of positions per second that can be captured), by occlusion of the markers by the body, and by the resolution of the cameras—specifically for their ability to differentiate markers close together. Early systems could track only a dozen or so markers at a time. More recent systems can track several dozen at once. Occlusion problems can be overcome by the use of more cameras, but even so, most current optical systems require manual post-processing to recover trajectories when a marker is lost from view. This will change as systems become more sophisticated. The problem of resolution involves a trade-off of many variables, including camera price, field of view, and space of movement. The more resolution you need, the more the camera costs. The same camera can give you greater movement resolution if focused on a smaller field of view, but this limits the size of motions that are possible. Because of these limitations, almost all the uses of optical tracking systems today rely on post-processing procedures to analyze, process, and clean up the data before they are applied to the computer character.

In 1983 Ginsberg and Maxwell at MIT, presented the Graphical Marionette, a system for "scripting-by-enactment"—one scripts an animation by enacting the motions. The system used an early optical motion capture systems called Op-Eye that relied on sequenced LEDs. They wired a body suit with the LEDs on the joints and other anatomical landmarks. Two cameras with special photo detectors returned the 2-D position of each LED in their fields of view. The computer then used the position information from the two cameras to obtain a 3-D world coordinate for each LED. The system used this information to drive a stick figure for immediate feedback, and stored the sequence of points for later rendering of a more detailed character. The slow rate of rendering characters, and the expense of the motion capture hardware was the largest roadblock to the widespread use of this technology for animation production. Since that time, however, hardware rendering has sped up considerably, and the methods employed in the Graphical Marionette project are becoming more commonly used for computer character animation.[2]

1988: deGraf/Wahrman — Mike the Talking Head

In 1988, deGraf/Wahrman developed "Mike the Talking Head" for Silicon Graphics to show off the real-time capabilities of their new 4D machines. Mike was driven by a specially built controller that allowed a single puppeteer to control many parameters of the character's face, including mouth, eyes, expression, and head position. The Silicon Graphics hardware provided real-time interpolation between facial expressions and head geometry as controlled by the performer. Mike was performed live in that year's SIGGRAPH film and video show. The live performance clearly demonstrated that the technology was ripe for exploitation in production environments.[3]

1988: Pacific Data Images — Waldo C. Graphic

As early as 1985, Jim Henson Productions had been trying to create computer graphics versions of their characters. They met with limited success, mainly due to the limited capabilities of the technology at that time. Finally, in 1988, with availability of the Silicon Graphics 4D series workstation, and with the expertise of Pacific Data Images, they found a viable solution. By hooking a custom eight degree of freedom input device (a kind of mechanical arm with upper and lower jaw attachments) through the standard SGI dial box, they were able to control the position and mouth movements of a low resolution character in real-time. Thus was Waldo C. Graphic born. Waldo's strength as a computer generated puppet was that he could be controlled in real-time in concert with real puppets. The computer image was mixed with the video feed of the camera focused on the real puppets so that everyone could perform together. Afterwards, in post production, PDI re-rendered Waldo in full resolution, adding a few dynamic elements on top of the performed motion.[4]

Subsequently PDI developed a light-weight plastic upper-body "exoskeleton" to track the movements of the upper torso, head, and arms so that actors could control computer characters by miming their motions. Potentiometers on the plastic frame measure body motion which is picked up by the computer in real-time. They have used the suit in many projects, although they have not found it to be the ideal body tracking device due to the noise in the electronics and the encumbering nature of the exoskeleton.[5]

1989: Kleiser-Walczak — Dozo

In 1989, Kleiser-Walczak produced Dozo, a (non-real-time) computer animation of a woman dancing in front of a microphone while singing a song for a music video. To get realistic human motion, they decided to use motion capture techniques. Based on experiments in motion capture from Kleiser's work at Digital Productions and Omnibus (two now-defunct computer animation production houses), they chose an optically-based solution from Motion Analysis that used multiple cameras to triangulate the images of small pieces of reflective tape placed on the body. The resulting output is the 3-D trajectory of each reflector in the space. As was described above, one of the problems with this kind of system is tracking points as they are occluded from the cameras. For Dozo, this had to be done as a very time-consuming post-process. Luckily, some newer systems are beginning to do this in software, significantly speeding up the motion capture process.[6]

1991: Videosystem — Mat the Ghost

Having seen the possibility of animating characters by performance techniques in Waldo C. Graphic, Videosystem, a French video and computer graphics producer, turned the attentions of its newly formed computer animation division to the problem of computer puppets. The result was a real-time character animation system whose first success was the daily production of a character called Mat the Ghost. Mat was a friendly green ghost that interacted with live actors and puppets on a daily childrens' show called Canaille Peluche. Using DataGloves, joysticks, Polhemus trackers, and MIDI drum pedals, puppeteers interactively performed Mat, chroma-keyed with the previously-shot video of the live actors. Since there was no post-rendering, animation sequences were generated in the time it took the performers to achieve a good take. Seven minutes of animation (one week's worth) were normally completed in a day and a half of performance. Mat appeared on Canaille Peluche every day for over three and a half years.[7]

Videosystem, now known as Medialab, has continued to develop the performance system to the point where it is a reliable production tool, having produced several hours of production animation in total, for more than a dozen characters.

Typically, each character is controlled by several puppeteers or actors working in concert. Two puppeteers control the facial expressions, lipsynch, and special effects such as shape transformations for Mat the Ghost, or bubbles from the mouth of a fish, and an actor mimes the upper body motions while wearing a suit with electromagnetic trackers (Polhemus) on the torso, arms, and head. The finger motions, joystick movements, and so on, of the puppeteers are transformed into facial expressions and effects of the character, while the motion of the actor is directly mapped to the character's body.

1992: SimGraphics — Mario

SimGraphics has long been in the VR business, having built systems around some of the first VPL DataGloves in 1987. Around 1992 they developed a facial tracking system they called a "face waldo." Using mechanical sensors attached to the chin, lips, cheeks, and eyebrows, and electro-magnetic sensors on the supporting helmet structure, they could track the most important motions of the face and map them in real-time onto computer puppets. The importance of this system was that one actor could manipulate all the facial expressions of a character by just miming the facial expression himself—a perfectly natural interface.

One of the first big successes with the face waldo, and its concomitant VActor animation system, was the real-time performance of Mario from Nintendo's popular videogame for Nintendo product announcements and trade shows. Driven by an actor behind the scenes wearing the face waldo, Mario conversed and joked with audience members, responding to their questions and comments. Since then, SimGraphics has concentrated on live performance animation, developing characters for trade shows, television, and other live entertainment.

During the past few years, SimGraphics has been continually updating the technology of the face waldo, improving reliability and comfort.

1992: Brad deGraf — Alive!

After deGraf/Wahrman's Mike the Talking Head, Brad deGraf continued working on his own, developing a real-time animation system which is now called Alive! For one character performed with Alive!, deGraf developed a special hand device with five plungers actuated by the puppeteer’s fingers. The device was used to control the facial expressions of a computer-generated friendly talking spaceship, who, much like Mario, promoted its "parent" company at trade shows.[8]

DeGraf subsequently joined Colossal Pictures where he used Alive! to animate Moxy, a computer generated dog who hosts a show for the Cartoon Network. Moxy is performed in real-time for publicity, but post-rendered for the actual show. The actor's motions are captured by an electromagnetic tracking system with sensors on the hands, feet, torso, and head of the actor.

1993: Acclaim

At SIGGRAPH '93 Acclaim amazed audiences with a realistic and complex two-character animation done entirely with motion capture. For the previous several years, Acclaim had quietly developed a high-performance optical motion tracking system, much like the ones used for the Graphical Marionette and Dozo, but able to track up to a 100 points simultaneously in real-time. Acclaim mainly uses the system to generate character motion sequences for video games. Their system is proprietary and they do not plan to market the technology except as a production house.

Today: Many players using commercial systems

In the past few years, Ascension, Polhemus, SuperFluo, and others have released commercial motion tracking systems for computer animation. In addition, animation software vendors, such as SoftImage, have integrated these systems into their product creating "off-the-shelf" performance animation systems. Although there are many problems yet to be solved in the field of human motion capture, the practice is now well ensconced as a viable option for computer animation production. As the technology develops, there is no doubt that motion capture will become one of the basic tools of the animator's craft.


[1] T. W. Calvert, J. Chapman and A. Patla, "Aspects of the kinematic simulation of human movement," IEEE Computer Graphics and Applications, Vol. 2, No. 9, November 1982, pp. 41-50.

[2] Carol M. Ginsberg and Delle Maxwell, "Graphical marionette," Proc. ACM SIGGRAPH/SIGART Workshop on Motion, ACM Press, New York, April 1983, pp. 172-179.

[3] Barbara Robertson, "Mike, the talking head," Computer Graphics World, July 1988, pp. 15-17.

[4] Graham Walters, "The story of Waldo C. Graphic," Course Notes: 3D Character Animation by Computer, ACM SIGGRAPH '89, Boston, July 1989, pp. 65-79.

[5] Graham Walters, "Performance animation at PDI," Course Notes: Character Motion Systems, ACM SIGGRAPH 93, Anaheim, CA, August 1993, pp. 40-53.

[6] Jeff Kleiser, "Character motion systems," Course Notes: Character Motion Systems, ACM SIGGRAPH 93, Anaheim, CA, August 1993, pp. 33-36.

[7] Herve Tardif, "Character animation in real time," Panel: Applications of Virtual Reality I: Reports from the Field, ACM SIGGRAPH Panel Proceedings, 1991.

[8] Barbara Robertson, "Moving pictures," Computer Graphics World, Vol. 15, No. 10, October 1992, pp. 38-44.

Main Animation Page
HyperGraph Table of Contents.
HyperGraph Home page.

Last changed March 13, 1999, G. Scott Owen, owen@siggraph.org