Introduction
Introduction
Motion Capture is the process of capturing the movement of a real object and
create synthetic actors by capturing the motions of real humans. In this case, special
markers are placed over the joints of actors. Then, a special hardware samples the
position and/or orientation of those markers in time, generating a set of motion data,
also known as motion curves. This technique has been used by special effects
keyframing and simulation is the capability of real-time visualization and the high-
Although it has been studied since the 80's, the present utilization of MoCap is
captured from live subjects are mapped directly on a virtual actor, and then the
animation is displayed. In spite of its value, this use is very limited and do not exploit
Our goal is to study techniques to improve the quality of motion captured data, and to
of an animation system which is based on MoCap. This software works with motion
captured data, and provides tools for motion analysis, manipulation and reuse.
History
Around this same time, biomechanics labs were beginning to use computers to analyze
human motion. Techniques and devices used in these studies began to make their way into
the computer graphics community. In the early 1980's, Tom Calvert, a professor of
body and used the output to drive computer animated figures for choreographic studies and
clinical assessment of movement abnormalities. To track knee flexion, for instance, they
strapped a sort of exoskeleton to each leg, positioning a potentiometer alongside each knee so
as to bend in concert with the knee. The analog output was then converted to a digital form
and fed to the computer animation system. Their animation system used the motion capture
apparatus together with Labanotation and kinematic specifications to fully specify character
motion.[1]
Soon after that, commercial optical tracking systems such as the Op-Eye and SelSpot systems
began to be used by the computer graphics community. In the early 1980's, both the MIT
Architecture Machine Group and the New York Institute of Technology Computer Graphics
Optical trackers typically use small markers attached to the body—either flashing LEDs or
small reflecting dots—and a series of two or more cameras focused on the performance
space. A combination of special hardware and software pick out the markers in each camera's
visual field and, by comparing the images, calculate the three-dimensional position of each
the number of positions per second that can be captured), by occlusion of the markers by the
body, and by the resolution of the cameras—specifically for their ability to differentiate
markers close together. Early systems could track only a dozen or so markers at a time. More
recent systems can track several dozen at once. Occlusion problems can be overcome by the
use of more cameras, but even so, most current optical systems require manual post-
processing to recover trajectories when a marker is lost from view. This will change as
systems become more sophisticated. The problem of resolution involves a trade-off of many
variables, including camera price, field of view, and space of movement. The more resolution
you need, the more the camera costs. The same camera can give you greater movement
resolution if focused on a smaller field of view, but this limits the size of motions that are
possible. Because of these limitations, almost all the uses of optical tracking systems today
rely on post-processing procedures to analyze, process, and clean up the data before they are
In 1983 Ginsberg and Maxwell at MIT, presented the Graphical Marionette, a system for
used an early optical motion capture systems called Op-Eye that relied on sequenced LEDs.
They wired a body suit with the LEDs on the joints and other anatomical landmarks. Two
cameras with special photo detectors returned the 2-D position of each LED in their fields of
view. The computer then used the position information from the two cameras to obtain a 3-D
world coordinate for each LED. The system used this information to drive a stick figure for
immediate feedback, and stored the sequence of points for later rendering of a more detailed
character. The slow rate of rendering characters, and the expense of the motion capture
hardware was the largest roadblock to the widespread use of this technology for animation
production. Since that time, however, hardware rendering has sped up considerably, and the
methods employed in the Graphical Marionette project are becoming more commonly used
In 1988, deGraf/Wahrman developed "Mike the Talking Head" for Silicon Graphics to show
off the real-time capabilities of their new 4D machines. Mike was driven by a specially built
controller that allowed a single puppeteer to control many parameters of the character's face,
including mouth, eyes, expression, and head position. The Silicon Graphics hardware
provided real-time interpolation between facial expressions and head geometry as controlled
by the performer. Mike was performed live in that year's SIGGRAPH film and video show.
The live performance clearly demonstrated that the technology was ripe for exploitation in
production environments.[3]
As early as 1985, Jim Henson Productions had been trying to create computer graphics
versions of their characters. They met with limited success, mainly due to the limited
capabilities of the technology at that time. Finally, in 1988, with availability of the Silicon
Graphics 4D series workstation, and with the expertise of Pacific Data Images, they found a
viable solution. By hooking a custom eight degree of freedom input device (a kind of
mechanical arm with upper and lower jaw attachments) through the standard SGI dial box,
they were able to control the position and mouth movements of a low resolution character in
real-time. Thus was Waldo C. Graphic born. Waldo's strength as a computer generated
puppet was that he could be controlled in real-time in concert with real puppets. The
computer image was mixed with the video feed of the camera focused on the real puppets so
that everyone could perform together. Afterwards, in post production, PDI re-rendered Waldo
in full resolution, adding a few dynamic elements on top of the performed motion.[4]
movements of the upper torso, head, and arms so that actors could control computer
characters by miming their motions. Potentiometers on the plastic frame measure body
motion which is picked up by the computer in real-time. They have used the suit in many
projects, although they have not found it to be the ideal body tracking device due to the noise
dancing in front of a microphone while singing a song for a music video. To get realistic
human motion, they decided to use motion capture techniques. Based on experiments in
motion capture from Kleiser's work at Digital Productions and Omnibus (two now-defunct
computer animation production houses), they chose an optically-based solution from Motion
Analysis that used multiple cameras to triangulate the images of small pieces of reflective
tape placed on the body. The resulting output is the 3-D trajectory of each reflector in the
space. As was described above, one of the problems with this kind of system is tracking
points as they are occluded from the cameras. For Dozo, this had to be done as a very time-
consuming post-process. Luckily, some newer systems are beginning to do this in software,
Graphic, Videosystem, a French video and computer graphics producer, turned the attentions
of its newly formed computer animation division to the problem of computer puppets. The
result was a real-time character animation system whose first success was the daily
production of a character called Mat the Ghost. Mat was a friendly green ghost that interacted
with live actors and puppets on a daily childrens' show called Canaille Peluche. Using
DataGloves, joysticks, Polhemus trackers, and MIDI drum pedals, puppeteers interactively
performed Mat, chroma-keyed with the previously-shot video of the live actors. Since there
was no post-rendering, animation sequences were generated in the time it took the performers
to achieve a good take. Seven minutes of animation (one week's worth) were normally
completed in a day and a half of performance. Mat appeared on Canaille Peluche every day
Videosystem, now known as Medialab, has continued to develop the performance system to
the point where it is a reliable production tool, having produced several hours of production
Two puppeteers control the facial expressions, lipsynch, and special effects such as shape
transformations for Mat the Ghost, or bubbles from the mouth of a fish, and an actor mimes
the upper body motions while wearing a suit with electromagnetic trackers (Polhemus) on the
torso, arms, and head. The finger motions, joystick movements, and so on, of the puppeteers
are transformed into facial expressions and effects of the character, while the motion of the
SimGraphics has long been in the VR business, having built systems around some of the first
VPL DataGloves in 1987. Around 1992 they developed a facial tracking system they called a
"face waldo." Using mechanical sensors attached to the chin, lips, cheeks, and eyebrows, and
electro-magnetic sensors on the supporting helmet structure, they could track the most
important motions of the face and map them in real-time onto computer puppets. The
importance of this system was that one actor could manipulate all the facial expressions of a
character by just miming the facial expression himself—a perfectly natural interface.
One of the first big successes with the face waldo, and its concomitant VActor animation
system, was the real-time performance of Mario from Nintendo's popular videogame for
Nintendo product announcements and trade shows. Driven by an actor behind the scenes
wearing the face waldo, Mario conversed and joked with audience members, responding to
their questions and comments. Since then, SimGraphics has concentrated on live performance
animation, developing characters for trade shows, television, and other live entertainment.
During the past few years, SimGraphics has been continually updating the technology of the
After deGraf/Wahrman's Mike the Talking Head, Brad deGraf continued working on his
own, developing a real-time animation system which is now called Alive! For one character
performed with Alive!, deGraf developed a special hand device with five plungers actuated
by the puppeteer’s fingers. The device was used to control the facial expressions of a
computer-generated friendly talking spaceship, who, much like Mario, promoted its "parent"
DeGraf subsequently joined Colossal Pictures where he used Alive! to animate Moxy, a
computer generated dog who hosts a show for the Cartoon Network. Moxy is performed in
real-time for publicity, but post-rendered for the actual show. The actor's motions are
captured by an electromagnetic tracking system with sensors on the hands, feet, torso, and
1993: Acclaim
At SIGGRAPH '93 Acclaim amazed audiences with a realistic and complex two-character
animation done entirely with motion capture. For the previous several years, Acclaim had
quietly developed a high-performance optical motion tracking system, much like the ones
used for the Graphical Marionette and Dozo, but able to track up to a 100 points
simultaneously in real-time. Acclaim mainly uses the system to generate character motion
sequences for video games. Their system is proprietary and they do not plan to market the
In the past few years, Ascension, Polhemus, SuperFluo, and others have released commercial
motion tracking systems for computer animation. In addition, animation software vendors,
such as SoftImage, have integrated these systems into their product creating "off-the-shelf"
performance animation systems. Although there are many problems yet to be solved in the
field of human motion capture, the practice is now well ensconced as a viable option for
computer animation production. As the technology develops, there is no doubt that motion
capture will become one of the basic tools of the animator's craft.
Types of motion capture
Motion capture is typically accomplished by any of the following Methods while each
technology has its strengths and weakness, there is not a single motion capture technology
Magnetic motion capture system is utilization of sensor on body. These sensors are cabled to
electronic control unit that correlates their reported locations within the field. These
electronic controlled unit are networked with the host computer that uses a software driver to
represent these positions in 3D space. These sensors denotes the positional and rotational
information of markers
Pro: Real time, No correspondences, smaller workspace, positions are absolute, rotations are
Cons: Heavier sponsors, difficult to move, wires on body, Cost, Limited Range, magnetic
distortion occurs as distance increases, prone to interference from magnetic fields- cement
In this type of motion capture Performer wears a human shape set of metal strips like a very
basic skeleton that is hooked on to performers back. Each joints has sensors which gives the
position.
Other type of mechanical motion capture involves gloves, mechanical arm, or articulated
Con: No realistic motion. Sensors makes noise. The technology has no awareness of ground,
so there can be no jumping, plus feet data tends to slide. Equipment must be calibrated often.
Unless there is some other type of sensor in place, it does not know which way the
performer’s body is pointing. Absolute positions are not known but are calculated from the
rotations.
Optical motion capture is most commonly used technology. There are two types of optical
motion capture. Reflective and Pulse LED. Optical motion capture is capturing data digitally
which is capable of turning real life motion into digital form. Optical motion capture is
extensively used many fields like animation, special effects, gaming. This technique brings a
In short it uses number of special cameras from different angle. Basically using two or more
camera angles brings a sense of a 3d world. This technique has reflective markers which are
placed on actor’s body. Because these are reflective markers, it becomes easy for software to
identify the position in 3d world. When the same marker gets tracked from more than one
camera it gives details of all the three axis. Marker base capture can give more accuracy over