0% found this document useful (0 votes)
712 views

3d Human Model

This document describes the development of a software package for generating realistic 3D human animations and textures from multiple camera views. The package uses the open-source 3D modeling software Blender and its Python scripting capabilities. A skeletal human model and rigging process are created to allow animation via motion capture data. Camera configurations are generated to capture views from different angles for testing computer vision algorithms. The package exports animated mesh and texture data along with camera and animation parameters to provide ground truth data without noise for algorithm evaluation and testing in a multi-camera system.

Uploaded by

abhishekr25
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
712 views

3d Human Model

This document describes the development of a software package for generating realistic 3D human animations and textures from multiple camera views. The package uses the open-source 3D modeling software Blender and its Python scripting capabilities. A skeletal human model and rigging process are created to allow animation via motion capture data. Camera configurations are generated to capture views from different angles for testing computer vision algorithms. The package exports animated mesh and texture data along with camera and animation parameters to provide ground truth data without noise for algorithm evaluation and testing in a multi-camera system.

Uploaded by

abhishekr25
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

CENTER FOR MACHINE PERCEPTION

Articulated 3D human model and its animation for testing and learning algorithms of multi-camera systems
Ondej Mazan r y
{mazano1,svoboda}@cmp.felk.cvut.cz

CZECH TECHNICAL UNIVERSITY

CTUCMP200702 January 15, 2007

MASTER THESIS

Available at ftp://cmp.felk.cvut.cz/pub/cmp/articles/svoboda/Mazany-TR-2007-02.pdf Thesis Advisor: Tom Svoboda as The work has been supported by the Czech Academy of Sciences un der Project 1ET101210407. Tom Svoboda acknowledges suppor as as of the Czech Ministry of Education under Project 1M0567. Research Reports of CMP, Czech Technical University in Prague, No. 2, 2007 Published by Center for Machine Perception, Department of Cybernetics Faculty of Electrical Engineering, Czech Technical University Technick 2, 166 27 Prague 6, Czech Republic a fax +420 2 2435 7385, phone +420 2 2435 7637, www: http://cmp.felk.cvut.cz

Prohlen as Prohlauji, e jsem svou diplomovou prci vypracoval samostatn a pouil s z a e z jsem pouze podklady (literaturu, projekty, SW atd.) uveden v piloenm e r z e seznamu.

V Praze dne ....................... ................................ podpis

Acknowledgments I thank my advisers Tom Svoboda and Petr Doubek for their mentoras ing and the time spent helping me with my work. I thank my wife Eva and my parents for their support during my studies. Finally I give thanks to Jesus Christ, for the faith that gives me hope and purpose to study.

Abstract This document describes a software package for creating realistic simple human animations. The package uses the open source 3D modeling software Blender and the scripting language Python. The animations generated for several cameras ought to be used for testing and learning of tracking algorithms in the multi-camera system where the ground-truth data are needed. We made a human skeletal model and designed the way how to animate it by scripts with using motion capture data. The texture of 3D human model is obtained from captured images. Transformations between computer vision and computer graphics are discussed in detail. We designed our own algorithm for automatic rigging mesh model with the bones of the skeletal model. Steps of the design are covered in this thesis together with the usage of the software package.

Abstrakt Tento dokument popisuje softwarov bal pro vytven realistickch aniy k ar y mac lovka. Softwarov bal je zaloen na open source 3D modelovac c e y k z m nstroji Blender a programovac jazyku Python. Vytvoen animace z a m r e pohledu nkolika kamer jsou ureny pro pouit v multikamerovm systmu e c z e e pro uen a testovn algoritm po covho vidn Vytvoili jsme model c a u cta e e . r kostry a navrhli zpsob animace s pomoc nasn u manch dat pohybu. Texy tury trojrozmrnho modelu lovka jsou z any ze sn u z kamer. Transe e c e sk mk formace mezi po covm vidn a po covou grakou jsou detailn cta y e m cta- e popsny. Navrhli jsme vlastn algoritmus pro automatick spojen sovho a e t e modelu lovka s modelem kostry. Jednotliv kroky nvrhu jsou zahrnuty v c e e a tto diplomov prci spolen s pouit softwarovho bal e e a c e z m e ku.

Contents
1 Introduction 2 Articulated graphic model 3 Blender and Python overview 3.1 Blender coordinate system . . . 3.2 Blender scene and objects . . . 3.3 Blender materials and textures 3.4 Blender camera . . . . . . . . . 4 Skeleton model denition 5 Rigging the mesh with bones 6 Texturing 6.1 Used approach . . . . . . . . . . . . . 6.2 The problems of our method . . . . . 6.3 Determining the visibility of faces . . . 6.4 The best visibility algorithm . . . . . 6.5 Counting the visible pixels . . . . . . . 6.6 Conclusions (for texturizing approach) 5 7 11 12 12 13 14 16 19 25 25 27 27 28 30 30

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

7 Animation 32 7.1 Motion capture data . . . . . . . . . . . . . . . . . . . . . . . 32 7.2 Usage of motion captured data . . . . . . . . . . . . . . . . . 34 8 Description of the software package 8.1 Mesh data format . . . . . . . . . . . 8.2 Animation XML le . . . . . . . . . 8.3 Camera conguration les . . . . . . 8.4 Exported animation . . . . . . . . . 8.5 Run parameters . . . . . . . . . . . . 3 37 38 38 39 40 41

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

. . . . .

4 9 Used mathematics and transformations 9.1 Blenders math . . . . . . . . . . . . . . 9.2 Blender Euler rotations . . . . . . . . . 9.3 Blender camera model . . . . . . . . . . 9.4 Using projection matrices with OpenGL 9.5 Bone transformations . . . . . . . . . . . 9.6 Vertex deformations by bones . . . . . . 9.7 Conguring the bones . . . . . . . . . . 9.8 Idealizing the calibrated camera . . . . . 10 The 10.1 10.2 10.3

CONTENTS 42 42 43 45 49 54 55 57 58

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

. . . . . . . .

package usage 61 Importing mesh and attaching to the skeleton . . . . . . . . . 62 Fitting the model into views and texturing . . . . . . . . . . 63 Loading animation, rendering and animation data export . . 64

11 Results and conclusions 66 11.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

Chapter 1

Introduction
Tracking in computer vision is still a dicult problem and in general it remains largely unsolved. Monocular tracking using only one camera is possible with the knowledge of the tracked object model as described in [14]. The 3D model of tracked object can be also a side product of the tracking algorithms when learning the model that is going to be tracked. Having the prior knowledge of the model can simplify the process of learning and make the tracking in computer vision more robust. The priors of human model can suciently constrain the problem [13]. The 3D estimation of a human pose from monocular video is often poorly constrained and the prior knowledge can resolve the ambiguities. Dimitrijevic, Lepetit and Fua used generated human postures to obtain template human silhouettes for specic human motions [3]. They used these templates to detect silhouettes in images in both indoor and outdoor environments where the background subtraction is impossible. This approach is robust with respect to the camera position and allows detecting a human body without any knowledge about camera parameters. Since the detected templates are projections of posed 3D models, it is easy to estimate the full 3D pose of the detected body from 2D data. The constraints for hierarchical joints structures tracked using existing motion tracking algorithms solve the problems with false positive classication of poses which are unreachable for human body as shown by Herda, Urtasun and Fua [6]. Also the dependencies between the joints helps to determine valid poses. Due to the lack of data for joint ranges authors tracked all the possible congurations for hand joints and applied the acquired data to constrain the motion tracker. Another use is in testing the computer vision algorithms. The noise in real data makes detecting silhouettes and events in the scene harder and therefore it is convenient to have ground truth data without the noise for testing the algorithms. It is easier to dene own scene with own model, generate an animation and then verify information gained by the algorithms. 5

CHAPTER 1. INTRODUCTION

Developing simple system for creating realistic animations may speed up the process of testing multi-camera systems. Modern computer animations allow us to generate realistic scenes, and therefore we can work with synthetic data as well as with real data. This saves the time needed for capturing new data from the real environment. For ones safety it is preferred to simulate some dangerous scene scenarios as it is often used in movies nowadays. But creating a realistic 3D animation is often a hand work that starts with modeling a 3D mesh and nishes by its animation. This is a general process of creating animations used by computer artists which we tried to simplify and automate. For testing and learning the algorithms we need several human models with the same animations and poses. It is important for us to be able to change the model easily. It is true that the major changes can be done by changing the textures, but sometimes the change of the whole model may be needed. Our task was to design a realistic articulated human model and ways of its animation. This human model is viewed from dierent angles by dierent cameras. The idea was to use any of the existing SW for generating photorealistic scenes such as Blender and Povray. Blender is mainly a 3D modeling and animation tool for creating realistic scenes, but it has more features. Blender supports scripting in Python language, this allows to modify camera settings or objects in the scene easily. We chose Blender [1] and Python [10] for their features and availability. Python is built inside of Blender for Blenders internal scripting. Blender supports usage of external Python installation with external Python modules. We took advantage of Blender in skeletal animation. Blenders GUI was used during preparation of the scenes and the human model. We also used Matlab for some computations and testing.

Chapter 2

Articulated graphic model


Articulated graphic models are widely used in computer graphics and therefore many approaches were developed in this branch. These models are often called avatars, virtual humans, humanoids, virtual actors, animated characters. The sort of animation of these characters is often called character animation. The character animation is used in computer games, modeling softwares, VRML (virtual reality mark-up language) etc. Most of the 3D modeling and animation software tools are supplied with several models prepared for animation and also with character animation tools. Blender has less sophisticated character animation support then commercial softwares, but allows similar functions with good results. Blenders character animation ought to be used in game engine, so the focus is on the speed instead of on visual quality. However Blenders functions give us everything we need to build an animated human character with sucient results. Building the model is similar to other commercial 3D modeling software tools. Articulated character models usually consists of: 1. Mesh (wire-frame, polygon-based ) model. 2. Material, textures. 3. Virtual bones. Some softwares also allows simulating muscles. In our work the Blender articulated model consists of these three parts only. We can imagine the mesh model as skin of the model, see Figure 2. It describes the surface of the model in the models rest position (see Figure 5.5), when there is no animation applied. The mesh model is a net of connected triangles or polygons and can contain separated parts. These polygons are also called faces of the mesh. Each face is dened by 3,4 or more vertices. Vertex is a point in the 3D space. Vertex besides the coordinate parameters x,y,z can also have its normal as a parameter. To be able to determine where the face is facing to, each face has to have its normal. 7

CHAPTER 2. ARTICULATED GRAPHIC MODEL

Figure 2.1: Example of a mesh model of a head. The lines between vertices show the edges of the polygons.

The material species how the model will be visualized, how it will reect lights during the rendering, how it will cast shadows, the color of mesh polygons and the translucency. A texture can be used simultaneously with the material. The texture is often a bitmap which is mapped on the mesh model. Mapping the texture on the mesh can be done in several ways. The way mostly used is the UV mapping, also used in our work. The UV stands for 2D coordinates in the texture space. Usually each face vertex has its UV coordinates. This denes the correspondence between the texture pixels and the face surface, see Figure 2.

Figure 2.2: Example of a textured cube mesh (on the left) and the texture (on the right) with unwrapped UV coordinates.

Virtual bones (further only bones) are used to deform the mesh, see Figure 2. It is possible to animate each vertex separately, but unusable in case of thousands of vertices. Instead of this you attach the vertices to the bones and though any change of the bone position will aect the

9 vertices. This is a common way of character animation and it is imitating real body movement as well. Attaching the vertices to the bones is called rigging. The most dicult problem is to nd the correspondences between vertices and bones and this has not been fully solved yet. Artists create human models according to their own knowledge about the model and body deformations. One way, still often used especially in need of precise results, is to do it manually. At the beginning, the animator uses any automatic tool and then corrects the mistakes. This is unusable if you need to change the mesh. Blender allows attaching vertices to the closest bone but we found this function not working properly in our case. The other possibility is to use envelopes. The envelope of a bone denes a bone aected area. This may be useful in case of very simple meshes like robot models. We wrote our own rigging function which we describe later in Chapter 5. It nds the correspondences between the vertices and the bones, and creates vertex groups using these correspondences. Each vertex group belongs to a dierent bone. It is possible to attach one vertex to several bones. Therefore the nal deformation is constructed of deformations of all these bones. We achieved better results especially if the vertices are close to two bones.

Figure 2.3: The bones in Blender and the eect on the mesh in a new pose.

To get an articulated model we need to dene a mesh, textures and bones. Blender is not supplied with such models as other softwares are, but Blender oers tools for creating, animating and rendering a model. Graphic artists often create a realistic model by hand which is very time consuming. The artists get well looking models because they work on all details. For our purposes we need only approximate model. We need to change both the model and the animation easily and quickly as well as the animation. We automated the whole process using available software and methods.

10

CHAPTER 2. ARTICULATED GRAPHIC MODEL

The project MakeHuman [7] is an open source project which oers a tool for creating realistic humanoid models for computer graphics. In the current release, the project allows to create only a mesh model. It is possible to export the model into the Maya .obj format and reuse it in Blender. The maintainers of the MakeHuman project are currently working on a muscle engine, which will simulate the musculature. This project does not contain bones or textures but oers a wide variety in making humanoid mesh models. Modeling a human mesh model is much easier using this tool then Blender so we used the MakeHuman to create the mesh models. The advantages of using the MakeHuman include convenient pose for rigging the mesh and the possibility of dening a generic skeleton of virtual bones for the meshes generated by MakeHuman. The textures have the biggest inuence on appearance of nal rendered images. For example computer games work only with simple models but the nal result looks realistic. The key is the proper texture. In our work we need to be able to change the textures of the model. We setup a multicamera system for capturing images of a real human in order to texture our model. More is described in Chapter 6. The used algorithm is based on determination of visible and hidden pixels in each camera and chooses the best view for texturing the particular mesh face. Creating a skeleton for a character usually depends on levels of articulation and purposes of animation. The skeleton denes virtual bone connections. There exists a standard H-Anim [4] for humanoid animations in VRML, which denes joints and some feature points on a human body. The feature points are derived from the CAESAR project - the database of measured human data. We used the joints and the feature points from the H-Anim as a template during the designing the mesh skeleton. Using the MakeHuman allows generating a skeleton that match the most of the created meshes. H-Anim joint nomenclature was used for naming the bones. The nal skeleton was dened in order to match the MakeHuman created mesh and to be easy for animation with BVH les (more in Chapter 7. The articulation is done through bones. It is possible to change the length of the bones or their position during animation, but this makes no sense in the case of human body with constant bone lengths. However this may be useful during tting the model into captured images as described in Chapter 6. The nal articulation should be rotation of the bones only. Blender has an inverse kinematics solver. Pose of the character may be changed quickly by grabbing the hands or feet instead of setting bone rotations.

Chapter 3

Blender and Python overview


Blender [1] is open source software for 3D modeling and animation. We used Blender version 2.42a and Python version 2.4. The choice of Blender version is important, because Blender is under massive development undergoing changes of the philosophy and data structures, unfortunately in Blender Python API as well. For example we could not use the bones setting from our previous work [8]. We used Blender as the rendering and animation engine. Blender supports inverse kinematics, tools for character animation and others. Blender uses Python for both internal and external scripting. It is possible to write a Python expression in the Blender text box or run an external Python script directly from Blender. The Python installation is not needed, because Blender has Python built in. Blender can use an external Python installation to extend functions and libraries. We do not describe all the Blender features, we oer only an overview. More detailed information may be found in the Blender Manual [1] and in Blender Python API [1]. Blenders user interface and rendering is based on OpenGL [11]. Blender Python API provides a wrapper of the OpenGL functions and allows using OpenGL for creating user-dened interface in Python scripts. We used the OpenGL in our work for rendering in texturing algorithm as will be described in Chapter 6. We advice to read a Blender manual [2] to learn using the Blender. The BlenderQuickStart.pdf le, which is shipped with Blender and can be found in Blender installation folder, contains a brief tour of Blender usage. Some functions are accessible only through hot-keys or mouse clicking and may neither be found in menus nor in Blender Python API. Blender supports external raytracers as well as the internal renderer with support for radiosity or raytracing. Blender has a panorama rendering that can simulate the sh-eye cameras. The Blender has also the anti-aliasing featured called oversampling, motion blur, and gaussian lter. The whole 11

12

CHAPTER 3. BLENDER AND PYTHON OVERVIEW

animation composed of several frames can be rendered into one le ( the avi or quicktime format) or into single images.

3.1

Blender coordinate system

Blender uses right-handed coordinate system as shown in Figure 3.1. Rotations used in Blenders GUI are entered in degrees. For the sake of completeness, we should mention that Blender internally calculates in radians. We notice this only for certainty. Blender expresses the rotations in both Euler angles, quaternions, and matrices. The Quaternions are for example used for posing the bones. Blender uses Euler rotations for general objects in Python API. More about rotations and transformations in Blender will be described in Chapter 9. The units in Blender are abstract. The coordinates can be absolute - in the world space, or relative to parent objects or object origins. In our work we perceive Blender units as metric units for easier expression in real world.

Figure 3.1: Right-handed coordinate system.

3.2

Blender scene and objects

The scene in Blender consists of objects. Each object contain data for its own specication. Data associated with the objects can be either a Mesh, Lamp, Camera, or Armature. The Mesh is a wire-frame model built of polygons (Blender call them faces). The basic parameters of the objects include location and rotation (dened by Euler angles). These parameters can be absolute (in the world space) or relative. Objects can be parents or children of other objects. For example our mesh model is composed of two objects: from the object with Mesh data, and from object with armature data. Armature objects hold bones in Blender (they are equivalent to the skeleton). The Armature object is parent of the Mesh object and controls the deformation of mesh vertices. The example of scene contents is shown in Figure 3.2, where eye symbol stands for camera, bone symbol for bones, axes

3.3. BLENDER MATERIALS AND TEXTURES

13

for general objects, symbol of man for an armature object and spheres stand for materials. The vertex groups shown as small balls will be described in Chapter 5. They split mesh vertices into several groups that are used later for rigging with the bones. They are called same names as bones.

Figure 3.2: The example of a scene structure in Blender.

The bones in Armature objects dene the rest pose of the skeleton. In fact three parameters dene the bone: the head of the bone, the tail of the bone and the roll parameter which rotates the bone around its y axis. The bones can be children of other bones and may inherit the transformations of their parent. If bones are connected to the parent, the bone head automatically copies the location of the parents bone tail as shown in Figure 3.3.

3.3

Blender materials and textures

Blender materials have many features but we use only few of them in our work. The materials are used for binding the textures. The textures must be used together with materials in Blender. The materials can be also used for simulating the human skin which does not behave as a general surface. The project MakeHuman has a plugin for Blender for simulating the skin. We do not use it in our work. Materials in our work are set to be shadeless. This means that the material is insensitive to lights and shadows and the texture is mapped without any intensity transformation. Final textured object will appear the same from dierent views even if the illumination diers. This makes recognizing learned patterns of an object easier. The materials in Blender also specify the mapping of the texture onto an object. The method which we use is UV mapping. This material option must be set explicitly, because this is not default texture mapping in Blender. We

14

CHAPTER 3. BLENDER AND PYTHON OVERVIEW

Figure 3.3: Bones in Blender, children bone is connected to parent bone and copies the parent transformation.

set this option automatically in our scripts. No other changes are needed in default materials. The important thing for usage of the Blender Python API is linking of the materials. The materials can be linked to both general Blender Object and Mesh object. We link materials to the Mesh object. Up to 16 materials can be linked to an object. This also limits the number of textures for uniform-material object. The textures which we use are normal images loaded in Blender. It is recommended to align color intensities between images before mapping the textures. Obviously dierent cameras have dierent sensitivity on colors. The light sources in the scene have also eect on the illumination in the captured image. The object textured with images of varied illumination will appear inconsistent in nal rendering. The other texture usage is displacement maps.

3.4

Blender camera

The camera in Blender is similar to pinhole camera or perspective projection. The Blender camera can be switched to orthographic. The view is in camera negative z axis. The camera parameter lens sets the viewing angle. The value 16 corresponds to a 90 degree angle as shown in Figure 3.4. The drawback of Blender camera is in simulating a real camera projection. The Blender camera always has ideal perspective projection with principal axis in the image center. The camera is congured as other objects in Blender scene. The main parameters are location and rotation in the world space. The inverse of camera world transformation matrix can be used to get trans-

3.4. BLENDER CAMERA

15

formation from world space to the camera local space. This inverse matrix equals to [R, C] expression in computer vision. This is more detailed described in Chapter 9. Other parameters which dene the projection are independent on camera objects and can be setup in Blender render parameters. These parameters are width, height of the image and aspect ratio. More about the camera model and related transformations will be described in Chapter 9.

Figure 3.4: Denition of lens parameter of Blender camera.

Chapter 4

Skeleton model denition


We use the MakeHuman project [7] for obtaining the 3D human mesh model. The advantage is easy change of the model. The pose and the proportions of most models created in the MakeHuman are approximately constant. This allows to dene a general skeleton for most of the models. Using the MakeHuman project we can generate a mesh using several targets for the legs, hands, face features etc. These models are intended to be used primarily in computer graphics and may lack anatomical correctness. Models can be exported into a .obj le format which Blender can import. These models only describe the surface of the model. They are without bones or materials. These models can be imported into Blender, where we attach bones for model articulation. The skeleton denition is ambiguous. The H-Anim standard [4] denes levels of articulation, bones (joints), dimensions, and feature points of a human body for virtual humans (humanoids). But this standard is hardly applicable to other data structures, where the philosophy of character animation is dierent from the VRML (Virtual Reality Mark-up Language). However, the basic ideas and denitions of this standard can be converted for usage in Blender. We follow the H-Anim recommendations and use them as a template for the skeleton model. We adjusted the H-Anim denitions for easier use with motion capture format BVH. The names of our bones correspond with joint points of the H-Anim standard. We dened the general skeleton, which we generate by script, so it ts most of the MakeHuman meshes. The advantage is having the accurate lengths and positions of the bones. This can be used for example if the computation of the nal hand position in a new pose is needed. The locations are dened in table 4.1 and the skeleton visualization is in Figure 4.1. The bones are dened by pairs bone head and bone tail, see Figure 3.3. Bones have also a roll parameter, which can be specied to rotate the bone space around the bone y axis. The bone head is a joint location where the transformation applies. It is the origin of the bone space. The bone head together with tail dene the bones 16

17
Head y 0.824 0.921 1.057 1.4583 0.921 0.843 0.493 0.091 0.921 0.843 0.493 0.091 1.4583 1.4488 1.434 1.379 1.393 1.4583 1.4600 1.434 1.379 1.393 Tail y 0.921 1.057 1.4583 1.7504 0.843 0.493 0.091 0.012 0.843 0.493 0.091 0.012 1.4488 1.434 1.379 1.393 1.391 1.460 1.434 1.379 1.393 1.391

Bone name HumanoidRoot sacroiliac vl5 vt3 HumanoidRoot to l hip l hip l knee l ankle HumanoidRoot to r hip r hip r knee r ankle vl5 to l sternoclavicular l sternoclavicular l shoulder l elbow l wrist vl5 to r sternoclavicular r sternoclavicular r shoulder r elbow r wrist

x 0.000 0.000 0.000 0.000 0.000 0.096 0.065 0.069 0.000 -0.096 -0.065 -0.069 0.000 0.0820 0.194 0.410 0.659 0.000 -0.0694 -0.194 -0.410 -0.659

z 0.0277 -0.080 -0.034 -0.057 -0.080 -0.029 -0.011 -0.054 -0.080 -0.029 -0.011 -0.054 -0.057 -0.0353 -0.032 -0.062 -0.052 -0.057 -0.0330 -0.032 -0.062 -0.052

x 0.000 0.000 0.000 0.000 0.096 0.065 0.069 0.042 -0.096 -0.065 -0.069 -0.042 0.082 0.194 0.410 0.659 0.840 -0.0694 -0.194 -0.410 -0.659 -0.840

z -0.080 -0.034 -0.057 0.000 -0.029 -0.011 -0.054 0.180 -0.029 -0.011 -0.054 0.180 -0.0353 -0.032 -0.062 -0.052 -0.042 -0.033 -0.032 -0.062 -0.052 -0.042

Parent bone HumanoidRoot sacroiliac vl5 HumanoidRoot HumanoidRoot to l hip l hip l knee HumanoidRoot HumanoidRoot to r hip r hip r knee vl5 vl5 to l sternoclavicular vl5 to l sternoclavicular l shoulder l elbow vl5 vl5 to r sternoclavicular vl5 to r sternoclavicular r shoulder r elbow

Table 4.1: The bone locations in a rest pose. The units are abstract, but can be perceived as meter units. y axis. The bones can be connected together (see the parent bone column). The joint locations are dimensionless but can be perceived as meter units. The imported mesh is sized and rotated so it ts the generated skeleton. The Blender has importing script for obj les but it allows to rotate the mesh before importing. We advise to use a wrapper function written in our scripts to avoid improper functionality. Our function uses the same script supplied with Blender. The nal non-textured articulated model is nished by attaching (parenting) the bones to the mesh.

18

CHAPTER 4. SKELETON MODEL DEFINITION

Figure 4.1: The generic skeleton model as shown in Blenders top view.

Chapter 5

Rigging the mesh with bones


In this chapter we describe the problems of rigging the Mesh with the Bones, and also our own algorithm which we found suitable for rigging MakeHuman meshes with our general skeleton. The need of nding new rigging algorithm arose when Blender tools did not work well. Blender tools may attach mesh vertices to an improper bone (if using envelopes option) or they may not attach all vertices (using closest bones). The rigging process (skinning the skeleton) is attaching the vertices to the bones. The vertices are attached in armature (Blenders skeleton) rest pose. The rest pose is a default pose without rotations or other bone transformations. The rest pose is only dened by bones locations, lengths and by roll parameter. No transformations are applied on the vertices in the rest pose. The rigging is done by parenting an armature object (Blenders skeleton object) to a mesh object. Usually the armature object consists of several bones. With the new Blender version it is possible to use the armature object as a modier instead of as a mesh parent object. Modiers are applied on a mesh in order to modify it in several ways. We did not use this option. The armature deform options are: using the envelopes and using the vertex groups. These options can be mixed. One vertex can be deformed by several bones. The envelope denes an area around the bone. All the vertices in this area are aected by bone transformations. The vertex groups dene directly the correspondences between the vertices and the bones. A vertex group must have same name as the corresponding bone. The advantage of envelopes is quick and simple usage. You can place the bone into the desired area and start using it. The problem is in the envelope shape. Often the vertices which you do not want to attach are assigned to a bone. This is shown in Figure 5.1. The better way is using the vertex groups. You can add or remove the vertices from the vertex group and directly control which vertex will be aected by the bone. The problem is that the vertices must be added to the vertex groups explicitly. The Blender has a possibility of an automatic grouping the vertices according to closest bones. This function 19

20

CHAPTER 5. RIGGING THE MESH WITH BONES

can miss some vertices as shown in Figure 5.1, that they do not correspond to any bone and stay unaected. Therefore we found an algorithm which is suitable for our general skeleton. In order to make a correct new mesh pose, the proper correspondences of vertices with bones must be found. The vertices can have weight parameter which denes the inuence of the bone transformation. This is useful in case of a vertex transformation by two bones. Vertices at the border of two bones are more naturally deformed if they are attached to the both bones with dierent weights, see Figure 5.2 for comparison.

Figure 5.1: Bad deformation in the new pose. Using envelopes option on the left and the vertex groups created from the closest bones function on the right. In our algorithm we use the vertex groups but select the group vertices using our own classication function. The vertices with a higher angle to the bones y axis are less aected by setting scaled weight. We also take an advantage of the mesh symmetry in our algorithm. The algorithm in pseudo-code looks as follows:
for bone in armature.bones : create vertex group( bone.name ) for vertex in mesh.vertices : J max = 0.0 vertex group = bone.name weight = 1.0 for bone in armature.bones : v1 = bone.head vertex.location v2 = bone.tail vertex.location vb = bone.tail bone.head J = AngleBetwenVectors( v1, v2 )

21

Figure 5.2: The vertex deformations with each vertex attached to one bone on the left. On the right, vertices at the bones border are attached to both bones with dierent weights.

if bone not at the same side as the vertex then J = 0.0 if J max < J then J max = J vertex group = bone.name weight = AngleBetweenVectors( v1, vb ) / 60.0 assign vertex to group( vertex , vertex group, weight ) if has parent(bone ) and ( weight < 1.0 ) then assign vertex to group( vertex , parent(bone ).name, 1.0 weight )

See Figure 5.3 for better understanding. We use Blender Python API function as AngleBetweenVectors function. The function returns absolute angle in degrees. The angle can be also computed as arc cosine of two vectors dot product arccos aab . The main idea of vertex classication is nding the b biggest angle between bones head and tail as shown in Figure 5.3. If the angle is bigger than angle , then the vertex belongs to the parent bone and vice versa. The simulation of the algorithm is in Figure 5.4. We test each vertex if it lies on the right (positive x) or the left (negative x) side of y axis. We do not allow the vertices on the left side to be attached to bones with joints on the right side. The weight factor is decreased for vertices with larger angle between the vb vector and -v1 vector. If the weight is less then 1.0 and the bone has a parent bone, then the vertex is also added to the parent bone group with the remaining weight (the weight is clamped in range < 0, 1 >). How the vertices are deformed is described in Chapter 9.

22

CHAPTER 5. RIGGING THE MESH WITH BONES

Figure 5.3: The rigging algorithm is based on nding the maximum angle between bones head, tail and vertex.

This algorithm was tested with MakeHuman meshes and our skeleton model. Classifying only by angle gives good results with our bone conguration. The nal rigging results are shown in Figure 5.5, with rest poses and new poses for MakeHuman generated meshes.

23

Red shows the border for attaching the vertices to the bones Parent Bone Child Bone

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5

0.6

0.7

0.8

Red shows the border for attaching the vertices to the bones Parent Bone Child Bone

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.1 0.2 0.3 0.4 0.5

0.6

0.7

0.8

Figure 5.4: The simulation of the vertex classication for two dierent bone congurations.

24

CHAPTER 5. RIGGING THE MESH WITH BONES

Figure 5.5: The rest pose and the new pose for rigged meshes with the skeleton.

Chapter 6

Texturing
Follow the chapters above, we can create an articulated mesh model. This model looks like a gurine because it has not any texture or special material attached. The texture dierentiates the model from others and provides more detailed information. In order to have a realistic human model we choose digitally captured images of people for textures. We use multi-camera system in order to capture human images synchronously from dierent angles. The mesh model must be posed and tted into the captured images before texturing. We use a manual posing and tting using the Blender interface, see Figure 6.1. This task is hard to be solved automatically. The cameras in the system are calibrated and synchronized, but they do not have ideal parameters. The real cameras suer from radial distortion and skew. We can not simulate a real camera in Blender, so before tting the model into images we must adapt the images and adjust the camera parameters. First the camera parameters must be computed a converted to Blender camera parameters. Then the radial distortion and skew must be removed from images. Because we t the object in the view of Blender cameras, we also must shift the image origin to match the Blender camera projection. The transformations are discussed in Chapter 9.

6.1

Used approach

Mapping the texture onto a model is well recognized by computer graphics and most of literature about 3D graphics investigate this. We use UV mapping method for mapping images onto mesh faces. We use several images from dierent angles to cover whole mesh. The UV means, that each vertex of a face has its UV coordinates in the texture space. The texture is interpolated during the mapping. Faces have an associated texture. The texture mapping is an ane transformation dened by the vertex UV coordinates. Our problem is dierent illumination in images and the image noise. Another problem is that visibility of entire body surface is not possible (for 25

26

CHAPTER 6. TEXTURING

Figure 6.1: The manually positioned and posed mesh model into images using the Blenders user interface. The four cameras were used in this example.

example in the standing pose the bottom of feet is hidden). The distortion at boundaries arises due to limited accuracy and the problems mentioned above. The next problem is occlusion of mesh faces from a camera view. An algorithm with good results covering all these problems was developed by Niem and Broszio [9]. They use a sequence of images from stationary calibrated camera. They apply a maximal texture resolution approach for the best face camera view classication but they extended the method for regrouping the faces in order to avoid boundary distortion. They synthesize a new texture for hidden parts. Another algorithm was presented by Starck and Hilton [12]. In their model-based reconstruction of people they compose the captured images into a single texture. We build on the same foundations as used in these two works. There are two possibilities of using the captured images as textures. The rst is using one bitmap as the texture for whole mesh. This impose static unwrapped vertex UV coordinates (but if usage of the new model is desired, the old UV coordinates are invalid). All images are then composed to a single bitmap. This allows easier manipulation with texture and easier ltering in texture space. This is not a case of MakeHuman meshes because they are built of constant number of vertices, with constant ordering. The drawback of single bitmap texture is more dicult implementation. The second option, which we use, is using several textures for one mesh.

6.2. THE PROBLEMS OF OUR METHOD

27

This method is also often used for manual texturing. It allows to specify different materials for skin, clothes, hair etc. We use this approach but we do not use dierent materials. We use each captured image as a texture so the nal model contains several textures. This makes coding easier without any manipulations with images but it has many drawbacks. First the problem with dierent illumination in images causes steps and distortion in boundaries. From one camera the same object may look darker or brighter than from another. This can be partly solved by color intensity alignment. The objects viewed from dierent angles have often occluded parts. The camera calibration is never perfect so the object projection from one camera will not match the projection from another camera exactly. We did not solve these problems, they need to be addressed in future development. It must be considered which philosophy of the texture model would be better, the model with one material and one texture or model with several materials and textures.

6.2

The problems of our method

The biggest challenges in texturing the model are: the face best visibility estimation in the cameras and a texturing method of hidden parts. The simple method for testing the visibility is: render the object and read the pixels in result. The rendered pixels must contain information about the faces. This has some drawbacks. Small polygons can be occluded even if they are visible in a camera. The hidden parts can be as large as limbs or small as ngers. We can only guess the texture for these parts. For most of the real human bodies we can expect small changes in texture in small area. The texture for hidden parts would be synthesized from closest visible parts, but for easier coding we pretend, that the polygons are visible from the same view as their neighbor polygons. We expect that occlusions are caused by the same or similar body parts. This impose poses where hands are laid up and do not occlude a body at least from one view. The back side of a limb is then occluded by the front side. The probability of the same surface for the back side as for the front side is high for human objects. Of course there are exceptions as head or colorful miscellaneous clothes. The appropriate cameras arrangement is important.

6.3

Determining the visibility of faces

The problem of visibility is well known in computer graphics. The rendering process must properly solve this task. Lot of algorithms were developed for the visibility test. Some use z-buer, some test and cull the viewing frustum, some test polygon normals. These algorithms are now often implemented

28

CHAPTER 6. TEXTURING

directly in hardware graphic accelerators in order to speed up the rendering process. Instead of writing our own functions, we use OpenGL rendering capabilities which take advantage of hardware accelerators. Python script interpretation is too slow for rendering algorithm. Our approach is similar to Starck and Hilton [12]. We use the OpenGL [11] features, the hardware accelerated rendering to determine faces visibility. The OpenGL is turn to the simplest rendering mode with z-buer testing and without shading. The face index is encoded into RGB polygon color value. All the OpenGL features which change the face color during rendering (like shading) must be set o. The nal rendered bitmap contains the face indices of visible faces coded as colors. This has some drawbacks. We have only pixel accuracy for testing and small polygons can overlap each other. Polygon edges can be overlapped by neighbor polygons. We compute the number of visible pixels in order to measure the visibility and for determining the amount of occlusion we count the hidden (occluded) pixels. Some small faces may not be classied as visible due to limited pixel accuracy. Other faces can be hidden or occluded. It is better to have these faces textured for a better visual quality. The texture can be synthesized for occluded faces. The implementation is complicated because we use several bitmaps instead of a single bitmap. We pretend, that the faces are visible. In our algorithm we simply search in neighbor faces for a camera with the best visibility for most of the faces. The problem is searching for neighbor faces because the data accessed through Blender Python API has no such relation as neighbor face. We must test the face vertices for same location.

6.4

The best visibility algorithm

The algorithm for estimating the best face visibility in camera views is following (simplied for reading, pseudo-code):
% CLASSIFICATION OF THE FACES USING VISIBILITY % DEFINED BY VISIBLE PIXELS visible faces = [] hidden faces = [] selected image = None for face in mesh.faces : best visibility = visibility = 0 for image in images : [visible pixs,hidden pixs ] = image.pixels of( face ) if hidden > 0 then visibility = visible pixs / hidden pixs else visibility = visible pixs if visibility > best visibility and (visibility > MIN VIS) then best visibility = visibility

6.4. THE BEST VISIBILITY ALGORITHM


selected image = image if selected image then set Texture( selected image ) UV = selected image.get UV coordinates( face ) set Face UV coordintes( face, UV ) visible faces.append( face ) else hidden faces.append( face ) % FINDING THE BEST VIEW FOR UNCLASSIFIED FACES loop count = MAX OF LOOPS while hidden faces and loop count new visible faces = [] for face in hidden faces : neighbors = get Neighbors( visible faces, face ) if neighbors then selected image = get Most Used( neighbors ) set Texture( selected image ) UV = selected image.get UV coordinates( face ) set Face UV coordintes( face, UV ) new visible faces.append( face ) hidden faces.remove( face ) new visible faces.append( neighbors ) loop count = loop count 1 visible faces = new visible faces

29

The algorithm part of nding proper texture for unclassied faces must be constrained to avoid innite loops. Because some faces may not have neighborhood faces (the mesh can contain separated parts). Other reason is a slow speed of the algorithm. In our results we use value 10 for the constant MAX OF LOOPS. We also specify the constant MIN VIS to avoid primary texturing from views with occluded faces. The value of MIN VIS was set to 0.5 in our work. The slowest part is searching the neighborhood faces. This could be done once and more eectively than we do. Because we do not have any relation between mesh vertices and mesh faces, we need to test vertex locations for each face vertex with the vertices of the rest of the faces. We search only in the set of newly retrieved visible faces in order to eliminate visible faces which do not have hidden neighbor faces, thus the searching set is reduced. This algorithm is neither sophisticated nor quick, but shows the possible approach to the problem. Figure 6.2 shows the used images for textures and the nal textured model. The texture distortion in boundaries and the distortion caused by dierent illumination is present in the nal result. This is caused by capturing source images from angles with dierent illumination and by dierent camera types. The additional error is that manually tted model does not match exactly the real object in the scene as shown in Figure 6.1.

30

CHAPTER 6. TEXTURING

6.5

Counting the visible pixels

Using the calibrated cameras we can render mesh model into images and estimate the vertex UV coordinates. As mentioned above we use OpenGL to speed-up the process. The use of camera projection matrix for controlling the OpenGL view-port transformations will be described in detail in Chapter 9. The visible pixels are obtained by the following process: 1. Render the model in OpenGL render mode with depth (z-buer) test and without shading. Set the projection transformations using the cameras projection matrix. Render each mesh face with dierent color (face index is encoded in color RGB values). 2. Using the OpenGL feedback mode, process the model and obtain the image coordinates for each face. The OpenGL feedback mode returns data of OpenGL pipeline right before rasterizing. This is suitable for obtaining correct face window coordinates which correspond to the previously rendered data. 3. Process every face of the model in render mode with depth testing turned o. Turning the depth testing o causes that the faces are fully rendered as they t into the view. Compare the newly rendered area with data from the rst step. Count the equal color values as visible pixels and others as hidden.

6.6

Conclusions (for texturizing approach)

This part of our work was most complicated to eectively code in Python. For speed issues we used a built-in support for OpenGL and we took advantage of its fast rendering. This rapidly sped-up the whole process. The drawbacks are in the pixel accuracy and rasterizing. The very small faces rendered as a single pixel can be occluded by their neighborhoods. The polygon edges are discretized and therefore can overlap each other. This causes that the face fully visible in a view can be enumerated as partly visible or hidden. We did not investigate the problems with texture distortion at edges or distortion caused by dierent illumination. We expect a well posed object for selecting the source texture for unclassied faces. The camera locations and orientations are also important. Our approach can be extended in a future development. The implementation in Blender shows the possible usage. In future the algorithm presented by Niem and Broszio [9] can be fully implemented. This will expect synthesizing the texture for hidden parts and would allow usage of the sentence of images from a single camera. The model tting can be done semi-automatically as presented by Starck and Hilton [12].

6.6. CONCLUSIONS (FOR TEXTURIZING APPROACH)

31

Figure 6.2: The four source images for texturing and the nal rendered model in a new pose.

Chapter 7

Animation
The animation itself can be studied separately for its complexity. Even Blender oers much more animation tools than we used (for example nonlinear animation and actions). We will focus only on issues which we applied in our work. As mentioned in previous chapters we use a skeleton for animation. Besides that, it is possible to animate vertices separately and directly by changing their position. Also the textures can be animated which can be useful for example for animation of face gestures. We prefer to use motion capture data for real representation of human movements. The proper interpretation of data is not easy. We use BVH format because Blender has import script for BVH les. It must be noted that this script is not working correctly in our version of Blender with our BVH data even if they are correct. The script may omit some characters in joint names during import. It is always better to check joint names after import. The object which holds the bones in Blender is called armature. Armature contains several bones used for animation. Bones have pose channels which dene the changes against the rest pose. The pose channel is an interpolated curve which allows smooth animation. The curve is dened at desired frames and the rest of the curve is interpolated. The channels can be for change in location, size or rotation. The bone rotation is expressed in quaternions. For bones pose, channels are for each quaternion element. Channel interpolated curves can be viewed and edited in Blenders IPO curve editor window (where IPO stands for interpolated curve). The number of frames and the frame rate can be set in Blenders Button window.

7.1

Motion capture data

For motion capture data we use the BVH format. This has several advantages. This format is widely used by animation software and can be imported into Blender. The BVH les can be obtained from Internet without the need of capturing own data. 32

7.1. MOTION CAPTURE DATA

33

The BVH le format was originally developed by Biovision, a motion capture services company. The name BVH stands for Biovision hierarchical data. Its disadvantage is the lack of a full denition of the rest pose (this format has only translational osets of children segments from their parent, no rotational oset is dened). The BVH format is built from two parts, the header chapter with joint denitions and the captured data chapter. See the example:
HIERARCHY ROOT Hips { OFFSET 0.00 0.00 0.00 CHANNELS 6 Xposition Yposition Zposition ... ... Zrotation Xrotation Yrotation JOINT Chest { OFFSET 0.00 5.21 0.00 CHANNELS 3 Zrotation Xrotation Yrotation JOINT Neck { OFFSET 0.00 18.65 0.00 CHANNELS 3 Zrotation Xrotation Yrotation JOINT Head { OFFSET 0.00 5.45 0.00 CHANNELS 3 Zrotation Xrotation Yrotation End Site { OFFSET 0.00 3.87 0.00 } } } } } MOTION Frames: 2 Frame Time: 0.033333 8.03 35.01 88.36 ... 7.81 35.10 86.47 ...

The start of the header chapter begins with the keyword HIERARCHY. The following line starts with the keyword ROOT followed by the name of the root segment. The hierarchy is dened by curly braces. The oset is specied by the keyword OFFSET followed by the X,Y and Z oset of the segment from its parent. Note that the order of the rotation channels appears a bit odd, it goes Z rotation, followed by the X rotation and nally the Y rotation. The BVH format uses this rotation data order. The world space is dened as a right handed coordinate system with the Y axis as the world up vector. Thus the BVH skeletal segments are obviously aligned along the Y axis (this

34

CHAPTER 7. ANIMATION

is same as our skeleton model). The motion chapter begins with the keyword MOTION followed by a line indicating the number of frames (Frames: keyword) and frame rate. The rest of the le contains the motion data. Each line contains one frame and the tabulated data contains data for channels dened in header chapter.

7.2

Usage of motion captured data

We import BVH data using the Blenders import script for BVH les. It creates empty objects in Blenders scene. The empty objects copy the hierarchy of the BVH le (but as noticed above some characters from joint names may be missing after import). The animation data are imported and animation channels for objects are created. As we know, there is no standard for joint names or hierarchy of BVH format, but the data that we use have the same hierarchy and the joint names. Our data correspond with our skeleton model, but the joint names dier. We use a dictionary for correspondences between our bones and captured data joints. Because we set only rotations for the bone poses (this makes sense for human skeleton), it is not a problem if the imported data are in dierent scale. But the different scale between data and our skeleton model causes problem when we move the whole skeleton object. Therefore we compute the scale factor from known height of our skeleton and expected height of the captured object. We measure from the ankles and heads end site joint the expected height of the used object for capture. Then the change in location is scaled by this factor. After import, we go through the dictionary of bones and joints and congure our bones to be parallel with corresponding joint links in all frames. We set the bones so they have the same direction as the axis connecting the corresponding parent joint with the child joint. For the joints, whose rest pose of motion capture hierarchy dier from our rest pose (in our case the ankles), we set only the dierence rotation from the rest pose. For some bones of our skeleton we do not have any corresponding joints and then we let them unaected. The computation of the needed rotation of the bones is described in Chapter 9. We use the following dictionary (listed the Python code), the # in code denotes a commentary:
skelDict={ "sacroiliac":["Hips","Chest"], "vl5":["Chest","Neck"], "vl5_to_l_sternoclavicular":["Neck","LeftCollar"], "l_sternoclavicular":["LeftCollar","LeftShoulder"], "l_shoulder":["LeftShoulder","LeftElbow"], "l_elbow":["LeftElbow","LeftWrist"], "l_wrist":["LeftWrist","LeftWrist_end"], "vl5_to_r_sternoclavicular":["Neck","RightCollar"], "r_sternoclavicular":["RightCollar","RightShoulder"],

7.2. USAGE OF MOTION CAPTURED DATA

35

"r_shoulder":["RightShoulder","RightElbow"], "r_elbow":["RightElbow","RightWrist"], "r_wrist":["RightWrist","RightWrist_end"], "HumanoidRoot_to_l_hip":["Hips","LeftHip"], "l_hip":["LeftHip","LeftKnee"], "l_knee":["LeftKnee","LeftAnkle"], "l_ankle":["LeftAnkle","LeftAnkle_end"], "HumanoidRoot_to_r_hip":["Hips","RightHip"], "r_hip":["RightHip","RightKnee"], "r_knee":["RightKnee","RightAnkle"], "r_ankle":["RightAnkle","RightAnkle_end"], "vt3":["Head","Head_end"] } # the bones that will change relativelly onlyDif=[#"sacroiliac","vl5","vt3", "l_ankle","r_ankle", #"l_knee","r_knee", #"l_hip","r_hip", "HumanoidRoot_to_l_hip","HumanoidRoot_to_r_hip"]

Our interpretation of motion data can be improper. We do not know where the measured joints are exactly located on the human body. We can make only an approximate reconstruction of the movement. Despite this, we are able to create a realistic animation of the human movement. The errors with limb self-occlusion may occur during the movement, this is caused by dierent rest pose joint locations in data than in our model. However we are able to create a short animation quickly with captured data without manual positioning. You can see the results in Figure 7.1.

36

CHAPTER 7. ANIMATION

Figure 7.1: The images from animation sequence generated with BVH motion captured data. The model is without a texture.

Chapter 8

Description of the software package


We focused on using the open source software during the development. We used also Matlab to speed up the process. We used Matlab instead of more dicult Python implementation of some math functions like matrix decompositions and using other Python packages. The whole package structure is shown in Figure 8.1. In our work we used free third party data for mesh and motion. We designed our formats for storing our own data as camera congurations. We use Blender scene les as templates for Blender. Motion data are stored in BVH format as mentioned before and the mesh is exported from MakeHuman application into Maya .obj simple text format.

Figure 8.1: The structure of software package.

37

38

CHAPTER 8. DESCRIPTION OF THE SOFTWARE PACKAGE

8.1

Mesh data format

The data exported from MakeHuman into *.obj le are simple and easy readable. You can see the example bellow (shortened printout):
# MakeHuman OBJ File # http://www.dedalo-3d.com/ o human_mesh v -8.162358 0.737658 4.963252 v -8.179651 0.704912 4.979013 v -8.178391 0.704847 4.960827 v -8.180817 0.704855 4.918237 v -8.162949 0.671900 4.963268 v -8.161950 0.737917 4.945174 ... f 5740// 5749// 5761// 5758// f 5199// 5132// 5749// 5740// f 5206// 5748// 5749// 5132// f 5761// 5749// 5748// 5751// ...

It starts with objects name human mesh and continues with vertex denitions (starts with the v character). The last is the chapter of face denitions (starts with the f character). The faces are dened by indices of vertices. Faces can be triangles or quads (three or four vertex polygons).

8.2

Animation XML le

We used an XML le in our previous work for animation denition. As the Blender changed, some features are no further available and this le format is deprecated. Instead, the BVH les with motion data must be used in the case of bone driven animation. We show it here only for completeness. This approach can be still used, even though this is not expected. This le contains animation description and is parsed by Python scripts, see Figure 8.2 for example. The XML tags must correspond to Blenders data structure. The root tag must be animation. The recognized tags are: startframe First frame to be rendered. Parameters: n number of frame endframe Last frame to be rendered. Parameters: n number of frame frame This tag tells which frame is going to be congured. All these settings will be done in this frame. Blender automatically linearly interpolates object parameters between frames. Tag parameters:

8.3. CAMERA CONFIGURATION FILES n number of frame

39

object Which object will be set up. The size of object depends on its default size. Setting sx=2.0 will produce two times bigger object in x axis then the default, so object 2 units wide by default in x axis will be 4 units wide. Depending on the type of object the following parameters can be passed, but the name parameter is mandatory. name objects name in Blenders data structure px objects position in x axis py objects position in y axis pz objects position in z axis rx objects rotation around x axis in degrees ry objects rotation around y axis in degrees rz objects rotation around z axis in degrees sx objects size in x axis sy objects size in y axis sz objects size in z axis

8.3

Camera conguration les

The camera conguration les (with .cam extension) are used for storing the conguration of cameras in Blender. Note that camera in Blender captures in its negative z axis. Camera conguration le is a normal text le with parameters on each line: C=3.0,0.0,0.0 is the camera centre in x,y,z axis f=1 is focal plane distance R=80.0,0.0,80.0 is camera rotation around its x,y,z axis k=4:2 is aspect ration u:v size=600x800 is output image size resolution in pixels format=png is image output type format. Possible formats are: aviraw Uncompressed AVI les. AVI is a commonly used format on Windows platforms avijpeg AVI movie w/ Jpeg images avicodec AVI using win32 codec quicktime Quicktime movie (if enabled)

40

CHAPTER 8. DESCRIPTION OF THE SOFTWARE PACKAGE

<?xml version="1.0"?> <animation> We start with first frame. <startframe n="1"/> And the 30th frame will be last. <endframe n="30"/> Now we define the first frame. <frame n="1"> The order of setting in frame tag doesnt matter. Here we say, that we want object mySkeleton to be a 1.8 unit height. The mySkeleton object is the name of armature object in Blender data. <object name="mySkeleton" sx="1.8" sy="1.8" sz="1.8" /> Here we set the position at (0,0,0) <object name="mySkeleton" px="0.0" py="0.0" pz="0.0"> </object> </frame> <frame n="15"> <object name="mySkeleton" rz="0.0"> </object> </frame> <frame n="30"> The final position of object should be at (0,-1,0) and rotated around z axis by 45 degrees. <object name="mySkeleton" rz="45" py="-1.0"> </object> </frame> </animation>

Figure 8.2: Example xml animation le. (Deprecated) tga Targa les rawtga Raw Targa les png Png les bmp Bitmap les jpg Jpeg les hamx Hamx les iris Iris les iriz Iris + z-buer les ftype Ftype le

8.4

Exported animation

We also export vertex locations for testing for each frame and the bone poses as well. We use simple text le format. The les with vertex coordinates have extension .verts and contain coordinates in x,y,z order for each vertex on a separate line:
-0.2762540280819, 1.4156879186630, -0.5783573985100;

8.5. RUN PARAMETERS

41

-0.2774430513382, -0.2793728411198, -0.2778972089291, -0.2761918604374, ...

1.4150956869125, 1.4186201095581, 1.4197340011597, 1.4200913906097,

-0.5788146853447; -0.5797730088234; -0.5795810222626; -0.5792297720909;

These data can be used for validation of human body detection. For bones, we export the head and tail locations in the same x,y,z order:
l_knee.head=[0.0485088936985, 1.4538201093674, -0.0491226166487] l_knee.tail=[0.0510297790170, 1.0506756305695, -0.0797354504466] l_elbow.head=[0.1631074249744, 2.1514077186584, -0.1573766618967] l_elbow.tail=[0.1602035462856, 1.9171544313431, -0.0712721124291] vl5.head=[-0.0243827812374, 2.0036427974701, -0.2223045825958] vl5.tail=[-0.0497045181692, 2.4030768871307, -0.2594771981239] r_ankle.head=[-0.1345274895430, 1.1150679588318, 0.0820896327496] r_ankle.tail=[-0.1498874127865, 1.0803805589676, 0.3276234567165] l_wrist.head=[0.1602035462856, 1.9171544313431, -0.0712721124291] l_wrist.tail=[0.1632092595100, 1.7440707683563, -0.0174388140440] ...

The bone pose data can be used for testing the detected pose. The locations are absolute ( in world space) for both bones and vertices. Data are exported for each frame to a separate le.

8.5

Run parameters

Syntax to run Blender with Python script is:


% Windows SET bpypath=drive:\path_to_python_scripts blender_dir\blender.exe template.blend -P %bpypath%\rahat\run.py % Linux bpypath=scripts blender_dir/blender template.blend -P \\ $bpypath/rahat/run.py

This starts Blender in an interactive mode with opened template.blend le and runs the main script with simple GUI. Few unexpected events may occur, like the script window is hidden, or the default windows arrangement can be changed. This is caused by dierent settings in template.blend le. It is easy to switch to script window by icons. When you run other Blenders script meanwhile the main script is running, you may need to switch back to main script by choosing the active script for the script window. This can be done by the scroll button on the script window panel. We use Blenders GUI because some changes must be done manually and the whole package is still under testing. The functions from GUI can be easily rewritten to another script to automate the whole process of creating animations.

Chapter 9

Used mathematics and transformations


For row vectors we use math bold font (v) and for matrices the math true type font (M). The functions of Blender Python API (PoseBone.poseMatrix) are written with true type font. Denitions: O(bone) is the bones pose matrix 4 4. It is a transformation from bone space to the armature space. It is obtained by accessing the PoseBone.poseMatrix. M(object) is the objects transformation matrix 44 to the world space. It is obtained by calling Python function Object.getMatrix(). B(bone) is the bones transformation matrix 3 3. It describes the orientation of the bone in the rest pose. It is obtained by accessing the attribute Bone.matrix[BONESPACE]. A(bone) is the bones transformation matrix 4 4 to the armature space in the rest pose. It is obtained by accessing the attribute Bone.matrix[ARMATURESPACE]. P is the projection matrix 3 4. ~ is the projection matrix extended for row vector to be 4 4 shape. P

9.1

Blenders math

The Blender is based on OpenGL and thus accept also the OpenGLs data format and transformations. The matrices are stored in the OpenGL columnmajor order which diers from standard C programming language interpretation of arrays. This causes that the matrices are transposed and the multiplying order is changed, the vectors are transposed (row vectors are transposed to column vectors) as well. The matrix parameters array [a1 , a2 , . . .] 42

9.2. BLENDER EULER ROTATIONS represent the following matrix:

43

M=

a1 a2 a3 a4

a5 a9 a6 a10 a7 a11 a8 a12

a13 a14 a15 a16

(9.1)

This causes that a translation matrix in Blender has the following shape:

T=

1 0 0 0 1 0 0 0 1 tx ty tz

0 0 0 1

(9.2)

The following order must be presented to transform a vector by Blenders matrix: (9.3) v =v M . This could be transposed but Blenders Python API uses this matrix array representation and the vectors in Blender are row vectors as default (Blender math functions work improper with column vectors). Therefore we follow this in the text whenever the Blender related transformations appear. Elsewhere we use the common notation used in computer vision books like [5]. The coordinates in Blender can be relative to parent objects or absolute. The absolute coordinates are in the world space. The local coordinates are relative to the objects origin or to its parent. This space is called local space. The space type must be specied for example if Blender Python API functions are used: Object.getLocation(space) or Object.getEuler(space). Besides that, Blender also recognizes the armature space, the bone space and the pose space. The coordinates in the armature space are relative to the armatures object origin. The coordinates in the bone space are relative to bone heads. Blender denotes the bone joints as the bone head and the bone tail. The bone space is dened by the bone conguration. The armature space is used to dene rest bone positions. The pose space is used for vertex transformations from armature space to the pose space. While the armature space vertex locations dene the rest pose, the pose space vertex locations dene new pose.

9.2

Blender Euler rotations

Blender uses Euler rotations to express the object rotation in the world space. The Blender Euler angles are in degrees (this corresponds to Blender Python Euler object), but rotation angles used in object parameters are in

44 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS radians (this corresponds to Blender Python Object.Rot* properties). We describe here how to transform Blender Euler angles to a rotation matrix. The Euler angles suer from drawbacks as gimbal-lock. The computation of rotation matrix is important mainly for computing the camera projective matrix. Assume that we already converted angles to radians. The , , angles are rotations around x,y,z axes. The nal rotation matrix can be composed as: (9.4) R(, , ) = Rz () Ry () Rx () . The rotation matrices around the individual axes are simple: 1 0 0 Rx = 0 cos sin , 0 sin cos cos 0 sin 0 1 0 , Ry = sin 0 cos cos sin 0 Rz = sin cos 0 . 0 0 1 The nal matrix is: R(, , ) =

cos() cos() sin() sin() cos() + cos() sin() sin() cos() cos() + sin() sin() sin() cos() sin() sin() sin() + cos() sin() cos() cos() sin() + sin() sin() cos() cos() cos()

(9.5)

(9.6)

(9.7)

sin() cos()

(9.8) The backward decomposition is more complicated and is not unique. Let R be a general rotation matrix with following parameters: a b c R(, , ) = d e f . g h i We can write using the result from (9.8) that: sin() = g, cos() = 1 g 2 . (9.10)

(9.9)

The sign in cos() can be also positive, but we search only one solution. When we express sin(), cos() we get: sin() = h , cos() = i 2 . 1g 1 g2 (9.11)

9.3. BLENDER CAMERA MODEL

45

Substitution of sin(), cos(), sin(), cos() back to rotation matrix gives such identity:

a b c cos() 1 g 2 . . . . . . 2 ... ... = d e f . sin() 1 g g h i g h i So sin(), cos() can be expressed as: sin() = d , cos() = a 2 . 1g 1 g2

(9.12)

(9.13)

This is correct if g = 1, otherwise the rotation matrix is degenerated: 0 x y R(, , ) = 0 y x = 1 0 0

0 0 1 sin() cos() cos() cos() 0 cos() sin() sin() sin() sin() sin() cos() sin() 0 cos() cos() sin() cos()

(9.14)

and we get a 2 equations for 4 variables. We can choose for example as: sin() = 0, cos() = 1 and express the as: sin() = b, cos() = e . (9.16) (9.15)

The lines above describe how we compute and decompose the rotation matrix composed of Blender Euler angles. This composition corresponds to Blender source code of the rotation matrix. Note that Blender uses transposed matrices, so a matrix obtained from Blender Python functions is transposed to the matrix (9.8) as well.

9.3

Blender camera model

The camera in Blender can be orthographic or perspective. We need only perspective camera for our purposes because this is a closest approximation to real pinhole camera. Projections in Blender are based on OpenGL so all the transformations are similar to OpenGL viewing transformations. To be able to describe the camera and to compute the projection matrix, we must understand the dierences between the OpenGL and the computer vision camera models. More about camera projections can be found in the book [5] by Hartley and Zisserman.

46 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS The basic camera parameters in Blender are location and rotation. All these parameters are related to the world space (as all objects in Blender scene). These parameters do not transform world coordinates to camera coordinates but reversely. The inverse transformation must be used to transform world coordinates to a camera space. The biggest dierence between Blender and computer vision is Blender camera view which is in negative z axis (it is common for OpenGL projections), see Figure 9.1, where C is the camera center. Cameras in Blender have also specied clipping plane for the closest and the most far visible objects. This does not have inuence to the camera model, except that objects out of this clipping plane are not viewed.

Figure 9.1: A common OpenGL camera model used for perspective camera in Blender.

To avoid possible confusions, we dene the used notation and conventions. First, we expect a pinhole camera model as shown in Figure 9.2. Note that camera axes dier from OpenGL model shown in Figure 9.1. The objects captured by this camera are projected into the image plane (x, y axes), see Figure 9.3. The image is rasterized to pixels (u, v axes). Here we use the coordinates as for indexing the matrices of stored images. The row is rst and the column is second index. The origin is at the top left corner. The origin of the image plane is shifted by oset u0 , v0 (the principal point) from the origin of the image. In the center of the image plane is the camera optical axis. The f parameter in Figure 9.2 is the camera focal length, this corresponds to lens parameter of Blender camera in Figure 3.4. We can write transformation between parameters as: f= lens . 16.0 (9.17)

9.3. BLENDER CAMERA MODEL

47

Figure 9.2: A pinhole camera model.

Figure 9.3: Image plane (x,y axes ) an the indexing of the image pixels (u,v).

We need to estimate the projection matrix which is often used in computer vision. The projection matrix describes the projection from world space to the image pixels. The rst step is transforming the world coordinates into the camera space. Because we know the camera location C and camera rotation in world space, we can transform any point into camera space. We know the Blender camera objects Euler angles , , , so we can compute the rotation matrix. The rotation to camera space is inverse. There is also another rotation of the axes between the Blender camera model and the pinhole model, see Figure 9.1 and Figure 9.2. This is caused by dierent axes orientation. This rotation can be written as: 0 1 0 T= 1 0 0 . 0 0 1 The rotation from world space to camera space is: R = T R(, , ) . (9.19)

(9.18)

48 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS The C is the vector of camera location in the world space. The whole transformation of any homogenous vector xw = [xw , yw , zw , 1] to the camera space can be written as: (9.20) xc = [R, C]xw , where the xc = [xc , yc , zc ] . This vector is projected on the image plane as:

x f 0 0 y = 0 f 0 xc 0 0 1

(9.21)

where f is the focal length. Transformation from normalized coordinates to pixels is the following:

x u mu 0 u0 mv v0 y v = 0 0 0 1 1 1 and so we can write the Blender camera projection matrix as: P = K[R, C]

(9.22)

(9.23)

where the K is known as the calibration matrix and can be written as: mu f 0 K= 0

0 mv f 0

u0 v0 . 1

(9.24)

The computation of the parameters of K matrix is follows: u0 = height , 2 width v0 = . 2

(9.25)

Blender chooses the axis with maximum resolution and the other axis is scaled if the aspect ratio is not 1 : 1. The coecients mu , mv can be estimated as: m = max(width kv , height ku ) m mu = 2ku m mv = 2kv (9.26) We use this code in our function:

9.4. USING PROJECTION MATRICES WITH OPENGL if (width kv > height ku ): mv = width 2 kv mu = mv k u else: mu = height 2 mv = mu k u kv

49

where the ku : kv is the aspect ratio height : width dened in Blender render parameters. The projection matrices are then exported into simple text les and can be used for computer vision algorithms.

9.4

Using projection matrices with OpenGL

In the previous chapter we derived the projection matrix of Blender perspective camera. We need also to use the projection matrix with OpenGL for rendering into images captured by real cameras. Instead of using OpenGL utility functions, we must set the own OpenGL view transformations in order to get real camera projection transformations. Before we describing transformations, we do a quick overview of OpenGL. OpenGL works almost always with homogennous vector coordinates and with 4 4 matrices. We use here the OpenGL terminology. Object coordinates of any point are transformed by: 1. MODELVIEW matrix to eye coordinates, which we can understand in the computer vision analogy as coordinates in the camera space. This corresponds to Blender transformations from objects local space to world space and then to camera local space. 2. PROJECTION matrix transforms eye coordinates to clip coordinates. These coordinates are clipped in the viewing frustum. 3. The clip coordinates are transformed (divided) by perspective division to a normalized device coordinates. These coordinates are clamped in interval < 1, 1 >. 4. Finally, the VIEWPORT transformation is applied to transform the normalized device coordinates to window (pixel) coordinates. All the matrices can be set explicitly, but it is recommended to use OpenGL utility functions. We will write Mv for MODELVIEW matrix and Pj for PROJECTION matrix. The coordinates [Xo , Yo , Zo ] of any point (the

50 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS object coordinates) are transformed to eye coordinates as:

Xe Ye Ze We

= Mv

Xo Yo Zo 1

(9.27)

then to clip coordinates:


Xc Yc Zc Wc

= Pj

Xe Ye Ze We

(9.28)

Finally the normalized device coordinates:

Xn Yn = Zn

X c Wc Yc Wc .
Zc Wc

(9.29)

Note that these coordinates are in interval < 1, 1 >. We set the default VIEWPORT transformations by OpenGL function glViewPort(0,0,width,height) We then get the window coordinates as follows:

width/2 Xn + width/2 x y = height/2 Yn + height/2 . Zn /2 + 1/2 z

(9.30)

The OpenGL window (image) coordinates x, y have the origin in the bottom left corner (this diers from our indexing of image as shown in Figure 9.3). During rasterizing all the three coordinates x, y, z are used. The last coordinate z is stored in z-buer for visibility test. You can see on the Figure 9.4 how are the normalized device coordinates displayed in the nal window (we can understand OpenGL window as the image, because we usually render into images). Figure 9.5 shows the image coordinates used by our pinhole camera model u, v and by OpenGL x, y. We want to set the OpenGL transformations so the projected coordinates in the window x, y image will correspond to the image pixel coordinates u, v. We want to dene the same projection as the real camera has. We use the real cameras projection matrix. The point with world coordinates [Xw , Yw , Zw ] must have the same location in the OpenGL window (image) as in the captured image (see Figure 9.5). The equation is: x y = v height u . (9.31)

9.4. USING PROJECTION MATRICES WITH OPENGL

51

Figure 9.4: The normalized coordinates of OpenGL devices and their rasterization in the window (image).

To be able to use a projection matrix P of size 3 4 with OpenGL, we can extend it by a row vector [0, 0, 0, 1]to size 4 4: ~= P to get the transformation:

P 0 0 0 1

(9.32)

u v w 1

~ = P

Xw Yw Zw 1

(9.33)

We need to nd a matrix T, which we use to multiply the ~ matrix in order P to get the eye coordinates. The eye coordinates must nally project on the same location in the image as if a point was captured by a real camera:

Xe Ye Ze 1

= T

u v w 1

ax bx cx dx u v ay by cy dy az bz cz dz w 0 0 0 1 1

(9.34)

52 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS

Figure 9.5: Pixel coordinates of a pinhole camera model are on the left image, pixel coordinates of the OpenGL window (image) are on the right.

which gives a set of equations: Xe = ax u + bx v + cx w + dx Ye = ay u + by v + cy w + dy Ze = az u + bz v + cz w + dz Before we continue, we dene the PROJECTION and VIEWPORT transformations. For the VIEWPORT transformation, we will use the transformation dened in (9.30). It is a common way of setting OpenGL viewport. We will set the PROJECTION transformation matrix Pj to this shape:

2 width

(9.35)

0
2 height

Pj =

0 0 0

0 0
2 f n

0 0

f +n , f n

0 0

(9.36)

where n, f is respectively the nearest and the furthermost value of the viewing distance. Note that the eye coordinates [Xe , Ye , Ze , 1] will be mapped to clip coordinates Xe 2 2 2Ze f n , Ye , , Ze width height f n . (9.37)

After perspective division we get the normalized coordinates: 2 Ye 2 2Ze f n Xe + , + , Ze width Ze height Ze (f n) Xe 2 Ye 2 2(Ze n) + , + , 1 Ze width Ze height f n = . (9.38)

9.4. USING PROJECTION MATRICES WITH OPENGL

53

If Ze = n then the window coordinate z is z = 1. If Ze = f then z = 1. In the x, y window coordinates we get the eye coordinates divided by distance Ze and shifted. So the origin is in the middle of the window (image). We get these equations: Xe width + Ze 2 Ye height y= + Ze 2 x= =
v w u w

= height

(9.39)

further we can form a set of equations using (9.35): ax u + bx v + cx w + dx az u + bz v + cz w + dz ay u + by v + cy w + dy az u + bz v + cz w + dz The one solution can be: width , dx = 0 2 height ay = 1, by = 0, cy = , dy = 0 2 az = 0, bz = 0, cz = 1, dz = 0 ax = 0, bx = 1, cx = and we can write the T matrix as:

= =

v w

width 2

height 2

u w

(9.40)

(9.41)

T=

0 1 0 0

1 width/2 0 0 height/2 0 0 1 0 0 0 1

(9.42)

This applies if we use the transformations (9.36) and (9.30). The sequence of proper OpenGL commands for setting the described transformations looks as follows: glViewPort(0,0,width,height) glMatrixMode(GL PROJECTION) glLoadIdentity() glMultMatrixd( Pj ) glMatrixMode(GL MODELVIEW) glLoadIdentity() glMultMatrixd( T) glMultMatrixd( ~) P Note that matrices in OpenGL are stored in a dierent format than some programming languages use. We use the ~ matrix multiplied by our T matrix P (9.42) as OpenGL MODELVIEW matrix and we use our Pj matrix (9.36) to dene OpenGL PROJECTION matrix.

54 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS

9.5

Bone transformations

Lot of developers working with Blender were disappointed by changes in Blender 2.4 armature source code and therefore Blender developers published a schema to explain how the armature deformations work in Blender. You can see it in Figure 9.6. This describes the code and data structures but it does not say anything how this relates to the Blender Python API and how to compose the matrices. We describe how we compute these matrices shown in the schema in Figure 9.6. The vectors bone.head, bone.tail, used in this chapter, are row vectors, locations [x, y, z] of the bone head and tail. We pursue here the Blender Python notation so the equations correspond to the Python code. First we need to describe computation of the quaternions. We can compute the normal vector n of two normalized vectors u, v as n=uv and the angle between the vectors as: = arccos(u v) . Then the quaternion of rotation u to v is q = cos , n sin 2 2 . (9.45) (9.44) (9.43)

We then use the Blender Python functions to convert the quaternions to rotation matrices. Here we will write Rq (quaternion) , (9.46)

by this we mean a rotation matrix composed from quaternion. The bone matrix in bone space is 3 3. It describes the orientation of the bone. It is dened by the location of the bones head and tail and by the roll parameter which rotates the bone around its local y axis. We compute the bone matrix in bone space B(bone) = R(0, bone.roll, 0) Rq (q) , (9.47)

where R is the matrix from equation (9.8) and q is the quaternion composed as: v= bone.tail bone.head bone.tail bone.head n = [0, 1, 0] v = arccos(u v) q = cos , n sin 2 2

(9.48)

9.6. VERTEX DEFORMATIONS BY BONES

55

We write R(0, bone.roll, 0) because the Blender Python API works with translated matrices and we use Blender functions for matrix operations. In order to get the bone matrix in the armature space we must extend the bone local space matrix to 44 and include the parent bone transformations. If the bone has not any parent bone we can express the bone matrix in armature space: A(bone) = B(bone) 0 0 0 0 1 I 0 bone.head 1 , (9.49)

where I is the identity matrix 3 3 and 0 is the zero vector. If the bone has parent the estimation of the bone matrix in the armature space is dierent: A(bone) = 0 ... B(bone) 0 0 0 1 I 0 bone.head 1 ... (9.50)

I 0 0 parent.length 0 1

A(parent) .

The parent.length is the length of the parent bone, and can be computed as parent.tail parent.head . Finally, the bone matrix in the pose space is distinguished from the matrix in armature space by the quaternion pose.rot and by the translation row vector pose.loc which sets the bone to the new pose. The quaternions are used to set the bone rotations. If the bone does not have a parent, we can express its pose matrix: O(bone) = Rq (pose.rot) I 0 pose.loc 1 A(bone) . (9.51)

If the bone has parent, then the evaluation of the matrix is: O(bone) = Rq (pose.rot) ... I 0 bone.head 1 I 0 pose.loc 1 0 B(bone) 0 0 0 1 ...

I 0 0 parent.length 0 1

O(parent) . (9.52)

The matrices described above are used to compute the bones location and pose. We only revised the Blender documentation which is less specic on how the matrices are computed. The description that we gave here refer to Figure 9.6, but describes only the basic deformations that we used in our work. This is enough for our purposes.

9.6

Vertex deformations by bones

We describe here vertex transformations of any mesh object by armature bones. We describe only deformation caused by bones using vertex groups

56 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS and weights for vertices. The deformation by envelopes or combination of vertex groups with envelopes is more complicated. The armature object is a parent to a mesh or in newer Blenders philosophy it is a mesh modier. In our work, each vertex has its own weight that denes the inuence of the bone. The vertex can be member of several vertex groups (we set a limit of 2 groups). The vertex group only denes correspondence with bone. Each group is transformed only by this bone (note that the vertex groups in Blender must have the same names as the bones to which they belong) but the vertex can be member of several groups. The nal local space vertex location can be also obtained using NMesh.GetRawFromObject(name) Blender Python function. Assume we have a vertex local space location as vector: vl = [x, y, z, 1] . The world space vertex location is vw = vl M(object) , (9.54) (9.53)

where the M(object) is the object transformation matrix. The vertex is member of this Mesh object. The vertex location in the parents armature space is va = vw M1 (armature) . (9.55) Now for each vertex group, where the vertex is member, we compute the weighted vector of relative change to the vertex rest position. The vertex location in the bone space is: vbi = va A1 (bonei ) . (9.56)

The vertex location in the armature space after applying the pose transformation is vpi = vbi O(bonei ) . (9.57) The weighted dierence of the new location is (use only x,y,z parameters)
1..3 1..3 d1..3 = (vpi va )) weighti . i

(9.58)

Note that weight is in interval < 0, 1 >. The nal vertex location in the armature space is di va = va + . (9.59) weighti The vertex location for a new pose is vl = va M(armature) M1 (object) , (9.60)

this is the nal vertex local space location after deformations. We found the transformations by studying the Blender source code. This is not complete description of Blender bone transformations for vertices but it completely describes the transformations which we use.

9.7. CONFIGURING THE BONES

57

9.7

Conguring the bones

When we animate our model, we do not change the bone joint locations or the bone sizes. We only rotate the bones to get the desired pose. As described in the Chapter 7, we use the correspondences between the empty objects (created by importing the BVH animation data) and our bones. We must estimate the rotation (expressed as quaternion) which rotates the bone from its rest pose to the new pose. The bone y axis must be parallel in the new pose to the axis connecting the corresponding empty objects. We can compute the bones head and tail locations in the rest pose as [bone.head, 1] = [0, 0, 0, 1] Orest (bone) [bone.tail, 1] = [0, bone.length, 0, 1] Orest (bone) . (9.61)

We know the locations of the corresponding empty objects. We will denote them empty.head, empty.tail. We need such transformation Q (rotation), which will transform the vector of head, tail to be parallel to vector of empty objects: [bone.headq , 1] = [0, 0, 0, 1] Q Orest (bone) [bone.tailq , 1] = [0, bone.length, 0, 1] Q Orest (bone) . The following equality must be valid: bone.tailq bone.headq empty.tail empty.head = bone.tailq bone.headq empty.tail empty.head . (9.63) (9.62)

Because the sought transformation, is only a rotation, we can omit the size of the vectors and write: [0, bone.length, 0, 1] Q = (empty.tailempty.head)O1 (bone) . (9.64) rest We compute the vector x x = (empty.tail empty.head) O1 (bone) rest and compute the quaternion q which denes the transformation Q: v= x x (9.65)

n = [0, 1, 0] v = arccos(u v) q = cos , n sin . 2 2

(9.66)

58 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS We use this quaternion in bone objects parameter to dene the rotation. This quaternion object can be used directly with bones without any transformation to rotation matrix. If the rotation should be relative to the empty object locations, we then use: empty.tail empty.head empty.tail empty.head empty.tailrest empty.headrest u= empty.tailrest empty.headrest n=uv , v=

(9.67)

where the empty.tailrest , empty.headrest are locations of the empty objects in the animations rst frame. The locations in the rst frame we refer as the default locations.

9.8

Idealizing the calibrated camera

Because we did not write own visualization for tting the model into the camera views, we must use the Blender camera views. The problem is that the Blender camera intrinsic parameters are ideal and the real camera can not be simulated in Blender. We need also to separate the extrinsic parameters of the calibrated camera to be able to congure the Blender camera to the same view. We can decompose the rst three columns of projection camera matrix using the RQ decomposition as: P1..3 = K R (9.68)

and get the translation vector from multiplying the inverse calibration matrix by last row of projection matrix: t = K1 P4 . The camera center in the world coordinates is: C=R t . (9.70) (9.69)

The camera center and the rotation matrix is enough to dene Blender camera location and orientation but the intrinsic parameters of the camera can not be obtained so easily. The general calibration camera matrix is: K= 0 0 0

(9.71)

This is dierent if compared with the matrix from equation (9.24). We omit the skew in calibration matrix and estimate the parameters of the Blender

9.8. IDEALIZING THE CALIBRATED CAMERA

59

camera so the nal transformation ts the real camera parameters. We can compute the aspect ratio K(1, 1) ku = (9.72) kv K(2, 2) and the focal length f= K(1, 1) . 2 max(width, height) (9.73)

We can not set the image pixel coordinates origin oset in the calibration matrix of Blender camera. But we can shift the image origin to match the Blender camera calibration matrix. We compute the needed oset as: shift = width height , 2 2 K3 1..2 (9.74)

The computed shift was up to 50 pixels in our data. It is apparent that without idealizing and shifting the images it is impossible to t the object into views using the Blender camera views.

60 CHAPTER 9. USED MATHEMATICS AND TRANSFORMATIONS

Figure 9.6: The schema of Blender bones structure and transformations, describing how the armatures in Blender work. This Figure was obtained from Blender web site [1].

Chapter 10

The package usage


In this chapter we show how to use the software package. We show the possibilities in creating an animation. We use the Blender in interactive mode with own user interface due to need for debugging and testing the Python code. This also helps to understand the procedure of creating an animation. Unfortunately this requires at least a minimum knowledge of Blender usage. The objects in the Blender scene must be selected by right mouse button click before an operation is chosen. The operation with scene objects can be attaching the mesh to skeleton or copying BVH animation. We describe here a quick tour of creating an animation. We expect these inputs to be already obtained: Exported MakeHuman mesh into *.obj le. Captured images from calibrated video together with decomposed and idealized camera congurations in *.cam les. For this see the Section 9.8 in Chapter 9. For the *.cam le format see the section 8.3 in Chapter 8. The BVH motion captured data. Note that some BVH les may not be compatible with our scripts.

If you run the Blender with parameters given in section 8.5 in Chapter 8, you will get the screen shown in Figure 10.1. It is possible to access the functions from our scripts through the buttons in the right panel. It is easy to write own Python scripts to automate the whole process instead of using user interface and clicking on buttons. Using the buttons is more educative for learning and understanding the usage and functions of our package. 61

62

CHAPTER 10. THE PACKAGE USAGE

Figure 10.1: The Blender window after start of run.py script. Buttons with our functions are on the right ( import and export, conguration and export of cameras, mesh import and animation etc.).

10.1

Importing mesh and attaching to the skeleton

It is easy to import any mesh. You just choose the Import MH Mesh button and select the proper le to import. The window may need to be redrawn after import. You can resize the window or zoom it in order to redraw. The imported mesh should appear. The mesh should be also selected (highlighted). If not, it must be selected by clicking with right mouse button. At the moment after import, the mesh is bigger and in dierent orientation than we generate the skeleton. The button Fit to Skeleton must be used in order to align mesh with the skeleton. The window may need to be redrawn again. The skeleton can be now generated using the Generate Skeleton button. Now mesh can be attached to the skeleton. Both the mesh and the skeleton object have to be selected. This can be done by holding the Shift key and clicking with right mouse button on the objects. The subwindow should look like in Figure 10.2. Then the Attach Skeleton to Mesh button can be used. Now the mesh was attached to skeleton. To test the articulation, press Ctrl+Tab and switch to the pose mode. Select a bone with the right mouse button and press the R key. The bone will start rotating. Switching back to the object mode is done again by Ctrl+Tab. After that, the mesh model is prepared for articulation and can be tted into camera

10.2. FITTING THE MODEL INTO VIEWS AND TEXTURING

63

Figure 10.2: The Blender window with skeleton and mesh object selected.

views in order to cover the model with textures.

10.2

Fitting the model into views and texturing

We must admit it is practically impossible to get precise results by manual model tting into the specic pose in camera views. Nevertheless we did not write any function for automatic tting. We can t roughly the model into the desired object in camera views using the Blender interface. We can move, scale, rotate our model and bones as well, in order to match the images. The tting process is the following. 1. Load the images into the Blender. 2. Switch the 3D View window to the UV/Image editor window by selecting an icon from the menu. (The rst icon in the list). 3. Through menu choose Image and Open to load the image into Blender. After loading all the images we can set the cameras in the scene. 1. We switch the window back to 3D View. 2. From the script window choose the button Load and Setup Camera and select the proper *.cam le. 3. Then from the 3D View window panel choose the View and Camera to get the view from the current camera.

64

CHAPTER 10. THE PACKAGE USAGE 4. Enable the background image in the camera view from the panel by choosing the View and Background image. Enable it and select the corresponding image with the current camera. 5. Split the window by clicking the middle mouse button on the window border and select the Split area. 6. In the one of the new windows, unlock the camera view by clicking on the icon with lock (the icon must be unlocked) and in the second window lock the view by the same icon. 7. Load the next camera, select the camera by Get next camera button. Continue from the beginning to set the view to camera view and to set the background image.

All the cameras can be processed by the process listed above. After conguring the cameras in the Blender, we can t our model into views using rotation (the R key), moving (the G key) and scaling (the S key) in both object and pose mode. You can see the results in Figure 6.1 on page 26. Before attaching images as textures, we need to export camera projection matrices by Export Camera Projection Matrix. These matrices are used to dene our Blender camera projections to images. We then select an image in the UV/Image editor window and assign the projection matrix by Assign P matrix to Image button. After all, we attach the images as textures by Attach Images As Texture button. The model is then texturized from Blender camera views by loaded images.

10.3

Loading animation, rendering and animation data export

We load the BVH les using the Blenders import script for BVH motion captured data. It can be accessed through the Blender menu by selecting File, Import, Motion Capture (.bvh) (we recommend to set the script scale parameter to 0.1 for better visualization). This script imports BVH data and creates empty objects connected in the same hierarchy as the BVH data joints. This script sometimes badly imports joint names. We advice to check the empty object names and correct them if it is needed (this can be done by pressing the N key and by rewriting the name). The running of the other script swaps the active script in the script window and our script can disappear. It can be recovered by choosing our script from the scripts list on the script windows panel. If the armature object is selected, the Copy BVH movement button can be pressed to copy imported animation. Important is setting of the start and the end frame in the Buttons window in the anim scene panel.

10.3. LOADING ANIMATION, RENDERING AND ANIMATION DATA EXPORT65 To render the animation for each camera, the Render for each camera button can be used. The button Export animation can be used for exporting vertex locations through the frames and for exporting the bone poses as well. All the other Blender features like lights, diuse rendering, shadow casting etc. can be also used to get better results. These features are well documented in Blender Manual [2].

Chapter 11

Results and conclusions


We extended Blender functionality for the needs of computer vision. We can generate a wide variety of real human animations from motion captured data. We can use outputs of multi-camera system to adapt our models to real observations. We created a general skeleton model which ts mesh models created by MakeHuman software. We followed the H-Anim standard and we adapted the recommendations for use with BVH motion captured data. We dened the joint locations and the level of articulation. We use 22 virtual bones, but only 19 of them are intended for articulation. The skeleton model is shown in Figure 4.1 (page 18). Together with MakeHuman mesh model and our general skeleton model we created the articulated model. We attached automatically mesh vertices to the bones using our own algorithm. Articulated models in new poses are shown in Figure 5.5 (on page 24). We setup calibrated multi-camera system for acquiring the images for textures. We manually tted the articulated model using Blender interface into camera views. The acquired images were idealized to match the Blender camera model which have principal axis in the center of the image. We then mapped the textures onto the mesh faces taking into account faces visibility in camera views. The textured articulated model is shown in Figure 6.2 (on page 31). For animation we used motion captured data stored in BVH format. We imported data into Blender and copied the animation using dictionary of correspondences between our skeleton model bones and the motion data joints. The animation example is shown in Figure 7.1 (page 36).

11.1

Conclusions

We developed a software package that can be used for quickly creating animation. Most of the process tasks can be done automatically or can be 66

11.1. CONCLUSIONS

67

fully automated by scripting. We used the open source software which is free to use. Using the modeling software Blender, we can extend the animated scene for objects. We developed a platform which can be extended and improved by more advanced algorithms for texturing or model creation. The main goal was to create a tool by which we can extend the robustness of our tracking algorithms. Outputs can be used for learning human silhouettes from dierent views, tting the articulated model into observations or just testing the algorithms on ground-truth data. Our models can not compete with virtual actors used in movies, but they are easy to create and animate. The steps that we used during creating an animation of the articulated model are: Obtaining model representation (mesh model) Dening the virtual skeleton Setting the textures Animation using motion captured data The reuse of the motion captured data with our model is not easy. We did not use any constraints for bone articulation which could help to bound the bones in valid poses. The model limbs can penetrate each other during the animation. This error is caused by improper interpretation of motion data. However, the motion data allow us to create animations with realistic human motions. We can recommend the following for the further development : The mesh model should be adapted to the real object captured by cameras. So the shape of the model will match the objects shape. The texturing of the model can be done with less distortion caused by over or under tting the model into the object. The H-Anim standard corresponds with the human anatomy. The HAnim dened human body feature points would be used for automatic model tting into views. It would be better to use own captured motion data with exactly dened joint locations on the human body. Constrains for joint rotations should be also used. We covered the steps of creating the articulated model animations. We described the needed transformations between computer vision and computer graphics. The developed software can be used anywhere the simple realistic human animations are needed.

Bibliography
[1] Blender. Open Source 3D Graphics modelling, animation, creation software. http://www.blender.org,. [2] Blender documentation. Blender online wikipedia documentation. http://mediawiki.blender.org. [3] M. Dimitrijevic, V. Lepetit, and P. Fua. Human body pose recognition using spatio-temporal templates. In ICCV workshop on Modeling People and Human Interaction, Beijing, China, October 2005. [4] Humanoid animation (h-anim) standard. ISO/IEC FCD 19774:200x. http://www.h-anim.org. [5] Richard Hartley and Andrew Zisserman. Multiple view geometry in computer vision. Cambridge University, Cambridge, 2nd edition, 2003. [6] L. Herda, R. Urtasun, and P. Fua. Hierarchical implicit surface joint limits for human body tracking. Computer Vision and Image Understanding, 99(2):189209, 2005. [7] Makehuman (c). The human model parametric modeling application. http://www.makehuman.org. [8] Ondej Mazan and Tom Svoboda. Design of realistic articulated r y as human model and its animation. Research Report K33321/06, CTU CMP200602, Department of Cybernetics, Faculty of Electrical Engineering Czech Technical University, Prague, Czech Republic, January 2006. [9] W. Niem and H. Broszio. Mapping texture from multiple camera views onto 3D-Object models for computer animation. In Proceedings of the International Workshop on Stereoscopic and Three Dimensional Imaging, 1995. [10] Python. Interpreted, objective-oriented programming language. http://www.python.org. 68

BIBLIOGRAPHY

69

[11] Mark Segal, Kurt Akeley, and Jon Leach (ed). The OpenGL Graphics System: A Specication. SGI, 2004. [12] Jonathan Starck and Adrian Hilton. Model-based multiple view reconstruction of people. In International Conference on Computer Vision, ICCV, volume 02, pages 915922, Los Alamitos, CA, USA, 2003. IEEE Computer Society. [13] R. Urtasun, D. J. Fleet, A. Hertzmann, and P. Fua. Priors for people tracking from small training sets. In International Conference in Computer Vision, pages 403410, October 2005. [14] Karel Zimmermann, Tom Svoboda, and Ji Matas. Multiview 3D as r tracking with an incrementally constructed 3D model. In Third International Symposium on 3D Data Processing, Visualization and Transmission (3DPVT), Piscataway, USA, June 2006. University of North Carolina, IEEE Computer Society.

You might also like