Simulation Photoréaliste de L'éclairage en Synthèse D'images

Simulation Photoréaliste de l’éclairage en Synthèse
d’Images
Nicolas Holzschuch
To cite this version:

Nicolas Holzschuch. Simulation Photoréaliste de l’éclairage en Synthèse d’Images. Computer Science
[cs]. Université Joseph-Fourier - Grenoble I, 2007. �tel-00379199�
HAL Id: tel-00379199

https://tel.archives-ouvertes.fr/tel-00379199
Submitted on 27 Apr 2009
HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
Simulation Photoréaliste de
l’Éclairage en Synthèse d’Images
Nicolas Holzschuch
Mémoire présenté pour l’obtention de l’Habilitation à Diriger des Recherches de

l’Université Joseph Fourier — Grenoble I, Spécialité Mathématiques et Informa-
tique
Composition du jury :
Sumanta Pattanaik Rapporteur
Bernard Péroche Rapporteur
Hans-Peter Seidel Rapporteur
George-Pierre Bonneau Président
George Drettakis Examinateur
Claude Puech Examinateur
François Sillion Examinateur
Habilitation préparée au sein de l’équipe ARTIS du laboratoire LJK. Le LJK est l’UMR 5224, un
laboratoire commun au CNRS, à l’INPG, à l’Université Joseph Fourier et à l’Université Pierre
Mendès-France. ARTIS est une équipe du LJK et un projet de l’INRIA Rhône-Alpes.
2
3
À Myriam,
À Henry, Erik et Lena,
ma famille qui me supporte... et qui me soutient.
4
Table des matières
1 Introduction 9
1.1 Structure du mémoire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Modélisation multi-échelles de l’éclairage 11

2.1 La méthode de radiosité hiérarchique . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Analyse de l’algorithme de radiosité hiérarchique . . . . . . . . . . . . . . . . . 13
2.3 Efficacité de la hiérarchie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.4 Ondelettes d’ordre élevé . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4.1 Améliorations de l’algorithme . . . . . . . . . . . . . . . . . . . . . . . 18
2.4.2 Maillage de discontinuités et ondelettes d’ordre élevé . . . . . . . . . . . 20
2.4.3 Radiosité spatio-temporelle . . . . . . . . . . . . . . . . . . . . . . . . 21
2.5 Structure de la scène . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.7 Articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7.1 Liste des articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.7.2 An efficient progressive refinement strategy for hierarchical radiosity
(EGWR ’94) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7.3 Wavelet Radiosity on Arbitrary planar surfaces (EGWR 2000) . . . . . . 42
2.7.4 A novel approach makes higher order wavelets really efficient for radio-
sity (EG 2000) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.7.5 Combining higher-order wavelets and discontinuity meshing: a compact
representation for radiosity (EGSR 2004) . . . . . . . . . . . . . . . . . 68
2.7.6 Space-time hierarchical radiosity with clustering and higher-order wave-
lets (CGF 2004) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
2.7.7 Accurate detection of symmetries in 3D shapes (TOG 2006) . . . . . . . 96
3 Propriétés de la fonction d’éclairage 123

3.1 Étude des dérivées de la fonction d’éclairage et applications . . . . . . . . . . . 123
3.1.1 Dérivées de la fonction de radiosité . . . . . . . . . . . . . . . . . . . . 123
3.1.2 Contrôle de l’erreur lors de la simulation . . . . . . . . . . . . . . . . . 124
3.2 Étude fréquentielle de la fonction d’éclairage . . . . . . . . . . . . . . . . . . . 124
3.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
3.4 Articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128
3.4.2 Accurate Computation of the Radiosity Gradient for Constant and Linear
Emitters (EGWR ’95) . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
3.4.3 An exhaustive error-bounding algorithm for hierarchical radiosity (CGF
’98) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5
6 TABLE DES MATIÈRES
3.4.4 A Frequency Analysis of Light Transport (Siggraph 2005) . . . . . . . . 164
4 Utilisation des cartes graphiques programmables 177

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
4.2 Calcul des ombres douces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
4.3 Précalcul d’occlusion ambiante . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
4.4 Calcul des réflexions spéculaires . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.5 Éclairage indirect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
4.6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
4.7 Articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
4.7.2 A survey of real-time soft shadows algorithms (CGF 2003) . . . . . . . . 184
4.7.3 Soft shadow maps: efficient sampling of light source visiblity (CGF 2006) 206
4.7.4 Fast Precomputed Ambient Occlusion for Proximity Shadows (JGT 2006) 224
4.7.5 Accurate specular reflections in real-time (EG 2006) . . . . . . . . . . . 238
4.7.6 Wavelet radiance transport for interactive indirect lighting (EGSR 2006) . 249
5 Conclusion et perspectives 263

5.1 Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264
Liste des publications 265

Table des figures
2.1 Images de modèles architecturaux calculées avec la méthode de radiosité . . . . . 12

2.2 Images du modèle du Soda Hall calculées avec la méthode de radiosité . . . . . . 12
2.3 Gathering et Shooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Surfaces planes complexes, triangulées . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Calcul de radiosité sur des surfaces planes quelconques. . . . . . . . . . . . . . . 15
2.6 La fonction de radiosité est définie sur un domaine plan étendu . . . . . . . . . . 16
2.7 Taux de convergence en fonction du temps de calcul . . . . . . . . . . . . . . . . 17
2.8 Coût mémoire pour les différentes fonctions de base . . . . . . . . . . . . . . . . 19
2.9 Erreur commise sur la simulation en fonction du temps de calcul (en s). . . . . . 19
2.10 Comparaison entre les ondelettes de Haar , M2 et M3 . . . . . . . . . . . . . . 19
2.11 Lissage a posteriori des ondelettes de Haar . . . . . . . . . . . . . . . . . . . . . 20
2.12 Adaptation du maillage de discontinuité aux ondelettes d’ordre élevé . . . . . . . 21
2.13 Radiosité avec ondelettes M3 et maillage de discontinuité . . . . . . . . . . . . . 22
3.1 Réflexion est plus ou moins nette en fonction de la BRDF . . . . . . . . . . . . . 124

3.2 Ombres causées par des sources ponctuelles ou surfaciques . . . . . . . . . . . . 125
3.3 L’éclairage indirect est en général plus flou que l’éclairage direct . . . . . . . . . 125
3.4 Application de notre étude des fréquences . . . . . . . . . . . . . . . . . . . . . 126
4.1 Calcul d’ombres douces par discrétisation des obstacles . . . . . . . . . . . . . . 179

4.2 Précalcul d’occlusion ambiante . . . . . . . . . . . . . . . . . . . . . . . . . . . 180
4.3 Réflexions spéculaires calculées avec notre algorithme . . . . . . . . . . . . . . 181
4.4 Calcul interactif de l’éclairage global . . . . . . . . . . . . . . . . . . . . . . . . 182
7
8 TABLE DES FIGURES
1.
Introduction
Les techniques de rendu en Synthèse d’Images produisent des images d’une scène virtuelle
en prenant comme point de départ la définition de cette scène : les objets qui la composent, leur
position, leurs matériaux, mais aussi la position de l’observateur.
Parmi les techniques de rendu, on distingue les techniques de rendu Photo-Réalistes, qui
cherchent à produire une image aussi proche que possible de la réalité, en simulant les échanges
lumineux à l’intérieur de la scène. On obtient ainsi une image de la scène avec les effets d’éclai-
rage, à la fois directs et indirects, les reflets, les ombres...
La recherche dans le domaine de la simulation de l’éclairage a énormément progressé au
cours des dernières années, de telle sorte que la production d’images photoréalistes est désormais
un objectif accessible pour le grand public. Plusieurs applications industrielles tirent parti de cette
génération d’images photoréalistes : visite virtuelle de bâtiments, jeux vidéo, prototypage virtuel,
effets spéciaux, design architectural...
Ces applications industrielles ont un effet d’entraînement sur la recherche : les utilisateurs
(et les industriels) sont demandeurs d’effets toujours plus réalistes, et les chercheurs sont mis
à contribution. Le décalage entre la date de publication d’un nouvel algorithme et son emploi
dans un produit industriel s’est considérablement réduit, passant de plus de 10 ans dans les
années 1990 à quelques années seulement en 2006. Non seulement ce dynamisme augmente les
possibilités d’application industrielle pour nos recherches, mais encore il ouvre de nouvelles
directions de recherche, pour combler les besoins accrus en interactivité et en réalisme des
utilisateurs.
Dans ce mémoire, nous allons nous intéresser à ces problèmes de simulation photo-réaliste de
l’éclairage. En particulier, nous allons présenter : la simulation de l’éclairage par des méthodes
d’éléments finis multi-échelles (radiosité par ondelettes), la détermination des caractéristiques
de la fonction d’éclairage (dérivées, fréquence), et la simulation en temps-réel ou interactif de
plusieurs effets lumineux (ombres, reflets spéculaires, éclairage indirect). Ces trois domaines
recouvrent l’ensemble de nos travaux pendant cette période.
– Dans le domaine de la simulation de l’éclairage par éléments finis, nous avons travaillé
sur la méthode de radiosité hiérarchique, puis sur la méthode de radiosité par ondelettes.
Nous avons montré l’efficacité de la représentation hiérarchique, mais aussi ses limites :
la hiérarchie est conditionnée par la modélisation initiale de la scène. Nous avons montré
comment s’affranchir de cette limitation. Nous avons également montré que l’extension de
la méthode de radiosité aux ondelettes d’ordre élevé imposait une modification radicale
de l’algorithme, puis comment combiner efficacement les ondelettes d’ordre élevé avec un
maillage de discontinuité.
– Un des problèmes commun à plusieurs méthodes de simulation de l’éclairage, c’est qu’il
est nécessaire d’adapter l’échantillonnage employé aux caractéristiques de la fonction
9
10 CHAPITRE 1. INTRODUCTION
d’éclairage. Mais ces caractéristiques sont, par définition, inconnues au début de la si-
mulation. Nous avons montré comment calculer localement certaines caractéristiques de la
fonction d’éclairage : d’une part ses dérivées, d’autre part la fréquence de ses variations.
– Enfin, on remarque qu’une part très importante des calculs en simulation de l’éclairage
est consacrée à l’aspect visuel du résultat, plus qu’à la précision physique des calculs.
Il est possible de calculer une image physiquement acceptable d’une scène en un temps
très court, mais calculer une image visuellement réaliste de la même scène multiplie ap-
proximativement par 10 le temps de calcul. Cette règle expérimentale vaut pour plusieurs
algorithmes de simulation, comme la radiosité ou le photon mapping. Mais le réalisme
visuel n’est, par définition, nécessaire que pour la partie visible de la scène. Nous avons
développé plusieurs algorithmes qui permettent de calculer en temps réel certains effets
essentiels pour le réalisme visuel (ombres, reflets spéculaires). En combinant ce calcul
temps-réel avec un calcul séparé des effets d’éclairage indirect, notre but est d’obtenir une
simulation en temps-réel de l’éclairage global dans une scène dynamique.
1.1 Structure du mémoire

Nous avons rédigé trois principaux chapitres, couvrant les travaux de chaque thème. Après
chaque chapitre en français se trouvent les articles eux-mêmes, donnant le détail des travaux.
Nous présentons ensuite les conclusions et quelques perspectives pour l’avenir.
Les travaux décrits dans ce mémoire ont tous été réalisés en collaboration avec des collègues
ou dans le cadre des stages de DEA ou des thèses que j’ai co-dirigés. Les noms de mes collabo-
rateurs sont cités dans les endroits appropriés.
2.
Modélisation multi-échelles de l’éclairage
Les techniques de radiosité hiérarchiques ou par ondelettes utilisent une représentation multi-
échelle de l’éclairage. Cette représentation hiérarchique permet d’accélérer les calculs. Cepen-
dant, la représentation hiérarchique est basée sur le modèle géométrique de la scène ; l’influence
de ce modèle géométrique peut ralentir la simulation et en diminuer la qualité. Nous présentons
deux méthodes pour s’affranchir du modèle géométrique original. Nous présentons également
une analyse de l’algorithme de radiosité par ondelettes. L’emploi de fonctions de base hiérar-
chiques d’ordre 2 ou 3, par opposition aux fonctions de base constantes par morceaux, impose
d’adapter l’algorithme pour tenir compte des coûts proportionnels à l’ordre des fonctions de base
élevé puissance n. Nous montrons qu’avec une adaptation fine de chacune des étapes de l’algo-
rithme, ces ondelettes d’ordre élevé permettent de réduire la complexité et d’accélérer les calculs.
2.1 La méthode de radiosité hiérarchique

Dans les techniques de simulation de l’éclairage, on cherche à calculer l’aspect d’une scène
donnée pour un observateur virtuel. La scène étant définie par sa géométrie, les matériaux et les
caractéristiques (position, émission) des sources lumineuses, le calcul de l’éclairage revient à
résoudre l’équation de l’éclairage1 :
Z
L(x, θ0 , φ0 ) = Le (x, θ0 , φ0 ) + ρbd (x, θ0 , φ0 , θ, φ)Li (x, θ, φ) cos θ dω (2.1)
Ω
La radiance en un point x dans la direction (θ0 , φ0 ) est simplement la somme de la radiance

émise en propre (Le ) et de la radiance réfléchie en ce point. Il existe plusieurs méthodes de ré-
solution de cette équation (lancer de rayon, radiosité, Monte-Carlo, photon mapping) ; parmi ces
méthodes, la méthode de radiosité2 résout l’équation 2.1 en simplifiant le problème (en supposant
toutes les surfaces diffuses), puis en discrétisant l’équation obtenue. On aboutit à une équation
matricielle :
B = E + MB (2.2)
dont la complexité est en O(n2 ) par rapport à la discrétisation de la scène. Comme les surfaces de
la scène sont diffuses, le résultat obtenu est indépendant du point de vue, ce qui permet ensuite
de se déplacer dans la scène en temps-réel (voir figures 2.1 et 2.2).
1. James T. K. « The Rendering Equation ». Computer Graphics (ACM SIGGRAPH ’86 Proceedings), 20(4):143–
150, août 1986.
2. Cindy M. G, Kenneth E. T, Donald P. G et Bennett B. « Modelling the Interaction of
Light Between Diffuse Surfaces ». Computer Graphics (ACM SIGGRAPH ’84 Proceedings), 18(3):212–222, juillet
1984.
11
12 CHAPITRE 2. MODÉLISATION MULTI-ÉCHELLES DE L’ÉCLAIRAGE
(a) Tholos, Delphe (b) Place Stanislas, Nancy
Figure 2.1 – Images de modèles architecturaux calculées avec la méthode de radiosité. Les mo-
dèles ont été fournis par l’École d’Architecture de Nancy.
(a) (b)
Figure 2.2 – Images du modèle du Soda Hall calculées avec la méthode de radiosité. Le modèle
du Soda Hall a été fourni par Carlo Sequin.
La matrice M contient les coefficients de transfert d’énergie lumineuse entre les différentes
facettes de la discrétisation de la scène. On résout l’équation 2.2 de façon itérative :
∞
X
B = (I − M)−1 E = Mk E (2.3)
k=0
La taille de la matrice M est n2 , où n est le nombre de facettes issues de la discrétisation de la

scène. Cette taille rend difficile son stockage en mémoire pour de scènes complexes. On utilise
alors une résolution partielle, qui repose à chaque étape, sur le calcul soit d’une ligne, soit d’une
colonne de la matrice (voir figure 2.3) :
– le calcul basé sur une ligne revient à mettre à jour la radiosité d’une facette en fonction de
la radiosité de toutes les autres facettes de la scène (gathering).
– le calcul basé sur une colonne revient à mettre à jour la radiosité de toutes les facettes de
2.2. ANALYSE DE L’ALGORITHME DE RADIOSITÉ HIÉRARCHIQUE 13
X X X
          
       
X X X
         
           
..  =  .. 
 X  =  XX . . . XX
   ..
  X 
     
 .   .     
 . 
X X X
    
(a) Shooting (b) Gathering
Figure 2.3 – Gathering et Shooting : utilisation d’une ligne ou d’une colonne de la matrice de
transport, M.
la scène en fonction de la radiosité d’une seule facette (shooting).

Dans la radiosité classique, le shooting permet d’obtenir des images exploitables plus rapide-
ment : l’éclairage direct est obtenu dès les premières étapes, et la partie la plus importante de
l’éclairage indirect est obtenue ensuite. En revanche, le temps nécessaire pour obtenir la solution
après convergence est identique dans les deux méthodes.
2.2 Analyse de l’algorithme de radiosité hiérarchique

L’algorithme de radiosité hiérarchique3, 4 utilise une représentation multi-échelle de la ra-
diosité sur chacune des surfaces composant la scène, ce qui permet de réduire la complexité de
l’algorithme à O(n log n). Sur chacun des objets composant la scène, on établit une hiérarchie de
facettes, représentant la radiosité à différents niveaux de précision. Les interactions ou transferts
d’énergie entre objets peuvent ensuite être établis entre les différents niveaux des hiérarchies ;
ainsi, pour deux objets éloignés l’un de l’autre (et donc échangeant peu d’énergie), on placera
une interaction entre les représentations de haut niveau des deux hiérarchies. En revanche, pour
deux objets proches et échangeant beaucoup d’énergie, on fera établira les interactions entre des
représentations précises de l’éclairage sur chaque objet.
On peut dire que la méthode correspond à stocker l’opérateur de transport M sous forme de
blocs.
L’algorithme de radiosité hiérarchique calcule le transfert d’énergie lumineuse de façon ité-
rative en plusieurs étapes :
– raffiner la représentation de l’opérateur de transfert M, pour tenir compte de la radiosité
existante dans la scène. Cette étape utilise un oracle de raffinement pour déterminer les
interactions qui ne sont pas modélisées avec une précision suffisante.
– utiliser l’opérateur M pour transférer l’énergie entre les différents objets de la scène.
3. Pat H, David S et Larry A. « A Rapid Hierarchical Radiosity Algorithm ». Computer Gra-
phics (ACM SIGGRAPH ’91 Proceedings), 25(4):197–206, juillet 1991.
4. Pat H et David S. « A Rapid Hierarchical Radiosity Algorithm for Unoccluded Environments ».
Dans Photorealism in Computer Graphics (Eurographics Workshop on Photosimulation, Realism and Physics in
Computer Graphics), p. 151–171, juin 1992.
– propager l’énergie reçue par les différents niveaux hiérarchiques de chaque objet dans l’en-
semble de la hiérarchie (push-pull).
La méthode de radiosité hiérarchique modifie en profondeur l’algorithme de radiosité : à
chaque étape, on a une représentation complète de l’opérateur de transport, et on peut donc cal-
culer un transfert de radiosité dans toute la scène en une seule étape.
Dans une étape de propagation, il est plus facile d’utiliser du gathering que du shooting :
avant qu’une surface puisse envoyer son énergie dans la scène, il est nécessaire d’effectuer une
étape de push-pull. Le shooting impose ainsi un trop grand nombre d’étapes de push-pull, ce
qui le rend moins efficace que le gathering. Expérimentalement, nous avons aussi trouvé que la
méthode de shooting est plus sensible à l’imprécision sur les calculs des coefficients de transfert,
et divergeait avec les méthodes imprécises de calcul de ces coefficients utilisées à l’époque.
Nous avons effectué une étude en profondeur de l’algorithme de radiosité hiérarchique [16]
(voir p. 26). Cette étude a montré deux choses importantes :
– la plus grande partie du temps de calcul est passé dans les tests de visibilité (plus de 80 %
dans une scène complexe). Les autres étapes de l’algorithme ont une influence relativement
modeste;
– la méthode construit une hiérarchie sur chacun des objets de haut niveau composant la
scène . On établit d’abord un lien entre chacune de ces hiérarchies (initial linking), puis
on raffine ces liens. La méthode a ainsi une complexité réduite par rapport au nombre
de facettes générées lors du raffinement, mais reste quadratique par rapport au nombre
d’objets de haut niveau composant la scène.
En nous basant sur cette étude, nous avons proposé une méthode de lazy linking, qui réduit le
temps de calcul d’un facteur 2, et permet d’obtenir les premières images plus rapidement. Nous
avons également proposé un nouvel oracle de raffinement, qui évite des raffinements inutiles, et
permet de gagner encore un facteur 2 sur le temps de calcul.
2.3 Efficacité de la hiérarchie

La méthode de lazy linking que nous avons proposé permet seulement de diminuer l’influence
de l’étape de création des liens entre surfaces de haut niveau, mais elle ne supprime pas complè-
tement cette étape, dont la complexité reste quadratique. Plusieurs méthodes ont été proposées
pour réduire encore la complexité, comme le clustering5 ou le face-clustering6 .
Une des difficultés provient de l’étape de push-pull, qui dépend des formes relatives entre les
nœuds parents et enfants de la hiérarchie. Les calculs de l’étape de push-pull sont simplifiés si la
hiérarchie est régulière, c’est-à-dire si à chaque niveau de la hiérarchie les formes et les positions
des nœuds fils par rapport aux ascendants sont identiques. Une hiérarchie régulière impose que
les surfaces de haut niveau qui sont soit triangulaires, soit parallélogrammes.
Cependant, dans une grande partie des scènes utilisées pour la simulation de l’éclairage, les
surfaces de départ n’ont pas une forme régulière (voir figure 2.4). Avant de lancer les calculs de
radiosité, il est alors nécessaire de les trianguler, mais cette triangulation augmente artificielle-
ment le nombre de surfaces de haut niveau dans la scène. En outre les triangles résultants de la
triangulation influencent négativement les calculs : ils ont souvent des formes très allongées, ce
qui diminue la qualité de la simulation (voir figure 2.5) et l’influence de la hiérarchie. Enfin, des
discontinuités peuvent apparaître entre les triangles.
5. Francois S. « Clustering and Volume Scattering for Hierarchical Radiosity Calculations ». Dans Rendering
Techniques ’95 (Eurographics Workshop on Rendering), p. 105–117, juin 1994.
6. A. W, P. H et M. G. « Face Cluster Radiosity ». Dans Rendering Techniques ’99 (Eurogra-
phics Workshop on Rendering), p. 293–304, 1999.
2.3. EFFICACITÉ DE LA HIÉRARCHIE 15
(a) Détail de la figure 2.1(b) (b) Triangulé : 32 tri- (c) Non-triangulé : une
angles seule surface
Figure 2.4 – Les modèles contiennent des surfaces planes complexes, qui sont triangulées exces-
sivement.
(a) Triangulé (b) Non-triangulé (notre algorithme)
Figure 2.5 – Calcul de radiosité sur des surfaces planes quelconques.

Figure 2.6 – La fonction de radiosité est définie sur un domaine plan étendu, plus simple que la
surface originale.
En collaboration avec le doctorant François Cuny (co-encadré par Jean-Claude Paul et

Laurent Alonso), nous avons proposé une méthode qui permet les calculs de radiosité hié-
rarchique sur des surfaces planes de forme quelconque, y compris les surfaces concaves ou
trouées [14] (voir p. 42). Cette méthode repose sur une extension de la surface plane originale à
une surface plus simple, qui la contient, et sur laquelle il est possible de construire une hiérarchie
régulière (voir figure 2.6). La radiosité doit être définie de façon continue sur la surface étendue,
tout en étant égale à la radiosité sur la surface de départ.
Notre algorithme permet naturellement de diminuer le nombre de surfaces initiales dans la
scène, et ainsi de diminuer le coût mémoire de l’algorithme et le coût de l’étape d’initial linking,
et ces points ont été confirmés par nos expériences. De façon plus surprenante, nous avons trouvé
que notre algorithme accélère la convergence de l’algorithme de radiosité hiérarchique (voir fi-
gure 2.7) : le pourcentage d’énergie restant à propager dans la scène diminue plus rapidement. Ce
résultat montre que la hiérarchie régulière employée approxime de façon plus efficace la radiosité
sur la surface que la hiérarchie basée sur la triangulation des surface.
Ces résultats ont été étendus par la suite au cas des surfaces courbes paramétriques7 , puis des
maillages polygonaux paramétrés8 .
2.4 Ondelettes d’ordre élevé

La méthode de radiosité hiérarchique repose sur une représentation multi-échelle de l’éclai-
rage. Cette représentation a été par la suite formalisée et étendue à des fonctions d’ondelettes
quelconques9 . En théorie, ces recherches ouvraient la voie à des fonctions de base d’ordre plus
élevé, par exemple linéaires ou quadratiques par morceaux, au lieu de constantes par morceaux.
En pratique, l’utilisation d’ondelettes d’ordre élevé modifie l’équilibre de l’algorithme. Le
7. Laurent A, François C, Sylvain P, Jean-Claude P, Sylvain L et Eric W. « The Virtual
Mesh: A Geometric Abstraction for Efficiently Computing Radiosity ». ACM Transactions on Graphics, 20(3):169–
201, juillet 2001.
8. Gregory L, Bruno L, Laurent A et Jean-Claude P. « Master-Element Vector Irradiance for Large
Tesselated Models ». Dans Third International Conference on Computer Graphics and Interactive Techniques in
Australasia and South East Asia (GRAPHITE ’05), p. 315–322, 463, novembre 2005.
9. Steven J. G, Peter S, Michael F. C et Pat H. « Wavelet Radiosity ». Dans ACM SIG-
GRAPH ’93, p. 221–230, 1993.
2.4. ONDELETTES D’ORDRE ÉLEVÉ 17
100 100
Original Original
50 Tesselated 50 Tesselated
20 20
10 10
5 5
2 2
1 1
0 1e4 2e4 5000
(a) Place Stanislas (figure 2.1(b)) (b) Temple (figure 2.1(a))
100
Original
50 Tesselated
20
10
1
0 1e5 2e5
(c) Soda Hall (figure 2.2)
Figure 2.7 – Taux de convergence (rapport énergie restant à propager sur énergie initiale) en
fonction du temps de calcul (en secondes)
coût de stockage des coefficients attachés à des ondelettes d’ordre n est de (n + 1)k , où k est la
dimension de la fonction échantillonnée. Ainsi, le coût lié au stockage de chaque facette de la
hiérarchie évolue comme (n + 1)2 , et le coût de stockage de chaque interaction évolue comme
(n + 1)4 . L’emploi d’ondelettes d’ordre 2 (quadratiques par morceaux) augmente donc le coût de
stockage de chaque interaction de deux ordres de grandeur.
Pour cette raison, une étude expérimentale10 a montré que les ondelettes d’ordre élevé n’ap-
portaient pas d’amélioration significative par rapport aux ondelettes de Haar (constantes par
morceaux). Cette étude était basée sur une implémentation naïve de l’algorithme de radiosité
par ondelettes, en modifiant simplement la fonction de base. Avec le doctorant François Cuny
(co-encadré par Jean-Claude Paul et Laurent Alonso), nous avons montré qu’il était nécessaire
d’adapter chacune des étapes de l’algorithme aux spécificités introduites par les fonctions de base
d’ordre élevé [8] (p. 56). Il était également nécessaire de réfléchir aux conséquences de chaque
adaptation sur les autres étapes de l’algorithme ; ainsi, la décision de ne pas stocker les liens a eu
une influence sur le choix de l’algorithme de résolution.
10. Andrew W et Paul H. « An Empirical Comparison of Progressive and Wavelet Radiosity ». Dans
Rendering Techniques ’97 (Eurographics Workshop on Rendering), p. 175–186, 1997.
2.4.1 Améliorations de l’algorithme

Nous avons proposé les modifications suivantes à l’algorithme de radiosité par ondelettes :
– Le coût lié au stockage des interactions étant prohibitif (16 ou 81 fois plus élevé qu’avec
des fonctions constantes par morceaux), nous avons abandonné le principe de stockage des
liens. Il était devenu plus intéressant de recalculer les liens à chaque itération que de les
stocker en mémoire.
– Une conséquence de ce choix est qu’il devenait à nouveau plus intéressant d’utiliser du
shooting pour la propagation de l’énergie. Avec le shooting, la distribution d’énergie pro-
pagée dans la scène varie à chaque étape, ce qui signifie que la distribution optimale des
liens varie à chaque itération. Il n’est donc pas pénalisant de les recalculer, au contraire.
– Nous avons augmenté la précision du calcul des coefficients de transfert pour chaque ité-
ration. En particulier, nous avons multiplié les tests de visibilité pour les interactions entre
facettes partiellement visibles. Nous avons trouvé que l’augmentation du temps de calcul
lié à ces tests supplémentaires était plus que compensée par l’augmentation de la qualité
du résultat.
– Nous avons également utilisé un oracle de raffinement11 qui tenait compte de la nature
linéaire ou quadratique de la fonction d’éclairage en cours de calcul pour guider le raffine-
ment. On ne raffine une interaction que si la fonction d’éclairage sur le récepteur s’écarte
significativement de la représentation fournie par les fonctions de base.
Avec ces modifications, nous avons montré que les ondelettes d’ordre élevé permettaient
d’obtenir un résultat de meilleure qualité et plus rapidement que les ondelettes de Haar. Chaque
nœud de la hiérarchie a un coût de stockage plus élevé (4 ou 9 fois plus élevé selon le degré
des ondelettes utilisées), mais la réduction du nombre de nœuds dans la hiérarchie compense ce
surcoût.
La figure 2.8 montre le coût en mémoire de la modélisation de l’éclairage, en fonction de
l’erreur commise dans la simulation, pour deux scènes de test. On voit que pour une simulation
de mauvaise qualité (erreur ≈ 0.1), les ondelettes de Haar fournissent une approximation plus
compacte en mémoire, mais que plus la qualité de la simulation augmente et plus les ondelettes
d’ordre élevé sont avantagées. Pour chaque base d’ondelettes étudiées, il existe un niveau de
qualité où elle est meilleure que les autres bases.
Le niveau de qualité correspondant à une simulation acceptable visuellement correspond (ex-
périmentalement) à une erreur ≈ 5 × 10−3 , c’est-à-dire à l’intersection des courbes M2 et M3 .
Nous avons également représenté l’erreur commise dans la simulation de l’éclairage, en fonc-
tion du temps de calcul, pour les trois bases d’ondelettes (voir figure 2.9). On voit que l’erreur
commise diminue en fonction du temps de calcul, pour les trois bases. Pour chaque base d’onde-
lettes, il existe un niveau de qualité pour lequel elle fournit un résultat plus rapidement que les
autres bases.
Si on analyse ces résultats, on voit que lorsqu’on raffine peu par rapport au modèle de départ,
les ondelettes constantes sont avantagées, ce qui est normal : leur coût par objet est moins élevé
que les autres bases. En revanche, plus on raffine, plus les fonctions de base d’ordre élevé per-
mettent une modélisation plus précise et plus compacte de la radiosité. Sur ce point, nos travaux
se complètent naturellement avec nos travaux sur la simplification de la scène (voir section 2.3
et [14]), dans la mesure où sur un modèle où les surfaces sont triangulées, le raffinement est
moins efficace.
Comme conséquence inattendue, mais intéressante, de nos travaux, nous avons trouvé que
notre implémentation donnait directement une fonction de radiosité continue (voir figure 2.10),
sans qu’il soit nécessaire d’effectuer un post-traitement (contrairement à toutes les implémenta-
11. Philippe B et Yves W. « Error Control for Radiosity ». Dans Rendering Techniques ’96 (Eurographics
Workshop on Rendering), p. 153–164, 1996.
8500 32000
Haar Haar
M2 M2
8000 M3 30000 M3
7500 28000
7000 26000
6500 24000
0.001 0.01 0.1 0.001 0.01 0.1
Global Error Global Error
(a) Dining room (b) Classroom
Figure 2.8 – Coût mémoire (en Ko) pour les différentes fonctions de base, en fonction de l’erreur
sur la simulation.
0.1
Haar Haar
0.1 M2 M2
M3 M3
0.01
0.01
0.001 0.001
1 10 100 1000 10 100 1000 10000 100000
CPU Time (s) CPU Time (s)
(a) Dining room (b) Classroom
Figure 2.9 – Erreur commise sur la simulation en fonction du temps de calcul (en s).
(a) Haar (b) M 2 (c) M 3
Figure 2.10 – Comparaison entre les ondelettes de Haar (constantes), M2 (linéaires) et M3 (qua-
dratiques), pour le même temps de calcul.
Figure 2.11 – Lissage a posteriori des ondelettes de Haar. On peut comparer la qualité avec la
figure 2.10.
tions précédentes). La qualité de la fonction que nous obtenons avec des ondelettes linéaires ou
quadriques sans post-traitement est supérieure à celle obtenue avec des ondelettes constantes par
morceaux et un post-traitement (voir figure 2.11)
2.4.2 Maillage de discontinuités et ondelettes d’ordre élevé

Dans la simulation de l’éclairage, on rencontre fréquemment des discontinuités de la fonction
d’éclairage ou de sa dérivée. Ces discontinuités sont par exemple liées au contact entre objets,
ou bien aux ombres dures causées par des sources ponctuelles. Les discontinuités de la dérivée,
elles, sont liées aux frontières d’ombre et de pénombre causées par des sources étendues.
Un échantillonnage par mailles régulières, comme celui produit par la radiosité par onde-
lettes, ne permet évidemment pas de modéliser ces discontinuités. Pour chaque discontinuité de
la radiosité, l’oracle de raffinement est conduit à subdiviser indéfiniment, jusqu’à atteindre un
seuil sur la taille de la facette. Avec des ondelettes M2 et M2 , chaque maille a un coût plus élevé
qu’avec des ondelettes constantes. Le raffinement effectué sur les discontinuités vient contreba-
lancer les gains obtenus par ailleurs.
Plusieurs travaux ont montré que dans le cas de la radiosité classique, un maillage adapté
aux discontinuités fournit les meilleurs résultats en terme de qualité12, 13, 14 . Ces travaux ont été
étendus au cas de la radiosité hiérarchique15, 16, 17 . Toutes ces approches commencent par calculer
l’ensemble des discontinuités, par des méthodes géométriques, puis triangulent cet ensemble, et
construisent enfin le maillage hiérarchique en se basant sur la triangulation.
Cette méthode a plusieurs défauts : le nombre de discontinuités des dérivées est élevé, donc
le maillage généré est très complexe, et cette complexité ralentit la simulation. Certaines des dis-
12. Daniel L, Filippo T et Donald P. G. « Discontinuity Meshing for Accurate Radiosity ».
IEEE Computer Graphics and Applications, 12(6):25–39, novembre 1992.
13. Paul H. « Discontinuity Meshing for Radiosity ». Dans Eurographics Workshop on Rendering, p. 203–226,
mai 1992.
14. George D et Eugene F. « A Fast Shadow Algorithm for Area Light Sources Using Backprojection ».
Dans ACM SIGGRAPH ’94, p. 223–230, 1994.
15. Daniel L, Filippo T et Donald P. G. « Combining Hierarchical Radiosity and Discontinuity
Meshing ». Dans ACM SIGGRAPH ’93, p. 199–208, 1993.
16. George D et Francois S. « Accurate Visibility and Meshing Calculations for Hierarchical Radiosity
». Dans Rendering Techniques ’96 (Eurographics Workshop on Rendering), p. 269–278, 1996.
17. Fredo D, George D et Claude P. « Fast and Accurate Hierarchical Radiosity Using Global Visi-
bility ». ACM Transactions on Graphics, 18(2):128–170, 1999.
continuités calculées n’ont par ailleurs pas d’effets visibles sur la simulation14 . Enfin, le maillage
généré est incompatible avec l’approche par ondelettes, qui nécessite une hiérarchie régulière.
(a) (b) (c)
Figure 2.12 – Une maille coupée par une discontinuité est (a) décomposée en deux mailles. Pour
chacune des mailles, nous identifions le parallélogramme englobant (b). Sur chaque parallélo-
gramme, nous conduisons un algorithme de radiosité par ondelettes classique.
Nous avons développé un algorithme qui combine les ondelettes d’ordre élevé avec le
maillage de discontinuité [13] (voir p. 68). Notre algorithme utilise les ondelettes d’ordre élevé
autant que possible, et n’introduit les discontinuités dans le maillage que si l’oracle de raffinement
établit qu’elles permettront d’en réduire la complexité. Nous avons trouvé qu’il n’est nécessaire
d’introduire dans le maillage qu’un petit nombre de discontinuités seulement (voir figure 2.13).
Les autres discontinuités sont approximées de façon correcte par les fonctions de base linéaires
ou quadratiques.
Notre travail est basé sur les travaux précédents sur la radiosité sur des surfaces planes quel-
conques [14], mais en modifiant l’approche sur plusieurs points :
– Les deux côtés de la discontinuité jouent un rôle dans la simulation de l’éclairage. Il faut
donc créer deux mailles spéciales pour chaque discontinuité introduite (voir figure 2.12).
– Sur la discontinuité, il faut calculer des coefficients de push-pull particuliers, qui tiennent
compte de la proportion effective de la surface qui se trouve de chaque côté de la discon-
tinuité. Sur le reste de la hiérarchie, en amont et en aval de la discontinuité, on utilise une
subdivision régulière, avec tous ses avantages.
– Il peut y avoir intersection entre plusieurs discontinuités (les frontières d’ombre causées
par plusieurs sources lumineuses, ou encore les frontières d’ombre et de pénombre qui se
rejoignent quand l’obstacle touche le récepteur). Il faut alors traiter les discontinuités les
unes après les autres, et prévoir certains cas particuliers.
Notre approche permet une représentation très compacte de la radiosité, y compris en pré-
sence de discontinuités (voir [13] et figure 2.13). Dans les zones éclairées, les ondelettes M3
permettent d’utiliser des mailles très larges tout en fournissant une représentation continue, tan-
dis que sur les frontières d’ombre, l’emploi des discontinuités permet d’interrompre rapidement
le raffinement. Dans les zones de pénombre, nous avons montré qu’il n’est pas toujours néces-
saire d’introduire les discontinuités, et que si la transition est suffisamment douce, les ondelettes
M3 peuvent la modéliser correctement.
2.4.3 Radiosité spatio-temporelle

Avec le doctorant Cyrille Damez (encadré par François Sillion), nous nous sommes par
ailleurs intéressé au calcul de la radiosité dans un espace à 4 dimensions (3 dimensions d’espace
et une dimension de temps). En supposant connue une animation, nous voulons calculer l’éclai-
rage pour toute cette animation, en exploitant non seulement la cohérence spatiale mais aussi la
(a) (b)
Figure 2.13 – Radiosité avec ondelettes quadriques (M3 ) et maillage de discontinuité, en présence
d’une source surfacique. Une partie de la zone de pénombre du fauteuil est modélisée avec des
mailles régulières, sans avoir besoin d’insérer de discontinuités. On voit aussi que des mailles
très larges suffisent pour la zone éclairée au pied du fauteuil.
cohérence temporelle de l’éclairage, par une décomposition hiérarchique portant également sur
la dimension temporelle.
Les travaux préliminaires18 souffraient d’importantes discontinuités temporelles. Ces discon-
tinuités existent également dans les dimensions spatiales dans le cas de la radiosité hiérarchique
classique, où elles sont partiellement corrigées par un post-traitement. Les discontinuités tempo-
relles sont à la fois beaucoup plus gênantes, parce qu’elles concernent l’ensemble de la scène, et
plus difficiles à traiter.
Nos travaux sur les ondelettes d’ordre élevé avaient montré que ces ondelettes génèrent di-
rectement une solution continue dans le domaine spatial. Nous avons montré qu’en introduisant
les ondelettes d’ordre élevé dans le domaine temporel [6] (p. 82), il était également possible
d’éliminer les discontinuités temporelles.
2.5 Structure de la scène

Nous avons fait plusieurs fois l’expérience de relations avec des partenaires extérieurs au
monde de la recherche : avec l’École d’Architecture de Nancy, avec la société OPTIS, avec les
entreprises innovantes Visual Information Systems (Cape Town) et VSP-Technologies (Nancy),
avec le Fraunhofer-Institut für Graphische Datenverarbeitung (Darmstadt), avec les partenaires
des projets européens ARCADE et SIMULGEN. Ces partenaires nous confiaient des scènes sur
lesquelles nous pouvions tester nos algorithmes de calcul (voir, par exemple, figure 2.1). Ces
scènes étaient, à chaque fois, fournies sous forme d’un ensemble désordonné de polygones, sans
information de connectivité ni structure interne.
Des discussions avec nos partenaires nous ont appris que la structure de la scène est, en
général, présente lors du processus de modélisation, mais qu’elle est perdue dans les multiples
transferts de fichiers. Cette disparition peut se produire même entre les différents services d’une
même entreprise.
18. C. D et F. S. « Space-Time Hierarchical Radiosity ». Dans Rendering Techniques ’99 (Eurographics
Workshop on Rendering), p. 235–246, 1999.
2.6. DISCUSSION 23
D’un autre côté, nos propres travaux et ceux d’autres chercheurs ont montré que la connais-
sance de la structure de la scène sous-jacente permet une meilleure qualité dans la simulation de
l’éclairage, par exemple en remplaçant un ensemble de triangles déconnectés par un seul poly-
gone plan [14], ou bien en remplaçant un ensemble de polygones par une surface paramétrée7 , ou
encore en exploitant la structure de la scène pour construire une hiérarchie efficace pour le lancer
de rayons, le clustering ou l’instantiation.
Nous avons lancé le projet SHOW, en collaboration avec François Sillion et Cyril Soler, et
avec trois autres projets INRIA : ALICE, REVES et IPARLA. L’objectif du projet SHOW est
de reconstruire la structure d’un grand ensemble de données désordonnées. En particulier, le
doctorant Aurélien Martinet, financé par le projet SHOW et que je co-encadre avec Cyril Soler
et François Sillion travaille sur l’extraction d’une structure de scène en partant d’un ensemble de
polygones désordonnés.
Les premiers résultats [4] (voir p. 96) ont permis l’extraction d’ensembles de triangles
connexes par arêtes, qui forment des briques de base. Nous sommes également parvenus à extraire
la liste des symétries de chaque brique de base, et à identifier les briques identiques, même si leur
tesselation diffère. Des travaux ultérieurs19 permettent l’identification rapide d’objets identiques
dans la scène, où les objets sont formés par assemblage de briques de base.
2.6 Discussion
Dans ce chapitre, nous avons présenté nos travaux dans le domaine de la radiosité par onde-
lettes. Nos contributions portent sur des améliorations de l’algorithme : faire porter la hiérarchie
sur l’ensemble de la scène, adapter la méthode aux ondelettes d’ordre élevé, combiner ondelettes
d’ordre élevé et maillage de discontinuité.
Ces améliorations sont importantes, voire essentielles pour toute application pratique de la
méthode de radiosité. Combinées, elles permettent une représentation optimale — compacte —
de l’éclairage diffus dans une scène. Mais ces travaux ont leurs propres limitations :
– on ne calcule que l’éclairage diffus, or les fonctions de réflectance complexes jouent un
rôle important dans le réalisme de la scène ;
– on manque d’informations a priori sur le comportement de la fonction d’éclairage, ce qui
ne permet pas d’adapter l’échantillonnage aux variations de la fonction ;
– une part très importante du temps de calcul est consacrée au calcul des frontières d’ombre
et de pénombre dans l’éclairage direct, alors que ces frontières ne jouent qu’un rôle mineur
dans le calcul de l’éclairage indirect.
Ces différents points sont partiellement liés. Ainsi, introduire des fonctions de réflectance qui
ne sont ni diffuses, ni spéculaires impose d’échantillonner à la fois dans le domaine spatial et
dans le domaine angulaire, augmentant ainsi la dimensionnalité du problème et le coût mémoire
de l’algorithme. Une connaissance a priori des variations de la fonction d’éclairage permettrait
d’adapter l’échantillonnage (spatial et angulaire) à ces variations, et ainsi de contrôler le coût
mémoire de l’algorithme. Le prochain chapitre est consacré à l’extraction des propriétés de la
fonction d’éclairage.
Le troisième point est commun à d’autres algorithmes de simulation de l’éclairage : une
grande part du temps de calcul est consacrée à des effets qui sont importants pour l’aspect visuel
de l’image calculée (frontières d’ombre, réflexions spéculaires) mais qui sont sans intérêt pour
7. Laurent A, François C, Sylvain P, Jean-Claude P, Sylvain L et Eric W. « The Virtual
Mesh: A Geometric Abstraction for Efficiently Computing Radiosity ». ACM Transactions on Graphics, 20(3):169–
201, juillet 2001.
19. Aurélien M. « Structuration automatique de scènes 3D ». Thèse, Université Joseph Fourier, 2006.
les calculs d’éclairage indirect. Cette disproportion impose de réfléchir à ce qu’on simule, et
d’étudier les moyens de calculer séparément certains effets. C’est l’objet du chapitre 4.
2.7. ARTICLES 25
2.7 Articles
2.7.1 Liste des articles
–An efficient progressive refinement strategy for hierarchical radiosity (EGWR ’94)
–Wavelet Radiosity on Arbitrary planar surfaces (EGWR 2000)
–A novel approach makes higher order wavelets really efficient for radiosity (EG 2000)
–Combining higher-order wavelets and discontinuity meshing: a compact representation for
radiosity (EGSR 2004)
– Space-time hierarchical radiosity with clustering and higher-order wavelets (CGF 2004)
– Accurate detection of symmetries in 3D shapes (TOG 2006)
2.7.2 An efficient progressive refinement strategy for hierarchical radiosity (EGWR ’94)
Auteurs : Nicolas H, François S et George D
Conférence : 5e Eurographics Workshop on Rendering, Darmstadt, Allemagne.
Date : juin 1994
An Ecient Progressive Re nement Strategy for
Hierarchical Radiosity
Nicolas Holzschuch, Fran
cois Sillion, George Drettakis
iMAGIS / IMAG ?
A detailed study of the performance of hierarchical radiosity is presented,

which con rms that visibility computation is the most expensive operation.
Based on the analysis of the algorithm's behavior, two improvements are sug-
gested. Lazy evaluation of the top-level links suppresses most of the initial linking
cost, and is consistent with a progressive re nement strategy. In addition, the
reduction of the number of links for mutually visible areas is made possible by
the use of an improved subdivision criterion. Results show that initial linking
can be avoided and the number of links signi cantly reduced without noticeable
image degradation, making useful images available more quickly.
1 Introduction
The radiosity method for the simulation of energy exchanges has been used to
produce some of the most realistic synthetic images to date. In particular, its
ability to render global illumination e ects makes it the technique of choice for
simulating the illumination of indoor spaces. Since it is based on the subdivision
of surfaces using a mesh and on the calculation of the energy transfers between
mesh elements pairs, the basic radiosity method is inherently a costly algorithm,
requiring a quadratic number of form factors to be computed.
Recent research has focused on reducing the complexity of the radiosity simu-
lation process. Progressive re nement has been proposed as a possible avenue [1],
whereby form factors are only computed when needed to evaluate the energy
transfers from a given surface, and surfaces are processed in order of importance
with respect to the overall balance of energy. The most signi cant advance in
recent years was probably the introduction of hierarchical algorithms, which
attempt to establish energy transfers between mesh elements of varying size,
thus reducing the subdivision of surfaces and the total number of form factors
computed [4, 5].
Since hierarchical algorithms proceed in a top-down manner, by limiting the
subdivision of input surfaces to what is necessary, they rst have to establish a
number of top-level links between input surfaces in an \initial linking" stage. This
? iMAGIS is a joint research project of CNRS/INRIA/UJF/INPG. Postal address:
B.P. 53, F-38041 Grenoble Cedex 9, France. E-mail: [email protected].
27
2 Nicolas Holzschuch, Fran
results in a quadratic cost with respect to the number of input surfaces, which
seriously impairs the ability of hierarchical radiosity systems to deal with envi-
ronments of even moderate complexity. Thus a reformulation of the algorithm
is necessary in order to be able to simulate meaningful architectural spaces of
medium complexity (several thousands of input surfaces). To this end the ques-
tions that must be addressed are: What energy transfers are signi cant? When
must they be computed? How can their accuracy be controlled?
The goal of the research presented here is to extend the hierarchical algorithm
into a more progressive algorithm,by identifying the calculation components that
can be delayed or removed altogether, and establishing improved re nement
criteria to avoid unnecessary subdivision. Careful analysis of the performance
of the hierarchical algorithm on a variety of scenes shows that the visibility
calculations dominate the overall compute time.
Two main avenues are explored to reduce the cost of visibility calculations:
First, the cost of initial linking is reduced by delaying the creation of the links
between top-level surfaces until they are potentially signi cant. In a BF re ne-
ment scheme this means for instance that no link is established between dark
surfaces. In addition, a form factor between surfaces can be so small that it is
not worth performing the visibility calculation.
Second, experimental studies show that subdivision is often too high. This is
a consequence of the assumption that the error on the form factor is of magni-
tude comparable to the form factor itself. In situations of full visibility between
surfaces, relatively large form factors can be computed with good accuracy.
2 Motivation
To study the behaviour of the hierarchical algorithm, we ran the original hierar-
chical program [5] on a set of ve di erent interior environments, varying from
scenes with simple to moderate complexity (from 140 to 2355 input polygons).
The scenes we used were built in di erent research e orts and have markedly
di erent overall geometric properties. By using these di erent scenes, we hope
to identify general properties of interior environments. We thus hope to avoid,
or at least moderate, the pitfall of unjusti ed generalisation that oftens results
from the use of a single scene or a class of scenes with similar properties to char-
acterise algorithm behaviour. The scenes are: \Full oce", which is the original
scene used in [5], \Dining room", which is \Scene 7" of the standard set of scenes
distributed for this workshop, \East room" and \West room", which are scenes
containing moderately complex desk and chair models, and nally \Hevea", a
model of a hevea tree in a room. Table 1 gives a short description and the num-
ber of polygons n for each scene. Please refer to colour section (Figs. 1, 3, 5 and
9-12) for a computed view of the test scenes.
2.1 Visibility
The rst important observation we make from running the algorithm on these
test scenes is the quanti cation of the cost of visibility calculations in the hier-
28
Progressive Re nement Strategy for Hierarchical Radiosity 3
Table 1. Description of the ve test scenes.
Name n Description
Full Oce 170 The original oce scene
Dining room 402 A table and four chairs
East room 1006 Two desks, six chairs
West room 1647 Four desks, ten chairs
Hevea 2355 An hevea tree with three light sources
archical algorithm. As postulated in previous work [9, 6], visibility computation

represents a signi cant proportion of the overall computation time. In the graph
shown in Fig. 1, the percentages of the computation time spent in each of the
ve main components of the hierarchical algorithm are presented. \Push-pull"
signi es the time spent traversing the quadtree structure associated with each
polygon, \Visibility" is the time spent performing visibility calculations, both
for the initial linking step and subsequent re nement, \Form Factors" is the
time spent performing the actual unoccluded form factor approximation calcu-
lation, \Re ne" is the time spent updating the quadtree for re nement, and
nally \Gather" shows the cost of transferring energy across the links created
between quadtree nodes (interior or leaves) [5]. The graph in Fig. 1 shows that
100 Push-Pull
90 Visibility
80 Form-Factors
70 Re ne
60 Gather
50
40
30
20
10
0
Full Oce Dining RoomEast Room West Room Hevea
Fig. 1. Relative time spent in each procedure.
visibility calculations dominate the computation in the hierarchical algorithm2.

2
In its current version, the program uses a xed number of rays to determine the
mutual visibility between two polygons. The cost of visibility computation is thus
roughly proportional to the number of rays used. In the statistics shown here, 16 rays
were used, a relatively small number. Using more rays would increase the percentage
of time devoted to visibility tests.
29
4 Nicolas Holzschuch, Francois Sillion, George Drettakis
Of course this is relative to the algorithm used. A better approach, e.g. with
a pre-processing step, as in Teller et al. [9] could probably reduce the relative
importance of visibility.
2.2 Initial Linking
The second important observation concerns the actual cost of the initial linking
step. As mentioned in the introduction, this cost is at least quadratic in the
number of polygons, since each pair of input polygons has to be tested to de-
termine if a link should be established. Since this step is performed before any
transfer has commenced, it is a purely geometric visibility test, in this instance
implemented by ray-casting. The cost of this test for each polygon pair can vary
signi cantly, depending on the nature of the scene and the type of ray-casting
acceleration structure used. In all the examples described below, a BSP tree is
used to accelerate the ray-casting process.
Table 2. Total computation time and cost of initial linking (in seconds).
Name n Total Time Initial Linking

Full oce 170 301 5.13
Dining room 402 4824 436
East room 1006 587 194
West room 1647 1017 476
Hevea 2355 4253 1597
Table 2 presents timing results for all test scenes. The total computation
time is given for ten steps of the multigridding method described by Hanrahan
et al [5].3.
These statistics show that the cost of initial linking grows signi cantly with
the number of polygons in the scene. The dependence on scene structure is also
evident, since the growth in computation time between East room and West room
is actually sublinear, while on the other hand the growth of the computation
time between West room and Hevea displays greater than cubic growth in the
number of input polygons. For all tests of more than a thousand polygons, it is
clear that the cost of initial linking becomes overwhelming. Invoking this cost at
the beginning of the illumination computation is particularly undesirable, since
a useful image cannot be displayed before its completion. Finally, we note that
recent improvements of the hierarchical radiosity method by Smits et al. [8] and
Lischinski et al. [6] have allowed signi cant savings in re nement time, but still
3
The k'th step of the multigridding method is typically implemented as the k'th
\bounce" of light: the rst step performs all direct illumination, the second step all
secondary illumination, the third all tertiary illumination etc.
30
rely on the original initial linking stage. Thus initial linking tends to become the
most expensive step of the algorithm4.
Another interesting observation can be made concerning the number of top-
level links (links between input polygons) for which the product BF never be-
comes greater than the re nement threshold "refine over the course of the ten
re nement steps5 . Figure 2 shows the percentage of such links during the rst
ten iterations. A remarkably high percentage of these links never becomes a can-
didate for re nement: after 10 steps, between 65% and 95% of the links have
not been re ned. A signi cant number of those links probably have very little
impact on the radiosity solution.
%
100 Hevea
90 West Room
80 East Room
70 Dining Room
60 Full Oce
50
40
30
20
10
0 Iterations
1 2 3 4 5 6 7 8 9 10 11
Fig. 2. Percentage of links for which BF does not exceed "refine .
What can be concluded from the above discussion? First, if the initial linking
step can be eliminated at the beginning of the computation, a useful solution be-
comes available much more quickly, enhancing the utility of the the hierarchical
method. Second, if the top-level links are only computed when they contribute
signi cantly to the solution, there is the potential for large computational savings
from eliminating a large number of visibility tests.
2.3 Unnecessary Re nement
The third important observation made when using the hierarchical algorithm
is that unnecessary subdivision is incurred, especially for areas which do not
include shadow boundaries. This observation is more dicult to quantify than
the previous two. To demonstrate the problem we present an image of the Dining
room scene, and the corresponding mesh (see colour section, Fig. 1 and 2). The
simulation parameters were "refine = 0:5 and MinArea = 0:001.
As can be seen in Fig. 2 in the colour section, the subdivision obtained
with these parameters is such that acceptable representation of the shadows
4
For example Lischinski et al. report a re nement time of 16 minutes for an initial
linking time of 2 hours and 16 minutes.
5
This is the " used in the original formulation.
31
is achieved in the penumbral areas caused by the table and chairs. However, the
subdivision on the walls is far higher than necessary: the illumination over the
wall varies very smoothly and could thus be represented with a much coarser
mesh. In previous work it was already noted that radiance functions in regions
of full illumination can be accurately represented using a simple mesh based on
the structure of illumination [2].
If this unnecessary subdivision is avoided, signi cant gains can be achieved
since the total number of links will be reduced, saving memory, and since an
attendant reduction of visibility tests will result, saving computation time.
3 Lazy Evaluation of the Top-level Interactions
In this section a modi cation of the hierarchical algorithm is proposed, which

defers the creation of links between top-level surfaces until such a link is deemed
necessary. The basic idea is to avoid performing any computation that does not
have a sizable impact on the nal solution, in order to concentrate on the most
important energy transfers. Thus it is similar to the rationale behind progressive
re nement algorithms. At the same time it remains consistent with the hierarchi-
cal re nement paradigm, whereby computation is only performed to the extent
required by the desired accuracy.
To accomplish this, a criterion must be de ned to decide whether a pair of
surfaces should be linked. In our implementation we use a speci c threshold
" on the product BF. Top-level links are then created lazily, only once the
link
linking criterion is met during the course of the simulation.
3.1 Description of the Algorithm

In the original hierarchical radiosity algorithm, two polygons are either mutually
invisible, and thus do not interact, or at least partially visible from each other
and thus exchange energy. We introduce a second quali cation, whereby a pair
of polygons is either classi ed or unclassi ed. A pair will be marked classi ed
when some information is available regarding its interaction. Initially, all pairs
of polygons are marked as unclassi ed.
At each iteration, all unclassi ed pairs of polygon are considered: First their
radiosity is compared to " . If they are bright enough, we check (in constant
link
time) if they are at least partially facing each other. If not, the pair is marked
as classi ed and no link is created. If they are facing, we compute an approxi-
mation of their form factor, without a visibility test. If the product of the form
factors and the radiosity is still larger than " , we mark the pair of polygons as
link
classi ed, and compute the visibility of the polygons. If they are visible, a link
is created using the form factors and visibility already computed. Thus a pair
of polygons can become classi ed either when a link is created, or when the two
polygons are determined to be invisible. Figure 3 shows a pseudo-code listing of
both the Initial Linking phase and the Main Loop in the original algorithm [5]
and Fig. 4 gives the equivalent listing in our algorithm.
32
Initial Linking
for each pair of polygons (p; q)
if pand are facing each other
q
and are at least partially visible from each other
if p q
link p and q
Main Loop
for each polygon p
foreach link l leaving p
if B F > "refine
re ne l
foreach link l leaving p
gather l
Fig. 3. The Original Algorithm
Initial Linking
for each pair of polygons (p; q)
record it as unclassi ed
Main Loop
for each unclassi ed pair of polygons (p; q)
if p and are facing each other
q
if Bp > "link or Bq > "link
compute the unoccluded FF
if B F > "link
link p and q
record (p; q) as classi ed
else record (p; q ) as classi ed
for each polygon p
for each link l leaving p
if B F > "refine
re ne l
for each link l leaving p
gather l
Fig. 4. Pseudo-code listing for our algorithm
The threshold "link used to establish top-levels interactions is not the same as
the threshold used for BF re nement, "refine . The main potential source of error
in our algorithm is an incomplete balance of energy. Since energy is transfered
across links, any polygon for which some top-level links have not been created
is retaining some energy, which is not propagated to the environment.
When recursive re nement is terminated because the product BF becomes
smaller than "refine , a link is always established, which carries some fraction of
this energy (the form factor estimate used in the comparison against "refine is an
upper bound of the form factor). On the other hand, when two top-level surfaces
are not linked because the product BF is smaller than "link , all the corresponding
energy is \lost". It is thus desirable to select a threshold such that "link < "refine .
33
In the examples shown below we used "link = "refine =5.

The classi ed/unclassi ed status of all pairs of input surfaces requires the
storage of n(n2,1) bits of information. We are currently working on compression
techniques to further reduce this cost6 .
3.2 Energy Balance
Since radiosity is mainly used for its ability to model light interre ection, it is
important to maintain energy consistency when modifying the algorithm. An
issue raised by the lazy linking strategy is that \missing" links, those that have
not been created because they were deemed insigni cant, do not participate
in energy transfers. Thus each gather step only propagates some of the energy
radiated by surfaces.
If the corresponding energy is simply ignored, the main result is that the
overall level of illumination is reduced. However a more disturbing e ect can
result for surfaces that have very few (or none) of their links actually established:
these surfaces will appear very dark because they will receive energy only from
the few surfaces that are linked with them.
The solution we advocate in closed scenes is the use of an ambient term
similar to the one proposed for progressive re nement radiosity [1]. However the
distribution of this ambient term to surfaces must be based on the estimated
fraction of their interaction with the world that is missing from the current
set of links. The sum of the form factors associated with all links leaving a
surface gives an estimate of the fraction of this surface's interactions that is
actually represented. Thus, in a closed scene, its complement to one represents
the missing link. Using this estimate to weight the distribution of the ambient
energy, the underestimation of radiosities can be partially corrected: surfaces
that have no links will use the entire ambient term, whereas surfaces with many
links will be only marginally a ected.
However, since form factors are based on approximate formulas, the sum of
all form-factors can di er from one, even for a normally linked surface. This
comes from our BF re nement strategy: we accept that the form-factor on a
link between two dark surfaces be over-estimated, or under-estimated. This may
results in energy loss, or creation. If the error we introduced by not linking some
surfaces is of the same order { or smaller { than the one due to our lack of
precision on the form-factor estimation, using the ambient term will not suce
to correct the energy inbalance.
To quantify the in uence of those errors on the overall balance of energy, we
compute the following estimate of the incorrect energy:
EET =
X j1 , F j B A
p p p (1)
p
6
The storage cost for the classi ed bit represents 62 kb for a thousand surfaces, 25
Mb for twenty thousand surfaces.
34
where Ap is the area of polygon p, Bp its radiosity and Fp the sum of the form
factors on all links leaving p. This can be compared to the total energy present
in the scene: X
ET = Bp Ap (2)
p
%
100 Full Oce, Orig.
90 Full Oce, Lazy
80 Dining Room, Orig.
70 Dining Room, Lazy
60
50
40
30
20
10
0 Iterations
1 2 3 4 5 6 7 8 9 10
Fig. 5. Incorrect Energy EET =ET
Figure 5 shows a plot of the ratio EET =ET for the Dining Room scene and
the Full Oce, for both the original algorithm and our algorithm. Note that the
error can be signi cant, but is mainly due to the original algorithm.
4 Reducing the Number of Links

The re nement of a link is based on the estimation of an associated error bound.
Various criteria have been used that correspond to di erent error metrics, in-
cluding the error in the form factor [4], the error in the radiosity transfer [5],
and the impact of the error in the radiosity transfer on the nal image [8].
All these criteria implicitly assume that the error in the form factor estimate
is equivalent to the magnitude of the form factor itself. While this is true for
in nitesimal quantities, in many instances it is possible to compute a reasonable
estimate of a relatively large form factor. In particular this is true in situations
of full visibility between a pair of surfaces.
Consider two patches p and q. A bi-directionnal link between them carries
two form factor estimates Fp;q and Fq;p . If we re ne the link by dividing p in
smaller patches pi of area Ai (e.g. in a quadtree), the de nition of the form factor
1 Z Z
Fu;v = A G(dAu; dAv )dAudAv (3)
u Au Av
where G is a geometric function, implies that the new form factors verify:
X !
Fp;q = A 1 Ai Fpi ;q (4)
p i
35
Fq;p =
XF (5)
q;p
i
i
These relations only concern the exact values of the form factors. However
they can be used to compare the new form factor estimates with the old ones,
and determine a posteriori wether re nement was actually required. If the sum
of the Fq;p is close to the old Fq;p , and they are not very di erent from one
i
another, little precision was gained by re ning p. Moreover, if Fp;q is close to the
average of the Fp ;q , and the Fp ;q are not too di erent from one another, then
i i
the re nement process did not introduce any additional information. In this case
we force p and q to interact at the current level, since the current estimates of
form factors are accurate enough.
In our implementation we only allow reduction of links in situations of full
visibility between surfaces. We compute the relative variation of the children
form factors, which we test against a new threshold "reduce. We also check that
the di erence between the old form factor Fp;q and the sum of the Fp ;q , and i
the di erence between Fq;p and the average of the Fq;p are both smaller than
"reduce.
i
If we note Fu;v our current estimation of the form-factor between two patches
u and v, and assuming we want to re ne a patch p in pi , we note:
min = mini (Fp ;q )
Fp;q i
min = mini(Fq;p )
Fq;p i
max = maxi (Fp ;q )

Fp;q Fq;p
max = maxi (Fq;p )
0 = 1 (P Ai Fp ;q ) 0 = P Fq;p
i i
Fp;q A i p i Fq;p i i
and we re ne p if any of the following is true:

F max ,F min
p;q p;q
> "reduce F max ,F min
q;p q;p
> "reduce
F max
p;q F max
q;p
jF ,F j jF ,F j
> "reduce > "reduce
0 0
p;q p;q q;p q;p
F 0
p;q F 0
q;p
The decision to cancel the subdivision of a link is based purely on geometrical

properties, therefore it is permanent. The link is marked as \un-re nable" for
the entire simulation.
The check whether a link is worth re ning involves the computation of form
factor estimates to and from all children of patch p. Thus the associated cost in
time is similar to that of actually performing the subdivision. If a single level
of re nement is avoided by this procedure, there will be little gain in computa-
tion time, but the reduction in the number of links will yield memory savings.
But if link reduction happens \early enough", several levels of re nement can
be avoided. In our test scenes, an implementation of this algorithm reduced sig-
ni cantly the number of quadtree nodes and links (see Fig. 6), with a slightly
smaller reduction in computation time because of the cost of the extra form
factor estimates (see Fig. 7).
36
5 Results
5.1 Lazy Linking
Figures 3 in coulour section shows the same scene as in Fig. 1, computed using
the lazy linking strategy of Sect. 3. Note that it is visually indistinguishable
from its original counterpart. Figure 4 plots the absolute value of the di erence
between these two images.
5.2 Reduction of the Number of Links
To measure the performance of the reduction criterion, we computed the ratio of
the number of quadtree nodes (surface elements) obtained with this criterion, to
the number of nodes obtained with the original algorithm. The graph in Fig. 6a
plots this ratio against the number of iterations. Note that an overall reduction
by nearly a factor of two is achieved for all scenes. Figure 6b shows a similar
ratio for the number of links. This global reduction of the number of objects
involved leads to a similar reduction of the memory needed by the algorithm,
thus making it more practical for scenes with more polygons.
% %
100 100 East Room
90 90 West Room
80 80 Dining Room
70 70 Hevea
60 60 Full Oce
50 50
40 40
30 30
20 20
10 10
0 Iterations 0 Iterations
1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 9 10
a. Percentage of Nodes b. Percentage of Links
Fig. 6. Percentage of nodes and links left after reduction.
Figure 7 shows the ratio of the computation times using the improved cri-
terion and the original algorithm. The reduction of the number of links has a
dramatic impact on running times, with speedups of more than 50%.
Figure 5 and 6 in colour section shows the image obtained after link reduc-
tion. Note the variation in the mesh on the walls, and the similarity of the shaded
image with the ones in Figs. 1 and 3. Figure 7 plots the absolute value of the
di erence between the image produced by the original algorithm and the image
obtained after link reduction. Note that part of the di erences are due to the
lazy linking strategy of Sect. 3. So Figure 8 shows the di erence between lazy
linking and reduction of the number of links.
5.3 Overall Performance Gains
Timing results are presented in Table 3. As expected, a signi cant speedup is
achieved, particularly for complex scenes. For all scenes, ten iterations with lazy
37
%
100 Dining Room
90 West Room
80 East Room
70 Hevea
60 Full Oce
50
40
30
20
10
0 Iterations
1 2 3 4 5 6 7 8 9 10
Fig. 7. Percentage of computation time using link reduction.
linking took less time to compute than the rst iteration alone with the original
algorithm. Finally, using lazy linking and reduction produces a useful image in
a matter of minutes even for the most complex scenes in our set.
Table 3. Time needed for ten iterations (and time for producing the rst image).
Name n Original Algorithm with Lazy Linking: : : and Reduction

Full oce 170 301 s (242 s) 287 s (234 s) 43 s (30 s)
Dining room 402 4824 s (4191 s) 4051 s (3911 s) 657 s (552 s)
East room 1006 587 s (378 s) 377 s (191 s) 193 s (59 s)
West room 1647 1017 s (752 s) 514 s (277 s) 270 s (101 s)
Hevea 2355 4253 s (2331 s) 1526 s (847 s) 543 s (122 s)
6 Conclusions and Discussion
We have presented the results of an experimental study conducted on a variety

of scenes, showing that visibility calculations represent the most expensive por-
tion of the computation. Two improvements of the hierarchical algorithm were
proposed. The rst modi cation creates top-level links lazily, only when it is
established that the proposed link will have a de nite impact on the simula-
tion. With this approach the hierarchical algorithm still remains quadratic in
the number of input surfaces, but no work and very little storage is devoted to
the initial linking phase. The resulting algorithm is more progressive in that it
produces useful images very quickly. Note that the quadratic cost in the number
of input surfaces can only be removed by clustering methods [7].
38
An improved subdivision criterion was introduced for situations of full vis-
ibility between surfaces, which allows a signi cant reduction of the number of
links.
Future work will include the simpli cation of the hierarchical structure due
to multiple sources and subsequent iterations. A surface that has been greatly
re ned because it receives a shadow from a given light source can be fully illu-
minated by a second source, and the shadow become washed in light.
Better error bounds, both on form factor magnitude and global energy trans-
fers, should allow even greater reduction of the number of links. Accurate visibil-
ity algorithms can be used to this end, by providing exact visibility information
between pairs of surfaces.
7 Acknowledgments
George Drettakis is a post-doc hosted by INRIA and supported by an ERCIM
fellowship. The hierarchical radiosity software was built on top of the original
program kindly provided by Pat Hanrahan.
References
1. Cohen, M. F., Chen, S. E., Wallace, J. R., Greenberg, D. P.: A Progressive Re-
nement Approach to Fast Radiosity Image Generation. siggraph (1988) 75{84
2. Drettakis, G., Fiume, E.: Accurate and Consistent Reconstruction of Illumination
Functions Using Structured Sampling. Computer Graphics Forum (Eurographics
1993 Conf. Issue) 273{284
3. Goral, C. M., Torrance, K. E., Greenberg, D. P., Bataille, B.: Modeling the Inter-
action of Light Between Di use Surfaces. siggraph (1984) 213{222
4. Hanrahan, P. M., Salzman, D.: A Rapid Hierarchical Radiosity Algorithm for Un-
occluded Environments. Eurographics Workshop on Photosimulation, Realism and
Physics in Computer Graphics, June 1990.
5. Hanrahan, P. M., Salzman, D., Auperle, L.: A Rapid Hierarchical Radiosity Al-
gorithm. siggraph (1991) 197{206
6. Lischinski, D., Tampieri, F., Greenberg, D. P.: Combining Hierarchical Radiosity
and Discontinuity Meshing. siggraph (1993)
7. Sillion, F.: Clustering and Volume Scattering for Hierarchical Radiosity calcula-
tions. Fifth Eurographics Workshop on Rendering, Darmstadt, June 1994 (in these
proceedings).
8. Smits, B. E., Arvo, J. R., Salesin, D. H.: An Importance-Driven Radiosity Algo-
rithm. siggraph (1992) 273{282
9. Teller, S. J., Hanrahan, P. M.: Global Visibility Algorithm for Illumination Com-
putations. siggraph (1993) 239{246
39
1. The Original Algorithm 2. The Grid Produced
3. With Lazy Linking 4. Di . Between 1. and 3. (8)
5. With Link Reduction 6. The Grid Produced
40
7. Di . Between 1. and 5. (8) 8. Di . Between 3. and 5. (8)
9. Full Oce 10. East Room
11. West Room 12. Hevea
41
2.7.3 Wavelet Radiosity on Arbitrary planar surfaces (EGWR 2000)

Auteurs : Nicolas H, François C et Laurent A
Conférence : 11e Eurographics Workshop on Rendering, Brno, République tchèque.
Date : juin 2000
Wavelet Radiosity on Arbitrary Planar
Surfaces
Nicolas Holzschuch1, François Cuny2 and Laurent Alonso1
ISA research team
LORIA3
Campus Scientifique, BP 239
54506 Vandœuvre-les-Nancy CEDEX, France
Abstract. Wavelet radiosity is, by its nature, restricted to parallelograms or tri-

angles. This paper presents an innovative technique enabling wavelet radiosity
computations on planar surfaces of arbitrary shape, including concave contours
or contours with holes. This technique replaces the need for triangulating such
complicated shapes, greatly reducing the complexity of the wavelet radiosity al-
gorithm and the computation time. It also gives a better approximation of the
radiosity function, resulting in better visual results. Our technique works by sep-
arating the radiosity function from the surface geometry, extending the radiosity
function defined on the original shape onto a simpler domain – a parallelogram –
better behaved for hierarchical refinement and wavelet computations.
1 Introduction
Wavelet radiosity [12] is one of the most interesting technique for global illumination
M
simulation. Recent research [7] has shown that higher order multi-wavelets ( 2 and M
3 ) are providing a very powerful tool for radiosity computations. Multi-wavelets can
approximate the radiosity function efficiently with a small number of coefficients. As a
consequence, they give a solution of better quality in a shorter time.
Multi-wavelets are defined only on parallelograms and triangles. This causes prob-
lems for radiosity computations on scenes coming from real world applications, such
as architectural scenes, or CAD scenes. In such scenes, planar surfaces have a fairly
complicated shape (see figure 1 and 12(a)). To do wavelet radiosity computations on
such scenes, we have to tessellate these planar shapes into triangles and parallelograms,
which results in a large number of input primitives (see figure 1(b)). Furthermore, this
decomposition is purely geometrical and was not based on the illumination, yet it will
influence our approximation of the radiosity function. In some cases, this geometric
decomposition results in a poor illumination solution (see figure 2(a) and 11(a)).
In the present paper, we separate the radiosity function from the surface geometry.
This enables us to exploit the strong approximating power of multi-wavelets for radios-
ity computations, with planar surfaces of arbitrary shape – including concave contours,
contours with holes or disjoint contours. Our algorithm results in a better approxima-
tion of the radiosity function (see figure 2(b) and 11(b)) with a smaller number of input
primitives, faster convergence and lower memory costs.
1 INRIA Lorraine.
2 Institut
National Polytechnique de Lorraine.
3 UMR no 7503 LORIA, a joint research laboratory between CNRS, Institut National Polytechnique de
Lorraine, INRIA, Université Henri Poincaré and Université Nancy 2.
1
(a) Detail of fig- (b) Tessellated: (c) Our algorithm:
ure 12(a) 32 triangles 1 surface
Fig. 1. Planar surfaces can have a fairly complicated shape
(a) Tessellated (b) Our Algorithm
Fig. 2. Wavelet radiosity on arbitrary planar surfaces (see also figure 11)
Our algorithm extends the radiosity function defined on the original shape onto a
simpler domain, better behaved for hierarchical refinement and wavelet computations.
This extension of the radiosity function is defined to be easily and efficiently approxi-
mated by multi-wavelets. The wavelet radiosity algorithm is modified to work with this
abstract representation of the radiosity function.
Our paper is organised as follows: in section 2, we will review previous work on
radiosity with planar surfaces of complicated shape. Section 3 is a detailed explanation
of our algorithm and of the modifications we brought to the wavelet radiosity algorithm.
Section 4 presents the experiments we have conducted with our algorithm on different
test scenes. Finally, section 5 presents our conclusions.
2
2 Previous work
The wavelet radiosity method was introduced by [12]. It is an extension of the radiosity
method [11] and especially of the hierarchical radiosity method [13]. It allows the use
of higher order basis functions in hierarchical radiosity.
In theory, higher order wavelets are a very powerful tool to approximate rapidly
varying functions with little coefficients. In practice, they have several drawbacks, es-
pecially in terms of memory costs. In the early implementations of wavelets bases in
the radiosity algorithm, these negative points were overcoming the positive theoretical
advantages [19]. Recent research [7] has shown that using new implementation meth-
ods [2, 3, 7, 18, 21] we can actually exploit the power of higher order wavelets, and that
their positive points are now largely overcoming the practical problems. They provide
a better approximation of the radiosity function, with a small number of coefficients,
resulting in faster convergence and smaller memory costs.
On the other hand, higher order wavelets, and especially multi-wavelets (M2 and
M3 ) are defined as the tensor products of one-dimensional wavelets. As a consequence,
they are defined over a square. The definition can easily be extended on parallelograms
or triangles, but higher order wavelets are not designed to describe the radiosity function
over complex surfaces.
Such complex surfaces can occur in the scenes on which we do global illumination
simulations. Especially, scenes constructed using CAD tools such as CSG geometry or
extrusion frequently contain complex planar surfaces, with curved boundaries or holes
in them.
The simplest solution to do radiosity computations on such surfaces is to tessellate
them into triangles, and to do radiosity computations on the result of the tessellation.
This method has several negative consequences on the radiosity algorithm:
It increases the number of input surfaces and the algorithmic complexity of the
radiosity algorithm is linked to the square of the number of input surfaces.
The tessellation is made before the radiosity computations and it influences these
computations. It can prevent us from reaching a good illumination solution.
The tessellation does not allow a hierarchical treatment over the original surface,
only over each triangle created by the tessellation. We can not fully exploit the
capabilities of hierarchical radiosity, and especially of wavelet radiosity.
By artificially subdividing an input surface into several smaller surfaces, we are
creating discontinuities. These discontinuities will have to be treated at some
point in the algorithm.
Tessellation can create poorly shaped triangles (see figure 1(b)), or slivers. These
slivers can cause Z-buffer artifacts when we visualise the radiosity solution, and
are harder to detect in visibility tests (e.g. ray-casting).
Some of these problems can be removed by using clustering [10, 16, 17]. In clus-
tering, neighbouring patches are grouped together, into a cluster. The cluster receives
radiosity and distributes it to the patches that it contains. On the other hand, current
clustering strategies are behaving poorly in scenes with many small patches located
close to each other [14]. It would probably be more efficient to apply clustering to the
original planar surfaces instead of applying it to the result of the tessellation.
A better grouping strategy is face-clustering [20]. In face-clustering, neighbouring
patches are grouped together according to their coplanarity. Yet even face-clustering
depends on the geometry created by the tessellation. Furthermore, it would not allow
us to exploit the strong approximating power of multi-wavelets.
3
if the original planar shape is polygonal:
– compute its convex hull (in linear time) using the
chain of points [9].
– compute the minimal enclosing parallelograms of
the convex hull (in linear time) using Schwarz et
al. [15].
– if the previous algorithm gives several enclos-
ing parallelograms, select the one that has angles
closer to 2 .
if the original shape is a curve, or contains curves:
– approximate the curve by a polygon
– compute the enclosing parallelogram of the poly-
gon
– compute the extrema of the curve in the directions
of the parallelogram.
– if needed, extend the parallelogram to include
these extrema.
Fig. 3. Our algorithm for finding an enclosing parallelogram.
Bouatouch et al. [5] designed a method for discontinuity meshing and multi-wavelets.
In effect, they are doing multi-wavelets computations over a non-square domain. How-
ever, their algorithm requires several expensive computations of push-pull coefficients.
Our algorithm avoids these computations.
Baum et al. [1] designed a method for radiosity computations with arbitrary planar
polygons, including polygons with holes. Their method ensures that the triangles pro-
duced are well-shaped, and suited for radiosity computations. Since it is designed for
non-hierarchical radiosity, it is done in a preliminary step, before all radiosity compu-
tations. Our method, designed for wavelet radiosity, acts during the global illumination
simulation, and adapts the refinement to the radiosity.
3 The Extended Domain Algorithm

In this section, we present our algorithm for wavelet radiosity computations on planar
surfaces of arbitrary shape. Our algorithm separates the radiosity function from the
surface geometry; we introduce a simple domain that will be used for radiosity compu-
tations. The radiosity function on the original surface is inferred from the radiosity on
the simple domain.
Section 3.1 explains how we select an extended domain for our computations. In
section 3.2, we describe how we extend the definition of the radiosity function over
this domain. The extended domain is then used in a wavelet radiosity algorithm like
an ordinary patch, with some specific adjustments. These adjustments are described in
section 3.3.
3.1 Selection of an extended domain

The first step of our algorithm is the choice of an extended domain, which we will use
for wavelet radiosity computations. This extended domain must obey two rules:
4
The radiosity function
The original The extended domain

polygon
Radiosity function Radiosity function extended

over the original polygon over the extended domain
Fig. 4. Extending the radiosity function over the extended domain.
it must enclose the original shape,

it must be well suited for wavelet radiosity computations.
Since multi-wavelets (M2 , M3 ...) are defined as tensor products of one-dimensional
wavelets, the second rule implies that the extended domain must be a parallelogram.
Moreover, if this parallelogram is closer to a rectangle, there will be less distortions in
the wavelet bases, resulting in a better approximation of the radiosity function. So we
want the angles of the parallelogram to be close to 2 .
Since only radiosity computations made on the original shape are of interest, we also
want the enclosing parallelogram to be as close as possible from the original shape.
Basically, any parallelogram satisfying these criterions could be used with our al-
gorithm. The algorithm used in our implementation is described in figure 3. The key
point is that this algorithm runs in linear time with respect to the number of vertices: the
convex hull of a chain of points in 2D can be computed in linear time [9], and Schwarz’s
algorithm for the enclosing parallelogram is also linear [15].
3.2 Extending the radiosity function over this domain

Once we have an extended domain, we need to define the radiosity function over this
domain. This extension of the radiosity function must obey two rules (see figure 4):
it must be equal to the original radiosity function over the original domain
it must be as simple and as smooth as possible, to be efficiently approximated by
multi-wavelets.
The second point is crucial: we have to compute the radiosity function over the
entire domain. Because of the hierarchical nature of wavelets, during the push-pull step
radiosity values computed at one point of the domain can influence other points of the
domain. So our extension of the radiosity function must be computed with the same
precision regardless of whether we are on the original surface or not.
Since the discontinuities of the radiosity function and its derivatives only come from
visibility discontinuities, we do not want to introduce more visibility discontinuities in
our extension. We define an extended visibility function V 0 : the visibility between a
point Q in space and a point P on the extended domain is defined as the visibility
between Q and P 0 , where P 0 is defined as the closest point from P on the original
planar surface:
0 0
V (Q; P ) = V (Q; P )
Of course, if P is already on the original planar surface, P 0 is equal to P . In that

case, the extended visibility function is equal to the standard visibility function.
5
(a) Trapezoidal map for the sur- (b) Using the trapezoidal map for visibility queries
face in figure 1(c)
Fig. 5. Trapezoidal map of an arrangement of line segments
The radiosity function on the extended domain is then defined as the radiosity func-
tion, as computed by the wavelet radiosity algorithm, using this extended visibility
function in the radiosity kernel.
3.3 Using the extended domain in the wavelet radiosity algorithm

In this section, we describe our adaptation of the wavelet radiosity algorithm to work
with our extended domains. We use a standard wavelet radiosity algorithm [7, 21]. The
core of the algorithm is left unchanged (refinement oracle, link storage). We will review
here the points that require some special attention:
reception and push-pull
visibility
emission
refinement
Reception and Push-Pull. The wavelet radiosity algorithm is a hierarchical algo-

rithm. During the push-pull step, radiosity values computed at one point of the patch
can influence the representation of radiosity for the entire patch. Hence, we want the
same precision for all radiosity computations over the entire patch.
For reception, the extended domain is therefore treated just like an ordinary patch.
All parts of the extended domain are receiving radiosity, with the same precision, re-
gardless of whether or not they belong to the original planar surface.
Similarly, we are doing the push-pull step over the entire extended domain, without
any reference to the original surface. Since our extended domain is by design a par-
allelogram, instead of an ordinary polygon, we do not have to compute any expensive
push-pull coefficients.
Visibility. For all the visibility computations, only the original planar surface can act
as an occluder. The extended domain is never used in visibility computations.
6
1
w0 w1 w2 α =
percent
covered
0 1
(a) on the unit segment 0 1

w0 w1 w2
(c) on the extended domain
(b) on the unit square
Fig. 6. The weights of the quadrature points can be seen as the area of a zone of influence.
To detect if the original surface is actually occluding an interaction, we compute

the trapezoidal map of an arrangement of line segments [4, 8] over the segments of the
contour of the original surface (see figure 5). For each trapeze, we store its status –
whether it is inside or outside of the original surface.
Using randomized algorithms, trapezoidal maps can be constructed in time
O(n log n), where n is the number of vertices. Once constructed, they can be queried in
O(log n) time. Since the construction algorithm is randomised, we shuffle the segments
of the original surface before building the trapezoidal map.
Visibility queries in our radiosity algorithm are visibility queries between two points,
either two quadrature points [7] or the closest point on the original surface from a
quadrature point (see section 3.2). We compute the intersection between the ray joining
these quadrature points and the supporting plane of the original surface, check whether
the intersection point is inside the bounding box of the extended domain, then check
whether it is inside the extended domain itself, then query the trapezoidal map to check
if it is inside or outside the original surface.
Emission. During the reception, the entire extended domain has received illumination.
The radiosity received over parts of the extended domain that are not included in the
original surface does not exist in reality, and it should not be sent back into the scene.
Otherwise, there would be an artificial creation of energy, violating the principle of
conservation of energy.
Because of the hierarchical nature of the wavelet radiosity algorithm, it would be
difficult to compute the exact part of this radiosity function that really exists. Instead,
we act on the weights of the quadrature points.
In the wavelet radiosity algorithm, all the transfer coefficients between an emitter
and a receiver are computed using quadratures. Quadratures allow the evaluation of
a complex integral by sampling the function being integrated at the quadrature points,
and multiplying the values by quadrature weights. Most implementations use Legendre-
Gauss quadratures.
Since the weights are positive and their sum is equal to 1, you can visualise them
as being the length of a zone of influence for the corresponding quadrature point (see
figure 6(a) for the one dimension case). The same applies in two dimensions: the
weights of the quadrature points can be seen as the area of a zone of influence (see
7
for each interaction s ! r:
for each quadrature point qi on the emitter s
Ai = area of influence of qi
i percentage of Ai that is inside the original emitter
qi0 = nearest point from qi on the emitter
for each quadrature point pj on the receiver r
p0j = nearest point on the receiver
V (qi0 ; p0j ) = visibility between qi0 and p0j
G(qi ; pj ) = radiosity kernel between qi and pj
Br + = i wi wj Bs (qi )V (qi ; pj )G(qi ; pj )
0 0
end for
end for
Fig. 7. Pseudo-code for wavelet radiosity emission using the extended domain.
full mesh empty mesh extended mesh
Fig. 8. Refinement of the extended domain
figure 6(b)); the weight of quadrature point pi;j is wi wj . Please note that these zones
of influence are not equal to the Voronoı̈ diagram of the quadrature points.
We suggest an extension to the Gaussian quadrature to take into account the fact
that the extended domain is not entirely covered by the actual emitter: the weight of a
quadrature point is multiplied by the proportion of its area of influence that is actually
covered by the emitter. For example, on figure 6(c), the weight of the quadrature point
in the hashed area should be w1 w2 . Since the fraction of its area of influence covered
by the emitter is , the weight used in the computation will be w1 w2 .
Our method allows for a quick treatment of low precision interactions, and for high
precision interactions, it tends toward the exact value. The more we refine an interac-
tion, the more precision we get on the radiosity on the emitter. We also get the exact
value if the zone of influence is entirely full or entirely empty.
In some cases, it can happen that the quadrature point falls outside the original
emitter. We use these quadrature points anyway.
Figure 7 shows the pseudo-code for radiosity emission using the extended domain.
Refinement. As with the original wavelet radiosity algorithm, the extended domain
can be subdivided if the interaction needs to be subdivided. The refinement oracle
deals with the extended domain as it would deal with any other patch. Because of the
hierarchical representation of the radiosity function in the wavelet radiosity algorithm,
we must have the same precision on the radiosity function over the entire domain. The
push-pull step can make parts of the domain that are not inside the original surface
influence our representation of the radiosity function over the entire domain.
8
Name # initial surfaces after tessellation ratio
Opera 17272 32429 1.88
Temple 7778 11087 1.43
Soda Hall 145454 201098 1.38
Table 1. Description of our test scenes
1e+06
200000
Our algorithm
800000
Tesselated
150000
600000
100000
400000
50000 200000
0 0
Opera Temple Soda Hall Opera Temple Soda Hall
(a) Before refinement (b) After refinement
Fig. 9. Number of patches in our test scenes
If the extended domain is refined, we deal with each part of the subdivided extended
domain as we would deal with the original extended domain. Two special cases can
appear (see figure 8):
if the result of the subdivision does not intersect at all with the original planar
surface, it is empty. Therefore it cannot play a role in the emission of radiosity,
but we keep computing the radiosity function over this patch.
if the result of the subdivision is totally included inside the original planar sur-
face. In that case, we are back to the standard wavelet radiosity algorithm on
parallelograms.
4 Experiments
We have tested our algorithm for wavelet radiosity on arbitrary planar surfaces on vari-
ous test scenes (see figure 12 for images of our test scenes, and table 1 and figure 9(a)
for their description). We were interested in a comparison between our algorithm and
the standard wavelet radiosity algorithm, acting on parallelograms and triangles. All
the computations were conducted on the same computer, a SGI Origin 2000, using a
parallel version [6] of our wavelet radiosity algorithm [7, 21].
In all these test scenes, the number of surfaces after tesselation is less than twice the
number of surfaces in the original scene. Much less than what could be expected from
figure 1. Most of the initial surfaces in the scenes are parallelograms or triangles, and
don’t require tesselation.
The first result is that our algorithm gives better visual quality than doing wavelet
radiosity computations on a tessellated surface (see figure 2 and 11). Our separation of
the radiosity function from the surface geometry results in a better approximation of the
radiosity function.
9
100 100
Original Original
50 Tesselated 50 Tesselated
20 20
10 10
5 5
2 2
1 1
0 1e4 2e4 5000
(a) Opera (b) Temple
100
Original
50 Tesselated
20
10
1
0 1e5 2e5
(c) Soda Hall
Fig. 10. Convergence rate (un-shot energy over initial energy) as a function of computation time
(in seconds).
Beyond this important result, we were interested in a comparison of computation

time and memory costs for both algorithms.
Obviously, our algorithm reduces the number of patches, and therefore the memory
cost of the initial scene (see table 1 and figure 9(a)). According to our computations, it
also reduces the number of patches in the final scene, although not in the same propor-
tions (see figure 9(b)).
The later result was to be expected: the wavelet radiosity algorithm will refine the
original scene a lot, resulting in numerous sub-patches. The number of patches in the
scene after the radiosity computations is mainly linked to the complexity of the radios-
ity function itself, and not to the complexity of the scene. However, it appears that
our algorithm results in more efficient refinement, since we reach convergence with a
smaller number of patches. In some scenes, we can reach convergence with 30 % less
patches.
The fact that our algorithm allows for more efficient refinement also appears in the
10
computation times (see figure 10). In our experiments, we measure the energy initially
present in the scene and the energy that hasn’t yet been propagated in the scene. The ra-
tio of these two measures tells us how far we are from complete convergence. Figure 10
displays this ratio as a function of the computation time, both for our algorithm and for
the wavelet radiosity algorithm operating on a tessellated version of the scene. Our
algorithm ensures a faster convergence on all our test scenes. The speedup is of about
30 %, which shows that acting on the original planar surface instead of the tessellated
surface gives more efficient refinement.
5 Conclusion
In conclusion, we have presented a method to separate the radiosity function from the
surface geometry. This method removes the need to tessellate complex planar surfaces,
resulting in a more efficient global illumination simulation, with better visual quality.
Our method results in faster convergence, with smaller memory costs.
In our future work, we want to extend this algorithm to discontinuity meshing. Dis-
continuity meshing introduces a geometric model of the discontinuities of the radiosity
function and its derivatives, the discontinuity mesh. The discontinuity mesh provides
optimal meshing for radiosity computations near the discontinuities. The discontinuity
mesh is a complicated structure, and it can influence radiosity computations away from
the discontinuities, for example because of triangulation. We want to use our algorithm
to smoothly integrate the discontinuity mesh in the natural subdivision for multi-wavelet
radiosity, removing the need to tesselate the discontinuity mesh.
We also want to explore a combination of our algorithm with clustering techniques.
First, our algorithm could be used to group together neighbouring coplanar patches in
a natural way. This would help the clustering strategy [14] and give a more accurate
result. Second, we would like to integrate our algorithm with face-clustering, bringing
multi-wavelets into face-clusters.
Finally, our separation of the radiosity function from the surface geometry could
also be used to compute radiosity using multi-wavelets on curved surfaces. There are
several parametric surfaces for which the limits of the parametric space are not square.
We suggest using our algorithm to enclose these limits into a square limit, making it
easier for multi-wavelets.
6 Acknowledgements
Permission to use the Soda Hall model4 was kindly given by Prof. Carlo Sequin.
Jean-Claude Paul has started and motivated all this research. The authors would like
to thank him for his kind direction, support and encouragements.
References
1. D. R. Baum, S. Mann, K. P. Smith, and J. M. Winget. Making Radiosity Usable: Automatic
Preprocessing and Meshing Techniques for the Generation of Accurate Radiosity Solutions.
Computer Graphics (ACM SIGGRAPH ’91 Proceedings), 25(4):51–60, July 1991.
2. P. Bekaert and Y. Willems. Error Control for Radiosity. In Rendering Techniques ’96 (Pro-
ceedings of the Seventh Eurographics Workshop on Rendering), pages 153–164, New York,
NY, 1996. Springer-Verlag/Wien.
4 The Soda Hall model is available on the web, at http://www. s.berkeley.edu/~kofler.
11
3. P. Bekaert and Y. D. Willems. Hirad: A Hierarchical Higher Order Radiosity Implementa-
tion. In Proceedings of the Twelfth Spring Conference on Computer Graphics (SCCG ’96),
Bratislava, Slovakia, June 1996. Comenius University Press.
4. J.-D. Boissonnat and M. Yvinec. Algorithmic Geometry. Cambridge University Press, 1998.
5. K. Bouatouch and S. N. Pattanaik. Discontinuity Meshing and Hierarchical Multiwavelet
Radiosity. In W. A. Davis and P. Prusinkiewicz, editors, Proceedings of Graphics Interface
’95, pages 109–115, San Francisco, CA, May 1995. Morgan Kaufmann.
6. X. Cavin, L. Alonso, and J.-C. Paul. Parallel Wavelet Radiosity. In Second Eurographics
Workshop on Parallel Graphics and Visualisation, pages 61–75, Rennes, France, Sept. 1998.
7. F. Cuny, L. Alonso, and N. Holzschuch. A novel approach makes higher or-
der wavelets really efficient for radiosity. Computer Graphics Forum (Euro-
graphics 2000 Proceedings), 19(3), Sept. 2000. To appear. Available from
http://www.loria.fr/˜holzschu/Publications/paper20.pdf.
8. O. Devillers, M. Teillaud, and M. Yvinec. Dynamic location in an arrangement of line
segments in the plane. Algorithms Review, 2(3):89–103, 1992.
9. H. Edelsbrunner. Algorithms in Combinatorial Geometry, volume 10 of EATCS Monographs
on Theoretical Computer Science. Springer-Verlag, Nov. 1987.
10. S. Gibson and R. J. Hubbold. Efficient hierarchical refinement and clustering for radiosity in
complex environments. Computer Graphics Forum, 15(5):297–310, Dec. 1996.
11. C. M. Goral, K. E. Torrance, D. P. Greenberg, and B. Battaile. Modelling the Interaction of
Light Between Diffuse Surfaces. Computer Graphics (ACM SIGGRAPH ’84 Proceedings),
18(3):212–222, July 1984.
12. S. J. Gortler, P. Schroder, M. F. Cohen, and P. Hanrahan. Wavelet Radiosity. In Computer
Graphics Proceedings, Annual Conference Series, 1993 (ACM SIGGRAPH ’93 Proceed-
ings), pages 221–230, 1993.
13. P. Hanrahan, D. Salzman, and L. Aupperle. A Rapid Hierarchical Radiosity Algorithm.
Computer Graphics (ACM SIGGRAPH ’91 Proceedings), 25(4):197–206, July 1991.
14. J. M. Hasenfratz, C. Damez, F. Sillion, and G. Drettakis. A practical analysis of clustering
strategies for hierarchical radiosity. Computer Graphics Forum (Eurographics ’99 Proceed-
ings), 18(3):C–221–C–232, Sept. 1999.
15. C. Schwarz, J. Teich, A. Vainshtein, E. Welzl, and B. L. Evans. Minimal enclosing parallelo-
gram with application. In Proc. 11th Annu. ACM Sympos. Comput. Geom., pages C34–C35,
1995.
16. F. Sillion. A Unified Hierarchical Algorithm for Global Illumination with Scattering Volumes
and Object Clusters. IEEE Transactions on Visualization and Computer Graphics, 1(3), Sept.
1995.
17. B. Smits, J. Arvo, and D. Greenberg. A Clustering Algorithm for Radiosity in Complex
Environments. In Computer Graphics Proceedings, Annual Conference Series, 1994 (ACM
SIGGRAPH ’94 Proceedings), pages 435–442, 1994.
18. M. Stamminger, H. Schirmacher, P. Slusallek, and H.-P. Seidel. Getting rid of links in hierar-
chical radiosity. Computer Graphics Journal (Proc. Eurographics ’98), 17(3):C165–C174,
Sept. 1998.
19. A. Willmott and P. Heckbert. An empirical comparison of progressive and wavelet ra-
diosity. In J. Dorsey and P. Slusallek, editors, Rendering Techniques ’97 (Proceedings of
the Eighth Eurographics Workshop on Rendering), pages 175–186, New York, NY, 1997.
Springer Wien. ISBN 3-211-83001-4.
20. A. Willmott, P. Heckbert, and M. Garland. Face cluster radiosity. In Rendering Techniques
’99, pages 293–304, New York, NY, 1999. Springer Wien.
21. C. Winkler. Expérimentation d’algorithmes de calcul de radiosité à base d’ondelettes. Thèse
d’université, Institut National Polytechnique de Lorraine, 1998.
12
(a) Tessellated (b) Our Algorithm
Fig. 11. Using our algorithm for wavelet radiosity on arbitrary planar surfaces (see also figure 2)
(a) Opera
(b) Temple (c) Soda Hall
Fig. 12. Our test scenes
13
2.7.4 A novel approach makes higher order wavelets really efficient for radiosity (EG
2000)
Auteurs : François C, Laurent A et Nicolas H
Conférence : Eurographics 2000, Interlaken, Suisse. Cet article a également été publié dans
Computer Graphics Forum, vol. 19, no 3.
Date : septembre 2000
EUROGRAPHICS 2000 / M. Gross and F.R.A. Hopgood Volume 19 (2000), Number 3
(Guest Editors)
A novel approach makes higher order wavelets really efficient

for radiosity
François Cuny† , Laurent Alonso‡ and Nicolas Holzschuch‡
ISA research team

LORIA§
Campus Scientifique, BP 239
54506 Vandœuvre-les-Nancy CEDEX, France
Abstract
Since wavelets were introduced in the radiosity algorithm5 , surprisingly little research has been devoted to higher
order wavelets and their use in radiosity algorithms. A previous study13 has shown that wavelet radiosity, and
especially higher order wavelet radiosity was not bringing significant improvements over hierarchical radiosity
and was having a very important extra memory cost, thus prohibiting any effective computation. In this paper,
we present a new implementation of wavelets in the radiosity algorithm, that is substantially different from pre-
vious implementations in several key areas (refinement oracle, link storage, resolution algorithm). We show that,
with this implementation, higher order wavelets are actually bringing an improvement over standard hierarchical
radiosity and lower order wavelets.
1. Introduction scene and the quality of the simulation is directly linked to

the precision we have on the radiosity function.
Global illumination simulation is essential for realistic ren-
dering of virtual scenes. In global illumination, we take The radiosity function is usually computed using finite
the geometric definition of a virtual scene, we simulate the element methods. The most efficient of these methods are
propagation of light throughout the scene, modelling its vi- hierarchical and use a multi-scale representation of the ra-
sual and physical effects, such as shadows and reflections. diosity function6 to reduce the algorithmic complexity of
Global illumination simulation has applications in all the ar- the computations. Hierarchical methods have been extended
eas where a realistic rendering is interesting, such as archi- with wavelets5 . The simplest wavelet base is piecewise con-
tecture, archeology, urban planning and computer-aided de- stant (Haar wavelets), but many other wavelet bases can be
sign. used in radiosity computations.
The radiosity method is one of the methods used in global
illumination simulation. In the radiosity method, we model In theory, higher order wavelets are providing a more
the exchanges of energy between the objects of the scene in compact representation of complex functions. Hence they
order to compute the radiant energy per unit area (or radios- use less memory and give a smoother representation of the
ity) on all the surfaces of all the objects in the scene. The function, that looks better on display. Higher order wavelets
radiosity can be used directly to display the objects of the should be the ideal choice for radiosity computations.
In practice, the memory required to store the interactions

between objects grows with the fourth power of the order
† Institut National Polytechnique de Lorraine. of the wavelet base, prohibiting any real computation with
‡ INRIA Lorraine. complex wavelets. Furthermore, Haar wavelets allow many
§ UMR no 7503 LORIA, a joint research laboratory between simplifications and optimisations that exploit their great sim-
CNRS, Institut National Polytechnique de Lorraine, INRIA, Uni- pleness. If these optimisations are kept with higher order
versité Henri Poincaré and Université Nancy 2. wavelets, they can inhibit some of their properties. In one
c The Eurographics Association and Blackwell Publishers 2000. Published by Blackwell

Publishers, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA
02148, USA.
57
F. Cuny, L. Alonso and N. Holzschuch / A novel approach makes higher order wavelets really efficient for radiosity
experimental study13 the practical problems of higher order all the other objects in the scene. K(x, y) is the kernel of the
wavelets were largely overcoming their theoretical benefits. equation, and expresses the part of radiosity emitted by point
y that reaches x.
However, these practical problems are not inherent to
higher order wavelets themselves, only to their implemen- To compute the radiosity function, we use finite element
tation in the radiosity method. In this paper, we present a methods. The function we want to compute, B(x), is first
new approach to higher order wavelets, that is substantially projected onto a finite set of basis functions φi :
different from previous implementations in several key ar-
B̃(x) = ∑ αi φi (x) (2)
eas, such as refinement oracle, link storage and resolution i
algorithm. Our approach has been developed by taking a
complete look at higher order wavelets and at the way they Our goal is to compute the best approximation of the
should integrate with the radiosity method. With this imple- radiosity function, given the set of basis functions φi . We
mentation, we show that the theoretical advantages of higher must also find the optimal set of basis functions. A possibil-
order wavelets are overcoming the practical problems that ity is to use wavelets. Wavelets are mathematical functions
have been encountered before. Higher order wavelets are that provide a multi-resolution analysis. They allow a multi-
now providing a better approximation of the radiosity func- scale representation of the radiosity function on every object.
tion, with faster convergence to the solution. They also re- This multi-scale representation can be used in the resolution
quire less memory for storage. algorithm6, 5 , allowing us to switch between different repre-
Our paper is organised as follows: in section 2, we review sentations of the radiosity function, depending on the degree
the previous research on wavelet radiosity and higher order of precision required. This multi-scale resolution results in a
wavelets. Then in section 3, we present our implementation, great reduction of the complexity of the algorithm6 .
concentrating on the areas where it is substantially different There are two broad classes of resolution algorithm: gath-
from previous implementations: the refinement oracle, not ering and shooting. In gathering, each patch updates its own
storing the interactions and the consequences it has on the radiosity function using the energy sent by all the other
resolution algorithm. patches, whereas in shooting each patch sends energy into
The main result that we present in this paper is the exper- the scene, and all the other patches update their own radios-
imental study we have conducted on higher order wavelets ity. In both cases, the energy is carried along links, that are
with our implementation. Section 4 is devoted to this exper- established by the wavelet radiosity algorithm, and used to
imentation and its results, namely that higher order wavelets store the information related to the interaction. A key ele-
are providing a faster convergence, a solution of better qual- ment of the wavelet radiosity algorithm is the refinement or-
ity and require less memory for their computations. Finally, acle, that tells which levels of the different multi-scale rep-
section 5 presents our conclusions and future areas of re- resentation of radiosity should interact.
search. Finally, before each energy propagation, we must update
the multi-scale representation of radiosity, so that each level
2. Previous work contains a representation of all the energy that has been re-
ceived by the object at all the other levels. This is done dur-
In this section we review the basis of the wavelet radiosity ing the push-pull phase.
algorithm (section 2.1), then we present the implementation
details of previous implementations for key areas of the al-
gorithm (section 2.2): the refinement oracle, the visibility es- 2.2. Details of previous implementations
timation and the memory problem. This review will help for 2.2.1. Refinement oracles
the presentation of our own implementation of these areas,
in section 3. The refinement oracle is one of the most important parts in
hierarchical radiosity algorithms. Since it tells at which level
the interaction should be established, it has a strong influ-
2.1. The wavelet radiosity algorithm ence on both the quality of the radiosity solution and the
In the radiosity method, we try to solve the global illumina- time spent doing the computations. A poor refinement ora-
tion equation, restricted to diffuse surfaces with no partici- cle will give poor results, or will spend a lot of time doing
pating media: unnecessary computations.
Z In theory, the decision whether or not to refine a given in-
B(x) = E(x) + ρ(x) B(y)K(x, y)dy (1) teraction could only be taken with the full knowledge of the
S
complete solution. However, the refinement oracle must take
Eq. 1 expresses the fact that the radiosity at a given point the decision using only the information that is locally avail-
x in the scene, B(x), is equal to the radiosity emitted by x able: the energy to be sent, and the geometric configuration
alone, E(x), plus the radiosity reflected by x, coming from of the sender and the receiver.
c The Eurographics Association and Blackwell Publishers 2000.
58
Given two patches in the scene, let us consider their inter- is much costlier than computing a kernel sample without
action: patch s, with its current approximation of the radios- visibility. As a consequence, estimating the visibility be-
ity function B̃s (y), is sending light toward patch r. Using a tween two patches is the most costly operation in wavelet
combination of eq. 1 and eq. 2, we can express the contribu- radiosity9 . Several methods have been developed in order to
tion of patch s to the radiosity of patch r: provide a quick estimate of visibility, sometimes at the ex-
Z pense of reliability.
Bs→r (x) = ρ ∑ αi φi (y)K(x, y)dy (3)
i s The easiest method6, 5 assumes a constant visibility be-
tween the patches. The constant is equal to 1 for fully visible
For the interaction between the two patches we will use patches, 0 for fully invisible patches, and is in ]0, 1[ for par-
the relationship coefficients, Ci j : tially visible patches. It is estimated by computing several
jittered visibility samples between the patches and averag-
Bs→r (x) = ∑ β j φ j (x) ing the results.
j
Z
Another method computes exact visibility between the
βj = Bs→r (x)φ j (x)
r corners of the patches, and interpolates between these val-
Z Z
ues for points located between the corners, using barycentric
β j = ρ ∑ αi φi (y)φ j (x)K(x, y)dydx
r s coordinates.
i
β j = ρ ∑ αiCi j Shadow masks16, 11 have also been used in wavelet radios-
i ity computations. In theory, shadow masks allow the decou-
pling of visibility from radiosity transport, and therefore a
These Ci j coefficients express the relationship between better compression of the radiosity transport operator, thus
the basis functions φ j (x) and φi (y). Computing the Ci j re- reducing the memory cost.
quires the computation of a complex integral, which cannot
be computed analytically and must be approximated, usually All these methods attempt to approximate visibility by
using quadratures. computing less visibility samples than kernel samples, in or-
der to reduce the cost of visibility in wavelet radiosity. Ac-
In most current implementations, refinement oracle esti- cording to an experimental study of wavelet radiosity con-
mate the error on this approximation of the Ci j . This error ducted by Willmott13, 14 the result is a poor approximation of
is then multiplied by the energy of the sender, to avoid re- the radiosity function, especially near shadow boundaries.
fining interactions that are not carrying significant energy.
Another method is to compute exactly one visibility sam-
There are several ways to estimate the error on the Ci j coef-
ple for each kernel sample. It has been used at least by
ficients: pure heuristics6 , sampling the Ci j at several sample
Gershbein4 , although it is not explicitely stated in his paper.
points5 and a conservative method giving an upper-bound on
According to our own experience, as well as Willmott ex-
the propagation of the energy10, 8 .
tended study14 , this method gives better visual results. Fur-
A recurrent problem with current refinement oracles is thermore, it gives more numerical precision. On the other
that they concentrate on the Ci j coefficients. This provides hand, it can introduce some artefacts, because the visibility
a conservative analysis, but it can be too cautious, especially samples are forced to be in a regular pattern.
with higher order basis functions. The Ci j coefficients are In our implementation, we used one visibility sample for
usually bound with constant functions and hence so is the each kernel sample, because we were looking for numeri-
radiosity function. Such a binding does not take into account cal accuracy, and because the artefacts are removed by our
the capacity of higher order wavelets to model rapidly vary- refinement oracle.
ing functions in a compact way. To take this into account,
we need to move the radiosity function inside the refinement 2.2.3. Memory usage
oracle. In section 3.1, we present a refinement oracle that
addresses this problem. Since the computation of the Ci j coefficients can be rather
long, they are usually stored once they have been computed,
2.2.2. Visibility estimations so that they can be reused. The storage is done on the link
between s and r.
Discontinuities of the radiosity function and its deriva-
An important problem with previous wavelet radiosity im-
tives are only caused by changes in the visibility between
plementations is the memory required for this storage. If we
objects7 . Therefore, great care must be taken when adding
use wavelet bases of the order m, then we have m one dimen-
visibility information to the radiosity algorithm.
sional functions in the wavelet base. For two dimensions,
As we have seen, we use a quadrature to compute the such as the surface of objects in our virtual scene, we have
Ci j coefficients. This quadrature requires several estimates m2 functions in the base. As a consequence, storing the inter-
of the kernel function K(x, y) and therefore of the visibil- action between two patches requires computing and storing
ity between points x and y. Computing a visibility sample m4 Ci j coefficients.
59
Hence, the memory usage of wavelet radiosity grows with for each interaction s → r:
the fourth power of the wavelet base used. Wavelets of order compute the radiosity function on the receiver: Bs→r (x)
3 will have a memory usage almost two orders of magnitude for each control point Pi
higher than wavelets with 1 vanishing moment. In an exper- compute the radiosity at this control point directly: Bs→Pi
imental study of wavelet radiosity, Willmott13 showed that compare with interpolated value,
this memory usage was effectively prohibiting any serious store the difference:
δi = |Bs→r (Pi ) − Bs→Pi |
computation with higher order wavelets.
end for
In 1998, Stamminger12 showed that it was possible to take the Ln norm of the differences:
eliminate completely the storage of the interactions in hier- δB = kδi kn
archical radiosity. His study was only made for hierarchical compare with refinement threshold
end for
radiosity, but it could be extended to wavelet radiosity, and
it would remove the worst problem of radiosity with higher
Figure 1: Our refinement oracle
order wavelets. In section 3.2, we review the consequences
of not storing links on the wavelet radiosity algorithm.
Not storing links is some kind of a trade-off between mem-

3. A novel approach to higher order wavelets in the ory and time: by not storing links, we are saving memory.
radiosity algorithm However, the information that has not been stored will prob-
Since experimental studies conducted with previous imple- ably have to be recomputed at some stage in the algorithm,
mentations of wavelet radiosity have shown that higher or- which will cost time.
der wavelets are behaving more poorly than Haar wavelets, Not storing links also has consequences on the structure of
we need to review the key points of our implementation that the algorithm itself. The main consequence is on the choice
differ from previous implementations: the refinement oracle between gathering and shooting.
and getting rid of interaction storage, along with the conse-
quences it has on the algorithm. Gathering sends energy from all the patches to all the other
patches at each iteration. All the links are used during a
All the elements described in this section have been im- given iteration.
plemented and tested thanks to our radiosity testbed soft- Shooting sends the unshot energy from one patch to all the
ware, Candela15 . other patches. At a given point in time, only the links from
the shooting patch to all the patches are being used. The
3.1. The refinement oracle shooting patch is then send to the bottom of the shooting
queue, and the links will not be re-used until it gets back
Instead of estimating the errors on the propagation coef- to the top of the shooting queue.
ficients, we estimate the error on the propagated energy
directly. Our refinement oracle is quite similar to that of Therefore, if we chose not to store links, it makes more
Bekaert2, 3 . sense to use shooting than to use gathering. But the reverse
is also true: if you use shooting instead of gathering, it also
To estimate the errors on the radiosity function, we use makes more sense not to store links.
control points on the receiver. These control points are lo-
cated so that they provide meaningful information: they are in gathering, the optimal link distribution for one iteration
different from quadrature points, and their number depends can be computed by refining the link distribution from
on the size of the receiver. Some of the control points are the previous distribution, because the radiosity gathered
located on the boundary of the receiver, in order to ensure at one point can only grow with subsequent iterations.
continuity with neighbouring patches. in shooting, the energy carried along the links is only
the unshot energy at the shooting patch. Its distribution
Our refinement oracle is summed up in fig. 1. The radios- changes completely for each use of the patch. As a conse-
ity values at the control points Bs→Pi are computed by direct quence, the optimal link distribution has no relation with
integration of eq. 3 at point x = Pi , using a quadrature. To the links computed for previous iterations.
take the norm of the errors at the control points, we can use
any norm, such as the L1 norm, the L2 norm, the L∞ norm.
4. Comparison of several wavelet bases
We have found that all these norms are giving similar results
for refinement. In this section, we present our experimental comparison of
different wavelet bases. We start with a description of the
experimentation protocol in section 4.1. We then present the
3.2. Not using links and the consequences
results of our experiments in section 4.2. Discussion of these
In order to reduce the memory footprint of the radiosity algo- results and comparison with previous studies follows in sec-
rithm, we have chosen not to store links, as in Stamminger12 . tion 4.3.
60
4.1. The experimentation protocol an energetic difference of 10 % between the energetic distri-
butions of the computed solution and the reference solution.
4.1.1. The wavelet bases
According to our experiments, this measure of global er-
We wanted to use our implementation of wavelet radiosity
ror is consistent, and gives comparable visual results on all
for a comparison of several wavelet bases. We have used the
the test scenes. For example, a global error of 10−1 will
first three multi-wavelets bases : M1 (Haar), M2 and M3 .
always give a poor result (see fig. 3(a)), a global error of
We use the Mn multi-wavelets as they were previously 10−2 will give a better result, but still with visible artefacts
defined 1, 5 : the smoothing functions for Mn are defined by at shadow boundaries (see fig. 3(b)), and a global error of
tensorial products of the first n Legendre polynomials. 10−3 will always give a correct result (see fig. 3(c)). In our
experience, (see fig. 3) the global error must be lower than
We have not used flatlets bases (Fn ), because although
5.10−3 in order to get visually acceptable results.
they have n vanishing moments, they are only piecewise con-
stant, and therefore do not provide a better approximation As it has been pointed out12 , we have also found that this
than Haar wavelets with further refinement. global error is closely correlated to the refinement threshold
on each interaction (the local error).
4.1.2. The test scenes
Our tests have been conducted on several test scenes, rang- 4.1.5. Experimentation details
ing from simple scenes, such as the blocker (see fig. 2(a)) In all our experiments, we have used the same computer, a
to moderately complex scenes, such as the class room (see SGI Octane working at 225 MHz, with 256 Mb of RAM.
fig. 2(d)). All our test scenes are depicted on fig. 2, with their
number of input polygons.
4.2. Results
4.1.3. Displaying the results
4.2.1. Visual comparison of our three wavelet bases
All the figures in this paper are depicting the exact results of
The first test to conduct is whether higher order wavelets are
the computations, without any post-processing of any kind:
giving a better visual impression. In previous tests13 , higher
the radiosity function is displayed exactly as it has been com-
order wavelets were unable to provide a correct approxima-
puted. Specifically, there has been no attempt to ensure con-
tion of the radiosity function, especially near shadow bound-
tinuity of the radiosity function, except in the refinement or-
aries. Shadow boundaries are very important because they
acle. Similarly, we haven’t balanced or anchored the com-
have a large impact on the visual perception of the scene.
puted mesh. So, for example, in fig. 4(c), the continuity of
the radiosity function is due only to the refinement oracle Our first experiment focuses solely on this problem. We
depicted in section 3.1. have computed direct illumination from an area light source
to a planar receiver, with an occluder partially blocking the
M3 wavelets can result in quadrically varying functions,
exchange of light. All wavelet bases were used with the same
which can not be displayed on our graphics engines. To dis-
computation time (66 s).
play these functions, we subdivide each patch into four sub-
patches, on which we compute four linearly varying func- Fig. 4 shows the radiosity function computed for each
tions approximating the quadrically varying radiosity func- wavelet base, along with the mesh used for the computation.
tion. Two elements appear clearly: higher order wavelets are pro-
viding a much more compact representation of the radios-
4.1.4. Computing the error ity function, even near shadow boundaries, and the radiosity
function computed with M2 and M3 wavelets is smoother
In order to compute the computational error, we have com-
than the function computed with Haar wavelets.
puted a reference solution, using M2 wavelets, with a very
small refinement threshold. Furthermore, the minimal patch Haar wavelets are usually not displayed as such, but us-
area in the reference solution was 16 times smaller than ing some sort of post-processing, such as Gouraud shad-
the minimal patch area in the computed solutions. We also ing. Fig. 5 shows the result of applying Gouraud shading to
checked that with all the wavelet bases, the computed solu- fig. 4(a). As you can see, although it can hide some of the dis-
tions did converge to the reference solution. continuities, Gouraud shading can also introduce some new
artefacts.
We have measured the energetic difference between this
reference solution and the computed solutions. In order to Judging from fig. 4, higher order wavelets are better for
have comparable results on all our test scenes, this differ- radiosity computations than lower order wavelets. This is
ence is divided by the total energy of the scene. It is this only a qualitative results and must be confirmed by quanti-
ratio of the energetic difference over the total energy that we tative studies; that is the object of the coming sections (4.2.2
call global error. Thus, a global error of 10−1 means there is and 4.2.3).
61
(a) Blocker (3) (b) Tube (5) (c) Dining room (402) (d) Classroom (3153)
Figure 2: Our test scenes, with their number of input polygons
(a) 10−1 (b) 10−2 (c) 10−3
Figure 3: Visual comparison of results for different values of global error
(a) Haar (b) M2 (c) M3
Figure 4: Visual comparison of results for our three wavelet bases
62
Figure 5: Applying Gouraud shading to Haar wavelets
1
Haar Haar
M2 M2
M3 M3
0.01 0.1
0.01
0.001
0.001
0.1 1 10 100 1 10 100
(a) Blocker (b) Tube
0.1
Haar Haar
0.1 M2 M2
M3 M3
0.01
0.01
0.001 0.001
1 10 100 1000 10 100 1000 10000 100000
(c) Dining room (d) Classroom
Figure 6: Global error with respect to computation time (in s)
63
4.2.2. Computation time a given wavelet base degrades quickly if we try to bring the
global error level below a certain threshold. This effect ap-
Fig. 6 shows the relationship between global error and com-
pears very clearly on fig. 7(c) and 7(d). There seems to be
putation time for our four test scenes and our three wavelet
a maximum degree of precision for each wavelet base, and
bases.
the wavelet base can only conduct global illumination simu-
The most important point that can be extracted from lations below this degree. Be aware, however, that the degra-
these experimental data is that with our implementation, dation is made more impressive on fig. 7 by the fact that
higher order wavelets are performing better than lower or- we are using a logarithmic scale for global error and a non-
der wavelets. They obtain results of higher quality, and they logarithmic scale for memory use. Furthermore, the degrada-
are faster: to get a visually acceptable result on the classroom tion is quite small when it is compared to the total memory
scene (global error below 5.10−3 ), M3 wavelets use 104 s used: between 10 % and 20 %. Since the effect appears in a
(see fig. 6(d)). In the same computation time, Haar wavelets similar way for all the wavelet bases used in the test we think
only reach a global error level of 10−2 . This test scene is it could be a general effect, and apply to all wavelet bases.
our hardest test scene, with lots of shadow boundaries. It is
Please note that the fact that higher order wavelets have a
on such test scenes that higher order wavelets were behaving
lower memory use than lower order wavelets is actually quite
poorly with previous experimentations13 .
logical. Higher order wavelets are providing a more power-
The advantage of higher order wavelets is more significant ful tool for approximating complex functions, with a higher
on high precision computations and on complex scenes. The dimensional space for the approximation. Furthermore, they
more precision you need on your computations, the faster have more vanishing moments, so their representation of a
they are, compared to lower order wavelets. given complex function is more compact and requires less
On the contrary, for quick approximations, M2 wavelets coefficients. Our experiments are therefore bringing practi-
are performing better than M3 wavelets. The same applies cal results in connection with theoretical expectations.
to Haar wavelets compared to M2 wavelets, for very quick The fact that lower order wavelets are more compact for
and crude approximations. low precision computations was also to be expected from
Each wavelet base has an area of competence, where it theory. Low precision computations are, by nature, not tak-
outperforms all the other wavelet bases: Haar wavelets are ing into account all the complexity of the radiosity function.
the most efficient base for global error above 10−1 — which As a consequence, they provide a very simple function, that
corresponds to a simulation with many artefacts still visible is also easy to approximate, especially for simple wavelet
(see fig. 3(a)). M2 wavelets are better than all the other bases bases.
for global error between 10−1 and (roughly) 5.10−3 , and
M3 wavelets are the best for global error below 5.10−3 . 4.3. Discussion and comparison with previous studies
Despite the fact that we are reaching opposite conclusions,
4.2.3. Memory use
we would like to point out that our study is actually consis-
The key problem with higher order wavelets in previous tent with the previous study by Willmott13, 14 .
studies13 was their high memory use, that effectively prohib-
In Willmott’s study, higher order wavelets were carry-
ited any real computation. We have computed the memory
ing a strong memory cost, due to link storage. As a conse-
footprint of our implementation of wavelets for our four test
quence, radiosity computations with higher order wavelets
scenes and our three wavelet bases. Fig. 7 shows the memory
were restricted to low precision computations. According to
used by the algorithm as a function of the global error.
our experiments, for low precision computations, lower or-
As you can see, for high precision computations (global der wavelets are indeed providing a faster approximation,
error below 5.10−3 ), higher order wavelets actually have a with a lower memory use.
lower memory use than low order wavelets. The effect is
Our study can therefore be seen as an extension of Will-
even more obvious on our more complex scenes (see fig. 7(c)
mott’s study to high precision computations. Such high pre-
and 7(d)).
cision computations were made possible only by getting rid
On the other hand, for low precision computations, this hi- of links12 . Once you have eliminated link storage, the mem-
erarchy is reversed, and Haar and M2 wavelets have a lower ory cost of the radiosity algorithm is almost reduced to the
memory use. Once again, each wavelet base has an area cost of mesh storage. The refinement oracle (see section 3.1)
of competence, where it outperforms all the other wavelet ensures that the mesh produced is close to optimal with re-
bases. For very crude approximations, Haar wavelets are the spect to the radiosity on the surfaces.
most efficient with respect to memory use, then, for moder-
Also, by concentrating the oracle on the mesh instead of
ately good approximations, M2 wavelets are the most effi-
the interactions, we are able to exploit the power of wavelet
cient, until M3 takes over for really good approximations.
bases functions to efficiently approximate functions. This re-
A very impressive result is the way the memory cost of sults in a coarser mesh, both at places where the radiosity
64
1500 30000
1400 Haar Haar
M2 M2
1300 M3 M3
1200 29000
1100
1000
900 28000
800
700
600 27000
0.001 0.01 0.1 0.001 0.01 0.1 1
(a) Blocker (b) Tube
8500 32000
Haar Haar
M2 M2
8000 M3 30000 M3
7500 28000
7000 26000
6500 24000
0.001 0.01 0.1 0.001 0.01 0.1
(c) Dining room (d) Classroom
Figure 7: Memory requirements (in kB) with respect to global error
function has slow variations, such as an evenly lit wall, and Although in this paper we have only conducted tests on
at place with rapid variations, such as shadow boundaries. relatively small test scenes (up to 3000 input polygons), our
implementation (Candela15 ) enables us to use higher order
wavelets on arbitrarily large scenes. Fig. 8* shows a radios-
5. Conclusion and future work
ity computation with M2 wavelets made with our imple-
We have presented an implementation of wavelets bases mentation on a scene with 144255 input polygons. The com-
in the radiosity algorithm. With this implementation, we putations took 3 hours, and required approximately 2 Gb of
have conducted experimentations on several wavelet bases. memory on 32 processors of a SGI Origin 2000. The com-
Our experiments show that for high precision computa- plete solution had approximately 1.5 million patches.
tions, higher order wavelets are providing a better approx-
imation of the radiosity function, faster, and with a lower The optimal choice for radiosity computations depends
cost in memory. Please note that our implementation is not on the degree of precision required. Lower order wavelets
putting any disadvantage on lower order wavelets; for Haar are better for low precision computations, and higher order
wavelets, our refinement oracle only uses a few tests and the wavelets are better for high precision computations. Each
visibility estimation only requires one visibility test. Simi- wavelet base corresponds to a certain degree of precision,
larly, the benefit of not storing links is independant of the where it outperforms all the other wavelet bases, both for
wavelet base. the computation time and the memory footprint. Although
65
our computations have been limited to Haar, M2 and M3 tors, Rendering Techniques ’95 (Proceedings of the Sixth Eu-
wavelets, we think that this effect applies to all the other rographics Workshop on Rendering) in Dublin, Ireland, June
wavelet bases, such as M4 , M5 ... and that for even more 12-14, 1995), pages 264–273, New York, NY, 1995. Springer-
precise computations, M4 would outperform M3 , and so Verlag. 3
on. 5. Steven J. Gortler, Peter Schroder, Michael F. Cohen, and Pat
Hanrahan. Wavelet Radiosity. In Computer Graphics Pro-
However, for moderately precise computations, M2
ceedings, Annual Conference Series, 1993 (ACM SIGGRAPH
wavelets are quite sufficient. The precision level that corre- ’93 Proceedings), pages 221–230, 1993. 1, 2, 3, 5
sponds, in our experience, to visually acceptable results is at
the boundary between the areas of competence of M2 and 6. Pat Hanrahan, David Salzman, and Larry Aupperle. A Rapid
M3 , so M2 wavelets can be used. M2 wavelets also have Hierarchical Radiosity Algorithm. In Computer Graphics
(ACM SIGGRAPH ’91 Proceedings), volume 25, pages 197–
a distinct advantage over all the other wavelet bases: they
206, July 1991. 1, 2, 3
result in linearly varying functions that can be displayed di-
rectly on current graphics hardware (using Gouraud shad- 7. Paul Heckbert. Discontinuity Meshing for Radiosity. In Third
ing), as opposed to constant, quadric or cubic functions. Eurographics Workshop on Rendering, pages 203–226, Bris-
tol, UK, May 1992. 3
In our future work, we want to explore the possibility to
8. Nicholas Holzschuch and Francois. X. Sillion. An exhaustive
use several different wavelet bases in the resolution process.
error-bounding algorithm for hierarchical radiosity. Computer
In this approach, it would be possible to use Haar wavelets
Graphics Forum, 17(4):197–218, December 1998. 3
for interactions that do not require a lot of precision, such as
interactions that do not carry a lot of energy, and M2 , and 9. Nicolas Holzschuch, Francois Sillion, and George Drettakis.
perhaps M3 , M4 ..., wavelets for interactions that require a An Efficient Progressive Refinement Strategy for Hierarchi-
high precision representation. We think that this approach cal Radiosity. In Fifth Eurographics Workshop on Rendering,
pages 343–357, Darmstadt, Germany, June 1994. 3
could be especially interesting with shooting since the first
interactions will carry a lot of energy, while later interactions 10. Dani Lischinski, Brian Smits, and Donald P. Greenberg.
will only carry a small quantity of energy. Bounds and Error Estimates for Radiosity. In Computer
Graphics Proceedings, Annual Conference Series, 1994 (ACM
We also want to explore the possibility to use higher order SIGGRAPH ’94 Proceedings), pages 67–74, 1994. 3
wavelets on non-planar objects. Since they have a better abil-
11. Philipp Slusallek, Michael Schroder, Marc Stamminger, and
ity to model rapidly varying radiosity functions, they seem
Hans-Peter Seidel. Smart Links and Efficient Reconstruction
to be the ideal choice for curved surfaces, such as spheres or for Wavelet Radiosity. In P. M. Hanrahan and W. Purgathofer,
cylinders. editors, Rendering Techniques ’95 (Proceedings of the Sixth
Eurographics Workshop on Rendering), pages 240–251, New
York, NY, 1995. Springer-Verlag. 3
6. Acknowledgements
12. M. Stamminger, H. Schirmacher, P. Slusallek, and H.-P. Sei-
The authors would like to give a very special thank to Jean- del. Getting rid of links in hierarchical radiosity. Computer
Claude Paul. It was his insight that started this work on Graphics Journal (Proc. Eurographics ’98), 17(3):C165–
higher order wavelets, and it was his advice and support that C174, September 1998. 4, 5, 8
ensured its success.
13. Andrew Willmott and Paul Heckbert. An empirical compar-
ison of progressive and wavelet radiosity. In Julie Dorsey
References and Phillip Slusallek, editors, Rendering Techniques ’97 (Pro-
ceedings of the Eighth Eurographics Workshop on Rendering),
1. B. Alpert, G. Beylkin, R. Coifman, and V. Rokhlin. Wavelet- pages 175–186, New York, NY, 1997. Springer Wien. ISBN
like bases for the fast solution of second-kind integral equa- 3-211-83001-4. 1, 2, 3, 4, 5, 8
tions. SIAM Journal on Scientific Computing, 14(1):159–184,
January 1993. 5 14. Andrew J. Willmott and Paul S. Heckbert. An empirical
comparison of radiosity algorithms. Technical Report CMU-
2. Philippe Bekaert and Yves Willems. Error Control for Radios- CS-97-115, School of Computer Science, Carnegie Mel-
ity. In Rendering Techniques ’96 (Proceedings of the Seventh lon University, Pittsburgh, PA, April 1997. Available from
Eurographics Workshop on Rendering), pages 153–164, New http://www.cs.cmu.edu/ radiosity/emprad-tr.html. 3, 8
York, NY, 1996. Springer-Verlag/Wien. 4
15. Christophe Winkler. Expérimentation d’algorithmes de calcul
3. Philippe Bekaert and Yves D. Willems. Hirad: A Hierarchical de radiosité à base d’ondelettes. Thèse d’université, Institut
Higher Order Radiosity Implementation. In Proceedings of National Polytechnique de Lorraine, 1998. 4, 9
the Twelfth Spring Conference on Computer Graphics (SCCG
’96), Bratislava, Slovakia, June 1996. Comenius University 16. Harold R. Zatz. Galerkin Radiosity: A Higher Order Solution
Press. 4 Method for Global Illumination. In Computer Graphics Pro-
ceedings, Annual Conference Series, 1993 (ACM SIGGRAPH
4. Reid Gershbein. Integration Methods for Galerkin Radios- ’93 Proceedings), pages 213–220, 1993. 3
ity Couplings. In P. M. Hanrahan and W. Purgathofer, edi-
66
Figure 8: Radiosity computation on a large scene (with M2 wavelets)
67
2.7.5 Combining higher-order wavelets and discontinuity meshing: a compact represen-

tation for radiosity (EGSR 2004)
Auteurs : Nicolas H et Laurent A
Conférence : Eurographics Symposium on Rendering, Linköping, Suède.
Date : juin 2004
Eurographics Symposium on Rendering (2004)
H. W. Jensen, A. Keller (Editors)
Combining Higher-Order Wavelets and Discontinuity

Meshing: a Compact Representation for Radiosity
N. Holzschuch1 and L. Alonso2
1 ARTIS† GRAVIR-IMAG - INRIA 2 ISA/LORIA‡ - INRIA
Abstract
The radiosity method is used for global illumination simulation in diffuse scenes, or as an intermediate step in
other methods. Radiosity computations using Higher-Order wavelets achieve a compact representation of the
illumination on many parts of the scene, but are more expensive near discontinuities, such as shadow boundaries.
Other methods use a mesh, based on the set of discontinuities of the illumination function. The complexity of this
set of discontinuities has so far proven prohibitive for large scenes, mostly because of the difficulty to robustly
manage a geometrically complex set of triangles. In this paper, we present a method for computing radiosity that
uses higher-order wavelet functions as a basis, and introduces discontinuities only when they simplify the resulting
mesh. The result is displayed directly, without post-processing.
Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Three-Dimensional Graphics and Realism]:
I.3.5 [Computational Geometry and Object Modeling]:
1. Introduction among them wavelet methods have proven interesting. Us-

ing higher-order wavelets as the basis functions, it is possible
The radiosity method is a finite element method used for to approximate smoothly varying illumination with a small
simulating light exchanges between diffuse surfaces. As number of patches, reducing the memory cost.
such, it is used either for computing global illumination in
diffuse scenes or as an intermediate step in other global A constant problem with basic radiosity methods is their
illumination methods. Although other rendering methods, misbehaviour near shadow boundaries. As most of these
such as Bi-Directional Path Tracing or Photon Mapping are methods use an axis-aligned hierarchical grid for their finite-
highly popular because they account for light exchanges be- element computations, they are missing discontinuities that
tween specular surfaces, many people still use radiosity be- are not aligned with the grid. Solving this problem requires
cause it offers the possibility to move the viewpoint in real- either using a finer grid size near shadow boundaries or
time after illumination computations. using finite elements aligned with the shadow boundaries,
called discontinuity meshing. The former method increases
However, radiosity methods are difficult to manage. The
the memory cost, while the latter provides good quality re-
quality of the output is not always visually correct, and the
construction of illumination but the discontinuities are com-
memory cost of the algorithm can be quite high, since it
plex, and managing them in a robust and efficient way is still
needs to store a complete representation of the illumination
a research problem.
on all objects in the scene. Hierarchical methods are used
nowadays to reduce storage costs and computation time, and Since most higher-order wavelets are defined as tensor-
products of 1D basis functions, they are only properly de-
fined over parallelogram patches. As a consequence, they are
in theory incompatible with discontinuity meshing, which
† ARTIS is a research project in the GRAVIR/IMAG laboratory, a
produces complex polygons.
joint unit of CNRS, INPG, INRIA and UJF
‡ LORIA is a joint unit of CNRS, INPL, INRIA, UHP and Univer- In this paper, we present an algorithm that combines
sité Nancy 2. higher-order wavelets with discontinuity meshing. We use
c The Eurographics Association 2004.
69
N. Holzschuch and L. Alonso / Combining Higher-Order Wavelets and Discontinuity Meshing for Radiosity
wavelets, defined on a regular subdivision in places where However, many scenes on which we wish to compute
they provide a good approximation, and we introduce dis- global illumination exhibit sharp discontinuities of the illu-
continuities only in places where they reduce the complex- mination functions, for example shadows caused by point
ity of the mesh. This selection of effective discontinuities is light sources or small area light sources, or shadows caused
done during the refinement process, by the refinement ora- by occluders that are close to the receiver. Regular hierar-
cle. The mesh produced is still a regular grid, but some of its chical basis of continuous polynomials are unable to model
patches are cut by discontinuities. such discontinuities. In the presence of these discontinuities,
most radiosity algorithms refine the hierarchy a lot, using
We use a fragment program to display quadric wavelets
very small patches to approximate the radiosity function.
directly. We are displaying the results of our illumination
The result is that the number of patches used near the dis-
computations immediately, without post-processing or final
continuity is roughly independent from the order of the ba-
gather. We are exploiting the fact that higher-order wavelets
sis function. Since each patch stores (k + 1)2 coefficients,
with the proper refinement oracle result in apparently con-
wavelet bases of higher-order polynomials end up being
tinuous functions after reconstruction, even in the absence
more costly at these discontinuities.
of a specific step to enforce this continuity.
Discontinuities of the radiosity function can be
This paper is organised as follows: in the following sec-
computed using geometrical methods [LTG92, Hec92]
tion, we will review previous work on hierarchical – or
[DF94, SG94, GS96, DDP02]. An adaptive mesh based on
wavelet – radiosity and discontinuity meshing, as well as in-
these discontinuities provides a better approximation of
tegrating them. Then, in section 3 we will present our algo-
the radiosity function [LTG92, Hec92]. Radiosity methods
rithm, and in section 4 we will present results and pictures
based on the discontinuity mesh have been proposed, either
from our experimentations. Finally, we will conclude and ex-
with classical radiosity [LTG92, Hec92, Stu94, DF94] or
pose future research directions.
with hierarchical radiosity [LTG93, DS96, DDP99]. All
these methods start with the complete set of discontinuities,
2. Previous Work triangulating it and refining it as necessary. The entire set
of discontinuites is quite large, giving a very complex mesh
The radiosity method was first introduced for global illu-
as a starting point. Managing this mesh proves compli-
mination simulations by Goral et al. in 1984 [GTGB84].
cated, and the associated memory cost is not neglictible.
It uses a finite element formulation of the rendering equa-
[DS96] used a regular mesh for visible areas, but kept the
tion [KH84] for diffuse scenes, and gets a complete rep-
triangulated set of discontinuities for penumbra regions.
resentation of global illumination. The radiosity method
was later extended using a hierarchical formulation of Several of the discontinuities in the discontinuity mesh
the finite element method [HS92, HSA91]. The hierar- are not visible in the radiosity function. Simplifica-
chical representation limits the complexity of the radios- tions of the discontinuity mesh have been suggested
ity algorithm to O(n) instead of O(n2 ). This hierarchi- [DF94, HWP97, Hed98]. But as they are computing discon-
cal formulation was later extended using a wavelet frame- tinuites before the illumination computations, this selection
work [GSCH93, SGCH93]. uses only geometrical tools and has not access to illumina-
tion information.
It is possible to use wavelets of different order (piecewise-
constant basis, piecewise-linear basis, piecewise-polynomial In our algorithm, however, we use a regular subdivision
basis). Early implementations of higher-order wavelets as often as possible, and we only introduce discontinuities
proved inefficient [WH97], until a complete analysis of the if they result in a simpler mesh. Significative discontinuities
wavelet radiosity algorithm [CAH00] showed that with the are thus naturally selected during the hierarchical refinement
right implementation, a good refinement oracle [BW96] and process.
efficient memory management [SSSS98] they were actually
A single paper has used Discontinuity Meshing to-
more interesting than hierarchical piecewise-constant basis
gether with wavelet radiosity with higher order-basis func-
functions for global illumination simulations, with smaller
tions [PB95]. Their study is quite complete, but they used
memory costs and shorter computation times.
only very simple scenes for their tests: a single patch with a
Piecewise polynomial wavelets are more costly for each single discontinuity. As a consequence, they could not iden-
patch of the finite element formulation, requiring (k + 1)2 tify several problems that only occur in larger scenes, such
coefficients for a wavelet basis made of polynomials of de- as intersecting discontinuities or the cost of computing push-
gree k. But they provide a better approximation of the illu- pull coefficients: their method would not scale to scenes
mination function, resulting in a smaller number of patches. much bigger. They tried to merge wavelets with disconti-
The study by Cuny et al. [CAH00] showed that most of the nuities by finding a wavelet-compatible parametrization of
time, the reduction in the number of elements more than the patch that followed the discontinuity. This causes a com-
compensates for the extra cost for each element, allowing plex computation of push-pull coefficients for each hierar-
a faster computation of radiosity and a smaller memory cost. chical level. Also, building such a parametrization is not al-
70
ways possible, in the case of intersecting discontinuities. Fi- for display. It is done during the push-pull step. The push-
nally, their approach does not address the problem of man- pull step is a recursive procedure, where parent nodes add
aging the set of discontinuities. Our algorithm, by contrast, their energy to their children, and the children’s energy is
keeps the same parametrization for all patches in the hier- collected in each parent and averaged.
archy, making it easy to compute push-pull coefficients. We
can deal with multiple discontinuities and intersecting dis- 3.1.2. Using Higher-Order Wavelets
continuities. Each discontinuity inserted in the hierarchy is
Using higher-order wavelets, such as Multi-Wavelets (M2
treated only at its hierarchical level.
and M3 ) [Alp93], does not change the algorithm, except in
these details:
3. Algorithm • each patch carries a wavelet representation of the radiosity
In this section, we present our algorithm for merging ra- function. The Mn basis is made of polynomials of degree
diosity using higher-order wavelets with meshing discon- n − 1, so each patch has n2 basis functions and stores n2
tinuities. We start with a short summary of the Hierarchi- coefficients.
cal Radiosity algorithm, and how it has been adapted to • The interaction between two patches implies computing
higher-order wavelets (section 3.1). Then we present our al- the influence that each wavelet coefficient on the shooting
gorithm for merging the wavelet bases with discontinuities patch has on every wavelet coefficient on the receiving
(section 3.2). Some finer points of the implementation are patch. Each of these influence coefficients is expressed as
explained in section 3.3. an integral, which is approximated using quadratures. As
there are n2 coefficients on each patch, we must evaluate
n4 integrals.
3.1. Wavelet Radiosity • The push-pull step implies computing the influence that
3.1.1. The Hierarchical Radiosity Algorithm each wavelet coefficient on the parent patch has on ev-
ery wavelet coefficient on the children patches, and recip-
In Wavelet Radiosity, each surface of the scene carries a hi- rocally. These influences are also expressed as integrals.
erarchical representation of illumination, using the wavelet These integrals only depend on the respective geometry
basis. This representation is computed iteratively, through of the parents and children patches in the hierarchy. For
three essential steps: a regular subdivision, the push-pull coefficients are there-
• refinement of interactions, fore constant on the hierarchy, and are pre-computed. For
• propagation of energy, irregular subdivision, the push-pull coefficients must be
• push-pull. recomputed at each level, a potentially costly step.
• As 2D wavelets are usually defined as tensor-products
At the beginning of the algorithm, we select the surface of 1D wavelets, they are only defined over a parallelo-
with the largest unshot energy, and indentify all surfaces that gram. Researchs have shown how to extend this definition
are potentially visible from it. We establish interactions be- for complex planar surfaces [HCA00] and for parametric
tween the shooting surface and all these receiving surfaces. curved surfaces [ACP∗ 01].
• Given the large number of coefficients for each interaction
We then refine these interactions, in a hierarchical manner.
(n4 , as much as 81 coefficients for polynomials of degree
At each point in time, we consider the current multi-scale
2), it is important to avoid storing them. Once we have
representation of the interaction, and check whether it is ac-
treated an interaction, we delete all its coefficients. This
curate enough, according to the refinement oracle. If not, we
strategy can result in computing the same interaction co-
refine the interaction, by subdividing either the shooting sur-
efficients twice, but the gain in memory largely offsets the
face or the receiver.
potential loss in time [SSSS98, CAH00].
Once we are satisfied with the level of precision on the
interaction, we propagate the energy by sending the unshot
3.2. Combining Wavelets and Discontinuity Meshing
energy of the shooting surface to the receivers, updating the
wavelet coefficients on the receivers. 3.2.1. The algorithm
After these steps, the unshot energy of the shooting sur- Our algorithm works as follows:
face is set to zero, and we pick the surface with the largest
• For each shooting surface, for each receiving surface, we
unshot energy as the next shooting surface.
compute the set of discontinuities on the receiving sur-
After the propagation, the different levels of the hierarchy face.
on each surface have received energy, but there isn’t a con- • We proceed with the usual refinement of interaction, using
sistent representation of the energy received at all hierarchi- the oracle and a regular subdivision.
cal levels. This representation must be reconstructed before • When the refinement oracle identifies that the interaction
we can use the hierarchical representation for shooting or should be subdivided only because of a discontinuity, it
71
(a) (b) (c)
Figure 1: A patch cut by a discontinuity (a) results in two children patches. For each children patch, we identify the enclosing
parallelogram (b). We conduct standard wavelet radiosity on each parallelogram (c).
introduces a discontinuity-based subdivision instead of a processing (see [CAH00] and Figure 2 for an example using
regular subdivision. M3 wavelets).
• Discontinuity-based subdivision works by:
In our algorithm, we do two computations of the refine-
– Computing the intersection of the current patch with ment oracle: one with standard visibility computations, and
the discontinuity. one assuming full visibility. If their results differ, visibil-
– For each part of the subdivided patch, identify the ity is the only reason for subdivision and we introduce a
smallest parallelogram that encloses it. discontinuity-based subdivision.
– Apply our radiosity algorithm using a regular subdivi- Subdivisions are thus only introduced in the hierarchy
sion over each parallelogram (see Figure 1). if they actually cancel further refinements on at least one
• Once we are satisfied with the level of refinement for this side, resulting in a more compact hierarchy. For point light
interaction, we propagate the energy, then erase the dis- sources, introducing a subdivision generates a coarse mesh
continuities and the interaction coefficients. Discontinu- on both sides of the subdivision (see Figure 2(a)). For area
ities that have not been used for subdivision are forgotten. light sources, introducing subdivisions creates a coarse mesh
in fully lit areas and in the umbra, while the penumbra is
We want to use the regular subdivision as much as possi- more refined (see Figure 2(b)).
ble for its robustness and simplicity. Our algorithm only in-
troduces discontinuities if they are considered important by For stability and robustness, a discontinuity is introduced
the refinement oracle. Smooth transitions that can be prop- only if the intersection between the discontinuity and the
erly approximated by the wavelet basis will not be intro- current patch is simple enough. Thus our algorithm only has
duced in the hierarchy. to manage simple patches and surfaces. For complex occlud-
ers casting a combination of simple and complex disconti-
In the following paragraphs, we review each step of this nuities, only the simple discontinuities are introduced in the
algorithm in detail: refinement oracle, discontinuity-based mesh (see Figure 11).
subdivisions, push-pull over a discontinuity, intersection of
discontinuities. In our implementation, we have used the following crite-
ria for selecting simple discontinuities: at least one of the
patches resulting from the discontinuity-based refinement
3.2.2. Refinement oracle and selection of discontinuities must be convex, and the number of vertices in each polygon
remains below a certain threshold.
We use the refinement oracle described in previous pub-
lications [BW96, CAH00]: for each patch, we select test-
3.2.3. Discontinuity-based subdivisions
ing points, where we compute radiosity directly. The val-
ues computed are compared with values obtained using the Once we have selected a patch for discontinuity-based sub-
wavelet basis. If the norm of the differences is above the division, we compute the intersection between the patch and
refinement threshold, the oracle concludes that we should the discontinuities, resulting in two separate patches. Most
refine. of the time, these sub-patches are neither parallelograms nor
triangles. For each of the sub-patches, we build the smallest
This oracle works well, especially if the testing points are
enclosing parallelogram (see Figure 1). We then use these
chosen with a good heuristics. By putting some of the test-
enclosing parallelograms instead of the patches in the radios-
ing points on the boundaries of the patches, we have found
ity algorithm as we would use standard patches:
that we obtain a representation of radiosity that looks con-
tinuous without having to ensure this continuity in post- • For radiosity reception, the enclosing parallelogram is
72
(a) Point light source (b) Area light source
Figure 2: M3 (quadric) wavelets with discontinuity meshing on simple scenes
treated as a standard receiver. It is subdivided normally, where the subscript i on the dot product expresses the fact
using regular subdivision. that the integration takes place on ei . We are looking for the
• For radiosity emission, only the actual sub-patch is al- contribution of B p to the αij , pushij :
lowed to emit radiosity; other parts of the enclosing par-
allelogram are not allowed to emit. Following previous pushij = hB p |φij ii
research [HCA00] we do this through the quadrature = h∑ αk φk |φij ii
weights, during the computation of Gaussian quadratures. k
We see each quadrature weight as the representative of an
area of influence for the quadrature point (see Figure 3).
= ∑ αk hφk |φij ii
k
We modulate the quadrature weight by the percentage of
this area of influence that is inside the actual sub-patch.
= ∑ αkCki j
k
• For push-pull, we use the standard push-pull coefficients
since we have a standard subdivision. The push coefficients, Cki j ,
only depend on the basis func-
tions and on the relative geometry of p and ei . We have an
integral expression for the push coefficients:
3.2.4. Push-Pull Coefficients over a discontinuity Z
On most steps of the radiosity algorithm, our method uses Cki j = hφk |φij ii = φk (x)φij (x)dx
ei
classical methods. The main difference lies in the push-pull
step over the discontinuity. 3.2.4.2. Pull coefficients: For the pull step, we need to
combine together the radiosity functions on patches pi , and
The enclosing parallelograms of the children patches are
express this radiosity on the wavelet basis for patch p. As
overlapping, and we need the push-pull step to compensate
the ei patches are overlapping, we restrict the definition of
for this. Let us assume a patch p has been subdivided into
Bei to its support. We use the characteristic function of ei ,
two children patches p1 and p2 . The children patches pi are
δei , defined as being equal to 1 on ei and 0 everywhere else.
enclosed into parallelograms ei . Each of the patches have its
own set of wavelet basis functions: φ j on p, φij on ei . The Combining together the radiosity functions computed on
radiosity function is expressed as: the children gives us:
Be1 (x)δe1 (x) + Be2 (x)δe2 (x) = ∑ ∑ αij φij (x)δei (x)
B p (x) = ∑ α j φ j (x) i j
j
Bei (x) = ∑ αij φij (x) The pull step projects this combined function on the wavelet
basis for p:
j
pullk = ∑ ∑ αij hφij δe |φk i i
3.2.4.1. Push Coefficients: For the push step, we need to i j
project B p on the basis functions of the children ei . The
wavelets coefficients of the projection will be added to the
= ∑ ∑ αij Dijk
i j
wavelet coefficients on each child ei . Since, on each patch,
wavelets functions form an orthonormal basis, wavelet coef- The pull coefficients, Dijk depend on the geometry of the
ficients are expressed as the scalar product of the radiosity subdivision:
function with the basis functions: Z
Dijk = hφij δei |φk i = φij (x)δei (x)φk (x)dx
αij = hBei |φij ii p
73
a =
w0 w1 w2 percent
covered
0 1
0 1 w0 w1 w2
(a) on the unit segment (b) on the unit square (c) on the extended
domain
Figure 3: The weights of the quadrature points can be seen as the area of a zone of influence.
be incomplete in their definition. To account for this, we

extend the enclosing parallelograms of the children so that
their union covers the enclosing parallelogram of the parent
? patch. Except for this small point there is no special case
in our algorithm for dealing with several light sources and
Figure 4: A non-convex patch cut along a discontinuity can intersecting discontinuities (see Figure 5).
result in two children whose enclosing parallelograms do
not cover the enclosing parallelogram of the parent.
3.3. Implementation details
In this section, we review implementation details of our al-
3.2.4.3. Computation of push-pull coefficients For the gorithm. The points described here are not essential to our
push-pull step over the discontinuity, we have an integral algorithm; others could use different approaches, e.g. for
expression, which we approximate using Gaussian quadra- computing discontinuities, or for handling visibility queries
tures. As we are integrating a discontinuous function (δek ), in radiosity computations. However, the approach we used to
we might have accuracy problems in the computation. We solve these problems can be interesting to other researchers.
compensate by using a large number of sampling points. We have used non-conventional solutions for computing
Also, once we have computed the coefficients for one patch, discontinuities, in the refinement oracle, for visibility queries
we check that they are consistent with each other, and that and for displaying results:
there is no creation or destruction of energy during the push-
pull step. Should we detect an inconsistency, we recompute Finding Discontinuities: We only need the set of discon-
them with more precision. tinuities for the interaction currently being refined. We
compute extremal discontinuities (umbra and penumbra
Push-pull coefficients are then stored on the hierarchy. Be- boundaries), using a method based on the GLU Tesse-
cause these special push-pull coefficients only happen once lator [SWND03]. Our method identifies EV, VE and
for each discontinuity-based subdivision, we can afford to EEE events, converts these events into 2D polygons cor-
spend some time computing them. responding to their intersection with the plane of the re-
ceiver, then uses the GLU Tesselator to compute the
3.2.5. Intersection of several discontinuities
union and the intersection of these 2D polygons. Our al-
The patches resulting from a discontinuity-based subdivi- gorithm for finding discontinuities is not complete (it can
sion are not necessarily convex. If a non-convex patch is cut miss some discontinuities) but it is robust and it finds the
by another subdivision, the enclosing parallelograms of the most important discontinuities.
children do not cover the enclosing parallelogram of the par- Umbra and penumbra boundaries are not necessarily lin-
ent patch (see Figure 4). ear: on EEE evetns, parts of them can be conic curves.
Our algorithm deals with such conics in a straightforward
This configuration appears when discontinuities from two
manner.
light sources intersect each other, or when the umbra and
Once we have computed the umbra and penumbra con-
penumbra boundaries touch each other, for an occluder that
tours, we have to answer position queries: “is this point
is in contact with the receiver.
inside the umbra or not?”. We store the contours in an ar-
When it appears, it causes the push-pull coefficients to rangement of line segments, using trapezoidal maps(see,
74
(a) Point light sources (M3 wavelets) (b) Area light sources (M3 wavelets)
Figure 5: Combining together several discontinuities (both scenes have three light sources, red, green and blue, located in a
triangle above the cube).
e.g. [BDS∗ 92, CGA]). This randomized data structure an- and indirect illumination. Pictures of the test scenes are
swers our positions query in average time O(log n), with available in Figures 7 and 13 (see color plates).
creation time O(n log n) and memory cost O(n). Wavelet Bases: We have tested our algorithm with the first
Handling Discontinuites in the Refinement Oracle: The three multi-wavelets bases: M1 (Haar), M2 (piecewise-
refinement oracle takes sampling points on the receiving linear) and M3 (piecewise-quadric). In the pictures, Haar
patch. Some sampling points can lie on a discontinuity, wavelets are displayed after a post-processing step to en-
which makes their exact value unknown. To avoid sure continuity, M2 wavelets are displayed using stan-
unnecessary refinement, points lying on a discontinuity dard linear interpolation from the graphics hardware and
can take a different value in the oracle depending on the M3 wavelets use a fragment program for the quadrically
patch being considered. varying part.
Handling Visibility Queries: In our radiosity computa- Material: All computations were done on the same com-
tions, we need the percentage of the light source that puter: a 2.4 GHz Pentium IV, with 1Gb memory and an
is visible from the receiving points. Previous implemen- NVIDIA GeForce FX 5600.
tations used a geometric data-structure, the Backprojec-
tion [DF94] to compute an exact value of this percentage.
4.2. Visual comparison for point light sources
We are computing it instead using an OpenGL extension,
OcclusionQuery [ARB], which gives us the percent- The first reason to use discontinuity meshing is the quality
age of the pixels of the light source that are visible from of the illumination computed. Adapting the mesh to the dis-
the receiving point. In our experiments, occlusion queries continuities produces a radiosity function that looks pleasing
are more robust than the geometric data structure while to the eye.
having the same speed, and they are much faster than cast-
The leftmost columns of Figures 6 and 12 show a side-
ing rays, while giving more precise results.
by-side comparison of the different wavelet bases on a spe-
Displaying Results: M3 wavelets give quadrically varying
cific detail of the Cabin test scene, with a point light source.
functions; they are displayed using a small fragment pro-
All pictures were generated with the same computation time
gram (10 lines of code). Linear interpolation from the
(25 s) to give a fair comparison of the different wavelet
graphics hardware (Gouraud shading) is not perfect for
bases. Without discontinuity meshing, the most satisfying
M2 wavelets, which are bilinear functions. It is possi-
representation is obtained with M2 wavelets, but artefacts
ble to replace this linear interpolation by a small fragment
are clearly visible along the discontinuity line; Haar and M3
program.
wavelets are visually not acceptable within the prescribed
time frame; they would eventually achieve a satisfying re-
sult, but for a longer computation time.
4. Experimentations and Results
With discontinuity meshing, all wavelets bases achieve a
4.1. Experimentation protocol visually pleasing result. Our algorithm for merging disconti-
nuities with wavelets thus achieves a visually better result in
Test scenes: We have used two different test scenes: the
the same computation time.
Cabin, from Radiance set of test scenes, and Room 523
from the Soda Hall model. For each scene, we used either We did a similar comparison for Room 523 of the Soda
point light sources or area light sources, giving a total of Hall. Figure 8 shows the pictures obtained with the differ-
four test scenes. On all test scenes, we computed direct ent wavelet bases on a detail of the room. All pictures were
75
Point LS, no DM Point LS, DM Area LS, no DM Area LS, DM

Haar
M2
M3
Figure 6: Wireframe version of simulation for the Cabin test scene. See also the color plates.
Cabin, 3 point light sources
Cabin, 3 area light sources

Room 523, 3 point light sources
Room 523, 1 area light source
Figure 7: Wireframe version of our test scenes after simulation with M3 wavelets. See also the color plates.
76
(a) Haar+DM (b) M2 +DM (c) M3 +DM
(d) Haar (e) M2 (f) M3
Figure 8: Visual comparison of the different wavelet bases for a point light source. All pictures used roughly 190 s computation
time.
generated with approximately the same computation time higher-order wavelet bases always gives better results than
(190 s). existing algorithms, with a smaller memory cost.
From a distant point of view, all the pictures are of compa-

rable quality. On a large scene like this, with a high number 4.3. Visual Comparison for Area Light Sources
of discontinuities, the time spent estimating the discontinu- The rightmost columns of Figures 6 and 12 show the same
ities and dealing with them becomes equivalent to the time comparison of the different wavelet bases for the same detail
it takes to do regular subdivisions. of the Cabin test scene, this time using an area light source.
All pictures were generated with approximately the same
However, when we look closely at the shadow boundaries,
computation time (240 s). This time, the benefits of using the
ringing artefacts and staircase effects become clearly visible
discontinuity-based approach appear very clearly. All three
(see the full resolution inserts).
wavelet bases greatly outperform the non-discontinuity-
We also compared the memory costs for both versions of based versions. This is because computing the discontinu-
the program (with discontinuity-based subdivision and with- ities speeds-up the visibility computations, the most ex-
out). Figure 9 shows the memory costs for all three wavelet pensive step in hierarchical radiosity. Within discontinuity-
bases, for the pictures on Figures 6, 12 and 8. On a scene based wavelet methods, M2 and M3 wavelets produce the
with a large number of discontinuities, such as Room 523, nicest result.
our algorithm results in an important gain in memory costs. For comparison, Figure 10 shows the memory costs for
Each discontinuity introduced replaces a large number of all wavelet bases on this test scene. Notice that this time,
regular patches, resulting in a net gain. On the Cabin test the memory cost is bigger with discontinuity meshing than
scene, with the prescribed time limit, the refinement was not without.
pushed to the same levels. As a consequence, the memory
gain is not as strong. This is a side effect of our comparison method: the time
given for the simulation was much too short for the system
In short, our algorithm for merging discontinuities with without discontinuities. It has just computed a crude ver-
77
4.5. Influence of the minimal area parameter

1200
12000
1000 10000
An important issue in the practical use of radiosity algo-
800 8000
rithms is the choice of parameters. A bad choice of the pa-
600 6000
rameters results in a long computation time or a simulation
400 4000 that is not visually pleasing – sometimes both.
200 2000
The minimal area for patches is one of these parame-
0 0
M1 M2 M3 M1 M2 M3 M1 M2 M3 M1 M2 M3 ters. Without discontinuity-based refinement, this minimal
+DM +DM +DM +DM +DM +DM
area is reached for many patches on all sharp discontinuities
(see the wireframe representations of the test scenes, in Fig-
(a) Cabin (b) Room 523 ure 6 and additional materials). A variation of this parameter
has important consequences on the computation time, on the
Figure 9: Memory costs (in Kb) for simulations on our test memory cost and on the quality of the result. Dividing this
scenes, with point light sources. minimal area by 2 results in twice as many patches being
used to represent each sharp discontinuity, potentially dou-
bling the computation times and memory cost.
2200
2000
1800
With discontinuity-based refinement the minimal area is
1600 not reached, except in places where discontinuities are too
1400
1200 complex. Hence, a variation of this parameter has little con-
1000
800 sequences on the computation times and memory costs. Our
600
400
algorithm has almost cancelled the influence of the mini-
200
0
mal area parameter. The user of our radiosity system has one
M1 M2 M3 M1
+DM
M2
+DM
M3
+DM
main parameter, the maximum value of the error on each in-
teraction. The minimal area still has an effect on the quality
Figure 10: Memory costs (in Kb) for simulations on the of the simulation, computation time and memory cost, but it
Cabin test scene, with an area light source. is a minor effect.
5. Conclusion and Future Directions

In this paper, we have presented a new algorithm for ra-
sion of illumination. If we give it more time for simulation, diosity computations, that combines higher-order wavelets
it eventually computes a nice version of the illumination, with discontinuity meshing. Our algorithm uses regular sub-
at a higher memory cost. Note that the memory cost with- division for wavelets where it is practical, and switches to
out discontinuities increases with the order of the wavelet discontinuity-based subdivision where discontinuities exist.
base. This is consistent with previous research [HCA00]: Only discontinuities that are important in the computation of
for crude estimates of the illumination, the memory cost in- the illumination solution are actually introduced in the mesh.
creases with the order of the wavelet base, while for high- This results in a compact representation of radiosity, with a
quality estimates, the memory cost decreases with the order good compromise between quality and cost.
of the wavelet base.
This representation can be displayed directly on the
screen, or it can be used as a starting point for more com-
4.4. Selective choice of discontinuities plete illumination computations, such as Monte-Carlo illu-
mination.
In places where the radiosity is smoothly varying, our al-
Our algorithm is robust enough to handle complex discon-
gorithm can choose not to introduce discontinuities, keep-
tinuities. It automatically discards discontinuities which are
ing the regular subdivision. This effect appears clearly in the
not important enough for the radiosity computations, provid-
wireframe representations of our test scenes (see the right-
ing a good way to manage the complicated set of disconti-
most column of Figures 6 and 12, and Figure 7).
nuities.
Figure 11 also shows a detail of the Room 523 test scene
In future work, we want to combine our algorithm
(with an area light source) where complex discontinuities
with a robust computation of visibility discontinuities
exist, but were not introduced in the mesh. This behaviour is
(e.g. [DD02]). We also want to combine our work with sep-
more likely to occur with area light sources, which produce
arate works allowing higher-order wavelet radiosity compu-
smoothly varying illumination, than with point light sources,
tations on curved surfaces [ACP∗ 01] and triangular meshes.
where there is always a C0 discontinuity at a shadow bound-
ary. In a separate direction of research, although our algorithm
78
(a) (b)
Figure 11: Our algorithm only inserts discontinuities that are perceived as useful by the refinement oracle (M3 wavelets).
only inserts discontinuities as they are needed in the refine- [ARB] ARB_occlusion_query. http://oss.
ment process, it starts by computing all potential discontinu- sgi.com/projects/ogl-sample/
ities for the current interaction, a costly preliminary step. We registry/ARB/occlusion_query.
will explore the possibility to suppress this step, using stan- txt%.
dard refinement but detecting inside the refinement oracle
[BDS∗ 92] B OISSONNAT J.-D., D EVILLERS O., S CHOTT
that subdivision is probably caused by a discontinuity, then
R., T EILLAUD M., Y VINEC M.: Applications
only computing and inserting this discontinuity in the mesh.
of random sampling to on-line algorithms in
This would reduce the computation cost of our algorithm. In
computational geometry. Discrete and Compu-
our experiments, almost all the discontinuities caused by in-
tational Geometry 8, 1 (1992), 51–71.
direct lighting are not important enough to justify their inser-
tion in the hierarchy. It is therefore not practical to compute [BW96] B EKAERT P., W ILLEMS Y.: Error Control for
them in advance. Radiosity. In Rendering Techniques ’96 (7th
Eurographics Workshop on Rendering) (1996),
6. Acknowledgements pp. 153–164.
This work was partly funded by the “Région Rhône-Alpes” [CAH00] C UNY F., A LONSO L., H OLZSCHUCH N.: A
under the DEREVE project. novel approach makes higher order wavelets re-
ally efficient for radiosity. Computer Graphics
The Soda Hall model was created by the original Berkeley Forum (Eurographics 2000) 19, 3 (2000), C–
Walkthru team under the direction of Prof. Carlo H. Sequin. 99–C–108.
For more information on the Soda Hall, see http://www.
cs.berkeley.edu/~sequin/soda/soda.html [CGA] CGAL, Computational Geometry Algorithms
Library. http://www.cgal.org.
References [DD02] D UGUET F., D RETTAKIS G.: Robust ep-

silon visibility. ACM Transactions on Graphics
[ACP∗ 01] A LONSO L., C UNY F., P ETITJEAN S., PAUL (SIGGRAPH 2002) 21, 3 (2002).
J.-C., L AZARD S., W IES E.: The virtual mesh:
A geometric abstraction for efficiently comput- [DDP99] D URAND F., D RETTAKIS G., P UECH C.: Fast
ing radiosity. ACM Transactions on Graphics and accurate hierarchical radiosity using global
20, 3 (July 2001), 169–201. visibility. ACM Transactions on Graphics 18, 2
(1999), 128–170.
[Alp93] A LPERT B. K.: A class of bases in L2 for
the sparse representation of integral operators. [DDP02] D URAND F., D RETTAKIS G., P UECH C.: The
SIAM Journal on Mathematical Analysis 24, 1 3d visibility complex. ACM Transactions on
(1993), 246 – 262. Graphics 21, 2 (Apr. 2002).
79
[DF94] D RETTAKIS G., F IUME E.: A Fast Shadow [KH84] K AJIYA J. T., H ERZEN B. P. V.: Ray Tracing
Algorithm for Area Light Sources Using Back- Volume Densities. Computer Graphics (ACM
projection. In Computer Graphics Proceedings, SIGGRAPH ’84) 18, 3 (July 1984), 165–174.
Annual Conference Series, (ACM SIGGRAPH
[LTG92] L ISCHINSKI D., TAMPIERI F., G REENBERG
’94) (1994), pp. 223–230.
D. P.: Discontinuity Meshing for Accurate Ra-
[DS96] D RETTAKIS G., S ILLION F.: Accurate Visi- diosity. IEEE Computer Graphics and Applica-
bility and Meshing Calculations for Hierarchi- tions 12, 6 (November 1992), 25–39.
cal Radiosity. In Rendering Techniques ’96 (7th
[LTG93] L ISCHINSKI D., TAMPIERI F., G REENBERG
D. P.: Combining Hierarchical Radiosity and
pp. 269–278.
Discontinuity Meshing. In Computer Graphics
[GS96] G HALI S., S TEWART A. J.: A Complete Treat- Proceedings, Annual Conference Series, (ACM
ment of D1 Discontinuities in a Discontinuity SIGGRAPH ’93) (1993), pp. 199–208.
Mesh. In Graphics Interface ’96 (May 1996),
[PB95] PATTANAIK S. N., B OUATOUCH K.: Discon-
pp. 122–131.
tinuity Meshing and Hierarchical Multiwavelet
[GSCH93] G ORTLER S. J., S CHRODER P., C OHEN M. F., Radiosity. In Graphics Interface ’95 (May
H ANRAHAN P.: Wavelet Radiosity. In Com- 1995), pp. 109–115.
puter Graphics Proceedings, Annual Confer-
[SG94] S TEWART A. J., G HALI S.: Fast Computation
ence Series, (ACM SIGGRAPH ’93) (1993),
of Shadow Boundaries Using Spatial Coher-
pp. 221–230.
ence and Backprojection. In Computer Graph-
[GTGB84] G ORAL C. M., T ORRANCE K. E., G REEN - ics Proceedings, Annual Conference Series
BERG D. P., BATTAILE B.: Modelling the In- 1994 (ACM SIGGRAPH ’94) (1994), pp. 231–
teraction of Light Between Diffuse Surfaces. 238.
Computer Graphics (ACM SIGGRAPH ’84) 18,
[SGCH93] S CHRODER P., G ORTLER S. J., C OHEN M. F.,
3 (July 1984), 212–222.
H ANRAHAN P.: Wavelet Projections for Ra-
[HCA00] H OLZSCHUCH N., C UNY F., A LONSO L.: diosity. In 4th Eurographics Workshop on Ren-
Wavelet radiosity on arbitrary planar sur- dering (June 1993), pp. 105–114.
faces. In Rendering Techniques 2000 (11th
[SSSS98] S TAMMINGER M., S CHIRMACHER H.,
S LUSALLEK P., S EIDEL H.-P.: Getting rid
pp. 161–172.
of links in hierarchical radiosity. Computer
[Hec92] H ECKBERT P.: Discontinuity Meshing for Ra- Graphics Forum (Eurographics ’98) 17, 3
diosity. In Third Eurographics Workshop on (September 1998), C165–C174.
Rendering (May 1992), pp. 203–226.
[Stu94] S TURZLINGER W.: Adaptive Mesh Refinement
[Hed98] H EDLEY D.: Discontinuity Meshing for Com- with Discontinuities for the Radiosity Method.
plex Environments. PhD thesis, Department of In Photorealistic Rendering Techniques (5th
Computer Science, University of Bristol, Bris- Eurographics Workshop on Rendering) (June
tol, UK, Aug. 1998. 1994), pp. 239–248.
[HS92] H ANRAHAN P., S ALZMAN D.: A Rapid [SWND03] S HREINER D., W OO M., N EIDER J., DAVIS
Hierarchical Radiosity Algorithm for Unoc- T.: OpenGL Programming Guide: The Official
cluded Environments. In Photorealism in Guide to Learning OpenGL, Version 1.4. Addi-
Computer Graphics (Proceedings Eurograph- son Wesley Professional, 2003, ch. 11. Tessel-
ics Workshop on Photosimulation, Realism and lators and Quadrics.
Physics in Computer Graphics, 1990) (1992),
[WH97] W ILLMOTT A., H ECKBERT P.: An empiri-
pp. 151–171.
cal comparison of progressive and wavelet ra-
[HSA91] H ANRAHAN P., S ALZMAN D., AUPPERLE diosity. In Rendering Techniques ’97 (8th Eu-
L.: A Rapid Hierarchical Radiosity Algorithm. rographics Workshop on Rendering) (1997),
Computer Graphics (ACM SIGGRAPH ’91) 25, pp. 175–186.
4 (July 1991), 197–206.
[HWP97] H EDLEY D., W ORRALL A., PADDON D.: Se-
lective culling of discontinuity lines. In Ren-
dering Techniques ’97 (8th Eurographics Work-
shop on Rendering) (1997), pp. 69–80.
80
Point LS, no DM Point LS, DM Area LS, no DM Area LS, DM

Haar
M2
M3
Figure 12: Visual comparison of results for the Cabin test scene. See also figure 6 for wireframe representation.
Cabin, 3 point light sources
Cabin, 3 area light sources

Room 523, 3 point light sources
Room 523, 1 area light source
Figure 13: Our test scenes (all figures with M3 wavelets). See also figure 7 for wireframe representation.
81
2.7.6 Space-time hierarchical radiosity with clustering and higher-order wavelets (CGF
2004)
Auteurs : Cyrille D, Nicolas H François S
Journal : Computer Graphics Forum, vol. 23, no 2.
Date : avril 2004
Volume 23 (2004), number 2 pp. 129–141 COMPUTER GRAPHICS forum
Space-Time Hierarchical Radiosity with Clustering

and Higher-Order Wavelets
Cyrille Damez1 , Nicolas Holzschuch2 and François X. Sillion2
1 Max-Planck-Institut
für Informatik, Saarbrücken, Germany
2 ARTIS, GRAVIR/IMAG-INRIA, Grenoble, France
Abstract
We address in this paper the issue of computing diffuse global illumination solutions for animation sequences. The
principal difficulties lie in the computational complexity of global illumination, emphasized by the movement of
objects and the large number of frames to compute, as well as the potential for creating temporal discontinuities
in the illumination, a particularly noticeable artifact. We demonstrate how space-time hierarchical radiosity, i.e.
the application to the time dimension of a hierarchical decomposition algorithm, can be effectively used to obtain
smooth animations: first by proposing the integration of spatial clustering in a space-time hierarchy; second, by
using a higher-order wavelet basis adapted for the temporal dimension. The resulting algorithm is capable of
creating time-dependent radiosity solutions efficiently.
Keywords: global illumination, animation, hierarchical radiosity, clustering, wavelets.

ACM CCS: I.3.7 [Computer Graphics]: Three-Dimensional Graphics and Realism
1. Introduction spatial dimensions. All four dimensions are treated in the

same way, meaning that we can refine an interaction either
Global illumination techniques have reached the stage where in time or in space. This results in few computations being
they allow the calculation of high-quality images of three- done in areas where there is little temporal variation of the
dimensional scenes, complete with subtle lighting and inter- illumination, while areas with rapid variation of the illumi-
reflection effects. It is therefore natural to try and use them nation will be computed to full precision. The hierarchical
for the production of animation films, or more generally in formulation guarantees a compact representation of the tem-
all lighting jobs related with special effects, such as combin- poral variations of the radiosity function: as a result, the en-
ing synthesized elements with live action film footage. Un- tire animation is computed much faster than by performing a
fortunately, global illumination techniques remain typically complete radiosity solution for each frame.
expensive to use, even more so in the case of frame-by-frame
In a way similar to the original Hierarchical Radiosity al-
lighting calculations.
gorithm [1], the efficiency of the space-time hierarchical ra-
In this paper, we present a fully developed version of the diosity algorithm depends on the depth of the space-time hi-
space-time hierarchical radiosity, an algorithm aimed at com- erarchy built during computations: the deeper the hierarchy,
puting view independent global illumination simulations for the more efficient the algorithm. This suggests that it should
animated scenes. It works with scenes containing moving prove especially beneficial for complex scenes.
solid objects, whose trajectory are known beforehand and
We have previously presented a preliminary version of the
computes a hierarchical radiosity solution for the entire ani-
space-time hierarchical radiosity algorithm [2]. We present
mation, instead of frame-by-frame.
here a fully developed algorithm. In particular, two major
The hierarchical formulation for radiosity is extended by issues are addressed: first, we have modified the algorithm so
introducing a fourth dimension, time, along with the three that it uses linear wavelets for the time dimension, to improve
c The Eurographics Association and Blackwell Publishing Ltd

2004. Published by Blackwell Publishing, 9600 Garsington Submitted September 2003
Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, Revised January 2004
MA 02148, USA. Accepted February 2004
129
130 C. Damez et al. / Space-Time Hierarchical Radiosity
the temporal continuity of the animations produced. Second, In particular, the accumulated errors due to the incremen-
we have combined space-time radiosity with clustering, thus tal nature of most interactive algorithm may cause distract-
enabling the algorithm to work more efficiently and on larger ing artifacts. The resulting frames quality may seem accept-
scenes. This allows us to use this algorithm in a range of able when considered separately. Nevertheless, discontinu-
complexity where its benefits can be fully realized. ities in the shading of surfaces may appear between two
consecutive frames. The interactive global illumination al-
The paper is organized as follows. In the next section, we gorithm proposed by Wald et al. [10], though it recomputes
briefly discuss previous work in time-dependent illumination a complete global illumination solution for each frame in-
of animated scenes, and review the shortcomings of our pre- dependently, is fast enough to converge to a good quality
liminary approach. Then, in Section 3, our fully developed view-dependent solution within a couple seconds. However,
algorithm is given in detail. In Section 4, we provide an anal- in order to avoid light flickering due to its stochastic na-
ysis of the performances of space-time hierarchical radiosity, ture, it requires to use the same random seeds from one
compared to our previous approach as well as frame-by-frame frame to the other, which only ensures temporal continu-
computations. Finally, in Section 5, we draw our conclusions ity of lighting for light paths that do not intersect moving
and trace directions for future work. objects.
It also seems natural to try to capitalize on the knowledge of
2. Background and Motivations objects movement to enhance the quality of the rendered ani-
mation. Therefore, it makes sense to consider high-quality an-
2.1. Global illumination algorithms for animations imations rendering as a separate problem, and to develop al-
Several algorithms to compute global illumination images gorithms specifically designed to solve it. Myszkowski et al.
have been proposed since the pioneering work of Goral et al. [13] extended the density estimation photon-tracing algo-
[3]. As the performance of the said algorithms and the com- rithm to the case of animated scenes, allowing the use of
puting power of graphics workstations improved, several photons for several consecutive frames. The decision to ex-
propositions have been made to extend these algorithms tend or contract the segment of time during which a given
and reduce the overwhelming cost of computing globally il- sample is valid is based on a perception-based Animation
luminated animations. Two classes of applications can be Quality Metric. It is used to measure the perceived dif-
distinguished: ference between consecutive frames, and therefore reduce
the flickering which results from the stochastic noise. How-
ever, to this date, this method is based on a fixed mesh
r Interactive methods, which render new solutions quickly,
and lack some kind of adaptive refinement scheme. There-
usually by reusing previous computation results as much
fore, the spatial resolution of the solutions computed is
as possible. They aim at offering as fast a feedback as
limited.
possible in response to changes made by the user. In-
teractive methods have been developed for Hierarchical Martin et al. [15] proposed a two pass algorithm based on
Radiosity [4,5], Path Tracing [6,7] and Particle Tracing hierarchical radiosity. During the first pass, a coarse hierar-
[8,9,10,11]. chical solution for the complete animation is computed incre-
r Offline methods, where the objects movements are sup- mentally. Then during the second pass, the resulting mesh and
posed continuous and known a priori. They aim at render- link structure is used to efficiently perform final gathering,
ing high-quality animations, and therefore should ensure assigning to each space-time mesh element a high-resolution
a constant quality. Global Monte Carlo methods [12] and texture movie representing the radiosity of this patch during
Particle Tracing [13] algorithms have been proposed to the corresponding interval of time. Since this algorithm ef-
compute high-quality global illumination animations. ficiently solves the problem of high-quality final gathering
for animated scenes, which our approach does not address,
A study of the current state of the art for both types of ani- both methods can be seen as complementary. In particular,
mated global illumination algorithms can be found in Damez the algorithm of Martin et al. does not make use of a cluster
et al. [14]. In this section, we discuss briefly only the methods hierarchy during the first pass, which limits its application to
allowing the computation of higher-quality animations. very simple scenes. However, we show in Section 3.3 how to
solve this particular issue. As a consequence, coupling both
Surprisingly, the case of high-quality animations has re- approaches seems promising.
ceived little attention when compared to the amount of work
devoted to interactive algorithms in the literature. Indeed,
most interactive algorithms could be used to compute a movie 2.2. Previous work on the space-time hierarchical
sequence. However, the quality of the resulting animation radiosity algorithm
may not always be satisfying, as these methods were de-
signed to satisfy real-time constraints instead of animation In order to reduce the cost of diffuse global illumination
quality criteria. computations for animations, we introduced in a previous
c
The Eurographics Association and Blackwell Publishing Ltd 2004
C. Damez et al. / Space-Time Hierarchical Radiosity 131
publication [2] the space-time hierarchical radiosity algo-

rithm. Our preliminary algorithm lacked several key features
that would enable its use on scenes with complex geometry
or lighting condition. In particular, it did not feature a way
to extend the object hierarchy above the surfaces level (an
approach that is commonly referred as clustering [16,17]).
Moreover, distracting “jumps” in the illumination could
appear in scenes where important changes in indirect lighting
occur along time. Similar discontinuities can be observed
in the spatial dimension for the classical static Hierarchical
Radiosity algorithm (cf. Figure 1), when the refinement oracle
used is based only on a global evaluation of the error. Oracles
designed to take into account the distribution of the error
on the receiving elements can remove such discontinuities
[18,19].
Additionally, the piecewise constant function basis we
used did not allow a proper distribution of the approximation
errors on the mesh elements. As a consequence, low-intensity
light exchanges, such as indirect bounces, were either insuffi-
ciently refined, or overly refined when the refinement thresh-
old was reduced.
We demonstrate such discontinuities using a scene for
which our preliminary algorithm performed in an obviously
unsatisfying manner. This scene is composed of four boxes
in a closed room with colored walls (red, blue, green and
gray), illuminated by rotating spotlights. As a consequence
the lighting in the scene is mostly indirect, and varies in
large proportions. The resulting animation presents impor-
tant lighting discontinuities in time, in particular at the main
subdivisions of the animation time interval, as illustrated in Figure 1: A radiosity discontinuity is clearly visible between
Figure 2. Such discontinuities must obviously be reduced in the upper and lower half of the walls.
order to make our algorithm of practical use.
Using these definitions, for every X = ( p, t) ∈ (S × T )

3. The Space-Time Hierarchical Radiosity Algorithm the radiosity function satisfies the following equation:

3.1. The space-time radiosity equation B(X ) = E(X ) + B(Y )K(X , Y ) dY , (1)
(S×T )
We want to compute the radiosity function B(p, t) for each
point p at each time t defined over (S × T ) where S is the where
set of all points on all surfaces of the scene and T is the time
interval over which we want to compute our animation. We K is defined on (S × T )2 as
define the following functions:
K(( p, t), (q, t ′ )) = ρ( p)k( p, q, t)δ(t, t ′ ) (2)
r : (S × S × T ) → IR the distance from point p to point q
at time t k is the function defined on (S × S × T ) by
θ : (S × S × T ) → [0, π] the angle between the outgoing cos θ( p, q, t) cos θ (q, p, t)

normal at p and the direction from p to q at time t k( p, q, t) = v( p, q, t) (3)
πr ( p, q, t)2
v : (S × S × T ) → {0, 1} the visibility function between
δ is the Dirac distribution equal to 0 when t = t ′
p and q at t
Note that, though equation (1) seems to describe intertem-
ρ : S → [0, 1] the diffuse reflectance at p
poral light exchanges, such nonphysical transfers are avoided
E : (S × T ) → IR the self-emitted radiosity at p at time t thanks to the Dirac function in equation (2). Note also that
c
Figure 2: Example of temporal discontinuities. Visualizations of the scene at t = 1/2 − ǫ, at t = 1/2 + ǫ and the difference
image. The illumination of the entire scene has been modified in a single frame interval.
equation (1) is formally equivalent to the classical radiosity a given element k of our mesh, defined as the cross product of
equation in the static case. Therefore, any algorithm capable polygon P k and time interval T k , correspond L basis functions
of solving the latter can probably be extended in a straight- u Lk+i (p, t) with 0 ≤ i < L, equal to 0 when (p, t) is outside
forward manner to solve the former. In particular, we shall (Pk × Tk ), and to i k (t) otherwise. Since the u i have to form
see that we can derive a finite element formulation similar to an orthogonal basis, the i k are the restriction of the first L
that of standard radiosity [17,20]. Legendre polynomials to the time interval Tk = [α k , β k ], i.e.
0
3.2. Discretization k (t) =1
√

Equation (1) is a Fredholm equation of the second kind, and 1
t − αk
k (t) = 3 2 −1
can be discretized by the Galerkin method. We want to com- βk − αk
pute an approximation B̃ of B in a finite-dimensional function ...
space spanned by an orthogonal basis of functions (u i ) 1≤i≤N .
Therefore, we can express B̃ as a linear combination of the Therefore, the variations of radiosity of each element k in
ui: the mesh will be described by L unknown coefficients BLk , . . .,
N B Lk+L−1 . Furthermore, from equation (5), it can be derived
B̃ = Bju j. that each pair (k, l) of elements in our mesh corresponds to a
j=1 L × L block ρ k I k,l in the matrix M, where I k,l is the following
The Galerkin condition [17] defines the approximation B̃ so interaction matrix:

that the residual function 1
Ik,l = G k,l (t) k( p, q, t) dq d p dt
||Pk ||(βk − αk ) Tk ∩Tl Pk Pl
r (X ) = B̃(X ) − E(X ) − B̃(Y )K(X , Y ) dY
Y ∈(S×T ) (6)
is orthogonal to all the u i . In such a case, the coefficients B j and the matrix G k,l is defined as:
that define B̃ are solutions of the following linear system:  0 0 0 L−1 
k (t) l (t) ··· k (t) l (t)
(I − M)B = E, (4) . .. ..
G k,l (t) =  .. .
 
. .
where I is the identity matrix, the vector E is defined by L−1 0
(t) l (t) · · · L−1
(t) L−1
(t)
k k l
E, u i
∀i ∈ [1, N ]E i = The interaction matrix extends the traditional notion of
u i 2
form factor used in the classical static radiosity algorithms.
and the matrix coefficients are defined by:

Y K(., Y )u j (Y ) dY , u i 3.3. Hierarchical solution of the discrete equation
2
∀(i, j) ∈ [1, N ] Mi, j = (5)
u i 2 We are using piecewise polynomial functions to describe the
variations of radiosity in time. Therefore, the resulting al-
The simplest possible choice of a function basis is piece- gorithm is an extension of the Wavelet Radiosity algorithm
wise constant functions. As discussed in Section 2.2, this [21,22], using Haar basis over the spatial dimension and
choice proves unsatisfying in certain cases, where it causes Alpert’s M L basis [23] over the time dimension.
noticeable temporal discontinuities in indirect lighting. As
a consequence, we propose instead functions that are piece- Since our mesh elements are defined both by their geom-
wise constant in space, and piecewise polynomial in time. To etry and their time interval, they can be subdivided either in
c
ble interactions between all surfaces in the scene, during the

whole animation [16,17].
Hierarchical Radiosity algorithms with clustering for static
scenes perform the construction of the cluster hierarchy as a
preprocessing step. Such an approach cannot be used in our
case, since the resulting spatial hierarchy would preexist the
recursive link refinement procedure, preventing any tempo-
ral refinement until we reach the surfaces level. We propose
instead a new approach, which we name lazy clustering. At
the beginning of our algorithm, we only build the root cluster,
Figure 3: A simple example of space-time hierarchy: here the plus one cluster for each different rigid motion in the anima-
root hierarchical element (represented as a space × time 3D tion. The rest of the cluster hierarchy is built as a by-product
volume) has been first subdivided in time. One of the resulting of the link refinement procedure. Therefore, as for surfaces
siblings has been subdivided in space. elements in the space-time mesh, we can split clusters that
have not been previously refined either in time or in space
(see Figure 4):
space (partitioning their geometry, e.g. using a quadtree sub- r Time-refinement of a cluster can be performed by creat-
division scheme) or in time (subdividing the time range and
ing two clusters as children of the original one, each one
leaving the geometry unchanged). As illustrated by Figure 3,
defined over one half of the original time interval. Each
repeated applications of either subdivision scheme build a
surface inside the original cluster must be duplicated and
data structure that offers a multi-resolution representation of
one copy is assigned to each of the two children clusters.
the radiosity function over space and time. As in the original
Hierarchical Radiosity algorithm, links joining two hierarchi-
r Space-refinement of a cluster is the act of grouping to-
cal elements are used to specify at which level of precision gether some of the surfaces contained in the said cluster,
the light exchanges should be actually computed. forming new children clusters (which may be space or
time split later on during the refinement process), pos-
Consequently, the space-time hierarchical radiosity algo- sibly leaving some surfaces as direct children. This can
rithm, similarly to the original hierarchical algorithm of be achieved in applying only one step of any classical
Hanrahan et al. is an iteration composed of the following top-down recursive clustering method (without the recur-
three steps: sion). We chose to adapt Christensen’s clustering method
[24], which is straightforward to implement and produces
(1) Recursive evaluation of the precision of all links to a rather well-formed spatial hierarchy [25].
place them at the appropriate level in the hierarchy.
The evaluation of the links is performed by a function
named refinement oracle. As a side-effect, this recursive 3.4. Space-time refinement oracle
process adaptively builds the hierarchical mesh, refin-
ing the original mesh elements where more precision is The refinement oracle is the function in charge of evaluating
needed. Our oracle is be discussed in Section 3.4. the precision of a given link and decide if it is placed at the
appropriate level in the hierarchy. In the case of the space-
(2) Gathering the light through the links in the hierar- time hierarchical radiosity algorithm, this function has two
chy. Light exchanges computations are detailed in Sec- goals:
tion 3.5.
r To decide whether a given link is a precise enough rep-
(3) Ensuring the coherence of radiosities between all hi-
resentation of the corresponding light exchange (like in
erarchical levels. This bidirectional traversal of the hi-
classical Hierarchical Radiosity).
erarchy is referred to as Push–Pull and is discussed in
Section 3.6.
r If it is not precise enough, to determine whether it should
be split in space or time.
To fully benefit from the strength of the hierarchical for-
mulation, it is necessary to extend the hierarchy of elements A simple oracle, based on a comparison of the estimated
above the initial surfaces level, by hierarchically grouping time-variance and space-variance of the irradiance gathered
together surfaces, and eventually groups of surfaces (called across the link, always produces the same mesh, regardless of
clusters). At the top of our hierarchy will be one root cluster, the function basis used. This is obviously not satisfying since
which will represent all surfaces, during the whole anima- piecewise polynomial functions should offer a better approx-
tion. The starting point of the algorithm will be the root link imation of the radiosity than piecewise constant functions at
joining this cluster to itself, thereby representing all possi- the same subdivision depth.
c
Figure 4: The cluster in the left-hand image (represented in 2D for simplicity) can be subdivided either spatially (center),
or temporally (on the right). Spatial subdivision builds one new hierarchical level of clusters around the surfaces. Temporal
subdivision duplicates the surfaces and subdivides the corresponding time interval.
We propose to extend for space-time radiosity an oracle r An average temporal variance: for each fixed control
designed for Wavelet Radiosity in the static case [18,19,26], point we compute the variance of error values at each
based on estimates of the error on the propagated energy control times, and then take the spatial average.
rather than on estimates of the variation of this energy. We
use a grid of control points located on the receiving element, We refine the interaction in time if the average temporal
and a set of control points in time. On these control points, at variance is above the average spatial variance, and in space
the control times, we estimate the radiosity value using two otherwise.
methods:
(1) by multiplying the emitters’ radiosity vector by the in- 3.5. Light exchanges computations
teraction matrix corresponding to the link, and then in-
terpolating the radiosity values at the control times, Computing the light exchanged between two linked surfaces
is straightforward. The product of the link’s interaction ma-
(2) by direct integration of the radiosity on the emitter, us- trix by the radiosity of the emitter is added to the radiosity
ing a quadrature. of the receiver. The interaction matrix has generally been
computed previously during the refinement procedure using
The difference between these two values is an indication simple Gaussian quadratures.
of the error made when evaluating the interaction at this point
However, interactions involving one or more clusters re-
for this level of precision. The norm of these differences is
quire a special approach, based on the one described by
used as the error on the current interaction. Refinement will
Sillion [27]. Roughly, anisotropic emission from a cluster is
occur if this norm is above the refinement threshold set by
approximated by going down to the surfaces level to estimate
the user.
the directional radiant intensity exiting the cluster (Delayed
The control points and times must be carefully chosen Pull), and the irradiance gathered by a cluster from a given
so that they provide meaningful information. They must be hierarchical element is distributed to all the surfaces inside
different from the quadrature points and times used for the the cluster immediately at gathering time according to their
form factor computations. The number of control points and orientation (Immediate Push). The specificities of the space-
times must be higher for large receivers so that we do not time hierarchical radiosity method come from the fact that the
miss important features. Also, placing control times at the position, orientation and radiosity of the objects can change
beginning and at the end of the time interval greatly enhances with time.
temporal continuity.
3.5.1. Emission from a cluster: Delayed pull
Once we have made the decision to refine an interaction,
we must choose between refinement in space or in time. We In the classical hierarchical radiosity algorithm, the compu-
compute two variance estimates for the set of estimated error tation of the light emitted from an object involves the compu-
values on our grid of control points and times: tation of the form factor between the sender l and the receiver
k. It is very difficult to define what the form factor should be
r An average spatial variance: for each fixed control time if the sender is an anisotropic cluster. Therefore, we directly
we compute the variance of error values at each control compute the irradiance emitted by the cluster to the receiver,
points, and then take the temporal average. by summing the contributions of the N surfaces contained in
c
the cluster l. At a given time t, point p receives from the N Since both the cluster and the sender may be moving, the
elements i in l the total irradiance: receiver factor is time-dependent. We need to project it on
N
our wavelet basis. Let us assume a cluster k has received
Ir eceived ( p, t) = Bi (q, t)g( p, q, t)v( p, q, t)dq, an irradiance I received . This irradiance is distributed to each
Qi
i=1 surface i in the cluster k according to its orientation:
where the geometric configuration function is defined by: Ii = Ir eceived (t) cos θi (t)
R(t) cos θ ′
g(x, y, t) = I i is then reprojected on the wavelet basis for the time inter-
πr 2 val T i over which the hierarchical element i is defined. The
and the R function is the receiver factor defined in [27] as resulting approximate irradiance is then:
cos θ if the receiver is a surface and 1 if the receiver is a cluster L−1
(the surfaces orientation in the receiving cluster will be taken
j
I˜i = γ1 i
into account by the Immediate Push mechanism described in j=0
Section 3.5.2).
and the γ coefficients are:
We approximate the received irradiance by projecting it
1 j
on our function basis: The resulting approximation is a linear γj = j 2
Ii | i
combination of our L basis functions: || i ||

L−1 1 j
= Ir eceived (t) cos θi (t) i (t) dt.

I˜ = λ j u Lk+ j . βi − αi Ti ∩Tl
j=0
These integrals are once again approximated using a Gaussian
Since the u i are orthogonal we have: quadrature.
1
λj = Ir eceived , u Lk+ j Our method contains two successive approximations: we
||u Lk+ j ||2
have separately computed the irradiance received at the clus-
j
i Tk ∩Ti Pk Q i
g( p,q,t)v( p,q,t)dqd p Bi (t) k (t)dt
ter level, which was time-dependent, projected it onto the
= . function basis, then dispatched it to the surfaces, taking into
Ak (βk − αk )
account the surface movement, and reprojected it on the func-
tion basis for the receiving surface. This double approxima-
Computing the above integral is costly as it involves a tion is consistent with the clustering approach. If the refine-
number of visibility estimations proportional to the number ment oracle decides that we can compute an interaction at
of surfaces in the cluster. Therefore we approximate it by the cluster level, then this approximation should be sufficient.
factoring out the visibility, and average it over the sending Spending more computation time to find a better approxima-
cluster: tion would impair the hierarchical nature of the algorithm

j
and would reduce its performance.
i Tk ∩Ti Pk Q i
g( p, q, t)dqd p Bi (t) k (t)Ṽ (t)dt
λj = , 3.6. Push–pull traversal
Ak (βk − αk )
where After the irradiances have been gathered across all links in
1
the scene, a traversal of the complete hierarchy is necessary
Ṽ (t) = v(x, y, t) dx dy. to maintain coherence between the different hierarchical lev-
A k Al Pk l
els. First, irradiance contributions computed at various level
We compute the α j and Ṽ (t) using a Gaussian quadrature. of the hierarchy have to be pushed down to the lowest level of
Since the cost of evaluating the approximate visibility must the structure and summed along the way. Here the radiosities
not depend on the number N of surfaces inside the cluster l of each leaf are computed, and these radiosities are then pro-
we place the quadrature points independently of the surfaces gressively pulled up the hierarchy and averaged to compute
positions inside the cluster’s bounding box. the correct radiosity representation corresponding to each hi-
erarchical level.
3.5.2. Reception inside a cluster: Immediate push
In the case of Wavelet Radiosity [21], this process is
The reception inside a cluster obeys the immediate push prin- slightly more complicated than it is for static Hierarchical
ciple: the irradiance received at the cluster level is immedi- Radiosity since we need to define how to combine the coef-
ately dispatched to all surfaces inside the cluster, where it is ficients describing the radiosity variations, to convert them
multiplied by the cosine of the angle between the normal of from one hierarchical level to the other. Remember from Sec-
the surface and the direction of the incoming radiance. The tion 3.2 that our multi-resolution basis functions are cross
origin of the incoming radiance is assumed to be the center products of the scale functions of the Haar basis over space,
of the emitter, whether a cluster or a surface. and scale functions of the M L basis over time. Since we use
c
a very simple midpoint subdivision scheme when subdivid- needing change would be the Push–Pull matrices we gave in
ing elements in time, the coefficients that have to be pushed Section 3.6.
down or pulled up during this traversal can be computed us-
ing simple linear transformations, which are independent of However, this problem does not arise in the temporal
the element or the hierarchical level. Both linear transforms dimension. Moreover, the lower dimensionality makes the
are referred to as the two scale relationship [22], and are de- added cost of the use of wavelets lower in the temporal di-
termined by two L × L matrices P and Q. When L = 2 (linear mension than it is in the spatial dimension. Therefore, we
wavelets), those matrices are: decided to limit our use of wavelets to the description of the
    temporal variations of radiosity. In our implementation, our
1 0 1 0 function basis was composed of linearly varying functions
P=

√

 Q = √
 
. (the M2 basis). This choice proved sufficient in practice to
 3 1   3 1 significantly reduce the temporal discontinuities (see Sec-
−
2 2 2 2 tion 4).
In order to provide a smooth appearance for patches in
When pushing down the total irradiance I (remember that our example animations, we applied a simple linear interpo-
this is a L-dimensional vector) from a given element split in lation over the polygons as a postprocess, when traversing
time to each of its two children, the corresponding irradiances the space-time mesh to generate the images. Though it no-
I ′ and I ′′ to be transmitted to its first and second children are ticeably increases the visual appeal of the results, this post-
given by the following linear transform: process doesn’t improve the precision of the solution. Much
I′ = t PI and I ′′ = t Q I . better reconstruction methods have been proposed for static
scenes, such as final gathering [30,31], and can be applied
Respectively when pulling up, the average radiosity B of an here on an image per image basis. Moreover, Martin et al.
element can be computed from the radiosities of its two chil- have proposed recently a final gathering acceleration method
dren B′ and B′′ : for animated scenes [15], whose coupling with our approach
1 seems promising.
B= (P B ′ + Q B ′′ )
2
3.7.2. Memory management issues

and refinement ordering.
3.7. Practical issues and choices
As our experiments will show in Section 4, the space-time
3.7.1. Choice of a space-time function basis. hierarchical radiosity algorithm is quite memory intensive.
This is due to the fact that we keep in memory a complete
Higher-order wavelets have been previously used as func- view independent solution describing the variations of radios-
tion basis for the representation of radiosity [21,22]. How- ity for all surfaces during the whole animation time interval.
ever, the algorithms resulting from their straightforward use Though the amount of memory available on graphic worksta-
within the classical Hierarchical Radiosity framework were tions is rapidly increasing, it may prove necessary to reduce
proved impractical, slower than when using classical piece- our algorithm requirements when running on less powerful
wise constant function basis and giving poorer results [28]. machines or when computing long animation sequences.
Further research [26] has later shown that by making use
of several recent advances in the field [18,19,26,29] higher- One way of reducing the memory cost is to use hard-disk
order wavelets were providing a better approximation of the space to cache parts of the hierarchy that are temporarily not
illumination function, requiring less memory and computa- required for computations. For the algorithm to remain ef-
tion time. In particular, the radiosity function produced looks ficient when such a caching scheme is used, accesses to the
continuous without postprocessing thanks to an adequate re- disk should be limited to a small number of large files. The or-
finement oracle. der according to which we traverse the space-time hierarchy
during the refinement should be chosen accordingly.
Unfortunately, to this date, higher-order wavelets in the
spatial dimension cannot be easily applied to cluster objects. Such a desirable traversal order can be derived if we recall
This is due to the fact that it is difficult to provide a func- from equation (6) that elements whose time interval do not
tion mapping the surfaces contained in a cluster onto the overlap cannot exchange energy. Since we always subdivide
square domain where the wavelet basis is defined. Therefore, time intervals in two parts of equal length, for every element
in order to be able to use higher-order wavelets for surfaces P that can interact with a given element Q, we know that
and clusters for inexpensive approximations, a mechanism to either T P ⊂ T Q or T Q ⊂ T P . Therefore, we can easily deter-
use different approximation order for different hierarchical mine which element may be needed to compute or refine a
elements should be defined. Our refinement oracle would au- certain interaction, by bucketing hierarchical elements and
tomatically adapt to the new function basis. The only point links according to their time interval.
c
Figure 5: Variation of the radiosity function at the center of the highlighted element during the animation.
The refinement can be performed as a traversal of the hier- sweeping movement of spotlights over walls painted in dif-
archy that would correspond to a depth-first-order traversal ferent colors cause important changes in the indirect illumi-
of the time intervals binary tree. Only the elements whose nation of the scene. In particular, strong color bleeding effects
time interval contain the one currently visited should be kept can be observed moving on the ceiling and the floor of the
in memory. Disk access would only take place when moving scene.
from one time interval to the other.
As explained in Section 2.2, this scene was purposedly
We ran an experiment to estimate the corresponding gain designed as a “worst-case scenario” for the space-time hier-
in memory that could be expected from such a traversal. For archical radiosity algorithm in order to exhibit strong tempo-
the SPOT scene (see Sections 2.2 and 4.1), we used 25 − 1 = ral discontinuities. When using a piecewise constant function
31 time interval buckets to sort our elements (one for each basis to describe the variation of radiosity in time, the indirect
time interval corresponding to the first five subdivision level). lighting effects in this scene are extremely discontinuous. For
The maximum total cost of the portion of the hierarchy that example, the color bleeding patches seem to be updated only
needs to be kept in memory is 40 MB whereas more than every second or so. The amplitude of these discontinuities is
450 MB are required when we are keeping everything in shown in the radiosity variation plot of Figure 5. It can be
RAM (cf. Table 1). In such a case, file accesses should not clearly seen that the greatest discontinuity is located at the
reduce excessively the performances of our algorithm: The middle of the animation, then at the first and third quarter.
added cost of reading and writing 31 files each about 15 The magnitude of the largest discontinuity is about 40% of
MB big should be reasonable since an iteration on this scene the time-average radiosity of this patch, which makes it quite
already requires 10–20 minutes. noticeable. Smaller discontinuities can be observed at other
even subdivisions of the time interval.
4. Experimental Results However, the same animation, computed using hierarchi-

cal elements linearly varying in time, exhibits a much more
In this section, we discuss the performance of our algorithm coherent indirect illumination. Without paying careful atten-
on several test scenes. In Section 4.1, we demonstrate the tion, it is difficult to perceive any discontinuity. We can see
improvement on temporal continuity that our use of piecewise on Figure 5 that the indirect lighting is obviously more “con-
linearly varying functions offers when compared to simple tinuous” than when using a piecewise constant function ba-
“box” functions. In Section 4.2, we demonstrate our use of sis. The largest discontinuity (at t = 1/4) has a magnitude of
clustering and we offer comparisons with frame-by-frame about 7% of the average radiosity of the patch. Figure 6 shows
Hierarchical Radiosity calculations to show that we obtain that this strong reduction of discontinuities can be observed
good acceleration factors. on all surfaces of the scene. Table 1 allows the comparison
of computation time and memory cost when computing this
4.1. Improvement of temporal continuity animation with a frame-by-frame Hierarchical Radiosity al-
gorithm, with our algorithm using the Haar basis, and with
To illustrate the improvement in temporal continuity ob- our algorithm using the M2 basis, respectively. All timings
tained using the M2 function basis in the time dimension, have been measured on a single 300Mhz MIPS R12000 pro-
we use the test scene from Section 2.2. In this scene, the cessor of an SGI Onyx2 computer.
c
Figure 6: Comparison of temporal discontinuities at t = 12 . Darker colors indicate a higher discontinuity (arbitrary units). Left:
hierarchical radiosity, right: M2 wavelets.
Table 1: Performance comparison on the SPOTS scene, between frame-by-frame Hierarchical Radiosity, our algorithm using the Haar basis,
and our algorithm using the M2 basis
Computation Time
Direct Lighting Indirect Lighting Total per Image Memory Used (MB)
Frame-by-Frame HR 1 s × 600 = 600 s 15 s × 600 = 9, 000 s 16.0 s 5

Haar 254 s 1,492 s 2.9 s 587
M2 271 s 1,172 s 2.4 s 464
The following comments can be made about these results: The first of our three test animations takes place in a small
room with some furniture (a couple desks, chairs, pens, etc.).
r Though this scene is geometrically quite simple, the It is lit by four area light sources. The bookshelf, against
speedup factor obtained, when compared to a frame-by- the wall, falls to the floor. The animation is 4 seconds long,
frame computation, is about 6. (Note that this accelera- and is composed of 100 frames. The input geometry is com-
tion factor is only about 2 if we only take into account posed of 7, 200 polygons. The second animation takes place
the time needed to compute the direct illumination). Our in a large library hall with several desks separated by rows of
algorithm performance on such scenes where the indirect bookshelves. This scene is lit by numerous area light sources.
lighting is dominant and dramatically changing over time A character is moving through the hall. The animation is 20
is therefore satisfying. seconds long and is composed of 500 frames. There are about
35 000 input surfaces. The third animation is somehow sim-
r The memory consumption when using the M2 basis is
ilar to the test scene we use in Section 4.1. We replaced the
15% lower than when using the Haar basis, in spite of
boxes by more complex objects. The resulting scene is com-
the added storage cost of the second radiosity coefficient
posed of approximately 30 000 polygons and is 24 seconds
and the interaction matrices. The animation has also been
long.
computed slightly faster. This is due to the fact that fewer
subdivisions in time are needed to obtain a precise enough Table 2 summarizes our experimental results for our three
representation of the variations of radiosity in time, re- test scenes. In this table, we compare the resources neces-
sulting in a faster refinement and a lighter mesh. sary to compute the animations when using the space-time
hierarchical radiosity algorithm and when performing a hier-
archical radiosity with Clustering frame-by-frame providing
4.2. Validation of the Clustering Approach the same image quality. All timings have been observed on a
300 MHz MIPS R12000.
We have tested our algorithm on scenes composed of sev-
eral thousands of input polygons (see Figure 7). For such The more elements the mesh is composed of, the more
scenes, Hierarchical Radiosity computations without the use the hierarchical approach is advantageous. Since it makes it
of clustering would have been extremely long because of the possible to compute more complicated animations, clustering
quadratic cost of the initial linking stage. really allows us to benefit fully from the hierarchical nature of
c
Figure 7: Sample frames from our test animations.
Table 2: Comparative results for the use of clustering: we compare computation time and memory use of our algorithm to the time and memory
needed to compute the same animation frame-by-frame with classical Hierarchical Radiosity with Clustering
Computation Time Memory Used
STHR Static HRC STHR (MB) Static HRC (MB)
SHELF 3,335 s 184 s × 100 = 18, 400 s 100 16

HALL 33,333 s 1.185 s × 500 = 592, 500 s 842 120
ROBOTS 4,591 s 109 s × 600 = 65, 400 s 475 17
our algorithm. The typical speedup is ranging from 6 5. Conclusion

to 18.
In this paper, we proposed a new algorithm to compute global
The memory consumption of our algorithm is quite high, illumination in diffuse animated environments. This algo-
since we keep in memory at the same time a complete view rithm is based on the adaptive refinement of a hierarchical
independent global illumination solution for all frames of the mesh defined both over time and space. It can therefore ben-
animation (we have discussed a possible way to avoid this efit from the a priori knowledge of objects movements to
in Section 3.7.2). However, we can note that the memory factor out a large part of redundant computations.
cost of our algorithm depends more on the complexity of the
illumination than on the number of input polygons. The more This technique allows computation of animations with a
complex the input mesh is, the smaller the polygons are on quality similar to frame-by-frame computation, in a shorter
average. Therefore, they are less likely to be subdivided later, time. Geometrically complex scenes can be dealt with thanks
and the resulting hierarchy will not be much bigger than if it to the definition of a clustering approach extending the space-
consisted initially of large unsubdivided surfaces. time mesh. The continuity of indirect lighting is improved by
c
the simultaneous use of a piecewise-linear wavelet basis in 9. X. Granier and G. Drettakis. Incremental updates for
the time dimension and of an adequate space-time refinement rapid glossy global illumination. In Computer Graph-
oracle. ics Forum (Proceedings of Eurographics 2001), vol. 20,
Promising directions for future research include: pp. 268–277. 2001.
r The derivation of a space-time final gathering approach, 10. I. Wald, T. Kollig, C. Benthin, A. Keller and P.
adapting the one proposed by Martin, Pueyo and Tost Slusallek. Interactive global illumination. In Proceed-
[15]. ings of the 13th Eurographics Workshop on Rendering.
2002.
r The implementation and extensive testing of disk caching
schemes such as the one suggested in Section 3.7.2. 11. K. Dmitriev, S. Brabec, K. Myszkowski and H.-P. Sei-
r The parallelization of this algorithm. This should be del. Interactive global illumination using selective pho-
straightforward on a shared memory architecture [32] ton tracing. In Proceedings of the 13th Eurographics
but will certainly prove more difficult on a cluster of PC. Workshop on Rendering. 2002.
r Experiments with alternative wavelet basis in the time
12. G. Besuievsky and X.Pueyo. Animating radiosity envi-
dimension, for example using higher-order polynomials.
ronments through the multi-frame lighting method. Jour-
r The extension of our algorithm to nondiffuse scenes, us- nal of Visualization and Computer Animation, 12:93–
ing a unified mesh-based particle shooting approach [9]. 106, 2001.
r The construction of a refinement criteria using human
perception-based animation quality metrics [13]. 13. K. Myszkowski, T. Tawara, H. Akamine and H.-P. Sei-
del. Perception-guided global illumination solution for
References animation rendering. In Computer Graphics (ACM SIG-
GRAPH ’01 Proceedings), pp. 221–230. 2001.
1. P. Hanrahan, D. Salzman and L. Aupperle. A rapid hi-
erarchical radiosity algorithm. In Computer Graphics 14. C. Damez, K. Dmitriev and K. Myszkowski. State of
(ACM SIGGRAPH ’91 Proceedings), vol. 25, pp. 197– the art in global illumination for interactive applications
206. 1991. and high-quality animations. Computer Graphics Fo-
rum, 22(1):55–77, 2003.
2. C. Damez and F. Sillion. Space-time hierarchical radios-
ity. In Proceedings of the 10th Eurographics Workshop 15. I. Martı́n, X. Pueyo and D. Tost. Frame-to-frame coher-
on Rendering, pp. 235–246. 1999. ent animation with two-pass radiosity. IEEE Transac-
tions on Visualization and Computer Graphics, 9(1):70–
3. C. M. Goral, K. E. Torrance, D. P. Greenberg and B. Bat- 84, 2003.
taile. Modelling the interaction of light between diffuse
surfaces. In Computer Graphics (ACM SIGGRAPH ’84 16. B. Smits, J. Arvo and D. Greenberg. A clustering algo-
Proceedings), vol. 18, pp. 212–222. 1984. rithm for radiosity in complex environments. In Com-
puter Graphics (ACM SIGGRAPH ’94 Proceedings),
4. E. Shaw. Hierarchical radiosity for dynamic envi- pp. 435–442. 1994.
ronments. Computer Graphics Forum, 16(2):107–118,
1997. 17. F. Sillion. Clustering and volume scattering for hier-
archical radiosity calculations. In Proceedings of the
5. G. Drettakis and F. Sillion. Interactive update of global 5th Eurographics Workshop on Rendering, pp. 105–117.
illumination using a line-space hierarchy. In Computer 1994.
Graphics (ACM SIGGRAPH ’97 Proceedings), pp. 57–
64. 1997. 18. P. Bekaert and Y. D. Willems. Error control for radiosity.
In Proceedings of the 7th Eurographics Workshop on
6. B. Walter, G. Drettakis and S. Parker. Interactive render- Rendering, pp. 153–164. 1996.
ing using the render cache. In Proceedings of the 10th Eu-
rographics Workshop on Rendering, pp. 235–246. 1999. 19. P. Bekaert and Y. D. Willems. Hirad: A hierarchical
higher order radiosity implementation. In Proceedings
7. P. Tole, F. Pellaccini, B. Walter and D. P. Greenberg. In- of the Twelfth Spring Conference on Computer Graphics
teractive global illumination in dynamic scenes. ACM (SCCG ’96), Bratislava, Slovakia, Comenius University
Transactions on Graphics (SIGGRAPH ’02 Proceed- Press, June 1996.
ings), vol. 21(3), pp. 537–546. 2002.
20. M. F. Cohen and J. R. Wallace. Radiosity and Realistic
8. A. Keller. Instant radiosity. In Computer Graphics (ACM Image Synthesis. Academic Press Professional, Boston,
SIGGRAPH ’97 Proceedings), pp. 49–56. 1997. MA, 1993.
c
21. S. J. Gortler, P. Schroder, M. F. Cohen and P. Hanrahan. ters. IEEE Transactions on Visualization and Computer
Wavelet radiosity. In Computer Graphics (ACM SIG- Graphics, 1(3): 240–254, 1995.
GRAPH ’93 Proceedings), pp. 221–230. 1993.
28. A. Willmott and P. Heckbert. An empirical comparison
22. P. Schroder, S. J. Gortler, M. F. Cohen and P. Hanrahan. of progressive and wavelet radiosity. In J. Dorsey and
Wavelet projections for radiosity. In Fourth Eurograph- P. Slusallek. (eds), Rendering Techniques ’97 (Proceed-
ics Workshop on Rendering, pp. 105–114. 1993. ings of the Eighth Eurographics Workshop on Render-
ing), New York, NY, pp. 175–186, Springer, Wien. 1997.
23. B. K. Alpert. A class of bases in L2 for the sparse repre- ISBN 3-211-83001-4.
sentation of integral operators. SIAM Journal on Math-
ematical Analysis, 24(1):246–262, 1993. 29. M. Stamminger, H. Schirmacher, P. Slusallek and H.-
P. Seidel. Getting rid of links in hierarchical radiosity.
24. P. H. Christensen, D. Lischinski, E. J. Stollnitz and D. H. Computer Graphics Journal (Proc. Eurographics ’98),
Salesin. Clustering for glossy global illumination. ACM 17(3), C165–C174, 1998.
Transactions on Graphics, 16(1):3–33, 1997.
30. M. Reichert. A Two-Pass Radiosity Method to Transmit-
25. J.-M. Hasenfratz, C. Damez, F. Sillion and G. Dret- ting and Specularly Reflecting Surfaces, M.Sc. thesis,
takis. A practical analysis of clustering strategies for Cornell University, 1992.
hierarchical radiosity. In Computer Graphics Forum
(Proc. Eurographics ’99), vol. 18, pp. C221–C232. Sept. 31. A. Scheel, M. Stamminger and H.-P. Seidel. Grid based
1999. final gather for radiosity on complex clustered scenes.
Computer Graphics Forum, 21(3): 547–556, 2002.
26. F. Cuny, L. Alonso and N. Holzschuch. A novel approach
makes higher order wavelets really efficient for radios- 32. F. Sillion and J.-M. Hasenfratz. Efficient parallel
ity. In Computer Graphics Forum (Proc. Eurographics refinement for hierarchical radiosity on a DSM com-
2000), vol. 19, pp. C99–C108. 2000. puter. In Proceedings of the Third Eurographics Work-
shop on Parallel Graphics and Visualisation. pp. 61–
27. F. Sillion. A unified hierarchical algorithm for global 74. 2000. http://www-imagis.imag.fr/Membres/Jean-
illumination with scattering volumes and object clus- M..Hasenfratz/PUBLI/EGWPGV00.html.
c
2.7.7 Accurate detection of symmetries in 3D shapes (TOG 2006)

Auteurs : Aurélien M, Cyril S, Nicolas H et François S
Journal : ACM Transactions on Graphics, vol. 25, no 2.
Date : avril 2006
Accurate Detection of Symmetries in 3D Shapes
AURÉLIEN MARTINET, CYRIL SOLER, NICOLAS HOLZSCHUCH and FRAN ÇOIS X. SILLION
ARTIS, INRIA Rhône-Alpes
We propose an automatic method for finding symmetries of 3D shapes, that is, isometric transforms which leave a shape globally
unchanged. These symmetries are deterministically found through the use of an intermediate quantity: the generalized moments.
By examining the extrema and spherical harmonic coefficients of these moments, we recover the parameters of the symmetries
of the shape. The computation for large composite models is made efficient by using this information in an incremental algorithm
capable of recovering the symmetries of a whole shape using the symmetries of its subparts. Applications of this work range from
coherent remeshing of geometry with respect to the symmetries of a shape to geometric compression, intelligent mesh editing,
and automatic instantiation.
Categories and Subject Descriptors: I.3.5 [Computer Graphics]: Computational Geometry and Object Modeling—Curve, surface,
solid and object representations
General Terms: Algorithms
1. INTRODUCTION
Many shapes and geometrical models exhibit symmetries: isometric transforms that leave the shape
globally unchanged. Using symmetries, one can manipulate models more efficiently through coherent
remeshing or intelligent mesh editing programs. Other potential applications include model compres-
sion, consistent texture-mapping, model completion, and automatic instantiation.
The symmetries of a model are sometimes made available by the creator of the model and represented
explicitly in the file format the model is expressed in. Usually, however, this is not the case, and auto-
matic translations between file formats commonly result in the loss of this information. For scanned
models, symmetry information is also missing by nature.
In this article, we present an algorithm that automatically retrieves symmetries in a geometrical
model. Our algorithm is independent of the tesselation of the model; in particular, it does not assume
that the model has been tesselated in a manner consistent with the symmetries we attempt to identify,
and it works well on noisy objects such as scanned models. Our algorithm uses a new tool, the generalized
moment functions. Rather than computing these functions explicitly, we directly compute their spherical
harmonic coefficients, using a fast and accurate technique. The extrema of these functions and their
spherical harmonic coefficients enable us to deterministically recover the symmetries of a shape.
For composite shapes, that is, shapes built by assembling simpler structures, we optimize the compu-
tation by applying the first algorithm to the subparts, then iteratively building the set of symmetries of
Authors’ addresses: A. Martinet, C. Soler, N. Holzschuch, F. X. Sillion, ARTIS, INRIA Rhône-Alpes, Saint Ismier, France; email:
Aurelien. [email protected].
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided
that copies are not made or distributed for profit or direct commercial advantage and that copies show this notice on the first
page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists,
or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested
from Publications Dept., ACM, Inc., 1515 Broadway, New York, NY 10036 USA, fax: +1 (212) 869-0481, or [email protected].
c 2006 ACM 0730-0301/06/0400-0439 $5.00
ACM Transactions on Graphics, Vol. 25, No. 2, April 2006, Pages 439–464.
97
440 • A. Martinet et al.
the composite shape, taking into account both the relative positions of the subparts and their relative
orientations.
We envision many applications for our work, including geometric compression, consistent mesh edit-
ing, and automatic instantiation.
This article is organized as follows. In the following section, we review previous work on identifying
geometric symmetries on 2D and 3D shapes. Then in Section 3, we present an overview of the symmetry-
detection problem and the quantities used in our algorithms. In Section 4, we introduce the generalized
moments and our method to compute them efficiently; in Section 5, we present our algorithm for
identifying symmetries of a shape. The extension of this algorithm to composite shapes is then presented
in Section 6., Finally, in Section 7, we show various applications of our algorithm.
2. RELATED WORK
Early approaches to symmetry detection focused on the 2D problem. Attalah [1985], Wolter et al. [1985]
and Highnam [1985] present methods to reduce the 2D-symmetry detection problem to a 1D pattern
matching problem for which efficient solution are known [Knuth et al. 1977]. Their algorithms efficiently
detect all possible symmetries in a point set but are highly sensitive to noise.
Identifying symmetries for 3D models is much more complex, and little research on this subject has
been published. Jiang and Bunke [1991] present a symmetry-detection method, restricted to rotational
symmetry, based on a scheme called generate and test, first finding hypothetical symmetry axes, then
verifying these assumptions. This method is based on a graph representation of a solid model and uses
graph theory. The dependency between this graph representation and the mapping between points
makes their method highly dependent on the topology of the mesh and sensitive to small modifications
of the object geometry. Brass and Knauer [2004] provide a model for general 3D objects and give an
algorithm to test congruence or symmetry for these objects. Their approach is capable of retrieving
symmetry groups of an arbitrary shape but is also topology-dependent since it relies on a mapping
between points of the model. Starting from an octree representation, Minovic et al. [1993] describe an
algorithm based on octree traversal to identify symmetries of a 3D object. Their algorithm relies on
PCA to find the candidate axis; PCA, however, fails to identify axes for a large class of objects, including
highly symmetric objects such as regular solids.
All these methods try to find strict symmetries for 3D models. As a consequence, they are sensitive
to noise and data imperfections. Zabrodsky et al. [1995] define a measure of symmetry for nonperfect
models, defined as the minimum amount of work required to transform a shape into a symmetric shape.
This method relies on the ability to first establish correspondence between points, a very restrictive
precondition.
Sun and Sherrah [1997] use the Extended Gaussian Image to identify symmetries by looking at
correlations in the Gaussian image. As in Minovic et al. [1993], they rely on PCA to identify potential
axes of symmetry, thus possibly failing on highly symmetric objects. More recently, Kazhdan et al. [2004]
introduced the symmetry descriptors, a collection of spherical functions that describe the measure of a
model’s rotational and reflective symmetry with respect to every axis passing through the center of mass.
Their method provides good results in the shape identification but involves a surface integration for each
sampled direction; this surface integration is carried on a voxel grid. Using the symmetry descriptors
to identify symmetries requires an accurate sampling in all directions, making their algorithm very
costly for an accurate set of results. In contrast, our algorithm only computes a deterministic small
number of surface integrals, which are performed on the shape itself, and still provides very accurate
results. Effective complexity comparisons will be given in Section 8.
ACM Transactions on Graphics, Vol. 25, No. 2, April 2006.
98
Accurate Detection of Symmetries in 3D Shapes • 441
Fig. 1. Mirror symmetries and rotational symmetries found by our algorithm for a cube (for clarity, not all elements are repre-
sented).
3. OVERVIEW
Considering a surface S, the symmetries of S are the isometric transforms which map S onto itself, in
any coordinate system centered on its center of gravity. Symmetries of a shape form a group for the
law of function composition with identity as its neutral element. For a given shape, the study of such a
group relates to the domain of mathematical crystallography [Prince 2004].
The group of the cube, for instance, contains 48 elements (see Figure 1): the identity, eight 3−fold
rotations around 4 possible axes, nine 4−fold rotations around 3 possible axes, six 2−fold rotations
around 6 possible axes, nine mirror-symmetries, and fifteen other elements obtained by composing
rotations and mirror symmetries.
Studying the group of isometries in IR 3 shows that, for a given isometry I , there always exists an
orthonormal basis (X, Y, Z) into which the matrix of I takes the following form:
⎛ ⎞
λ 0 0
α ∈ [0, 2π[
I (λ, α) = ⎝ 0 cos α − sin α ⎠ with
λ = ±1
0 sin α cos α
As suggested by the example of the cube, this corresponds to 3 different classes of isometries: rotations,
mirror symmetries, and their composition, depending whether λ is positive and/or α = 0(mod π ). Find-
ing a symmetry of a shape thus resolves into finding a vector X — which we call the axis of the isometry
— and an angle α — which we call the angle of the isometry — such that I (λ, α) maps this shape onto
itself.
However, finding all symmetries of a shape is much more difficult than simply checking whether
a given transform actually is a symmetry. In particular, the naive approach that would consist of
checking as many sampled values of (X, λ, α) as possible to find a symmetry is far too costly. We thus
need a deterministic method for finding good candidates.
Our approach to finding symmetries is to use intermediate functions, which set of symmetries is
a superset of the set of symmetries of the shape itself, but for which computing the symmetries is
much easier. By examining these functions, we will derive in Section 5 a deterministic algorithm
which finds a finite number of possible candidates for X, λ, and α. Because some unwanted triplets
of values will appear during the process, these candidates are then checked back on the original
shape. Choosing a family of functions which fulfill these requirements is easy. More difficult is the
99
task of finding such functions for which computing the symmetries can be done both accurately and
efficiently.
Inspired by the work on principal component analysis [Minovic et al. 1993], we introduce the gener-
alized moment functions of the shape for this purpose. These functions will be the topic of Section 4.
These functions, indeed, have the same symmetries as the shape itself plus a small number of extra
candidates. Furthermore, we propose an elegant framework based on spherical harmonics to accurately
and efficiently find their symmetries.
A second contribution of this article is to extend the proposed algorithm into a constructive algorithm
which separately computes the symmetries of subcomponents of an object—using the first method—,
and then associates this information to compute symmetries of the whole composite shape. This con-
structive algorithm proves to be more accurate in some situations and more efficient when it is possible
to decompose an object according to its symmetries. It is presented in Section 6.
4. GENERALIZED MOMENTS
In this section, we introduce a new class of functions: the generalized moments of a shape. We then
show that these functions have at least the same symmetries as the shape itself and that their own
symmetries can be computed in a very efficient way.
4.1 Definition
For a surface S in a 3-dimensional domain, we define its generalized moment of order 2 p in direction ω
by

2p
M (ω) = s × ω2 p ds. (1)
s∈S
In this definition, s is a vector which links the center of gravity of the shape (placed at the origin) to a
point on the surface, and ds is thus an infinitesimal surface element. M2 p itself is a directional function.
It should be noted that, considering S to have some thickness d t, the expression M2 (ω)d t (i.e., the
generalized moment of order 2) corresponds to the moment of inertia of the thin shell S along ω, hence
the name of these functions. Furthermore, the choice of an even exponent and a cross-product will lead
to very interesting properties.
4.2 Shape Symmetries and Moments

Symmetry properties of a shape translate into symmetry properties of its moment functions. We now
introduce a theorem that we will be rely on (see proof in Appendix):
THEOREM 1. Any symmetry I of a shape S also is a symmetry of all its M2 p moment functions:
I (S) = S ⇒ ∀ω M2 p (I (ω)) = M2 p (ω).
Furthemore, if M2 p has a symmetry I with axis ω, then the gradient of M2 p is null at ω:

∀ω M2 p (I (ω)) = M2 p (ω) ⇒ (∇M2 p )(ω) = 0.
This theorem implies that the axes of the symmetries of a shape are to be found in the intersection
of the sets of directions which zero the gradients of each of its moment functions. The properties are
not reciprocal, however. Once the directions of the zeros of the gradients of the moment functions have
been found, they must be checked on the shape itself to eliminate false positives.
100
4.3 Efficient Computation

At first sight, looking for the zeros of the gradient of the moment functions requires precise and dense
sampling of these functions which would be very costly using their integral form of Equation (1). We
thus present an efficient method to compute the generalized even moment functions of a shape, using
spherical harmonics. In particular, we can accurately compute the spherical harmonic coefficients of
the moment functions without sampling these functions. The search for zeros in the gradient will then
be performed efficiently on the spherical harmonic decomposition itself.
Spherical Harmonics. We use real-valued spherical harmonics [Hobson 1931] to represent directional
functions. Real spherical harmonics are defined, for integers l ≥ 0 and −l ≤ m ≤ l , by:
⎧√
⎨ 2 Nlm Plm (cosθ ) cos(mϕ) for 0 < m ≤ l
Y lm (θ, ϕ) = √ Nlm Pl0 (cosθ ) for m=0
⎩
2 Nlm Pl−m (cosθ) sin(mϕ) for − l ≤ m < 0
where Plm are the associated Legendre polynomials; the normalization constants Nlm are such that the
spherical harmonics form an orthonormal set of functions for the scalar product:

< f , g >= f (ω) g (ω) dω.
ω=1
This corresponds to choosing:
2l + 1 (l − |m|)!
Nlm = .
4π (l + |m|)!
We will use the following very powerful property of spherical harmonics. Any spherical harmonic of
degree l can be expressed in a rotated coordinate system using harmonics of same degree and coefficients
depending on the rotation R:
′ ′
Y lm ◦ R = Dlm,m (R)Y lm . (2)
−l ≤m′ ≤l
Any combination of spherical harmonics of degree less than l can therefore be expressed in a rotated
coordinate system,
′
using spherical harmonics of degree less than l , without loss of information. Coeffi-
cients Dlm,m (R) can efficiently be obtained using recurrence formula [Ivanic and Ruedenberg 1996] or
directly computed [Ramamoorthi and Hanrahan 2004].
Computation of Moment Functions. As defined by Equation (1), the 2 p−moment function of a shape
S is expressed as:

2p
M (ω) = s × ω2 p ds
s∈S

= s2 p sin2 p β ds
s∈S
In this expression, β is the angle between s and ω.

Function β → sink β has angular dependence on β only and therefore decomposes into zonal harmon-
ics (i.e., harmonics Y lm for which m = 0). Performing the calculation shows that, when k is even, the
decomposition is finite. Setting k = 2 p, we obtain :
p
sin2 p β = S lp Y 2l0 (β, .)
l =0
101
with:
√ 2l
(4l + 1)π 22 p+1 p!(2k)!( p + k − l )!
S lp = (−1)k . (3)
22l k=l
(2( p + k − l )+1)!(k − l )!k!(2l − k)!
For the sake of completeness, we provide the corresponding derivation and the proof of the finite de-
composition in the appendix section of this article.
Let Rs be a rotation which maps z, unit vector along z−axis, to s. Using Equation (2) for rotating the
Y 2l0 zonal harmonics, we have :
p 2l
sin2 p β = S lp 0,m
D2l (Rs)Y 2lm (ω).
l =0 m=−2l
And finally:
p 2l
2p
M2 p (ω) = C2l ,m Y 2lm (ω), (4)
l =0 m=−2l
using

2p 0,m
C2l ,m = S lp s2 p D2l (Rs) ds. (5)
s∈S
Equation (4) says that M2 p decomposes into a finite number of spherical harmonics, and Equation (5)
allows us to directly compute the coefficients. The cost of computing M2 p is therefore ( p + 1)(2 p + 1)
surface integrals (one integral per even order of harmonic, up to order 2 p). This is much cheaper than
the alternative method of computing the scalar product of M2 p as defined by Equation (1) with each
spherical harmonic basis function: this would indeed require many evaluations of M2 p , which itself is
defined as a surface integral. Furthermore, numerical accuracy is only a concern when computing the
m 2p
C2k, p coefficients, and we can now compute both M and its gradient analytically from Equation (4).
5. FINDING SYMMETRIES OF A SINGLE SHAPE

In this section, we present our algorithm for identifying symmetries of a shape seen as a single entity as
opposed to the algorithm presented in the next section where the shape is considered as an aggregation
of multiple subparts. For a given shape, we want to determine the axis X and the (λ, α) parameters
of the potential isometries, using the generalized moment functions, and check the isometries found
against the actual shape.
Central symmetries (λ = −1 and α = π) form a specific case since, by construction, M2 p always has
a central symmetry. Because central symmetries also do not require an axis, we treat this case directly
while checking the other candidate symmetries on the shape itself in Section 5.3.
5.1 Determination of the Axis

As we saw in Section 4.2, the axis of isometries which let a shape globally unchanged also zero the
gradient of the generalized even moments of this shape. We thus obtain a superset of them by solving
for:
∇(M2 p )(ω) = 0.
In a first step, we estimate a number of vectors which are close to the actual solutions by refining the
sphere of directions starting from an icosahedron. In each face, the value of ∇(M2 p )(ω)2 is examined
102
in several directions, and faces are sorted by order of the minimal value found. Only faces with small
minimum values are refined recursively. The number of points to look at in each face, as well as the
number of faces to keep at each depth level, are constant parameters of the algorithm.
In a second step, we perform a steepest descent minimization on ∇(M2 p )(ω)2 , starting from each
of the candidates found during the first step. For this we need to evaluate the derivatives of ∇(M2 p )
which we do using analytically computed second-order derivatives of the spherical harmonics along
with Equation (4). The minimization converges in a few steps because starting positions are by nature
very close to actual minima. This method has the double advantage that (1) the derivatives are very
efficiently computed and (2) no approximation is contained into the calculation of the direction of the
2p
axis beyond the precision of the calculation of the C2l ,m coefficients.
During this process, multiple instances of the same direction can be found. We filter them out by
estimating their relative distance. While nothing in theory prevents the first step from missing the
area of attraction of a minimum, it works very well in the present context. Indeed, moment functions
are very smooth, and shapes having two isometries with very close—yet different—axis are not common.
Finally, because all moment functions, whatever their order, must have an extremum in the direction
of the axis of the symmetries of the shape, we compute such sets of directions for multiple moment
functions (e.g., M4 , M6 and M8 ) but keep only those which simultaneously zero the gradient of all
these functions, which in practice leaves none or very few false positives to check for.
5.2 Determination of Rotation Parameters
After finding the zero directions for the gradient of the moment functions, we still need to find the
parameters of the corresponding isometric transforms. This is done deterministically by studying the
spherical harmonic coefficients of the moment functions themselves. We use the following properties.
PROPERTY 1. A function has a mirror symmetry Sz around the z = 0 plane if and only if all its spherical
harmonic coefficients for which l + m is even are zero (i.e., it decomposes onto z−symmetric harmonics
only). In the specific case of the moment functions:
2p
∀ω M2 p (ω) = M2 p (Sz ω) ⇔ m ≡ 0(mod 2) ⇒ C2l ,m = 0.
PROPERTY 2. A function has a revolution symmetry around the z axis if and only if it decomposes onto
zonal harmonics only, that is,
∀l ∀m m = 0 ⇒ Clm = 0.
PROPERTY 3. A function is self-similar through a rotation Rα of angle α around z if and only if all its
spherical harmonic coefficients Clm verify:
∀l ∀m Clm = cos(mα)Clm − sin(mα)Cl−m . (6)
Property 3 can be adapted to check if the function is self-similar through the composition of a rotation
and a symmetry with the same axis (i.e., the case λ = −1 as defined in Section 3). In this case, the
equation to be checked for is:
∀l ∀m (−1)l +m Clm = cos(mα)Clm − sin(mα)Cl−m . (7)
These properties are easily derived from the very expression of the spherical harmonic functions
[Hobson 1931].
103
Before using these properties, the moment function must be expressed in a coordinate system where
the z axis coincides with the previously found candidate axis. This is performed using the rotation
formula in Equation (2). Then checking for Properties 1 and 2 is trivial provided that some tolerance
is accepted on the equalities. Using Property 3 is more subtle; coefficients of the function are first
examined by order of decreasing m. For λ = 1, for instance, when the first nonzero value of Clm is found,
Equation (6) is solved by:

mα C −m 2 Cl−m kπ
tan = lm , that is, α = arctan m + ,
2 Cl m Cl m
then all the remaining coefficients are checked with the obtained values of α. If the test passes, then α
is the angle of an existing rotation symmetry for the moment function. A very similar process is used
to search for α when λ = −1.
The error tolerance used when checking for Properties 1, 2, and 3 can be considered as a way of
detecting approximate symmetries on objects. We will show in the results section that symmetries can
indeed be detected on noisy data such as scanned models.
5.3 Filtering Results

The condition extracted from Theroem 1 is a necessary condition only. To avoid false positives, the
directions and rotation angles obtained from the moment functions must therefore be verified on the
shape itself. We do this using a symmetry measure inspired by the work of Zabrodsky et al. [1995]. Let S
and R be two tessellated shapes. Let VS and VR be the mesh vertices of S and R. We define the measure
d M between S and R by:
d M (S, R) = max(min p − q). (8)
p∈VS q∈R
The symmetric measure d A (S) of a shape S with respect to a symmetry A is then defined by:
d A (S) = max(d M (S, AS), d M (AS, S)).
It should be noted that this definition is different from that of the Hausdorff distance since, in Equa-
tion (8), not all points of S are considered but only the mesh vertices, whereas all points of R are used.
However, because S is polyhedral, d A (S) = 0 still implies that AS = S.
Computing d A is costly, but fortunately we only compute it for a few choices of A which are the
candidates we found at the previous step of the algorithm. This computation is much cheaper than
computing a full symmetry descriptor [Kazhdan et al. 2004] for a sufficient number of directions to
reach the precision of our symmetry detection algorithm.
5.4 Results
Complete Example. The whole process is illustrated in Figure 2. Starting from the original object (a),
the moment functions of orders 4, 6, and 8 are computed (see, e.g., M8 in (b)). The gradients of these
moments are then computed analytically (c) and used for finding the directions of the minima. The
unfiltered set of directions contains 7 directions among which only 3 are common extrema of M4 , M6 ,
and M8 . This set of 3 directions (D1 ,D2 , and D3 ) must contain the axes of the symmetries of the shape.
After checking the symmetry axis and parameters on the actual shape, D1 is revealed as the axis of a
2-fold symmetry which is the composition of the two remaining mirror symmetries of axes D2 and D3 .
The example of the cube, shown in Figure 1, illustrates the extraction of rotations and mirror sym-
metries. Experiments have shown that our method finds all 48 symmetries whatever the coordinate
system the cube is expressed in originaly.
104
Fig. 2. Extraction of symmetries for a single shape. Starting from the original shape (a), generalized moments (b) and their
gradients (c) are computed. The set of their common extrema directions contains the axes of the symmetries of the shape, depicted
at right. Here, both mirror symmetries have been found as well as the 2-fold rotational symmetry. Note that the original shape
is neither convex nor star-shaped and that the mesh is not consistent with the symmetries of the geometry.
Fig. 3. View of the three 3D models used in the robustness tests presented in Figure 4 shown with their symmetries. For the
sake of clarity, we chose models with only one symmetry each.
Robustness Tests. We now study the sensitivity of our method to small perturbations of the 3D model
in two different ways.
(1) Noise. We randomly perturb each vertex of each polygon independently in the original model by a
fraction of the longest length of the model’s bounding box.
(2) Delete. We randomly delete a small number of polygons in the model.
We use a set of three models to test the robustness of our method. These model as well as their
symmetry are shown in Figure 3. For the sake of clarity, we use objects with only one symmetry.
In order to test the robustness of the method, we progressively increase the magnitude of the noise
and let the algorithm automatically detect the symmetry. In our robustness tests, we consider shapes as
single entities and use the first algorithm presented in Section 5 to detect these symmetries. To evaluate
105
Fig. 4. We test the sensitivity of the method to noise by progressively increasing noise magnitude and letting the algorithm
detect the symmetry for each of our three test models. We evaluate the accuracy of the results by computing the angular de-
viation between the axis found and the axis of the symmetry of the original model. Top row: We perturb each vertex of each
polygon independently by a fraction of the longest length of the bounding box on each of the three test models. The left fig-
ure shows a noisy pick-up model with a noise magnitude of 1% and the right figure shows angular deviation evolution for
the three models for a magnitude ranging from 0% to 1%. Bottom row: We randomly delete polygons of the models. The left
figure shows a noisy pick-up obtained by deleting 5% of the polygons and the right figure shows angular deviation evolution
by deleting 0% to 5% of the polygons of the three models. As can be seen from the curve, for small variations of the mod-
els, our method has approximatively linear dependency regarding noise and delivers high-quality results even for nonperfect
symmetries.
the reliability of the results, we compute the angular deviation between the found axis of symmety and
the real one, that is, computed with no noise. In our experiments, noise magnitude varies from 0 to 1%
of the longest length of the model’s bounding box, and the number of deleted polygons ranges from 0 to
5% of the total number of polygons in the model (see Figure 4).
The results of these experiments show that, for small variations, our method has approximatively
linear dependency regarding noise and delivers high-quality results even for nonperfect symmetries.
These statistical results can also be used to derive an upper bound on the mean angular error obtained
as a function of the noise in the model.
5.4.1 Application to Scanned Models. We present in Figure 5 examples of applying the single-shape
algorithm to scanned models, retreived from a Web database and used as is (see http://shapes.aim-at-
shape.net). Our algorithm perfectly detects all the parameters of candidate symmetries for all these
106
Fig. 5. Our algorithm perfectly detects approximate symmetries of scanned models. Detecting these symmetries requires re-
laxing the constraints when checking candidate symmetries on the model. Please note that these scanned models are by nature
neither axis-aligned nor tesselated according to their symmetries. This illustrates the fact that our algorithm does not depend
on the coordinate system nor on the mesh of the objects.
Table I. Computation times (in seconds) for the Four Scanned

Models Presented in Figure 5
Model Teeth Vase Pelvis Angkor statue
# polygons 233, 204 76, 334 50, 000 163, 054
Computing moments* 33.7 11.8 7.26 23.26
Finding parameters 0.4 0.6 0.4 0.7
Checking candidates 9.4 11.1 5 12.2
Total 43.5 23.5 12.66 36.16
∗
Global computation times for moments of order 2 to 8
shapes. When testing these symmetries, one should allow a large enough symmetry distance error (as
defined in Section 5.3) because these models are by nature not perfectly symmetric.
5.5 Discussion
Because the M2 p functions are trigonometric polynomials on the sphere, they have a maximum number
of strict extrema depending on p: the larger p is, the more M2 p is able to capture the information of a
symmetry, that is, to have an extremum in the direction of its axis. But because all moment functions
must have a null gradient in this direction (according to Theorem 1), these extrema are bound to become
nonstrict extrema for small values of p, and M2 p is forced to be constant on a subdomain of nonnull
dimension. Using the cube as an example in which case M2 is a constant function a trigonometric
polynomial of order 2 can simply not have enough strict extrema to represent all 12 distinct directions
of the symmetries of the cube.
In all the tests we conducted, however, using moments up to order 10 has never skipped any symmetry
on any model. But it would still be interesting to know the exact maximum number of directions
permitted by moments of a given order.
6. FINDING SYMMETRIES OF GROUPS OF OBJECTS

In Section 5, we have presented an algorithm for finding the symmetries of single shapes. In this section,
we present a constructive algorithm which recovers the symmetries of a group of objects—which we
call tiles to indicate that together they form a larger object—from the symmetries and positions of each
separate tile.
107
Fig. 6. This figure illustrates the reliability of our congruency descriptor (as defined by Equation (9)). Two identical objects
meshed differently and expressed in two different coordinate systems ( A and B) have extremely close descriptor vectors, but a
slightly different object (C) has a different descriptor. The graphics on the right shows each component of the three descriptors.
The constructive algorithm first computes (if necessary) the symmetries of all separate tiles using
the single shape algorithm. Then it detects which tiles are similar up to an isometric transform and
finds the transformations between similar tiles. Then it explores all one-to-one mappings between tiles,
discarding mappings which do not correspond to a symmetry of the group of tiles as a whole.
Section 6.2 explains how we detect similar tiles and Section 6.3 details the algorithm which both
explores tile-to-tile mappings and finds the associated symmetry for the whole set of tiles.
Because it is always possible to apply the algorithm presented in Section 5 to the group of tiles,
considering it as a single complex shape, questioning the usefulness of the constructive method is
legitimate. For this reason, we will explain in Section 6.5 in which situations the constructive method
is preferable to the algorithm for single shapes; but let us first explain the method itself.
6.1 Computing the Symmetries of Each Tile

If not available, the symmetries of each tile are computed using the algorithm presented in Section 5.
When assembling known objects together, the economy of this computation can, of course, be performed
by simply computing the symmetries of one instance for each class of different tiles.
6.2 Detecting Tiles Congruency

In this subsection, we introduce a shape descriptor suitable for detecting whether two shapes are
identical up to an—unknown—isometry. We will use this tool for classifying tiles before trying to find
a mapping of a composite object onto itself.
2p
Let S be a shape and C2l ,m the spherical harmonic coefficients of its generalized even moment func-
tions M2 p up to an order p. Our shape descriptor is defined as the p( p + 1)/2-vector obtained by pack-
ing together the frequency energy of the spherical harmonic decomposition of all moments of S up to
order p:

2p 2p 2p
D2 p = d 00 , d 02 , d 22 , . . . , d 0 , d 2 . . . d 2 p (9)
with
2
2k 2k
d 2l = C2l ,m (10)
−2l ≤m≤2l
(See Figure 6). It has been shown by Kazhdan et al. [2003] that dlk , as defined in Equation (10), does
not depend on the coordinate system the spherical harmonic decomposition is expressed in. This means
2p
that each d 2l , and therefore D2 p itself, is not modified by isometric transforms of the shape. Mirror
108
Table II. Percentage of Tiles Matched by our Shape

Descriptor That Are Effectively Identical For Our Test Scenes
Max 39,557 Polygons 182,224 Polygons 515,977 Polygons
order 851 Tiles 480 Tiles 5,700 Tiles
2 92.1% 43.9% 92.3%
4 100% 78.0% 100%
6 100% 92.2% 100%
8 100% 100% 100%
Fig. 7. Scenes used for testing the object congruency descriptor. In each scene, the descriptor has been used to detect objects
with similar geometry (but possibly different meshes) up to a rigid transform. Objects found to be congruent are displayed with
the same color.
2p
symmetries do not affect d 2l either since they only change the sign of the coefficient for some harmonics
in a coordinate system aligned with the axis.
Two tiles A and B are considered to be similar up to an isometric transform, at a precision ε, when:
D2 p (A) − D2 p (B) < ε.
Theoretically, this shape descriptor can produce false positives, that is, tiles that are not congruent
but have the same descriptor, but it can not produce false negatives because of its deterministic nature.
Our experiments have shown that using moments up to order 6 produces a sufficiently discriminant
shape descriptor on all test scenes. This is illustrated in Table II where we present the average precision
value, that is, the percentage of matched tiles that are actually identical up to an isometric transform,
for a set of architectural scenes (Figure 7).
By definition, congruent tiles should have the same set of symmetries, possibly expressed in different
coordinate systems. Since we know the symmetries of each of the tiles, we introduce this constraint,
thereby increasing the discriminating power of our shape descriptor as shown in Table III.
6.3 Algorithm for Assembled Objects

6.3.1 Overview. Once we have determined all classes of congruent tiles, the algorithm examines
all the one-to-one mappings of the set of all tiles onto itself which map each tile onto a similar tile.
For each one-to-one mapping found, it determines the isometric transforms which are simultaneously
compatible with each tile and its symmetries.
The algorithm works recursively: at the beginning of each recursion step, we have extracted two
subsets of tiles, H1 and H2 , of the composite shape S, and we have computed the set of all possible
isometric transforms that globally transform H1 into H2 . Then, taking two new similar tiles, S1 ∈ S \ H1
109
Table III. Percentage of Tiles Matched By Our Shape

Descriptor That Are Effectively Identical Using the Added
Constraint That Identical Tiles Must Have the Same Set of
Symmetries Up to a Rigid Transform
Max 39,557 Polygons 182,224 Polygons 515,977 Polygons
order 851 Tiles 480 Tiles 5,700 Tiles
2 95.6% 73.4% 97%
4 100% 96.0% 100%
6 100% 100% 100%
8 100% 100% 100%
and S2 ∈ S \ H2 , we restrict the set of isometric transforms to the isometric transforms that also map
S1 onto S2 (but not necessarily S2 onto S1 ). Because these tiles have symmetries, this usually leaves
multiple possibilities.
Note that the global symmetries found must always be applied with respect to the center of mass g
of S, according to the definition of a symmetry of S.
At the end of the recursion step, we have the set of isometric transforms that map H1 ∪ {S1 } onto
H2 ∪ {S2 }.
Each recursion step narrows the choice of symmetries for S. The recursion stops when either this
set is reduced to identity transform or when we have used all the component tiles in the model. In the
latter case, the isometric transforms found are the symmetries of the composite shape. The recursion
is initiated by taking for H1 and H2 two similar tiles, that is, two tiles of the same class.
In the following paragraphs, we review the individual steps of the algorithm: finding all the isometric
transforms which map tile S1 onto similar tile S2 and reducing the set of compatible symmetries of S.
We then illustrate the algorithm in a step-by-step example.
6.3.2 Finding All the Isometries Which Transform a Tile onto a Similar Tile. At each step of our
algorithm, we examine pairs of similar tiles, S1 and S2 , and we have to find all the isometries which
map S1 onto S2 .
If gi is the center of mass of tile Si and g is the center of mass of the composite shape S, this condition
implies that the isometries we are looking for transform vector g1 − g into g2 − g. In order to generate
the set of all isometric transforms that map S1 onto S2 , we use the following property.
PROPERTY 4. If J is an isometry that maps S1 onto a similar tile S2 , then all the isometries K which
map S1 onto S2 are of the following form:
K = J T −1 AT with A ∈ G S1 such that A(g1 − g) = g1 − g, (11)
where G S1 is the group of symmetries of S1 , and T is the translation of vector g− g1 (refer to the Appendix
for proof of this property).
This property states that, once we know a single seed isometric transform which maps S1 onto S2 , we
can generate all such transforms by using the elements of G S1 in Equation (11).
6.3.3 Finding a Seed Transform. We need to find a seed transform J that maps S1 onto S2 . For each
tile, we extract a minimum set of independent vectors that correspond to extremas of their generalized
even moment functions. The number of vectors needed depends on the symmetries of the tile. J is then
defined as any isometric transform that maps the first set of vectors onto the second as well as vector
g1 − g onto g2 − g. Most of the time, a single isometric transform is possible at most. When multiple
choices exist, the candidate transforms are checked onto the shapes using the distance presented in
Section 5.3. This ensures that we find at least one seed transform.
110
Fig. 8. Three spheres uniformly distributed on a circle in the z-plane. Etablishing all one-to-one mappings of the set of all tiles
onto itself, which map each tile onto a similar tile, are used to detect all the symmetries of the shape. Note that the 3−fold
symmetry H is detected and is associated to a circular permutation mapping.
6.3.4 Ensuring Compatibility with Previous Isometries. During the recursion, we need to store the
current set of compatible isometries we have found. We do this by storing a minimal set of linearly
independent vectors along with their expected images by these isometries. For example, if we have to
store a symmetry of revolution, we store only one vector, the axis of the symmetry, and its image (itself).
For mirror symmetries, rotations, and central symmetries, we store three independent vectors, along
with their images by this isometric transform. For instance, in the case of a rotation of angle π around
axis X, we have:
X→ X Y → −Y Z → −Z. (12)
By examining all the one-to-one mappings of the set of all tiles onto itself, which map each tile onto a
similar tile, we are able to detect all symmetries of the set of tiles (see Figure 8). Note in this example
that the 3−fold symmetry H is detected and is associated to a circular permutation mapping.
6.4 Step-By-Step Example
Figure 9 presents a very simple example of a shape (a pair of pliers) composed of 3 tiles, S1 , S2 (the
handles), and R (the head). Two of the tiles are similar up to an isometric transform, S1 and S2 . Figure 9
also displays the centers of mass, g1 , and g2 of tiles S1 and S2 (which are not in the plane z = 0), and
the center of mass g of the whole shape. In the coordinate systems centered on their respective centers
of mass, S1 and S2 have a mirror symmetry of axis Z, and R has a rotation symmetry around axis X of
angle π .
Our constructive algorithm starts by selecting tile R and a similar tile (here, the only possible choice
is R).
Step 1. The algorithm explores the possibilities to transform R into itself. Two possibilities exist (a) the identity
transform, and (b) the rotation around X of angle π, deduced from (a) by Property 4.
At this point, the algorithm branches, and either tries to map S1 to itself (branch 1) or to S2 (branch 2).
Branch 1, Step 1. The algorithm tries to match S1 to itself. The only compatible transform is the identity
transform.
111
Fig. 9. Illustration of the constructive algorithm on a very simple example: from the symmetries of each of the 3 parts of the
object, the symmetries of the whole object are recovered. Please note that no symmetry was ommitted in this Figure. In particular,
tile R has only a rotational symmetry but no mirror symmetry. See text of Section 6.4 for a detailed explanation.
Fig. 10. A complex model which has the same group of symmetries as the icosahedron. The constructive algorithm successfully
retrieves all 15 planes of mirror symmetries (center) and all 31 distinct axes of rotational symmetries (right) using the rotational
and mirror symmetry of each tile (at left). The presence of 3−fold and 5−fold symmetries proves that our algorithm also detects
symmetries which map a set of similar tiles onto itself through a complex permutation.
Branch 1, Step 2. The algorithm then tries to map S2 to itself. Once again, the only possible transform is the
identity transform, and the recursion stops because all the tiles in the model have been used.
Branch 2, Step 1. The algorithm tries to match S1 to S2 . The only compatible transform is the rotation around X
of angle π.
Branch 2, Step 2. The algorithm then tries to match S2 to S1 . Once again, the only compatible transform is the
rotation around X of angle π, and the recursion stops because all the tiles in the model have been used.
Two symmetries have been found that map the shape onto itself, the identity transform and the
rotation around X of angle π. Note that, although our algorithm can potentially create lots of branching,
we prune branches that result in empty sets of transforms and, in practice, we only explore a small
number of branches.
6.5 Application Scenarios

In order to illustrate the efficiency of the constructive algorithm, we show in this section various situ-
ations where this method is a valuable alternative to the single-shape algorithm.
6.5.1 Application to an Agregation of Many Objects. Figure 10 presents a complex model which has
the same group of symmetries as an icosahedron. The constructive algorithm retrieves all the 31 distinct
axis of rotational symmetries (Figure 10, right) as well as the 15 axis of plannar symmetries (Figure 10,
112
Table IV. Comparison of the costs of the single-shape algorithm

presented in Section 5 to the cost of the constructive algorithm to
find all 46 symmetries of the icosahedron shape displayed on
Figure 10 at equivalent precision. Because the object is close to a
sphere and because it has many symmetries, the constructive
algorithm performs much better
Method Single shape (order 10) Constructive (order 4)
Moments calculation 500 sec 30 × 0.5 sec
Symmetry verification 46 × 55 sec 30 × 2 × 1.5 sec
Tile congruency N/A 2 sec
Tile mappings N/A 10 sec
Total 50mn 30 sec 1mn 57 sec
middle) of the shape, using the symmetries of each tile (Figure 10, left), which are 1 revolution symmetry
and 1 mirror symmetry.
Conversely, directly applying the first algorithm on such a shape shows that M2 to M8 are extremely
close to constant functions, making the extraction of directions an inaccurate process. The single-shape
algorithm still correctly finds all the axis if using moments up to order 10, but this has some impact on
computation times. Furthermore, the single-shape algorithm requires checking all of the symmetries
found on the model which is a significant part of its computation time. This is not the case for the
constructive algorithm because it relies on its knowledge of the symmetries of the tiles only. Because
many symmetries exist for this model, the total computation time of the single-shape algorithm is
therefore much higher. This is summarized in Table IV where we compare the computation times for
both methods at equivalent precision (i.e., 10−4 radians).
6.5.2 Finding Symmetries Inside Noncoherent Geometry. There exist common situations where 3D
scenes do not come as a set of closed separate objects but as an incoherent list of polygons. This hap-
pens, for instance, when retrieving geometric data from a Web site, mostly because a list of polygons
constitutes a practical common denominator to all possible formats.
In such a case, applying the single-shape algorithm would certainly give the symmetries of the whole
scene but if we are able to partition the set of polygons into adequate groups, that is, tiles to which we
apply the constructive algorithm, we may be able to extract symmetric objects from the scene as well
as the set of symmetries for the whole scene more rapidely as illustrated in Figure 10.
The gain in using the constructive algorithm to recover symmetries in the scene resides in the fact
that, once tile symmetries have been computed, grouping them together and testing for symmetries in
composed objects only adds a negligible cost which is not the case when we try to apply the single-shape
algorithm to many possible groups of polygons or even to the entire scene itself.
The various issues in the decomposition of a raw list of polygons into intelligent tiles are beyond the
scope of this article. In our case, tiles only need to be consistent with the symmetries. We propose the
following heuristic to achieve this correctly for most scenes:
We define tiles as maximal sets of edge-connected polygons. To obtain them, we insert all vertices of
the model into a KDTree and use this KDTree to efficiently recover which polygons share vertices up
to a given precision and share an edge. By propagating connectivity information between neighboring
polygons, we then build classes of edge-connected polygons, which we define to be our tiles. Figure 11
gives examples of such tiles for objects collected from the Web as a raw list of polygons.
Our simple heuristic approach of making tiles produced very good results on all scenes we tested and
suffices for a proof of concept of the constructive algorithm. This is illustrated in Figure 11 where a
lamp object and a chess game are shown along with their global symmetries. These symmetries were
113
Fig. 11. Two models taken from the Web. From the raw list of polygons (left) our heuristic for scene partitionning extracts tiles
before the single-shape algorithm computes the symmetries for each of them (center). Using this information, the constructive
algorithm computes the symmetries of the whole model (right). Top row: A lamp object which has seven mirror symmetries and a
7−fold rotational symmetry. Bottom row: a chess board which is composed of pieces with very different symmetries but reveals to
only have a single 2−fold symmetry around a vertical axis (Note: in this last model, once tiles have been identified, chess pieces
were moved so as to obtain a model with at least one global symmetry).
computed from the symmetries of each of the subparts. These, in turn, were separately computed using
the algorithm presented in Section 5.
Obviously, this application requires that constructed tiles be consistent with symmetries, that is, that
it is possible to partition the scene into tiles which will map onto each other through the symmetries
of the scene. This may not be easy with scanned models, for instance, nor in perturbated data. In such
a case, our simple heuristic should be modified so as to base polygon neighborood relationships on
proximity distances between polygons rather than vertex positions only. Doing so, cutting one tile into
two parts and remeshing them independently, would have a high probability of producing the same
original tile after reconstruction. If not, then the existance of a symmetry inside the model may become
questionnable. Suppose, for instance, that the wrench in the step-by-step example (Section 6.4) gets
split into tiles that are not exact symmetrical copies of one another, and that these two tiles are too far
away to be merged into a single tile. Then the model is by nature not symmetric anymore which will
also be the output of the constructive algorithm.
114
Table V. Computation Times (in seconds) for the

Different Steps of our Algorithm, for the Models
Shown in this Article
Model Plier Lamp Chessboard
# polygons 1, 940 39, 550 24, 942
# tiles 3 22 8
Computing moments* 0.9 18.2 15
Finding parameters 0.4 1.2 2.0
Checking candidates 2.3 7.4 7.9
Constructive algo. 0.001 0.05 0.01
Total 3.601 26.85 24.91
∗
Global computation time for moments of order 2 to 8.
6.6 Computation Cost

Computation times (in seconds) for the models shown in this article are given in Table V as well as
the complexity of the models. They were measured on a machine equipped with a 2.4GHz processor
with 512MB of memory. As expected, the cost of the computation of the moment functions and the
cost of the verification of the candidates required by the first algorithm occupy the most important
part of the total cost and depend on the model complexity. Conversely, finding the parameters of the
symmetries (Section 5.2) as well as applying the constructive algorithm only depends on the number
of these symmetries.
Regarding accuracy, both algorithms computed the axes of the symmetries with a maximum error of
10−4 radians, independently of shape complexity, in our tests.
7. APPLICATIONS
7.1 Geometry Compression and Instantiation
Our framework can be used for model compression at two different levels. (1) If a model exhibits
symmetries, then it can be compressed by storing only the significant part of the model and using the
symmetries to recreate the full model. (2) If a model contains multiple instances of the same part, then
these parts can be instantiated. (see Figure 12).
Although complex models often do not present symmetries, symmetry-based compression can usu-
ally be used on some subparts of the model. The ability to express a model by explicitely storing the
significant parts only while instancing the rest of the scene is provided by some recent 3D file formats
such as X3D (see Table VI). We thus measure our compression ratios as the size of the X3D files before
and after our two compression operations which we detail now.
The scene is first loaded as a raw collection of polygons, before being decomposed into tiles, using the
heuristic presented in Section 6.5.2. We then compute symmetries and congruent descriptors for each
tile. Computation times shown in Table VI present the average time needed to compute symmetries and
congruent descriptors for a single tile. As the process of computing tile properties does not depend on the
other tiles, it is an easily parallelizable process. The scene is then first compressed by instancing the tiles.
Secondly, when storing each tile, we only store the minimum significant part of its geometry according
to its symmetries. This part is extracted using the same algorithm we will present for remeshing a tile
according to its symmetries in the next section. Note that compression rates shown on this table are
computed using geometry informations only, that is, neither texturing nor material information are
taken into account. Compression times shown in Table VI are the times needed to detect all classes of
tile congruency.
115
Fig. 12. Detecting symmetries and similarities between tiles created from a raw list of polygons allows us to compress geometric
models in two ways: (1) by instancing similar tiles and (2) inside each symmetric tile, by instancing the part of the geometry which
permits to reconstruct the whole tile. In such a big model as the powerplant (13 millions triangles), we achieve a compression
ratio (ratio of geometry file size in X3D format) of 1:4.5. We show in this figure two subparts of the complete model. For each, we
show the tiles computed by our heuristic (see Section 6.5) as well as the obtained compression ratio. The PowerPlant model is a
courtesy of The Walkthru Project.
Table VI. Examples of Compression Rates Obtained Using our Symmetry

Detection Method Coupled with the Congruency Descriptor. (See text in
Section 7.1 for a detailed explanation.)
Model Room Plane Studio Powerplant
# polygons 39, 557 182, 224 515, 977 12, 748, 510
# tiles 851 480 5, 700 525, 154
av. computing tile properties (secs) 1.45 1.3 1.9 1.1
Compression time (secs) 7.2 9 14.6 311
Compression rate 1 : 2.7 1 : 8.3 1 : 3.5 1 : 4.5
7.2 Mesh Editing

It may be interesting, when an object contains symmetries, to remesh the object with respect to these
symmetries. In order to do this, we proceed by first extracting the minimum part of the shape that can
be reconstructed through each symmetry independently, then we apply the corresponding symmetry
to each of them in order to get as many meshes of the shape which are consistent with each symmetry
independently. The final step is to compute the union of all these meshes, merging identical vertices and
adding new vertices at edge crossings. While not necessarily optimal, the obtained mesh is consistent
with all symmetries of the shape.
Since a coherent remeshing allows for the establishment of a correspondence between model ver-
tices, we have developed a proof-of-concept mesh editing system which allows the user to modify a
3D object under the constraints given by the symmetries of the original object. It appears that, under
the constraint of too many symmetries, no vertices can be moved independently of the others, and the
geometry is sometimes bound to scale about its center of gravity. Images collected from this program
are displayed in Figure 13.
8. DISCUSSION
We discuss here a number of features of our technique as well as differences with existing approaches.
116
Fig. 13. Starting from an object in arbitrary orientation, we detect symmetries of the shape (in the figure, a planar symmetry)
and use it to remesh the objects with respect to these symmetries. Then, a user can easily edit the mesh and modify it while
keeping the symmetries of the initial shape.
Using Spherical Harmonics

Generalized moments are a central component of our system. As stated before, we do not compute
these functions explicitly but we rather compute their coefficients in a spherical harmonics basis. As
for the decomposition itself, any basis could be used. In particular, a well chosen basis of 3D monomials
restricted to the unit sphere may also lead to a finite decomposition. Still, using spherical harmonics
has many advantages, in particular, because we use the same coefficients computed once for different
tasks throughout this article. (1) The expression of moment function as a sum of spherical harmonics
provides an accurate detection of the potential axes of symmetries. This detection is made deterministic
by finding the zero directions for the gradient of the moment functions. Such a computation is performed
analytically from the 2nd order derivatives of the spherical harmonics, and thus does not introduce
further approximation. (2) Computing symmetry parameters for the moment functions is made very
easy by working on the spherical harmonic coefficients themselves. Since spherical harmonics are
orthogonal and easily rotated, finding symmetries on the moment functions translates into simple
relationships between the coefficients. (3) The spherical harmonic coefficients provide an effective shape
congruency descriptor which we use to detect which tiles are identical up to an unknown isometric
transform.
In summary, the use of spherical harmonics provides us a consistent framework throughout the whole
process of our symmetry-finding algorithm.
Non Star-Shaped Objects

Whether the direct algorithm presented in Section 5 works for non star-shaped objects is a legitimate
question. Our approach never relies on a spherical projection. Indeed, the moment functions, as ex-
pressed in Equations (1) and (5) are computed through an integration over the surface itself, possibly
covering the same directions multiple times but with different values. Parts of a shape which correspond
to a same direction during integration will not contribute the same into the various moment functions
because of the varying exponent. By using various orders of moment functions in our symmetry detec-
tion process and in the computation of our shape congruency descriptor, we thus capture the geometry
of non star-shaped objects as well. Some previous approaches [Kazhdan et al. 2004] achieved this by
decomposing the shape into concentric spherical regions before doing a spherical integration which can
be assimilated to convoluting the shape with 0-degree functions with concentric spherical support; Our
technique is similar, but with another, kind of functions expressed into the form of the even moments.
In summary, detecting symmetries on non star-shaped objects has no particular reason to fail which is
illustrated by the result in Figure 2.
117
The second algorithm (for assembled objects) naturally works just as well for non star-shaped objects
as illustrated by the examples in Figure 11.
Avoiding Dense Sampling

Previous methods that defined a continuous measure of symmetry( [Zabrodsky et al. 1995; Kazhdan
et al. 2004]) can theoretically compute both perfect and approximate symmetries. However, detecting
symmetries using such methods involves a sampling step of the directions on the sphere, whose density
must be adapted to the desired angular precision for the axis of the symmetry.
The work of Kazhdan et al. [2004] leads to impressive results concerning the improvment on the shape
matching process. However, relying on this technique to obtain accurate symmetries with high angular
precision requires a time-consuming step for the construction of the symmetry descriptors. According
to the presented results, the time needed to compute reflective, 2-fold, 3-fold, 4-fold, 5-fold, and axial
symmetry information for a spherical function of bandwidth b = 16 is 0.59 seconds. As stated in the
article [Kazhdan et al. 2004], the number of samples taken on the sphere is O(b2 ) (i.e., approximately
103 sample directions) which is insufficent to reach a high angular precision equivalent to the one
obtained with our method: reaching a precision of 10−4 radians would require approximately 109 sample
directions. This would theoretically increase the computation time to approximately 0.59 × 109 /103 =
5.9 105 seconds, making the method inefficient for this task.
In contrast, our method does not rely on a dense sampling of directions to find symmetries but on
the computation of a fixed number of surface integrals which—thanks to the Gauss integration used—
provides an extremely accurate approximation of the spherical harmonic coefficients of the moment
functions. From there on, no further approximation is introduced in the computation of the directions
of the candidate symmetries which lets us achieve an excellent angular precision at a much lower
cost.
Furthermore, the cost of our algorithm does not rely on assumptions about the expected results.
The method of Kazhdan et al. [2004] indeed computes symmetry descriptors for each kind of searched
symmetry. Our method in turn computes all directions of possible symmetries and then checks back on
the shape of the obtained candidates.
9. CONCLUSIONS
We have presented an algorithm to automatically retrieve symmetries for geometric shapes and models.
Our algorithm efficiently and accurately retrieves all symmetries from a given model, independently
of its tesselation.
We use a new tool, the generalized moment functions, to identify candidates for symmetries. The
validity of each candidate is checked against the original shape using a geometric measure. Generalized
moments are not computed directly: instead, we compute their spherical harmonic coefficients using an
integral expression. Having an analytical expression for the generalized moment functions and their
gradients, our algorithm finds potential symmetry axes quickly and with good accuracy.
For composite shapes assembled from simpler elements, we have presented an extension of this algo-
rithm that works by first identifying the symmetries of each element, then sets of congruent elements.
We then use this information to iteratively build the symmetries of the composite shape. This extension
is able to handle complex shapes with better accuracy since it pushes the accuracy issues down to the
scale of the tiles.
Future Work
The constructive algorithm presented in Section 6 automatically detects instantiation relationships
between tiles into a composite shape.
118
We are currently developing a constructive instantiation algorithm which iteratively collates similar
tiles into instances, checking at each step that the relative orientation of each tile with respect to each
already constructed instance is preserved.
This algorithm requires the symmetries of the tiles, and maintaining the symmetries of the instances
found so far. For this, we use our shape congruency metric, our algorithm for finding symmetries of single
shapes, and our algorithm for finding symmetries on composite shapes.
APPENDIX (PROOFS)
PROOF OF THEOREM 1. Let A be an isometric transform which lets a shape S be globally unchanged.
We have:

2p
∀ω M (Aω) = s × Aω2 p ds
s∈S

= At × Aω2 p | det A| dt
t∈A−1 S

= t × ω2 p dt
t∈A−1 S
2p
= M (ω)
At line 2, we change variables and integrate over the surface transformed by A−1 . At line 3, an isometric
transform is a unit transform and so, its determinant is ±1 and thus vanishes. The cross product is
also left unchanged by applying an isometric transform to each of its terms. Line 4: because AS = S, we
also have S = A−1 S. The isometric transform A is thus also a symmetry of the M2 p moment functions.
Let A be an isometric transform with axis v, and suppose that A is a symmetry of M2 p . Let dv be the
direction of steepest descent of function M2 p around direction v. Because A is a symmetry of M2 p , we
have:
dAv = Adv = dv. (13)
If A is a rotation, this is impossible because dv ⊥ v. Moreover, for all directions ω, we have M2 p (−ω) =
M2 p (ω) and thus:
d−v = −dv. (14)
So, if A is a symmetry, we have Av = −v. From Equations (13) and (14), we get dv = −dv which is
impossible.
In both cases, M2 p can not have a direction of steepest descent in direction v. Because M2 p is infinitely
derivable, this implies that ∇M2 p (v) = 0
PROOF OF PROPERTY 4. Let S and R be two shapes, identical up to an isometric transform. Let J be an
isometric transform such that J S = R. Let T be the translation of vector −uS with uS = gS − g with gS
as the center of mass of S, and g the origin of the coordinate system into which J is applied.
— Let A ∈ G S be a symmetry of S such that AuS = uS . We have AT S = T S (the symmetry A
operates in the coordinate system centered on gS ). Let K = J T −1 AT . Then
KS = J T −1 AT S K0 = J T −1 AT 0
= J T −1 T S and = J T −1 A(−uS )
= JS = J T −1 (−uS )
= R = J0 = 0
119
By construction K is a rigid transform and conserves distances. It maps the origin onto itself. K is thus
an isometric transform. Furthermore, K maps S to R.
— Let K be an isometric transform such that K S = R. Let us choose A = T J −1 K T −1 . This choice
leads to K = J T −1 AT . Moreover:
AT S = T J −1 K T −1 T S AuS = T J −1 K T −1 uS
= T J −1 K S and = T J −1 K 2uS
= TS = T 2uS = uS
and
A0 = T J −1 K T −1 0
= T J −1 K uS
= T J −1 ( g R − g)
= T (−uS )
= 0
By construction A is affine and conserves distances. It maps 0 onto 0. A is thus an isometric transform.
A is also a symmetry of S which verifies AuS = uS .
— The set of isometries which map S to R is therefore the set of functions K of the form K = J T −1 AT ,
where A ∈ G S is a symmetry of S such that A(g − gS ) = (g − gS ).
PROOF OF EQUATION 3. We compute the decomposition of function θ −→ sin2 p θ into zonal spherical
harmonics. We prove that this decomposition is finite, and give the values of the coeficients.
By definition [Hobson 1931], we have:

0 2L + 1
Y L (θ, ϕ) = PL (cos θ )
4π

2L + 1 (−1) L d L
= L L
(1 − x 2 ) L (cos θ )
4π 2 L! d x
where Pk is the Legendre polynomial of order k. Because the set of Legendre polynomials P0 , P1 , ..., Pn
is a basis for polynomials of order not greater than n, function θ −→ sin2 p θ = (1 − cos2 θ) p can be
uniquely expressed in terms of PL (cos θ ). The decomposition of θ −→ sin2 p θ is thus finite and has
terms up to Y 20p at most.
Let’s compute them explicitely:
L
dL 2 L
dL
(1 − x ) = (−1) L−k x 2L−2k CLk
d xL d xL k=0
L
dL
= (−1) L (−1)k x 2k CLk
d xL k=0
= (−1) L+k CLk 2k(2k − 1)...(2k − L + 1)x 2k−L

L≤2k≤2L
(2k)!
= (−1) L+k CLk x 2k−L
L≤2k≤2L
(2k − L)!
120
So:

2L + 1 (−1)k k (2k)!
Y L0 (θ, ϕ) = C
L L! L (2k − L)!
cos2k−L θ
4π L≤2k≤2L
2
The coeficients of the decomposition we are interested in are thus:

π 2π
2L + 1 (−1)k k (2k)! p
Y L0 (θ, ϕ) sin2 p θ sin θ d θd ϕ = 2π C I
L L! L (2k − L)! 2k−L
(15)
θ =0 ϕ=0 4π L≤2k≤2L
2
p
where integrals Im are defined by:
π
Imp = sin2 p+1 θ cosm θ d θ
θ =0
p
First, Im = 0 for all odd m because the integrand in antisymetric around x = π/2. Then, if m is even:
π
1 m−1 π
Imp = sin2 p+2 θ cosm−1 θ + sin2 p+3 θ cosm−2 θ d θ
2p + 2 0 2 p + 2 0

0
m − 1 2 p+3
= I
2 p + 2 m−2
π
(m − 1)(m − 3) . . . 1
= sin2 p+m+1 θ d θ
(2 p + 2)(2 p + 4) . . . (2 p + m) o
Let Jq be the integral defined by
π
Jq = sin2q+1 θ d θ.
0
We have
π
Jq = [− cos θ sin2q θ ]π0 + 2q cos2 θ sin2q−1 θ d θ
0
0
= 2q Jq−1 − 2q Jq
Therefore
2q
Jq = Jq−1
2q + 1
2q(2q − 2) . . . 2
= J0
(2q + 1)(2q − 1) . . . 3
22q+1 (q!)2
=
(2q + 1)!
For m even, we can take m = 2r and q = p + r; we get:
p (2r)! p! 22 p+2r+1 ( p + r)!2

I2r =
2r r!2r ( p + r)!(2 p + 2r + 1)!
2 p+1
(2r)! p!2 ( p + r)!
= (16)
r!(2 p + 2r + 1)!
121
From Equation (15), we deduce that, for L odd,

Y L0 (θ, ϕ) sin2 p θ sin θd θ d ϕ = 0.
For L even, we set L = 2l . Using r = k − l to match Equation (16) in Equation (15), we get:

S lp = Y 2l0 (θ, ϕ) sin2 p θ sin θd θd ϕ

4l + 1 (−1)k k (2k)! (2k − 2l )! p!22 p+1 ( p + k − l )!
= 2π 2l
C2l
4π 2l ≤2k≤4l 2 (2l )! (2k − 2l )! (k − l )!(2 p + 2k − 2l + 1)!
√ 2 p+1
(4l + 1)π k (2k)! p!2 ( p + k − l )!
= 2l
(−1)k C2l
2 (2l )! l ≤k≤2l (k − l )!(2 p + 2k − 2l + 1)!
√
(4l + 1)π (2k)! p!22 p+1 ( p + k − l )!
= 2l
(−1)k
2 l ≤k≤2l
k!(2l − k)!(k − l )!(2 p + 2k − 2l + 1)!
REFERENCES
ATTALAH, M. J. 1985. On symmetry detection. IEEE Trans. Comput. 34, 663–666.
BRASS, P. AND KNAUER, C. 2004. Testing congruence and symmetry for general 3-dimensional objects. Comput. Geom. Theory
Appl. 27, 1, 3–11.
HIGHNAM, P. T. 1985. Optimal algorithms for finding the symmetries of a planar point set. Tech. Rep. CMU-RI-TR-85-13 (Aug).
Robotics Institute, Carnegie Mellon University, Pittsburgh, PA.
HOBSON, E. W. 1931. The Theory of Spherical and Ellipsoidal Harmonics. Cambridge University Press, Cambridge, UK.
IVANIC, J. AND RUEDENBERG, K. 1996. Rotation matrices for real spherical harmonics, direct determination by recursion. J. Phys.
Chem. A. 100, 6342–6347. (See also Additions and corrections in vol. 102, No. 45, 9099-9100).
JIANG, X.-Y. AND BUNKE, H. 1991. Determination of the symmetries of polyhedra and an application to object recognition.
In Proceedings of the International Workshop on Computational Geometry—Methods, Algorithms and Applications (CG ’91).
Lecture Notes in Computer Science, vol. 553. Springer, London, UK, 113–121.
KAZHDAN, M. M., FUNKHOUSER, T. A., AND RUSINKIEWICZ, S. 2003. Rotation invariant spherical harmonic representation of 3D
shape descriptors. In Proceedings of the 2003 Eurographics/ACM Siggraph Symposium on Geometry Processing (SGP ’03).
Eurographics Association, Aire-la-Ville, Switzerland, 167–175.
KAZHDAN, M. M., FUNKHOUSER, T. A., AND RUSINKIEWICZ, S. 2004. Symmetry descriptors and 3D shape matching. In Proceedings of
the 2004 Eurographics/ACM Siggraph Symposium on Geometry Processing (SGP ’04). Eurographics Association, Aire-la-Ville,
Switzerland.
KNUTH, D. E., MORRIS, JR., J. H., AND PRATT, V. R. 1977. Fast pattern matching in strings. SIAM J. Comput. 6, 2, 323–350.
MINOVIC, P., ISHIKAWA, S., AND KATO, K. 1993. Symmetry identification of a 3-D object represented by octree. IEEE Trans. Patt.
Analy. Mach. Intell. 15, 5, 507–514.
PRINCE, E. 2004. Mathematical Techniques in Crystallography and Materials Science, 3rd Ed. Springer, Berlin, Germany.
RAMAMOORTHI, R. AND HANRAHAN, P. 2004. A signal-processing framework for reflection. ACM Trans. Graph. 23, 4, 1004–1042.
SUN, C. AND SHERRAH, J. 1997. 3D symmetry detection using extended Gaussian image. IEEE Trans. Patt. Analy. Mach.
Intell. 19, 2 (Feb.), 164–168.
WOLTER, J. D., WOO, T. C., AND VOLZ, R. A. 1985. Optimal algorithms for symmetry detection in two and three dimensions.
Visual Comput. 1, 37–48.
ZABRODSKY, H., PELEG, S., AND AVNIR, D. 1995. Symmetry as a continuous feature. IEEE Trans. Patt. Analy. Mach. Intell. 17, 12,
1154–1166.
Received August 2005; revised December 2005; accepted January 2006
122
3.
Propriétés de la fonction d’éclairage
Dans les méthodes de simulation de l’éclairage, on cherche à reconstituer une fonction

(l’éclairage). Pour ce travail, on ne dispose que d’informations parcellaires, calculées en des
points d’échantillonnage. L’efficacité et la précision des calculs dépendent du positionnement de
l’échantillonnage, mais la détermination du positionnement optimal suppose une connaissance
complète de la fonction d’éclairage.
Il est cependant possible de déterminer certaines caractéristiques de la fonction d’éclairage
aux points d’échantillonnage, comme ses dérivées ou la fréquence de ses variations. On peut
alors adapter l’échantillonnage en fonction de ces caractéristiques, ce qui permet d’améliorer la
qualité de la simulation tout en diminuant le temps de calcul.
Dans ce chapitre, nous avons présenté nos travaux sur les propriétés de la fonction d’éclai-
rage : d’une part la détermination des deux premières dérivées de la radiosité en présence d’obs-
tacles (section 3.1), d’autre part la détermination des fréquences de la fonction d’éclairage (sec-
tion 3.2). Dans les deux cas, on cherche à déterminer des propriétés locales, en se basant sur
l’analyse d’un local light field, en tenant compte des obstacles.
Ces informations (dérivée, fréquence) permettent ensuite de guider efficacement la simulation
de l’éclairage, que ce soit avec un oracle de raffinement qui contrôle l’erreur commise dans la
simulation, ou en adaptant l’échantillonnage dans des algorithmes de photon tracing.
3.1 Étude des dérivées de la fonction d’éclairage et applica-

tions
3.1.1 Dérivées de la fonction de radiosité
La radiosité en un point x sous l’influence d’une source donnée a une expression intégrale,
prenant en compte la fonction d’éclairage de la source :
cos θ1 cos θ2
Z
ρ
B(x) = B(y) dy (3.1)
π E(x) r2
Où E(x) désigne la partie de l’émetteur E qui est visible du point x, et inclut donc l’influence des
obstacles. La fonction B(x) est dérivable, et il est possible de calculer ses dérivéesr [15, 11, 30].
Nous avons fourni l’expression de la dérivée première (le Jacobien ou le gradient) et de la dérivée
seconde (le Hessien) de la radiosité au point x.
Ces dérivées peuvent être calculées explicitement [15] ; ce calcul réutilise plusieurs quantités
qui sont aussi nécessaires pour le calcul de la radiosité. Il est donc possible de calculer simulta-
nément la radiosité et ses dérivées, pour un surcoût modeste (de l’ordre de 30 %) par rapport au
calcul de la radiosité seule.
123
124 CHAPITRE 3. PROPRIÉTÉS DE LA FONCTION D’ÉCLAIRAGE
3.1.2 Contrôle de l’erreur lors de la simulation

Mais les dérivées ne donnent qu’une information ponctuelle sur la fonction de radiosité, que
l’on peut éventuellement extrapoler par un développement limité, pour obtenir une information
locale. Pour des applications pratiques (comme une simulation de l’éclairage dans une maquette
virtuelle d’immeuble) il est utile d’avoir une information globale, comme l’erreur commise sur
l’ensemble de la simulation. Cette information globale peut se déduire d’une information sur
l’erreur commise sur chaque interaction entre facettes1, 2 .
Nous avons montré comment utiliser la connaissance des dérivées de la radiosité pour en dé-
duire un encadrement strict de l’erreur commise sur chaque interaction. On peut alors en déduire
l’erreur commise sur chaque interaction, et ainsi un oracle de raffinement exhaustif [11, 30] (voir
p. 142).
Cet encadrement repose sur deux conjectures, la conjecture d’unimodalité et la conjecture de
concavité. Ces deux conjectures étendent et formalisent un résultat connu depuis longtemps, que
la radiosité due à une source convexe n’adment qu’un seul maximum3 .
3.2 Étude fréquentielle de la fonction d’éclairage
Figure 3.1 – La réflexion est plus ou moins nette en fonction de la BRDF. Ici, la BRDF évolue de
parfaitement spéculaire à gauche à diffuse à droite.
La simulation de l’éclairage présente des phénomènes qui sont plus ou moins flous, en fonc-
tion des objets et des sources lumineuses. Par exemple, la réflexion sur un objet spéculaire est
parfaitement nette, et contient tous les détails de la scène réfléchie, tandis que la réflexion sur
un objet diffus est très floue (voir figure 3.1). De la même manière, l’ombre causée par une
source ponctuelle est nette, tandis que l’ombre causée par une source surfacique est floue (voir fi-
gure 3.2). Enfin, l’éclairage indirect dans une scène est en général plus flou que l’éclairage direct
(voir figure 3.3).
Ce côté net ou flou peut se traduire en terme de contenu fréquentiel de la fonction d’éclairage :
les effets nets (ombres dures, réflexion spéculaires) correspondent à des hautes fréquences, tandis
que les effets flous (ombres douces, réflexion diffuse, éclairage indirect) correspondent à des
basses fréquences.
Avec François Sillion et Cyril Soler, dans le cadre d’une collaboration avec Frédo Durand et
Eric Chan de l’équipe CSAIL du MIT, collaboration financée par une Équipe Associée INRIA,
nous avons montré qu’il est possible de prédire ce contenu fréquentiel de l’éclairage en fonction
1. Daniel L, Brian S et Donald P. G. « Bounds and Error Estimates for Radiosity ». Dans ACM
SIGGRAPH ’94, p. 67–74, 1994.
2. James A, Kenneth T et Brian S. « A Framework for the Analysis of Error in Global Illumination
Algorithms ». Dans ACM SIGGRAPH ’94, p. 75–84, 1994.
3. G. D et E. F. « Accurate and Consistent Reconstruction of Illumination Functions Using Structured
Sampling ». Computer Graphics Forum (Eurographics ’93), 12(3):C273–C284, septembre 1993.
3.2. ÉTUDE FRÉQUENTIELLE DE LA FONCTION D’ÉCLAIRAGE 125
(a) Source ponctuelle (b) Source surfacique
Figure 3.2 – L’ombre causée par une source ponctuelle est nette, tandis que l’ombre causée par
une source surfacique est plus floue (images gracieusement fournies par Ulf Assarsson).
(a) Éclairage total (b) Éclairage direct (c) Éclairage indirect
Figure 3.3 – L’éclairage indirect est en général plus flou que l’éclairage direct (images gracieu-
sement fournies par Cyril Soler).
de la scène (sources lumineuses, obstacles, matériaux) [5] (voir p. 164). Nous nous intéressons à
la fois au spectre spatial et au spectre angulaire :
– Nous considérons l’éclairage comme un local light field, paramétré par une distance et un
angle par rapport à un rayon de référence.
– Au départ de la source lumineuse, le spectre (spatial et angulaire) de ce local light field est
connu.
– Chaque étape entre la source lumineuse et le récepteur est vue comme un filtre agissant sur
le contenu fréquentiel :
– Le transport à travers l’espace libre a l’effet d’une affinité orthogonale à l’axe des fré-
quences angulaires. Cet effet convertit les fréquences spatiales en fréquences angulaires.
– En présence d’un obstacle, il y a convolution entre le spectre de l’obstacle et celui du
local light field, introduisant de nouvelles fréquences spatiales.
– la réflexion sur un récepteur peut être décomposée en plusieurs phases, avec un filtre
particulier associé à chacune. L’effet général est celui d’un filtre passe-bas dans les fré-
quences angulaires. La fréquence de coupure de ce filtre est liée au caractère spéculaire
ou non de la BRDF. Une BRDF diffuse coupe complètement les fréquences angulaires
tandis qu’une BRDF spéculaire les conserve entièrement.
– L’effet combiné de ces différents filtres permet de prédire l’étendue du spectre (spatial et
angulaire) en un point donné de la scène. On peut ensuite tirer parti de cette connaissance
pour guider les calculs de simulation de l’éclairage, en adaptant l’échantillonnage aux fré-
quences.
Une des observations les plus intéressantes issues de notre travail est que les fréquences spa-
tiales et angulaires sont liées par l’étape de transport dans l’espace libre. Lorsqu’une BRDF non
spéculaire élimine certaines fréquences angulaires, cela a aussi pour conséquence de supprimer
des fréquences spatiales. Plus le transport est long, plus les fréquences spatiales sont liées à des
fréquences angulaires élevées, et donc plus l’effet de coupure de la BRDF sur les fréquences
angulaires se traduit par des fréquences spatiales basses.
Cet effet, confirmé par des études expérimentales, ouvre de nombreuses possibilités dans la
simulation de l’éclairage. La capacité à prédire les fréquences maximales en chaque point de la
scène permet de guider l’échantillonnage au cours du processus de simulation de l’éclairage, et
ce quelle que soit la méthode employée pour les calculs (photon-mapping, radiosité, PRT...). Ce
travail devrait être la base de nombreuses études futures et applications pratiques.
(a) Obstacles à basses fréquences : (b) Fréquences plus élevées dans les (c) Obstacles à hautes fréquences :
basses fréquences sur le récepteur obstacles : fréquences plus élevées sur basses fréquences sur le récepteur
le récepteur
Figure 3.4 – Application de notre étude des fréquences de la fonction d’éclairage. Les fréquences
spatiales sur le récepteur diffus évoluent de façon non-monotone avec les fréquences des obs-
tacles.
Comme application de notre étude, considérons le spectre de la fonction d’éclairage sur un ré-
flecteur diffus en présence d’obstacles (voir figure 3.4). Ce spectre évolue de façon non-monotone
en fonction du spectre des obstacles : dans un premier temps, une augmentation de la fréquence
des obstacles se traduit par une augmentation des fréquences spatiales sur le récepteur (voir
figure 3.4(b)). En revanche, passé un certain seuil, une augmentation de la fréquence des obs-
tacles se traduit au contraire par une diminution des fréquences spatiales sur le récepteur (voir
figure 3.4(c)).
Cet effet, déjà étudié4 , est parfaitement expliqué par notre étude : l’obstacle introduit des
fréquences spatiales. Le transport après l’obstacle pousse ces fréquences spatiales dans les fré-
quences angulaires. La réflexion sur une surface diffuse coupe les fréquences angulaires, et donc
les fréquences spatiales qui y sont liées.
4. Francois S et George D. « Feature-Based Control of Visibility Error: A Multiresolution Clustering
Algorithm for Global Illumination ». Dans ACM SIGGRAPH ’95, p. 145–152, 1995.
3.3. DISCUSSION 127
3.3 Discussion
Dans ce chapitre, nous avons présenté nos travaux sur les propriétés des fonctions d’éclai-
rage. Nous avons montré qu’il est possible de déduire les propriétés locales de l’éclairage en
fonction des positions respectives des objets. Ces propriétés peuvent être utilisées pour guider les
méthodes de résolution, augmentant ainsi leur efficacité.
Les travaux sur le contenu fréquentiel de la fonction d’éclairage n’ont pas encore livré tout
leur potentiel ; nous comptons les poursuivre par de nouvelles recherches.
Chaque réflexion sur une surface non-spéculaire après un transport dans l’espace libre a pour
effet de faire baisser le contenu fréquentiel, aussi bien en espace qu’en angle. En conséquence, les
effets à haute fréquence vont se produire : soit lors des réflexions spéculaires, soit dans l’éclairage
direct, soit lorsque le transport dans l’espace libre a peu d’effet, c’est-à-dire lorsque deux objets
sont proches.
Compte-tenu des progrès des cartes graphiques programmables, il est possible de calculer
séparément et de façon interactive certains de ces effets à haute fréquence. Les effets à basse
fréquence pourraient alors être calculés séparément, avec un échantillonnage plus lâche. Ce calcul
en temps-réel des effets d’éclairage à haute fréquence fait l’objet du chapitre suivant.
3.4 Articles
– Accurate Computation of the Radiosity Gradient for Constant and Linear Emitters (EGWR
’95)
– An exhaustive error-bounding algorithm for hierarchical radiosity (CGF ’98)
– A Frequency Analysis of Light Transport (Siggraph 2005)
3.4. ARTICLES 129
3.4.2 Accurate Computation of the Radiosity Gradient for Constant and Linear Emitters
(EGWR ’95)
Auteurs : Nicolas H et François S
Conférence : 6e Eurographics Symposium on Rendering, Dublin, Irlande.
Date : juin 1995
Accurate Computation of the Radiosity Gradient for
Constant and Linear Emitters
Nicolas Holzschuch, François Sillion
iMAGIS/IMAG⋆
Abstract: Controlling the error incurred in a radiosity calculation is one of the

most challenging issues remaining in global illumination research. In this paper
we propose a new method to compute the value and the gradient of the radiosity
function at any point of a receiver, with arbitrary precision. The knowledge of
the gradient provides fundamental informations on the radiosity function and its
behaviour. It can specially be used to control the consistency of the discretisation
assumptions.
1 Introduction
Computing the effect of a given patch on the radiosity of another patch is easily done
assuming the radiosity on both patches are constant. In that case, we can express
the influence of the emitter on the receiver with a single number, the form-factor.
However, assuming the radiosity on both patches is constant is a strong assumption, and
it introduces a specific source of error in the resolution algorithm.
In 1994, Arvo et al. [2] recorded all possible sources of error in global illumination
algorithms, and introduced a framework for the analysis of error. Errors can occur at
several levels in the resolution process:
– During modeling: our geometry is not exactly that of the scene we want to compute,
and the BRDF are not exact either.
– During discretisation: our set of basis functions is not able to represent the real
solution, but only an approximated one.
– During computation: we do not compute transfer elements exactly, but only within
finite precision.
Lischinski et al. [9] presented an error driven refinement strategy for hierarchical
radiosity. They were able to maintain upper and lower bounds on computed radiosity,
and to concentrate their work in places where the difference was too large.
However, practical tools are still lacking to measure discretisation error. The problem
is to efficiently reconstruct the radiosity function, with only a small number of samples.
The best position for sampling points can only be found with total knowledge of the
radiosity function.
In practice, at each step, we have to intuit the behaviour of the function from our
current set of samples, in order to guess if we should – or not – introduce new sampling
points, and where.
⋆
iMAGIS is a joint research project of CNRS/INRIA/INPG/UJF. Postal address: B.P. 53, F-38041
Grenoble Cedex 9, France. E-mail: [email protected].
131
Knowing the radiosity derivatives allows better sampling, and thus reduction of
discretisation error. Heckbert [6] and Lischinski et al. [7] predicted an efficient surface
mesh using derivatives discontinuities. Drettakis and Fiume [4, 5] used information
on the structure of the function to accurately reconstruct the illumination. Vedel and
Puech [11] presented a refinement criterion based on gradient values at the nodes.
However, these authors usually resorted to approximated values of the partial deriva-
tives, using several computations of radiosity and finite differences. Computing accurate
values for the gradient allows arbitrary precision on our refinement criterion.
Arvo [1] presented a method to compute the irradiance Jacobian in case of partially
occluded light sources. His method is presented with constant emitters. This paper
introduces a new formulation of the radiosity gradient, valid for arbitrary radiosity
functions on the emitter. The derivation is presented in the case of total visibility, i.e.
without occluders. However, we shall see that extending the algorithm to the case of
partial visibility is easy using Arvo’s technique, since the two algorithms are largely
independant.
2 Reformulating the Radiosity Equation

We will consider only diffuse surfaces, characterised by their radiosity function B(x),
without any assumption about B.
We want to know the value of B at a point x on a given patch A1 , due to the emission
of light from another polygon A2 . We will assume a reflectivity of ρ at point x.
~
n2
y
θ2
~
r12 A2
θ1
~
n1
A1
Fig. 1. Geometry of the problem
Our knowledge of radiosity at the receiving point derives from the integral equation:
ρ B(y) cos θ1 cos θ2
Z
B(x) = dA2 (1)
π A2 k~r12 k2
where ~r12 is the vector joining point x on the receiver and point y on the emitter. θ1
is the angle between ~r12 and the normal on the receiver, θ2 the angle between ~r12 and
the normal on the emitter, and dA2 the area element on the emitter around point y (see
Fig. 1).
Should any occluders be present between point x and emitter A2 , the integral would
only be over the part of A2 visible from x.
132
We can reformulate Equation 1 as the expression of the flux of a vector field through
surface A2 : Z
B(x) = F~ · dA
~2 (2)
A2
where F~ is:
ρB(y)(~r12 · ~n1 )~r12
F~ = −
πk~r12 k4
A classic way to deal with flux integrals as Equation 2 is to transform them into a linear
integral using Stoke’s theorem2:
Z I
~ ) · dA
(∇ × V ~= V~ · d~x (3)
A ∂A
These linear integrals can be easier to compute, and are also easier to estimate if there
are no closed forms. However, to use Stoke’s theorem (3), we need to express the vector
field F~ as the curl of another vector field, V~ .
A classic property is that this is equivalent to F~ having a null divergence (∇· F~ = 0).
Basically, the divergence of a vector flux is a quantity that express at each point how
much does the flux “radiates away” from this point, while the curl of a vector field “turns
around” it at each point. The divergence of a curl is always null (∇ · (∇ × V ~ ) = 0), and
if a field has a null divergence, it can be expressed as a curl.
An easy computation shows that the divergence of F~ with respect to point y on
surface A2 is3 :
ρ ~r12 · ~n1
∇ · F~ = − (∇(B) · ~r12 ) (4)
π k~r12 k2
and hence is null if the gradient of B on the emitting surface is null. That is to say, if
the radiosity of the emitter is constant.
We can always separate F~ in two parts:
F~ = ∇ × (V
~)+G
~
Namely:
~ = ρB(y) ~r12 × ~n1

V
2πk~r12 k2

~ = −ρ∇(B) × (~r12 × ~n1 )
G
2πk~r12 k2
and thus cut Equation 2 in two integrals:
I Z
B(x) = ~ · d~x2 +
V ~ · dA
G ~2 (5)
∂A2 A2
Using the properties of cross-products and dot-products, we can rewrite Equation 5 as:
2π ~r12 × d~x2
I Z
~r12
B(x) = −~n1 · B(y) 2
+ · (~n1 × (∇(B) × ~n2 )) dA2 (6)
ρ ∂A2 k~r12 k r12 k2
A2 k~
2
H
∂A stands for the contour of A, and expresses that this contour is closed.
3
In this section, all derivative signs (∇, ∇·, ∇×) are relative to point y on surface A2 .
133
Note that this rewriting process does not make any assumption whatsoever on B(y).
Hence it can be used in any case. An interesting case is when B(y) is constant: then
G~ = ~0, and the second term is null. Another interesting case is B(y) being linear: then
its gradient is constant and can be carried out of the second integral, leaving only a pure
geometric factor to compute. Appendix A presents a detailed study of these two cases.
This rewriting process separates the radiosity in two terms, a contour integral that we
can generally compute, provided that we know the radiosity on the emitter, and a surface
integral, generally harder to compute as an exact term. But, as shown later, having an
integral form of this term, we can compute its value with an arbitrary precision.
3 The Radiosity Gradient

An interesting quantity to describe scalar fields, such as B(x) is their gradient. Gradient
is the extension of derivation for function of several variables. Basically, ∇(B)(x) · ~v
gives the derivative of function B at point x in the direction of ~v .
3.1 Computing the Gradient
The radiosity gradient can be computed from an equation such as Equation 1 or 6:
Z
∇(B)(x) = ∇ F~ · dA
~2 (7)
A2
In case the emitter A2 does not depend on the position of the point x – that is to say, in
case there are no occluder between point x and the emitting surface A2 – this equation
is equivalent to: Z
∇(B)(x) = ∇ F~ · dA~2
A2
Or, if we use Equation 5:
I Z
∇(B)(x) = ∇ V~ · d~x2 + ~ · dA
∇(G ~ 2) (8)
∂A2 A2
If the emitter depends on the position of point x – that is, if there are occluders –
the expression of ∇(B)(x) is the sum of two terms; the first one takes into account
the variation of F~ , and is exactly the term we are discussing, and the second one takes
into account the variation of the emitter. Thus, it is easy to merge a method to compute
the gradient with occluders and a constant emitter, as in Arvo [1], and our method to
compute the gradient with an arbitrary emitter, but without occluders.
Note that in this section, we are taking a derivative with respect to point x on
the receiving surface, not with respect to point y on the emitting surface. So for our
derivating operator, the radiosity on the emitting point B(y) can be regarded as constant,
as well as its gradient, ∇(B)(y).
Using the properties of the gradient of a scalar product, starting from Equation 8,
we can express the gradient of radiosity at the receiving point:
2π ~n1 · ~r12
I I
d~x2
∇(B)(x) = ~n1 × B(y) 2
+2 B(y) (~r12 × d~x2 )
ρ ∂A2 k~
r 12 k ∂A2 k~r12 k4
Z
dA2
+ (~n1 × (∇(B)(y) × ~n2 )) 2
A2 k~
r12 k
(~n1 × (∇(B)(y) × ~n2 )) · ~r12
Z
−2 ~r12 dA2 (9)
A2 k~r12 k4
134
This equation, like the radiosity equation (6) is divided in two parts: a contour integral
which usually has a closed form, and a surface integral that we can estimate to any
arbitrary precision.
As before, two interesting cases occur: if the gradient on the emitter is null, that is
if we assume a constant radiosity on the emitter, all surface integrals vanish. And if the
gradient on the emitter is constant, that is if we assume a linear radiosity on the emitter,
it can be carried out of the surface integrals, leaving us with purely geometrical factors
or vectors to compute. Please refer to Appendix A for a detailed study of these cases.
3.2 Using the gradient
Knowing the gradient at a point gives very valuable information on the function we are
studying. As previous authors pointed out, the gradient may be used either to reconstruct
the illumination function before display, or to check the consistency of our discretisation
hypothesis.
Reconstructing the illumination function If we know the radiosity values and the
gradient at our sample points, we can then reconstruct the radiosity function as, e.g. a
bicubic spline.
Salesin et al. [10] and Bastos et al. [3] proposed such methods for reconstruction
of radiosity using estimates of gradient. Ward and Heckbert [12] computed irradiance
gradients to interpolate irradiance on receiving surfaces.
Refinement criterion Many radiosity algorithms assume a constant radiosity over
patches. It may seem strange to compute the gradient of radiosity in that case, but
in fact the information given by the gradient can also be used there.
Using the derivatives allows precisely to check whether our discretisation hypothesis
were correct or not, and if they were not, it also gives a hint on where it would be best
to refine in order to minimize the discretisation error.
Polynomial approximation
B(y1 )
Approximate error on this edge

Approximate
error on this B(x)
patch Best probable cutting point
linear interpolation
Patch width B(y0 )
Constant Radiosity Assumption Linear Radiosity Assumption

(a) (b)
Fig. 2. Using the gradient to measure discretisation error

If we assume constant radiosity on our patches, the gradient gives a first estimate
of how much does the function vary over the patch: ∇(B)(x) · ~v is approximately the
difference between radiosity at point x + ~v and radiosity at point x. The norm of the
gradient times the width of the patch gives an approximation of how much does radiosity
varies over the patch (see Fig. 2a for an example in 2D). The direction of the gradient
gives the best probable direction of refinement.
If we assume linear radiosity on our patches, we can compute a cubic interpolant over
the patch using the radiosity and gradient values at each vertices, and then test how much
135
this cubic interpolant differs from our linear assumption (see Fig. 2b for an example in
2D). We can even compute the difference between linear and cubic interpolant without
explicitly computing the interpolants. This criterion also gives the best next sampling
point, the position of the maximum difference between the two interpolants.
Real Error
Error = 0 ∇(B) = 0
B B(y0 ) B(y1 )
Approximate Error
Constant Radiosity Assumption Linear Radiosity Assumption

(a) (b)
Fig. 3. Sample cases where the proposed refinement criterion fail

Although none of these refinement criterion are foolproof (see Fig. 3 for an example
where these two criterion fail to detect an important discretisation error), they provide
a way to measure and quantify the discretisation error.
Also, the points where these refinement criterions are more likely to be fooled are
basically the extrema of the radiosity functions. We know that a single convex emitter
induces only one maximum on the receiver (see, for example, Drettakis [4]).
So, to study the interaction between two patches so as to minimize discretisation
error, we would, first, find the theorical position of the maximum of radiosity, then
sample it, then refine the receiving patch using our gradient-based criterion.
4 Implementation and First Results

We have implemented the gradient and radiosity formulas described in appendix A, for
both constant and linear emitters4 . Using a C++ class for vectors, with definitions of
cross- and dot-products makes the implementation very straightforward, being a mere
recopy of the formulas. The only special attention it needs is avoiding to recompute
quantities already computed at previous steps. Most of the quantities needed to express
the gradient were also used for the radiosity.
In the color plates, Fig. A shows the radiosity values on a plane, due to a triangular
emitter parallel to that plane (see Fig. E for the geometry of the scene). Fig. B shows
the norm of the gradient of this radiosity.
Fig. C and D show the same quantities if we assume a linear emitter.
5 Conclusion and Future Work

We have provided a way to compute the gradient of radiosity at the receiving point with
any distribution of illumination on the emitter. The gradient can be used in several ways,
and specially to compute the discretisation error. It can also be used to find the best next
refining point.
Future work will include a complete gradient computation, using the method de-
scribed by Arvo in [1] to take the possible occluders into account.
4
Souce code and documentation for this implementation is available at
ftp://safran.imag.fr/pub/holzschu/gradient.tar.gz.
136
The ability to compute radiosity gradients for linear emitters is especially interesting
when using linear basis functions or linear wavelets. In that case, the discretisation error
can be precisely isolated.
Our next step will be a complete implementation of the refinement criterion described
in section 3.2, to effectively reduce the discretisation error, within a hierarchical radiosity
framework with linear radiosity.
We will then have the possible background for a complete radiosity algorithm
with all possible sources of error (visibility, discretisation, computational) recorded and
monitored, thus allowing to focus the computing resources at the points where this error
is large.
6 Acknowledgements
Color pictures were computed by Myriam Houssay-Holzschuch using the GMT package,
developped by Wessel and Smith [13].
The authors would like to thank the anonymous reviewers for useful insights and
positive criticism.
References
1. Arvo, J.: The Irradiance Jacobian for Partially Occluded Polyhedral Sources. SIGGRAPH
(1994) 343–350
2. Arvo, J., Torrance, K.,Smits, B.: A Framework for the Analysis of Error in Global Illumi-
nation Algorithms. SIGGRAPH (1994) 75–84
3. Bastos, R. M., de Sousa, A. A., Ferreira, F. N., Reconstruction of Illumination Functions
using Bicubic Hermite Interpolation. Fourth Eurographics Workshop on Rendering (June
1993) 317–326
4. Drettakis, G., Fiume, E.: Concrete Computation of Global Illumination Accurate and Con-
sistent Reconstruction of Illumination Functions Using Structured Sampling. Computer
Graphics Forum (Eurographics 1993 Conf. Issue) 273–284
5. Drettakis, G., Fiume, E.: Concrete Computation of Global Illumination Using Structured
Sampling. Third Eurographics Workshop on Rendering (May 1992) 189–201
6. Heckbert, P. S.: Simulating Global Illumination Using Adaptative Meshing. PhD Thesis,
University of California, Berkeley, June 1991.
7. Lischinski, D., Tampieri, F., Greenberg, D. P.: Discontinuity Meshing for Accurate Radios-
ity. IEEE Computer Graphics and Applications 12,6 (November 1992) 25–39
8. Lischinski, D., Tampieri, F., Greenberg, D. P.: Combining Hierarchical Radiosity and Dis-
continuity Meshing. SIGGRAPH (1993)
9. Lischinski, D., Smits, B., Greenberg, D. P.: Bounds and Error Estimates for Radiosity.
SIGGRAPH (1994) 67–74
10. Salesin, D., Lischinski, D., DeRose, T.: Reconstructing Illumination Functions with Selected
Discontinuities. Third Eurographics Workshop on Rendering (May 1992) 99–112
11. Vedel, C., Puech, C.: Improved Storage and Reconstruction of Light Intensities on Surfaces.
Third Eurographics Workshop on Rendering (May 1992) 113–121
12. Ward, G. J., Heckbert, P. S.: Irradiance Gradients. Third Eurographics Workshop on Ren-
dering (May 1992) 85–98
13. Wessel, P. and Smith, W. H. F.: Free Software helps Map and Display Data. EOS Trans.
Amer. Geophys. U., vol. 72, 441–446, 1991
137
A Application to Constant and Linear Emitters
A.1 Case of a constant emitter
In the case of a constant emitter the Equations 6 and 9 reduce to:
2π ~r12 × d~x2
I
B(x) = −~n1 · B(y) (10)
ρ ∂A2 k~r12 k2
2π ~n1 · ~r12
I I
d~x2
− ∇(B)(x) = ~n1 × B(y) 2
+ 2 B(y) (~r12 × d~x2 )(11)
ρ ∂A2 k~ r12 k ∂A2 k~r12 k4
If A2 is a polygon, these integrals have a closed form, and yield:
2π X
B(x) = −B2~n1 · I1 (i) (~ri × ~ei )
ρ i
2π X
− ∇(B)(x) = B2 I1 (i) (~n1 × ~ei )
ρ i
X
+ 2B2 (~ri × ~ei ) · ~n1 (I2 (i)~ri + J2 (i)~ei )
i
where the sum extends on all the edges of the polygon, and B2 is the radiosity of the
emitter. ~ri , ~ei , I1 (i), I2 (i) and J2 (i) stand for (see also Fig. 4):
−→
~ri = xEi
−−−−→
~ei = Ei Ei+1
γi
I1 (i) =
k~ri × ~ei k

1 ~ri+1 · ~ei ~ri · ~ei 2
I2 (i) = − + k~
e i k I 1 (i)
2k~ri × ~ei k2 k~ri+1 k2 k~ri k2

1 1 1
J2 (i) = − − 2I 2 (i)~
ri · ~
e i
2k~eik2 k~ri k2 k~ri+1 k2
and γi is the angle sustended by edge ~ei from point x.
Ei
~
ei
~
ri
Ei+1
γi
A2
x
A1
Fig. 4. Geometric Notations Used
Computing B at point x requires roughly 63 multiplications, 6 divisions, 54 additions

or substractions, 6 square roots and 3 arc cosines. This equals approximately 300
additions on an SGI Indy computer, with no optimisations and the standard compiler.
138
Computing ∇(B)(x) requires roughly 87 multiplications more, 57 additions more
and 3 divisions more. Which, with the same material, equals approximately 150 addi-
tions.
Although this computationnal cost may depend on implementation details as well
as on the computer used (some compilers have very fast implementations of arc cos
and square root), computing the gradient along with the radiosity does not over-increase
computation time.
A.2 Case of a linear emitter
If the emitter is not constant, the gradient of radiosity on the emitter is not null, and
must be used in our computations. However, if we assume the radiosity of the emitter
is linear, then its gradient is constant and can be carried out of the integrals. Moreover,
this gradient is orthogonal to ~n2 , and can be expressed as:
∇(B)(y) = ~n2 × ~k
with ~k orthogonal to ~n2 . ~k = 2A

1
2
((B2 − B0 )~e0 + (B1 − B0 )~e2 )
~
Using the properties of k, we can express Equation 6 as:
2π ~r12 × d~x2
I
B(x) = −~n1 · B(y) +(~n1 ·~n2 )(~k·(m×~ ~ n2 )(~n2 ·(~n1 ×~k))
~ n2 ))+(m·~
ρ ∂A2 k~r12 k2
with:
~r12
Z Z
~ =
m dA2 = ∇(ln(r12 ))dA2
A2 k~r12 k2 A2
Computing the contour integrals does not induce any particular difficulties. However,
computing m
~ is harder. We can make use of Ostrogradsky’s theorem, similar to Stoke’s:
Z I
~
∇(V ) × dA = − V d~x
A ∂A
to express m~ × ~n2 .
~ · ~n2 is null if point x is on polygon A2 . If point x is not on polygon A2 , it can be
m
estimated with arbitrary precision.
The formula for B(x) is then:
2π ~r12 × d~x2
I I
B(x) = −~n1 · B(y) − (~
n 1 · ~
n 2 )~
k · ln(r12 )d~x2
ρ ∂A2 k~r12 k2 ∂A2
+ (m ~ · ~n2 )(~n2 · (~n1 × ~k))
If we derive this formula rather than use Equation 9, we find:
2π ~n1 · ~r12
I I
d~x2
− ∇(B)(x) = ~n1 × B(y) + 2 B(y) (~r12 × d~x2 )
ρ ∂A2 k~r12 k2 ∂A2 k~r12 k4
~r12 ~
I
− (~n1 · ~n2 ) (k · d~x2 )
∂A2 k~r12 k2
+ (~n2 · (~n1 × ~k)) (~n2 X1 − 2(~n2 · ~r0 )~
p)
139
with:
Z
~r12
p~ = dA2
A2 k~r12 k4
Z
dA2
X1 =
A2 k~r12 k2
Computing p~ is exactly like computing m: ~ we can compute p~ × ~n2 , and we can estimate
~p · ~n2 with arbitrary precision. Then we use:
~p = ~n2 × (~
p × ~n2 ) + (~
p · ~n2 )~n2
Hence:
2π X
B(x) = −~n1 · (Bi I1 (i) + δBi J1 (i)) (~ri × ~ei )
ρ i
X
− (~n1 · ~n2 ) (~k · ~ei )K1 (i)
i
+ (~r0 · ~n2 )(~n2 · (~n1 × ~k))X1

2π X
− ∇(B)(x) = (Bi I1 (i) + δBi J1 (i)) (~n1 × ~ei )
ρ i
X
+2 ~n1 · (~ri × ~ei )Bi (I2 (i)~ri + J2 (i)~ei )
i
X
+2 ~n1 · (~ri × ~ei )δBi (J2 (i)~ri + K2 (i)~ei )
i
X
− (~n1 · ~n2 ) (~k · ~ei ) (I1 (i)~r1 + J1 (i)~e1 )
i
X
− (~n2 · (~n1 × ~k))(~n2 · ~r0 )~n2 × I2 (i)~ei
i
+ (~n2 · (~n1 × ~k)) X1 − 2(~r0 · ~n2 )2 X2 ~n2

With:
δBi = Bi+1 − Bi

1 k~ei+1 k
J1 (i) = ln + (~
r i · ~
e )I
i 1 (i)
k~ei k2 k~ei k
1
~ri+1 · ~ei ln(k~ei+1 k2 ) − ~ri · ~ei ln(k~ei k2 ) + 2k~ri × ~ei kγi

K1 (i) = 2
2k~eik
1
I1 (i) − k~ri k2 I2 (i) − 2(~ri · ~ei )J2 (i)

K2 (i) = 2
k~ei k
Z
dA2
X2 = 4
A2 k~ r 12 k
If the the distance between point x and the emitter surface is null, m ~ · ~n2 and p~ · ~n2
are both null. If it is not, we prefer to estimate X1 and X2 . As we know bounds on the
values of the function and its derivatives, we make use of a Gaussian quadrature.
140
A. Radiosity on the Receiving Plane, B. Norm of Radiosity Gradient, due
due to a Constant Emitter. to a Constant Emitter.
C. Radiosity on the Receiving Plane, D. Norm of the Radiosity Gradient,

due to a Linear Emitter due to a Linear Emitter
Receiver
Emitter
0.1
E. Geometry of our Test Scene
141
3.4.3 An exhaustive error-bounding algorithm for hierarchical radiosity (CGF ’98)

Auteurs : Nicolas H et François S
Date : décembre 1998
Volume 17 (1998), number 4 pp. 197—218
An exhaustive error-bounding algorithm

for hierarchical radiosity
Nicolas Holzschuch†
François X. Sillion
iMAGIS‡
GRAVIR/IMAG - INRIA
Abstract
This paper presents a complete algorithm for the evaluation and control of error in radiosity calculations. Pro-
viding such control is both extremely important for industrial applications and one of the most challenging issues
remaining in global illumination research.
In order to control the error, we need to estimate the accuracy of the calculation while computing the energy
exchanged between two objects. Having this information for each radiosity interaction allows to allocate more
resources to refine interactions with greater potential error, and to avoid spending more time to refine interactions
already represented with sufficient accuracy.
Until now, the accuracy of the computed energy exchange could only be approximated using heuristic algorithms.
This paper presents the first exhaustive algorithm to compute fully reliable upper and lower bounds on the energy
being exchanged in each interaction. This is accomplished by computing first and second derivatives of the ra-
diosity function where appropriate, and making use of two concavity conjectures. These bounds are then used in a
refinement criterion for hierarchical radiosity, resulting in a global illumination algorithm with complete control
of the error incurred.
Results are presented, demonstrating the possibility to create radiosity solutions with guaranteed precision. We
then extend our algorithm to consider linear bounding functions instead of constant functions, thus creating sim-
pler meshes in regions where the function is concave, without loss of precision.
Our experiments show that the computation of radiosity derivatives along with the radiosity values only requires
a modest extra cost, with the advantage of a much greater precision.
1. Introduction the ergonomy of the workplace — is there enough light, or

too much? — or to ensure that the items in a museum are
Global illumination algorithms now have many applications.
properly lit.
One of the most promising fields is in urban and architectural
planning, where the use of a global illumination algorithm In such applications, it is vital to be able to quantify the
allows to visualize a future building, and thus to check for light arriving on each point of the scene, in order to give the
misconceptions. For example, it becomes possible to check user a precise range in which the illumination is guaranteed
to fall.
† Current position: Invited Researcher, Department of Computer Global illumination algorithms generally have at least a
Science, University of Cape Town, South Africa. parameter that the user can manipulate, choosing either fast
‡ iMAGIS is a joint research project between CNRS, INRIA, computations or precise results. For Monte-Carlo ray tracing
INPG and Université Joseph Fourier — Grenoble I. Postal ad- algorithms, this parameter can be the number of rays. For
dress: B.P. 53, F-38041 Grenoble Cedex 9, France. E-mail: hierarchical radiosity algorithms, it can be the refinement
[email protected]. threshold, used to decide whether or not to refine a given
c The Eurographics Association 1998. Published by Blackwell Publishers, 108 Cowley

Road, Oxford OX4 1JF, UK and 238 Main Street, Cambridge, MA 02142, USA.
143
2 N. Holzschuch and F. X. Sillion / An exhaustive error-bounding algorithm for hierarchical radiosity
interaction. Until recently, however, we had little knowledge A2

of the total precision of the result computed, or of the rela- y
→
n2
tion between the parameters and this precision. Even if it was θ2
clear that spending more time on the simulation would pro- →
r12
duce more precise results, we could not quantify precisely →
n1
this increase. θ1
A1 x
In 1994, Lischinski1 proposed a refinement criterion for
hierarchical radiosity such that the error on the energy at
each point of the scene could be controlled by the refinement Figure 1: Geometric notations for the radiosity equation.
threshold. Their algorithm used upper and lower bounds on
the point-to-area form factor for each interaction in order to
compute upper and lower bounds for the radiosity at each 2. Background
point in the scene. However, they had no way to compute The radiosity method was introduced in the field of light
reliable upper and lower bounds for the point-to-area form- transfer in 1984 by Goral4 . This method uses a simplification
factor on a given interaction, and still resorted to sampling in order to solve the global illumination problem: it assumes
— computing a set of values for the form-factor, and taking that all the objects in the scene are ideal diffuse surfaces:
the minimum and maximum of these values. their bidirectional reflectance is uniform, and thus does not
depend on the outgoing direction.
Although Lischinski’s method is easy to implement, it is
In this case, the radiosity emitted at a given point x can be
not totally reliable. In this paper, we present a method allow-
expressed as an integral equation:
ing to compute fully reliable upper and lower bounds for the Z
point-to-area form-factor on any interaction. To achieve this cos θ1 cos θ2
B(x) = E(x) + ρd (x) B(y) V (x, y)dy
goal, we use our knowledge of the point-to-area form-factor y∈S
πr 2
derivatives together with its concavity properties. (1)
In this equation, S is the set of all points y. r is the dis-
These concavity properties of the point-to-area form- tance between point x and point y, θ1 and θ2 are the angles
factor are described in section 3. They extend the unimodal- between the − → vector and the normals to the surfaces at
xy
ity conjecture proposed by Drettakis2, 3 . Like the unimodal- point x and y respectively (refer to figure 1 for the geomet-
ity conjecture, they are only conjectures, and despite their ric notations). ρd (x) is the diffuse reflectance at point x, and
apparent simplicity, we have been unable to find a complete V (x, y) expresses whether point x is visible from point y or
demonstration for them. However, we also have been unable not.
to exhibit a counter-example.
In order to solve equation 1, Goral4 suggested to discretize
the scene into a set of patches [Pi ], over which a constant
As is explained in appendix B, we can compute exact val-
radiosity, Bi is assumed.
ues for the derivatives of the point-to-area form-factor; either
for the first derivative, the gradient vector, or for the second In this case, the radiosity at point x becomes:
derivative, the Hessian matrix. As we shall also see in ap- X Z
cos θ cos θ′
pendix B, it is indeed faster to compute an exact value for the B(x) = E(x)+ρd(x) Bi V (x, y)dy
y∈Pi
πr 2
form-factor derivative than computing approximate values i
using several samples. Using our knowledge of the deriva- (2)
tives along with the concavity properties of the point-to-area The purely geometric quantity
form-factor, we show in section 4 how to derive bounds for
cos θ cos θ′
Z
the point-to-area form-factor in any unoccluded interaction. Fi (x) = V (x, y)dy
πr 2
We also show an implementation of the refinement criterion y∈Pi
using these bounds. is called the point-to-area form-factor at point x from patch
i. It only depends on the respective positions of point x and
When dealing with partially occluded interactions we can patch i.
not use the previous bounds, as the concavity conjectures do Since we assume a constant radiosity value within the
not hold in this case. But we can exhibit two emitters that are patch, we can compute this value as the average of all the
convex and bound the actual emitter, which we call the min- point values. This leads to a matrix equation:
imal and the maximal emitter. Using the previously defined
algorithm, we find an upper bound for the maximal emitter, X
and a lower bound for the minimal emitter. The algorithm Bj = Ej + ρi Fji Bi (3)
for finding these convex emitters is detailed in section 5. i
c The Eurographics Association 1998
144
N. Holzschuch and F. X. Sillion / An exhaustive error-bounding algorithm for hierarchical radiosity 3
where the geometric quantity cessively in places where the solution has already attained a
Z Z ′ correct level of precision (Holzschuch9 ).
1 cos θ cos θ
Fij = V (x, y)dxdy
Aj x∈Pj y∈Pi
πr 2 Part of these problems can be addressed by using dis-
continuity meshing, where the patches are first subdivided
is called the form-factor. Schröder5 showed that there is a
along the discontinuity lines of the radiosity function and its
closed form expression for the form-factor in the case of two
derivatives (see Heckbert10 , Lischinski11, 12 and Drettakis13 ).
fully visible polygonal patches. In the general case, we do
These discontinuity lines can be computed using geometric
not have access to the exact value of the form-factor, but
algorithms. However, as pointed out by Drettakis, these dis-
only to approximate values.
continuity lines are not of equal importance. Some of them
Equation 3 can be solved in an iterative manner, using do not have a noticeable effect on the final radiosity solu-
Jacobi or Gauss-Seidel iterative methods (see Cohen6 ). The tion. Hence it is not necessary to compute all the disconti-
problem is that in order to compute one full bounce of light nuity lines. Deciding which discontinuity lines are relevant
across the surfaces in the scene, we have to compute the en- is done by a refinement oracle, using heuristic methods like
tire form-factor matrix, which is quadratic with respect to the one described above.
the number of patches.
Many of the latest research results have dealt with giv-
A significant improvement over the classical radiosity ing the user a better control of the level of precision in the
method is hierarchical radiosity. In “standard” radiosity, the modelling of radiosity in the hierarchical radiosity method.
discretisation of one object into patches does not depend on
the objects with which it interacts. In order to model the in- In the most promising paper on the subject, Lischinski1 ,
teraction between objects that are very close, and exchange suggested to compute for each interaction an upper and
lots of energy, we need to subdivide them into many patches, lower bound for the point-to-patch form-factor between the
so as to get a precise modelling of the radiosity. On the other points of the receiving patch and the emitting patch, namely
hand, an interaction between two objects that are far away Fmax and Fmin , as well as an upper and lower bound for
could be modelled with fewer patches. the radiosity of the emitting patch, using information already
available in the hierarchy. We then know that the radiosity on
In hier- the receiving patch is between Fmax Bmax and Fmin Bmin .
archical radiosity, introduced in 1990 by Hanrahan7 , each
object is subdivided into a hierarchy of patches, with each Hence, the uncertainty on the radiosity on the receiving
node in the hierarchy carrying the average of the radiosity patch, due to this particular interaction is:
of its children. Interaction between objects far away from
each other are modelled as interactions between nodes at a δBreceiver = Fmax Bmax − Fmin Bmin
high level in each hierarchy. On the other hand, interactions
between objects close to each other are modelled as interac- The inaccuracy on the energy of the receiving patch, due to
tions between nodes at a lower level in the hierarchy, thereby this particular interaction, is:
allowing more precision in the modelling of radiosity. Each
interaction between two nodes is modelled by a link, a data δEreceiver = Areceiver (Fmax Bmax − Fmin Bmin )
structure carrying the identity of the sender and the receiver,
as well as the form-factor, and possibly other informations We can then decide to refine all interactions where this im-
on the respective visibility of both patches. This hierarchical precision on the transported energy is above a given thresh-
radiosity algorithm has later been extended using wavelets old. The most difficult part in this algorithm is finding re-
(see Gortler8 ). liable values for the bounds on the form-factor. Lischinski1
The most important step in the hierarchical radiosity suggested computing exact values for the point-to-area form
method is the decision whether or not to refine a given in- factor at different sampling points on the receiver, and using
teraction. This decision is deferred to a refinement criterion. the maximum and minimum value at these sampling points
Early implementations of the hierarchical radiosity method as the upper and lower bounds. Although this algorithm does
used crude approximations of the form-factor between two not give totally reliable bounds, it does provide a close ap-
patches. It was known that these form-factor estimates were proximation, and is quite easy to implement on top of an
most imprecise when the result of the approximation was existing hierarchical radiosity implementation.
large. Hence, interactions were refined as long as the form-
In the following sections we show that it is possible to
factor estimate was above a certain threshold (Hanrahan7 ).
compute reliable upper and lower bounds for the point-to-
This refinement criterion does not give the user a full con- area form factor. These bounds can then be used in the pre-
trol of the precision on the modelling of the radiosity func- ceding algorithm, allowing the refinement of all interactions
tion. In particular, it does not give any guarantee that it will where the inaccuracy on the transported energy is above the
refine all problematic interactions, and it can also refine ex- threshold.
145
Tangent
Function
Secant
Tangents
Concave Function
f ′′ (x) < 0
Area of Interest
Function Xmin Xmax
Tangent
Figure 3: A function that remains concave across an inter-
val lies above its secant, and below all its tangents on this
interval.
Convex Function
f ′′ (x) > 0
Tangent
1.5
Function 0.5
-2
0 -1
Inflection Point 0
f ′′ (x) = 0 1
-2 -1 0 2
1 2
Figure 2: Concavity for univariate functions.
Figure 4: A point where the function is concave: the function
lies below the tangent plane.
3. The Concavity Conjectures
3.1. Definition of Concavity
Univariate functions are said to be concave at a point when ate function is said to be concave at a point when it lies below
they lie entirely below their tangent at that point; conversely, its tangent plane (see figure 4), convex when it lies above its
they are said to be convex when they lie above their tangent. tangent plane and indefinite when the function crosses the
When the function crosses its tangent, the point is said to be tangent plane (see figure 5). As with univariate functions,
an inflection point (see figure 2). Classically, the concavity concavity can be used to find upper and lower bounds: if a
of the function is linked to the sign of its second derivative: if function is concave over a triangular area, then on this area it
the second derivative is positive, then the function is convex. lies below all its tangent planes, and above the secant plane
If it is negative, then the function is concave. It is only when defined by the three corners of the triangle.
the second derivative changes sign that we have an inflection A univariate function usually crosses its tangent at an iso-
point. lated point, the inflection point. Contrarily, the set of points
Concavity is often used to find upper and lower bounds where a bivariate function crosses its tangent plane is a
for functions; if a function is concave on an interval, then whole region.
it is below all its tangents on this interval, and above all its
The second derivative of a bivariate function is a 2 × 2
secants (see figure 3). Since concavity allows bounding by
matrix, called the Hessian matrix. As with univariate func-
affine functions (like tangents) instead of constants, it gener-
tions, the concavity of the function is linked to its second
ally provides bounds that are closer to each other, and hence
derivative: if the Hessian matrix is definite positive, then the
a “better” range.
function is convex; if the Hessian matrix is definite negative,
This notion of concavity extends naturally to bivariate then the function is concave; if the Hessian matrix is indef-
functions, such as radiosity defined over a surface. A bivari- inite, then the function is indefinite. The Hessian can be ex-
146
0.5
-0.5 -2
-1
-2 0 Figure 6: The C1 conjecture: the radiosity function has in-
-1
0 1 definite concavity everywhere, except over a convex area
1 2
2
(hatched), where the radiosity function is concave.
Figure 5: A point where the concavity is indefinite: the func-
tion crosses its tangent plane.
3.2.2. Concavity Conjectures
pressed with respect to the partial derivatives of the function: Like Drettakis, we consider a finite convex emitter, with
constant radiosity, and we assume the receiver is an infinite
∂2f ∂2 f

∂u2 ∂u∂v
1 r s plane. We state the following two conjectures on the concav-
H= ∂2f ∂2 f
= (4)
2 s t ity of the radiosity on the receiver:
∂u∂v ∂v 2
Conjecture C1 The Hessian matrix of the radiosity function

The Hessian is definite if rt − s2 is positive. It is definite- is indefinite everywhere, except over a bounded area. On this
positive if rt − s2 is positive and r is positive, definite- area, the radiosity function is concave. Furthermore, the area
negative if rt − s2 is positive and r is negative. If rt − s2 is convex.
is negative or null, the Hessian is indefinite, and the function
crosses its tangent plane. Conjecture C2 On any line drawn on the receiver, radiosity
is concave over a bounded interval, and convex everywhere
It must be noted that a function is necessarily concave
else.
where it has a local maximum, and convex wherever it has
a local minimum. This property is true both for uni- and bi- Figure 6 illustrates the C1 conjecture: the radiosity func-
variate functions. tion is indefinite everywhere — and crosses its tangent plane
— except over a convex region (hatched).
3.2. Concavity of the Point-To-Area Form Factor Figure 7 illustrates the C2 conjecture: the radiosity func-
tion defined over a given line is convex across [−∞, a] and
3.2.1. Background
across [b, +∞], and concave across [a, b].
Let us single out an interaction between an emitting patch
and a receiving patch. We seek an upper and a lower bound Despite their apparent simplicity, these conjectures have
for the point-to-area form-factor across the receiver. These yet escaped demonstration. It is obvious that they are true in
the simplest case of a point light source sending light in all
upper and lower bounds can then be used by a refinement
oracle, as introduced by Lischinski1 . directions. However, even for the case of a differential emit-
ter area instead of a point light source, it has not been possi-
Using the algorithm described in appendix B, we have ac- ble so far to prove the concavity conjectures. Appendix A is
cess to the form-factor and to its derivatives at any point of a detailed study of the differential emitter area.
the receiver. However, these values are only valid at this spe-
cific point. Since we seek a result valid across the whole re- 3.2.3. Relationship between the conjectures
ceiver, we must exhibit a property of the point-to-area form-
factor that is valid across the receiver. Our concavity conjectures are actually an extension of the
unimodality conjectures: that is, C1 implies U1, and C2 im-
A similar approach was used by Drettakis2, 3 . In the case plies U2. Note that we also know that U2 implies U1:
of a finite convex emitter, with constant radiosity, and of an
U2 =⇒ U1
(
infinite receiver, Drettakis made the following two conjec-
tures: C2 =⇒ U2
C1 =⇒ U1
Conjecture U1 Radiosity on the receiver has only one max-
imum. U2 =⇒ U1:
Conjecture U2 Radiosity on any line on the receiver has
Proof Assume U1 is false. Then there exists at least two
only one maximum.
maxima for the radiosity function, M1 and M2. On the line
These two conjectures are referred to below as the uni- joining M1 and M2 there are two maxima, which is in con-
modality conjectures. tradiction with U2.
147
1.8
1.6
1.4
1.2 B
1 A
0.8
0.6
0.4
0.2
0
-2 -1.5 -1 -0.5 0 0.5 1 1.5
A B The radiosity function on the line…

10
0
A B
-5
-10
-15
-2 -1.5 -1 -0.5 0 0.5 1 1.5
…and its second derivative

Figure 7: The C2 conjecture: the radiosity function on a line is concave only over a finite interval, [AB].
C2 =⇒ U2: 4.1. Computing Radiosity Derivatives

Proof The function is concave on the neighbourhood of each Let us call A2 the emitting patch, A1 the receiver and x a
local maximum. If there are two local maxima on a line, point on the receiver (see figure 1). In this case, there is an
there must be a local minimum between them. In the neigh- exact formula for the point-to-area form factor (Siegel and
bourhood of this local minimum, the function would have to Howell14 ):
be convex, which is impossible because of C2. I
1 ~r12
F (x) = − ~n1 · × d~
ℓ2 (5)
C1 =⇒ U1: 2π ∂A2
k~r12 k2
Proof Assume U1 is false. Then there exists at least two lo- where the integral is on ∂A2 , the contour of A2 , and d~
ℓ2 is
cal maxima for the radiosity function. On the neighbourhood
the differential element of this contour.
of each maximum, the radiosity function is concave. But
between the two maxima, there must be a pass-like point, Using this expression of the point-to-area form-factor, it is
where the concavity is indefinite. This is in contradiction possible to compute exact formulae for both its first and sec-
with C1. ond derivatives. These formulae for the derivatives are eas-
ily implemented, giving access to exact values for the func-
No relationship between C1 and C2: An important point tion and its derivatives (see appendix B, and also Arvo15 or
is the independence of our two concavity conjectures. C1 Holzschuch16, 17 ).
does not imply C2, and C2 does not imply C1.
If we compute simultaneously the point-to-area form-
factor and its derivatives, we can save computation time by
4. Error Control for Unoccluded Interactions reusing some geometric quantities that appear in several for-
In this section, we describe our algorithm for finding upper mulae. In this case, the overall cost of computing the deriva-
and lower bounds for the point-to-area form-factor across the tives is reasonable: there is an increase of 40% for com-
receiver. These values are then used by a refinement oracle puting the gradient along with the form-factor, and an in-
like the oracle introduced by Lischinski1 . crease of 100% for computing both the gradient and the Hes-
148
,,,,,,,,,,,
sian matrix (see appendix B and Holzschuch16, 17 ). This cost

,,,,,,,,,,,
All form-factor values
,,,,,,,,,,,
in this half-plane are
must be balanced against what it would require to compute smaller than F(x).
,,,,,,,,,,,
approximate values for the derivatives using several form-
factor computations: in this case, the cost increase for the
,,,,,,,,,,,
gradient would be of 100%, and that of the Hessian 600%.
,,,,,,,,,,,
In our refinement phase, we compute the values of the x
point-to-area form-factor and its derivatives at the vertices
of the receiving patch. These values can be reused in the
radiosity propagation phase to obtain the radiosity values at The maximum lies
the vertices.
in this half-plane ∇F(x)
4.2. Computing Bounds for the Point-to-Area

Figure 8: Knowledge of the gradient helps find the position
Form-Factor
of the maximum.
We show here how our knowledge of the point-to-area form-
factor and its derivatives at the vertices of the receiving
patch, used jointly with our conjectures, gives us access:
Hence the maximum of the point-to-area form-factor can
• first, to the location of the maximum and the minimum of only be in the half-plane defined by: −
→ · ∇F (x) ≥ 0 (see
xp
the point-to-area form-factor, figure 8.)
• second, to an exact value for the minimum,
• third, to an upper bound for the maximum. This property gives us an algorithm to determine whether
the maximum for the point-to-area form-factor across the re-
4.2.1. The Minimum is at one of the Vertices ceiving patch A1 can lie inside the patch, or if it must be at
one of the vertices (see figure 9):
An immediate consequence of the unimodality conjectures
(U1 and U2) is that the minimum for the point-to-area form- • For each vertex, there is a half-plane (defined by the form-
factor is necessarily at one of the vertices of the receiver: factor gradient at this vertex) where the form-factor value
can be greater than the value at the vertex.
• If the minimum was inside the receiving patch, A1, then • The intersection of these half-planes is an area where the
there would exist several local maxima for the point-to- point-to-area form-factor value can be greater than the
area form-factor on the plane supporting A1 — this is value at all the vertices. The intersection of this area with
in contradiction with U1. Hence, the minimum across A1 the receiving patch is either empty or not empty.
must be on the contour of A1. • If this intersection with the patch is not empty, then there
• The contour of A1 is made of polygonal edges. If on one exists an area inside the patch where the maximum can
of these edges the minimum is inside the edge then on the be.
line supporting the edge the form-factor must have two • If this intersection is empty, then the maximum for the
maxima — this is in contradiction with U2. form-factor across the patch must be at one of the vertices.
• Hence, the minimum can only be at one of the vertices of
A1. 4.2.4. If the Maximum is at one of the Vertices
4.2.2. An exact value for the minimum If the above algorithm tells us that the maximum can only be
at one vertex of the receiving patch, then we know the exact
Since we chose to compute the point-to-area form-factor at value of the maximum: it is the value of the point-to-area
the vertices of the receiving patch, A1, we do have access to form-factor at that vertex.
the exact value of the minimum across A1: it is the minimum
of our computed values for the point-to-area form-factor at 4.2.5. If the Maximum is Inside the Receiving Patch
the vertices of A1.
If the above algorithm tells us there exists an area inside the
4.2.3. Finding the Position of the Maximum receiving patch A1 where the maximum can be, then we do
not have access to the exact value of the maximum of the
A consequence of U2 is that given a point x, given the point-
point-to-area form-factor across A1.
to-area form-factor F (x) and its gradient at point x, ∇F (x),
for all points p such that −
→ · ∇F (x) < 0, we have F (p) <
xp The only thing we know at this stage is that the value of
F (x). the maximum must be greater than the values computed at
the vertices of A1.
Otherwise, there would be one local minimum between p
and x on the line passing through p and x, and hence two There are three kind of algorithms for finding an upper
local maxima, which is in contradiction with U2. bound for the point-to-area form-factor across A1:
149
,,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,, ,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,, ,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,, ,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
x1 x2
,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,
x1 x2
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,, ,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,, ,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,,
,,,,,,,,,, ,,,,,,,,,,,
,,,,,,,,,,,
,,,,,,,,,, ,,,,,,,,,,,
,,,,,,,,,,,
The maximum can
,,,,,,,,,,
,,,,,,,,,,
be in this region x3 x4
,,,,,,,,,,,
x3 x4
,,,,,,,,,,
,,,,,,,,,,
The maximum can be inside the polygon. The maximum is at one of the vertices.
Figure 9: Using the gradient to locate the maximum inside or outside the receiving patch.
Heuristic Algorithms: Compute another sample value for here is always two since we are dealing with bivariate func-
the point-to-area form-factor inside patch A1. The positions — and on the number of vertices in the receiving patch.
tion of the sampling point can be arbitrary or can make Usually, in hierarchical radiosity algorithms, we are restrict-
use of the information given by the form-factor gradient. ing ourselves to triangular or quadrangular patches. If this
Concavity Algorithms: If the point-to-area form-factor is the case, we can assume the complexity of computing the
function on the receiving patch is concave, we use the tan- intersection of the tangent planes is constant.
gent planes to find an upper-bound.
Geometric Algorithms: Using geometric tools, build an 4.2.5.2. Geometric Algorithms If the form-factor Hessian
emitter that encloses the actual emitter for all the points of is not definite negative at all the vertices of the receiving
the receiving patch, and for which we can find the value patch, then the point-to-area form-factor function is not con-
of the maximum. This value is an upper-bound. cave across the entire receiving patch. It is therefore not
possible to use concavity algorithms. In this case, we resort
Heuristic algorithms include gradient descent algorithms,
to geometric algorithms: in a plane parallel to the plane of
as described by Arvo15 and Drettakis2, 3 . Gradient descent
the receiver, we construct an emitter with the following two
algorithms make use of the information provided by the gra-
properties:
dient to subdivide the receiving patch until convergence. The
gradient can either be approximated (Drettakis2, 3 ) or an ex- • From all the points of the receiver, it is seen as including
act value (Arvo15 ). the original emitter.
In our implementation, we use concavity algorithms • It has two axes of symmetry, so that we can find the max-
wherever possible, and resort to geometric algorithms if the imum form-factor due to the emitter.
point-to-area form-factor function is not concave. The reason for the second item lies in the symmetry prin-
ciple: if the emitter and the receiver are left unchanged by
4.2.5.1. Concavity Algorithms According to C1, the zone a planar symmetry, then so is the point-to-area form-factor
where the point-to-area form-factor function is concave is a function on the receiver; thus its maximum can only lie on
convex one. As a consequence, if the form-factor Hessian is the intersection of the plane of the symmetry and of the plane
definite negative at the vertices of the receiving patch, then of the receiver. If there are two planes that leave the emitter
it stays definite negative across the receiving patch. and the receiver un-changed, then the maximum can only be
at their intersection (see figure 24, in the color section).
In this case, the form-factor function lies below all its tan-
gent planes at the vertices across the receiving patch. We To build this emitter:
know these tangent planes since we know the form-factor
• select a plane P parallel to the plane of the receiving
gradient at the vertices. Finding an upper bound for the
patch;
point-to-area form-factor is then equivalent to computing the
• for each vertex Vi of the receiving patch, build the projec-
intersection of the tangent planes.
tion pi of the original emitter according to this vertex on
This is mainly a linear programming problem (see, for P (see figure 10);
example, Preparata18 ); the computational complexity of the • this projection is totally equivalent to the original emitter
problem depends on the dimension of the problem — which for this particular vertex;
150
• any convex region enclosing all the pi projections is seen

from all the points of the receiver as enclosing the original Emitter
emitter; as a consequence, the point-to-area form-factor
due to this convex region is greater than the point-to-area
form-factor due to the actual emitter; P Pi
• building a convex region enclosing the pi is a standard ge-
ometry problem (see, for example, Foley et al.19 , Kay20 or
Toth21 ). Constraining this convex region to have two axes
of symmetry can either be a consequence of the bounding
object used, like ellipses and rectangles, or be a property
we add afterward. Since our problem is a two dimensional Receiver
geometry problem — although we have a set of three di-
mensional data points — we start by projecting our pi
Figure 10: The projection of the emitter on the plane P from
onto one of the coordinates planes (x, y), (z, x) or (y, z).
a given vertex.
Once we have built the result in this coordinate plane, we
will project it back onto the emitter plane.
• Several algorithms can be used, either giving a faster re- View from above
sult, but a greater enclosing emitter, and hence a greater
upper-bound, or requiring more time, but giving an en-
closing emitter that is closer to the pi , and hence a smaller
upper-bound:
Emitter
– Build the convex hull of the pi , then build a region Enclosing emitter
with two axes of symmetry enclosing the convex hull.
This gives the smaller enclosing emitter, but requires P
more computation time
– Build a bounding rectangle enclosing the pi inside the
emitter plane, as in Toth21 . This is one of the fastest Pi
possible algorithm. Furthermore, it naturally gives an
enclosing emitter with two axes of symmetry, so there Figure 11: Building an enclosing emitter in order to find an
is no construction time involved for building the sym- upper bound.
metries.
– Build a bounding ellipse enclosing the pi inside the
emitter plane. This algorithm is slower, but it also 4.3. Implementation and Testing
gives an enclosing emitter with two axes of symmetry,
so there is no construction time involved for building 4.3.1. Refinement Criterion
the symmetries. Once we have access, for each iteration, to the minimum and
– A bounding rectangle using the (x, y, z) axes can give maximum form-factor, it is possible to implement a refine-
an enclosing emitter much bigger than the pi , thus in- ment criterion based on their difference. Following the algo-
ducing a greater upper bound. A simple improvement rithm suggested by Lischinski1 , we refine every interaction
is to use slabs, as suggested in Kay20 . In this case, in such that:
order to build an object with two axes of symmetry,
we have to restrict ourselves to two sets of orthogo- Areceiver (Bmax Fmax − Bmin Fmin ) > ε
nal slabs. This algorithm requires more computational
time than the previous algorithm, but can give a signif- This means that we refine an interaction whenever the un-
icantly smaller enclosing emitter. certainty on the incoming energy of the receiving patch is
above the threshold ε.
– If ne is the number of vertices of the emitter, and
nr the number of vertices of the receiver, the to-
tal number of vertices for all the pi is ne nr . In this 4.3.2. Resulting Mesh Simplification
case, the complexity of the convex hull algorithm is In regions where the Hessian matrix of the form-factor
O(ne nr log ne nr ), and the complexity of the other is definite-negative, we know that the form-factor can be
three algorithms is O(ne nr ). bounded between the tangent planes and the secant planes.
We can use these bounding planes to find tighter upper and
lower bounds for the form-factor.
Figure 11 gives an example of the construction of an en-
closing emitter. The form-factor for all the points on the receiving patch
151
lies below all the tangent planes for the point-to-area form form-factors for each patch, plus the time needed for the ex-
factor, and above the secant plane. Therefore, we can say ploitation of the derivatives for computing upper and lower
that our uncertainty on the point-to-area form-factor on the bounds.
receiver is equal to the maximum of the distance between
Existing heuristic refinement algorithms (see Lischinski1 )
the secant plane and these tangent planes.
compute one form-factor sample for each of the receiver ver-
Computing this distance is again a linear programming tices, plus one sample at the center of the receiving patch.
problem (see, for example, Preparata18 ). The complexity de- If we assume that the form-factor values at the vertices are
pends on the number of vertices of the receiver, nr , which is shared with the neighbouring patches, we are computing an
usually three or four. Let us denote by EF F this uncertainty average of two point-to-area form-factors for each receiver.
on the form-factor. EF F can be used in our expressions as a
Thus, the cost of the heuristic algorithm and the cost of
replacement for Fmax − Fmin . Using the fact that:
our algorithm are roughly similar. The main overhead of our
(Bmax Fmax − Bmin Fmin ) = algorithm when compared with the heuristic algorithm is the
Bmax (Fmax − Fmin ) + Fmin (Bmax − Bmin ) time needed for the actual computations for finding the posi-
tion of the maximum and for finding an upper bound for the
we decide to refine a given interaction if maximum, when necessary.
Areceiver (Bmax EF F + Fmin (Bmax − Bmin )) > ε Hence, the relative costs of our refinement criterion are in
fact quite small and can be generally regarded as acceptable,
It must be noted that this new bounding of the form-factor especially with respect to the complete control it gives on
does not introduce any uncertainty. We are still bounding the error carried by each interaction.
the form-factor by fully reliable functions. However, since
these functions are affine instead of constants, they provide Also, our algorithm allows for a significant mesh sim-
much tighter bounds, and we can expect a simpler mesh in plification (see figure 25, in the color section) which may,
the areas where the point-to-area form factor is concave. depending on the scene considered, induce a smaller com-
putation time for the exhaustive refinement criterion when
Figure 25 (in the color section) shows the result of our re- compared to a heuristic refinement criterion.
finement criterion on a simple box, with only direct illumi-
nation. Notice that the mesh produced is coarser in some ar-
eas with respect to the immediately neighbouring areas (the 5. Error Control for Partially Occluded Interactions
disc-shaped area on the floor, and the drop-shaped areas on The above algorithm for finding upper and lower bounds
the walls). These are the places where the Hessian is definite- only works in the case of unoccluded interactions, and with
negative. a convex emitter. This algorithm relies on the concavity and
This refinement criterion extends, in some ways, the mesh unimodality conjectures, which do not hold if there are oc-
simplification found in previous work (Holzschuch9 ). The cluders between the emitter and the receiver.
shape of the mesh produced is quite similar between our new However, it is possible to construct, using geometrical
algorithm and the algorithm in Holzschuch9 . However, our tools, a minimal and a maximal emitter that have the fol-
new refinement criterion, while keeping low memory costs, lowing qualities:
also gives fully reliable upper and lower bounds on the ra-
diosity of each patch. • both are convex;
• any point of the minimal emitter is fully visible from the
receiver;
4.3.3. Dealing with Singularities
• the maximal emitter contains all the points of the emitter
4.3.4. Relative Complexity of the Algorithm that are visible from at least one point of the receiver;
Our algorithm requires the computation of the first two Then at any given point on the receiver,
derivatives of the point-to-area form-factor at the vertices of
• the form-factor due to the minimal emitter is lesser or
the receiver. This implies a 100 % increase on the compu-
equal to the actual form factor,
tation time for each vertex (see appendix B). That is to say,
• and the form-factor due to the maximal emitter is greater
computing the point-to-area form-factor and its derivatives
or equal to the actual form-factor.
costs twice what it would cost to compute the point-to-area
form-factor alone. We apply our previous algorithm to these emitters, and
find a lower bound using the minimal emitter, and an upper
Since vertices are shared by several patches, this over-
bound using the maximal emitter.
head cost is shared by several interactions. On the average,
we are only computing one point-to-area form-factor and its Figure 26 (in the color section) shows an example of min-
derivatives for each patch. Thus, the cost of our algorithm imal and maximal emitters for a simple configuration with
is approximately the cost of computing two point-to-area only one occluder: the small red square on the ground is the
152
Maximal Emitter
Emitter
Receiver
Receiver
Occluders
Occluders
Complement of "Umbra"
Emitter
Figure 12: A single interaction with occluders.
Figure 14: The maximal emitter can be any convex including
the complement of the “umbra” region.
Our minimal emitter

Emitter
ver
Receiver Complement of
Occluders "penumbra": several
candidates for
the minimal emitter
Occluders
Figure 13: Computing the “umbra” and “penumbra” vol- Emitter
umes using the receiver as a light source.
Figure 15: Several possible candidates for the minimal emit-
ter.
receiver; the black square with a white border is the occluder,

and the bright red area is the minimal emitter — the part of 5.1.1. Computing the maximal emitter using the
the emitter that is visible from all the points of the receiver. “umbra” volume
The dark red area is the maximal emitter. The blue line is the The intersection of the emitter with the “umbra” volume is
contour of the emitter as it is seen from one of the points of the set of points on the emitter that are totally invisible from
the receiver. the receiver.
The complement of this intersection is the set of points
on the emitter that are visible from at least one point on the
5.1. Computing the minimal and maximal emitter receiver.
Our definition of minimal and maximal emitter bears a Since our criterion only works for convex emitters, we
strong resemblance with the definition of umbra and penum- have to build a convex emitter that includes this complement.
bra, except that the roles of the emitter and the receiver are Our basic rule is that we must not under-estimate the point-
reversed. to-area form-factor, only possibly over-estimate it. Hence,
A similar algorithm has been used by Teller to computer the maximal emitter must be any convex region including the
the antipenumbra of an area light source22 , and to solve the previously computed complement — for example the convex
visibility problem in a hierarchical radiosity algorithm23 , and hull of the complement, or the bounding-box of the comple-
by Drettakis13 . Drettakis13 used a specific data structure, the ment (see figure 14).
backprojection, which gives to the program the structure of
the projection of the occluders on the emitter plane, from any 5.1.2. Computing the minimal emitter using the
point on the receiver. “penumbra” volume
Similarly, the intersection of the emitter with the “penum-
Algorithms used for computing umbra and penumbra can
bra” volume is the set of points on the emitter that are at
be quickly adapted in order to compute the minimal and
least partially hidden from the receiver.
maximal emitter for each receiver. Let us consider a sin-
gle interaction, with one emitter, one receiver, and occluders The complement of this intersection is the set of points on
(see figure 12). We compute the umbra and the penumbra the emitter that are visible all the points of the receiver.
volume using the receiver as a light source (see figure 13).
Any convex region that is included in this complement is
The intersection of these volumes with the emitter plane is a
a suitable candidate for the minimal emitter (see figure 15).
close indication of where the minimal and maximal emitter
are. Depending on the position of the occluders, it is possible
153
to have several candidates for the minimal emitter. Ideally, only the computation of a surface included in the umbra vol-
we would like to pick the candidate that gives the largest esti- ume, and of a surface enclosing the penumbra volume. Two
mate for the minimum, since this would give tighter bounds, such surfaces can be computed in a straightforward way:
and hence reduce the number of un-necessary refinements.
• For each occluder:
However, it is impossible to find this without computing
the point-to-area form-factor for all the candidates, which – For each receiver vertex, compute the projection of the
would prove very time-consuming. In our implementation, occluder onto the emitter supporting plane;
we choose the candidate with the largest area, since it is – The intersection of these projections is the umbra vol-
likely to induce a larger form-factor. ume for this particular receiver;
– The convex hull of these projections is the penumbra
volume for this receiver.
5.2. Implementation and testing
• The union of the penumbra volumes for all occluders is
We have implemented our algorithm for finding upper and the penumbra volume for the entire interaction.
lower bounds for the point-to-area form-factor using the • The union of the umbra volumes for all occluders is not
maximal and minimal emitter. equal to the umbra volume for the entire interaction. How-
Figure 27 (in the color section) shows the result of our ever, it is included into the actual umbra volume (see
refinement criterion on a simple scene, with a single oc- Lischinski11 ). Hence, we can use it for building the maxi-
cluder. Notice that the algorithm detects the shadow bound- mal emitter.
aries and refines properly in order to model them. Outside The computation of the projection of the occluders onto
of the shadow, the mesh produced is identical to the mesh the emitter supporting plane, and the computation of the
produced without occluders. union of these projections can be reused for computing the
exact value of the point-to-area form-factor in the radiosity
5.3. Complexity of the Algorithm and Possible propagation phase.
Improvements The only extra cost of our refinement criterion is then
Our algorithm relies on computation of the umbra and the computation of the minimal and maximal emitter know-
penumbra volumes for all the interactions. This computation ing the projection of all the occluders on the emitter plane.
can be quite costly, if it is implemented in a naive way. This is a two-dimensional problem, computing a convex re-
gion that contains the complement of the umbra volume,
Previous work by Chin24 has shown that the use of a and another convex region that is included into the comple-
BSP-tree can greatly improve the computation of umbra and ment of the penumbra volume. Note that we do not have to
penumbra volumes. Teller22 showed that by extending the explicitely construct the umbra and the penumbra volume,
data structure used to store the interaction between patches only the two convex regions. We can use several methods
to also store the possible occluders for this interaction, the for computing these convex regions, as described in sec-
complexity of visibility computations could be greatly re- tion 4.2.5.2. The cost of our algorithm is the cost of finding
duced. Both these improvements work with our algorithm. two convex regions enclosing nr n polygons, where n is the
number of occluders, and nr is the number of vertices of the
Our algorithm can also be used in a combination with
receiver.
standard discontinuity meshing, as described in Lischinski11 .
A preliminary light-source discontinuity meshing will re- The heuristic algorithm described by Lischinski1 uses the
duce the complexity of the minimal and maximal emitter same computation of the exact values of the point-to-area
computations by providing occlusion information and reduc- form-factor at the vertices of the receiver, which will be
ing the number of patches where we have to compute these reused in the radiosity propagation phase, plus the compu-
emitters. tation of the point-to-area form-factor at the center of the re-
ceiving patch, which implies the projection of the occluders
The backprojection algorithm described by Drettakis13, 3
on the emitter supporting plane and the computation of the
gives for each patch created during the discontinuity mesh-
union of these projections. Hence, the cost of the heuristic al-
ing step the geometric structure of the emitter as seen
gorithm is n projections and the union of n two-dimensional
from this patch. Implementing our algorithm on top of a
polygons.
backprojection algorithm should be a straightforward post-
processing step.
6. Conclusions and Future Directions
It has been shown (Lischinski11 and Drettakis13, 3 ) that the
boundary of the umbra volume can include a quadric sur- We have introduced a new and reliable way of computing the
face, and hence can be quite complex to model. However, maximum and the minimum of the point form-factor on any
our algorithm does not require a complete computation of interaction. These bounds on the form-factor allow a con-
the umbra and penumbra volumes for each interaction, but trol of the precision of the hierarchical radiosity algorithm,
154
precision that can be required for certain applications of the a simpler algorithm to find upper and lower bounds by using
algorithm, such as architectural planning. only U1, U2 and the form-factor gradient.
These bounds have been integrated in a new refinement This algorithm would be very similar to the gradient-
criterion for hierarchical radiosity. We have also presented descent algorithms described by Arvo15 and Drettakis2, 13 .
another refinement criterion that, while maintaining control The main difference would be the use of geometric tools, as
on the upper and lower bounds of the energy transported, described in section 4.2.5.2 to find an upper bound. These
allows a coarser mesh to be constructed in some places, thus geometric tools will provide a fully reliable upper bound on
reducing memory and computation costs. the receiving patch.
This algorithm is a significant step in error-control for This simpler algorithm would not allow mesh simplifica-
global illumination methods. Although it has been devised tion as described in section 4.3.2; also, since this simpler
and implemented in a hierarchical radiosity framework, algorithm would only use geometric methods to find up-
nothing in the algorithm prevents the refinement criterion to per bounds it can be expected that it will give greater upper
be implemented with progressive refinement radiosity, as de- bounds, and hence induce more refinement than our current
scribed by Cohen25 . algorithm. On the other hand, this algorithm would not re-
Knowledge of the error produced in all the parts of the al- quire the computation of the form-factor Hessian, thus sav-
gorithm allows global illumination programs to concentrate ing computation time, and would probably be easier to ex-
their work on parts of the scene where the error is still large, tend to partial visibility cases, where C1 may not hold.
and to skip parts where it can be neglected. Thus, our algo- Future work will include an implementation of this sim-
rithm can be hoped to accelerate global illumination compu- pler algorithm, and timing and memory costs comparisons
tations by reducing the amount of unnecessary refinement. between our full algorithm, the simpler algorithm and the
Our algorithm relies on several conjectures: the unimodal- heuristic algorithm, as well as error measurements.
ity conjectures (U1 and U2) and the concavity conjectures
(C1), as well as on a knowledge of the radiosity derivatives.
7. Acknowledgements
Table 1 recalls, for each part of the algorithm, which conjec-
ture and which derivatives are being used. The first author has been funded by an AMN grant from Uni-
The concavity and unimodality conjectures assume that versité Joseph Fourier from 1994 to 1996.
radiosity on the emitter is constant, that the receiver is dif-
fuse and that there is full visibility. An extension of our error- References
control algorithm to cases where radiosity on the emitter is
not constant, or to reflectance functions that are not constant 1. D. Lischinski, B. Smits, and D. P. Greenberg, “Bounds
would first require a careful study of to what extent do our and Error Estimates for Radiosity”, in Computer
concavity or unimodality conjectures still hold. For exam- Graphics Proceedings, Annual Conference Series,
ple, it is clear that they cannot hold for whatever distribution 1994 (ACM SIGGRAPH ’94 Proceedings), pp. 67–74,
of radiosity on the emitter, but only for specific cases. These (1994).
specific cases, once identified, can be used as a functional 2. G. Drettakis and E. Fiume, “Accurate and Consistent
basis for radiosity. Reconstruction of Illumination Functions Using Struc-
We have dealt with the partial visibility problem by com- tured Sampling”, in Computer Graphics Forum (Euro-
puting maximal and minimal emitter, thereby reducing the graphics ’93), vol. 12, (Barcelona, Spain), pp. C273–
problem to two full visibility problems. However, it is known C284, (September 1993).
that it is possible to compute the radiosity gradient in pres- 3. G. Drettakis, “Structured Sampling and Reconstruction
ence of occluders (see Arvo15 ), and it seems possible to com- of Illumination for Image Synthesis”, CSRI Technical
pute the radiosity Hessian in presence of occluders as well Report 293, Department of Computer Science, Univer-
(see Holzschuch17 ). In this case, it would be possible to ex- sity of Toronto, Toronto, Ontario, (January 1994).
tend our refinement criterion to some partially visible inter-
actions without having to compute the maximum and mini- 4. C. M. Goral, K. E. Torrance, D. P. Greenberg, and
mum emitter. Once again, this can be done only in specific B. Battaile, “Modelling the Interaction of Light Be-
configurations where the concavity or unimodality conjec- tween Diffuse Surfaces”, in Computer Graphics (ACM
tures still hold. This is not the case for generic occluders (see SIGGRAPH ’84 Proceedings), vol. 18, pp. 212–222,
figure 28, in the color section), but only for certain specific, (July 1984).
simple occluders (see figure 29 in the color section).
5. P. Schröder and P. Hanrahan, “On the Form Factor
Although the algorithm described in this paper makes use Between Two Polygons”, in Computer Graphics Pro-
of the U1, U2 and C1 conjectures, and of the form-factor ceedings, Annual Conference Series, 1993 (ACM SIG-
gradient and Hessian, table 1 shows that it is possible to build GRAPH ’93 Proceedings), pp. 163–164, (1993).
155
Parts of the algorithm Conjectures required Derivatives required
Position and value of the minimum U1 and U2 None

Position of the maximum U2 Gradient
Using tangents to find an upper bound C1 Hessian
Using geometric algorithms to find an upper None None
bound
Simplification of the mesh C1 Hessian
Table 1: Dependencies for the different parts of the algorithm
6. M. Cohen and D. P. Greenberg, “The Hemi-Cube: 15. J. Arvo, “The Irradiance Jacobian for Partially Oc-
A Radiosity Solution for Complex Environments”, in cluded Polyhedral Sources”, in Computer Graphics
Computer Graphics (ACM SIGGRAPH ’85 Proceed- Proceedings, Annual Conference Series, 1994 (ACM
ings), vol. 19, pp. 31–40, (August 1985). SIGGRAPH ’94 Proceedings), pp. 343–350, (1994).
7. P. Hanrahan, D. Salzman, and L. Aupperle, “A 16. N. Holzschuch and F. Sillion, “Accurate Computation
Rapid Hierarchical Radiosity Algorithm”, in Computer of the Radiosity Gradient for Constant and Linear Emit-
Graphics (ACM SIGGRAPH ’91 Proceedings), vol. 25, ters”, in Rendering Techniques ’95 (Proceedings of the
pp. 197–206, (July 1991). Sixth Eurographics Workshop on Rendering) (P. M.
Hanrahan and W. Purgathofer, eds.), (New York, NY),
8. S. J. Gortler, P. Schröder, M. F. Cohen, and P. Hanra- pp. 186–195, Springer-Verlag, (1995).
han, “Wavelet Radiosity”, in Computer Graphics Pro-
ceedings, Annual Conference Series, 1993 (ACM SIG- 17. N. Holzschuch, Le Contrôle de l’Erreur dans la
GRAPH ’93 Proceedings), pp. 221–230, (1993). Méthode de Radiosité Hierarchique (Error Control
in Hierarchical Radiosity). Ph.D. thesis, Équipe
9. N. Holzschuch, F. Sillion, and G. Drettakis, “An Effi- iMAGIS/IMAG, Université Joseph Fourier, Grenoble,
cient Progressive Refinement Strategy for Hierarchical France, (March 5th, 1996).
Radiosity”, in Fifth Eurographics Workshop on Render-
ing, (Darmstadt, Germany), pp. 343–357, (June 1994). 18. F. P. Preparata and M. I. Shamos, Computational Ge-
ometry – An Introduction. New York: Springer Verlag,
10. P. Heckbert, “Discontinuity Meshing for Radiosity”, in (1985).
Third Eurographics Workshop on Rendering, (Bristol,
UK), pp. 203–226, (May 1992). 19. J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes,
Computer Graphics, Principles and Practice, Second
11. D. Lischinski, F. Tampieri, and D. P. Greenberg, “Dis- Edition. Reading, Massachusetts: Addison-Wesley,
continuity Meshing for Accurate Radiosity”, IEEE (1990).
Computer Graphics and Applications, 12(6), pp. 25–39
(1992). 20. T. L. Kay and J. T. Kajiya, “Ray tracing com-
plex scenes”, Computer Graphics, 20(4), pp. 269–276
12. D. Lischinski, F. Tampieri, and D. P. Greenberg, “Com- (1986). Proceedings of SIGGRAPH ’86 in Dallas
bining Hierarchical Radiosity and Discontinuity Mesh- (USA).
ing”, in Computer Graphics Proceedings, Annual Con-
ference Series, 1993 (ACM SIGGRAPH ’93 Proceed- 21. D. L. Toth, “On ray-tracing parametric surfaces”, Com-
ings), pp. 199–208, (1993). puter Graphics, 19(3), pp. 171–179 (1985). Proceed-
ings SIGGRAPH ’85 in San Francisco (USA).
13. G. Drettakis and E. Fiume, “A Fast Shadow Algo-
rithm for Area Light Sources Using Backprojection”, 22. S. J. Teller, “Computing the antipenumbra of an
in Computer Graphics Proceedings, Annual Confer- area light source”, in Computer Graphics (ACM SIG-
ence Series, 1994 (ACM SIGGRAPH ’94 Proceedings), GRAPH ’92 Proceedings), vol. 26, pp. 139–148, (July
pp. 223–230, (1994). 1992).
14. R. Siegel and J. R. Howell, Thermal Radiation Heat 23. S. Teller and P. Hanrahan, “Global Visibility Algo-
Transfer, 3rd Edition. New York, NY: Hemisphere Pub- rithms for Illumination Computations”, in Computer
lishing Corporation, (1992). Graphics Proceedings, Annual Conference Series,
156
dA 0.6
θ
0.4
n
0.2
(u,v)
(0,0) 0
Figure 17: An example of the point-to-area form-factor

Figure 16: A differential area emitter and an infinite receiv- function (θ = π6 ).
ing plane.
1993 (ACM SIGGRAPH ’93 Proceedings), pp. 239–

246, (1993).
θ=0
24. N. Chin and S. Feiner, “Fast object-precision shadow θ = 0.2
generation for areal light sources using BSP trees”, θ = π/6
Computer Graphics (1992 Symposium on Interactive θ = π/4
3D Graphics), 25(2), pp. 21–30 (1992). θ = π/3
25. M. Cohen, S. E. Chen, J. R. Wallace, and D. P. Green- θ = π/2
berg, “A Progressive Refinement Approach to Fast
Radiosity Image Generation”, in Computer Graphics
(ACM SIGGRAPH ’88 Proceedings), vol. 22, pp. 75–
84, (August 1988). Figure 18: The areas where the point-to-area form-factor
function is concave for different values of θ.
Appendix A: Concavity conjectures: case study of a
differential area emitter
Let us consider the case of an infinite receiving plane and the expression S(u, v, θ) is positive, where S(u, v, θ) is:
a single differential area for the emitter. In this case, due to
the symmetries shared by the emitter and the receiver, there S = −3u4 − 8 tan θu3 − 5 tan2 θu2 − 4u2 v 2 + 3u2
is only one parameter: the angle, called θ, between the nor- +4u tan θ − 8 tan θuv 2
mal of the emitter and a line parallel to the receiver, (see −5 tan2 θv 2 − v 2 − v 4 + tan2 θ
figure 16).
To express the position of a point on the receiver, we Although it is impossible to find an explicit solution of
choose a set of axes related to the emitter: the first axes the equation S(u, v, θ) = 0, it is possible to plot these solu-
shares the direction of the projection of the normal of the tions for different values of θ. Figure 18 shows the contour
emitter on the receiver, and the second axes is orthogonal to of the area where S(u, v, θ) is positive for different values of
the first. The origin of our coordinate system is the projec- θ. Outside these areas, S(u, v, θ) is negative, and hence the
tion of the emitting point. Using this set of coordinates, we Hessian matrix is indefinite. Inside these areas, S is positive,
have a simple expression for the point-to-area form-factor at and the form-factor is concave.
any point M (u, v) on the receiver (see figure 17 the aspect
An interesting point is the shape of the zones where the
of the surface):
point-to-area form factor is concave. When θ = π2 , it is of
dA u cos(θ) + sin(θ) course a disc, due to the symmetries in the scene. When θ =
F (u, v) =
π (u2 + v 2 + 1)2 0, it is a shape like a drop, that tapers to a point in (0, 0). For
This value is only for u cos(θ) + sin(θ) > 0. If u cos(θ) + intermediate values of θ, the zone has an intermediate shape
sin(θ) ≤ 0, then of course F (u, v) = 0. between the drop and the disc, but this shape always appears
to be convex.
The C1 concavity conjecture

The C2 concavity conjecture
In this simple case, it is possible to explicitly compute the
derivatives of the point-to-area form-factor. An explicit com- If we now focus on the radiosity on a specific line v = au +
putation of the Hessian shows that it is definite if and only if b on the receiving plane, we have, for the form-factor as a
157
function of u, A2
dA u cos(θ) + sin(θ) Ei +1
f (u) =
π (u2 + (au + b)2 + 1)2
→
ei →
ri +1
The form-factor is equal to f (u) if u cos(θ)+sin(θ) > 0.
Ei
If u cos(θ) + sin(θ) ≤ 0, then the form-factor is null. → →
n1
ri
γi
It must be noted that f (u) goes to zero when u goes to
±∞, and that f (u) is equal to zero only for u = u0 = A1 x
− tan θ.
It is possible to compute the first and the second derivative Figure 20: Notation when the emitter is a polygon.
of f (u). The first derivative, f ′ (u), is of the sign of a second
degree polynomial in u, and the second derivative, f ′′ (u) is
of the sign of a third degree polynomial in u. As a conse-
quence, f ′ (u) can change sign at most twice, and f ′′ (u) at F =0
foreach edge [Ei Ei+1 ]
most three times. ri = E i − x
~
ri+1 = Ei+1 − x
~
Since the function f (u) goes to zero when u goes to ±∞, crossprod = ~ ri × ~
ri+1
~i ·~
r ri+1
it must have one maximum between u0 and +∞, and one

gamma = arccos ri ri+1
minimum between u0 and −∞. As a consequence, f ′ (u) gamma
I1 = kcrossprodk
must change sign exactly twice. Let us call u1 and u2 the mixt = ~
n1 · crossprod
F − = I1 mixt
points where the first derivative changes sign (u1 < u0 < 1
F ∗ = 2π
u2 ).
Figure 21: Pseudo-Code for computing the form-factor.
f ′ (u) also goes to zero when u goes to ±∞. As a con-
sequence, it must have one minimum between u2 and +∞,
and another between −∞ and u1 , and it must have one max-
imum between u1 and u2 . So the second derivative changes
sign exactly three times. One of the point where the second form-factor gradient is 30 %, while computing an approx-
derivative changes sign is smaller than u1 , which is smaller imate value of the gradient would require two form-factor
than u0 , and one of them is greater than u2 , which is greater samples, thus increasing computation time by 100 %
than u0 .
Then the second derivative changes sign at least once and
at most twice on [u0 , +∞]. When u goes to +∞, f is con- The Point-to-Area Form-Factor
vex, and f ′′ is positive. So we just proved that f ′′ can be Let us recall that the point-to-area form-factor from a point
negative only over a unique bounded segment on [u0 , +∞]. x on a patch A1 to a patch A2 (see figure 1) can be expressed
The form-factor on the line is equal to f (u) for u > u0 , as a contour integral:
and null everywhere else. So the form-factor on a line is con-
~r12 × d~
I
cave only over a unique bounded segment. This proves the 1 ℓ2
F (x) = −~n1 ·
C2 conjecture for a differential area emitter. 2π ∂A2
k~r12 k2
Figure 19 shows an example of such a f (u) function,

along with its first and second derivatives. It can be noted For the explicit derivation of this contour integral from the
that this function is concave over a single segment, and con- equation 2, see Siegel and Howell14 .
vex everywhere else. In the case where the emitter is a polygon, this expression
simplifies to a finite sum:
Appendix B: Effective computation of the form-factor 1 X
derivatives F (x) = ~n1 · ~γi (6)
2π
i
In this section, we show how it is possible to compute the
derivatives of the point-to-area form-factor with little addi- where ~γi is the vector of norm γi , and of direction the cross-
tional computation expense. product ~ri × ~ri+1 (see figure 20).
In particular, it is shown that the computation of the ex- An example pseudo-code for computing the form-factor
act value of the form-factor derivatives is always cheaper using equation 6 can be found in figure 21. This pseudo-
than the computation of an approximate value using several code makes use of the standard 3D operations like addition,
form-factor samples. For example, the cost of computing the cross-product and dot product.
158
Line cutting the function
0.18
f(x)
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
-0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0
A line cutting through the radiosity function The radiosity function on the line
2 30
d(x) s(x)
20
1.5
10
1
0
0.5 -10
0 -20
-30
-0.5
-40
-1
-50
-1.5 -60
-0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0
The first derivative of the radiosity function on The second derivative is negative only over a
the line segment.
Figure 19: The radiosity on any line on the receiving plane is concave only over a segment.
Form-Factor Gradient
F =0
G~ =~ 0
The point-to-area form-factor gradient can be easily com-
puted by derivation of the previous formula (see Arvo15 , or .
.
Holzschuch17 ): .
F − = I1 mixt
−−−−−→
−1 X ei = Ei Ei+1
~
∇F (x) = − ~n1 × ~ei I1 ei ·~
~ ri+1 ei ·~
~ ri
+ e2
2π I2 =
r2
−
r2 i I1
i i+1 i
2
I2 / = 2crossprod
+2~n1 · (~ri × ~ri+1 ) (~ri I2 + ~ei J2 )
J2 = 0.5 1 − 1 −~
ei · ~
ri I2
r2 r2
i i+1
With: J2 / = e2i
~ = (~
G+ n1 × ~
ei )I1 + 2mixt(~
ri I2 + ~
ei J2 )
γi
I1 = F ∗ = 2π1
k~
ei × ~ri k ~ =− 1
G∗
2π
1 ~ei · ~
ri+1 ei · ~
~ ri
I2 = 2
− + e2i I1 Figure 22: Pseudo-Code for computing the gradient of the
ri k2
2kei × ~ ri+1 ri2
form-factor.
1 1 1 ~ei · ~
ri
J2 = − 2 − I2
2e2i ri2 ri+1 e2i
The code in figure 21 for computing the form-factor can

be extended for computing the gradient. Figure 22 shows the
extension of the pseudo-code needed for computing simul- pends on the computer and on the compiler used. On an
taneously the point-to-area form-factor and its gradient (we R4000 SGI with the standard cc compiler, it is 30 % (see
did not include the part of the code that is exactly identical). Holzschuch16, 17 ).
As can be seen, most of the costly computations like inverse
What is fundamental is that it actually costs much less to
trigonometric functions have been done for the form-factor,
compute the exact value for the gradient than it would cost
and do not need to be redone for the gradient.
to compute two radiosity values, and then to approximate the
The exact extra cost of computing the gradient de- gradient using these values.
159
the form-factor alone (see Holzschuch17 ), meaning that the

F =0
G~ =~ 0
overall cost of computing the point-to-area form-factor and
H=0 its first two derivatives is 2.1 times the cost of computing
.
the form-factor alone. Notice it is much faster to compute
.
. the exact value than it would be to compute an approximate
~ = (~
G+ n1 × ~
ei )I1 + 2mixt(~
ri I2 + ~
ei J2 )
ei ·~
~ ri+1
Hessian matrix — which would require seven separate form-
ei ·~
~ ri
I3 = − + e2
i I2 factor computations.
r4 r4
i+1 i
2
I3 / = 4crossprod

J3 = 0.25 1 − 1 −~
ei · ~
ri I3
r4 r4
i i+1
J3 / = e2
i
K3 = I2 − ri2 I3 − 2~ ei · ~
r i J3
K3 / = e2i
H+ = −mixtI2 I + Q(~ ri I2 + ~ei J2 , n
~1 × ~
ei )
+2mixt(Q(~ ri , ~
ri )I3 + Q(~ei , ~
ei )K3 + 2J3 Q(~ ei , ~
ri ))
F ∗ = 2π1
~
G∗ = − 2π 1
H∗ = π1
Figure 23: Pseudo-Code for computing the first two deriva-

tives of the form-factor.
Hessian matrix for the point-to-area form-factor

The point-to-area form factor Hessian matrix can also be
computed by derivation of Equation 6 (see Holzschuch17 ):
1X
H = − Q (~
n1 × ~ei , ~ri I2 + ~ei J2 )
π
i
−~n1 · (~ri × ~ei )I2 I
+2~n1 · (~ri × ~ei )(Q(~ri , ~ri )I3
+Q(~ei , ~ei )K3 + 2J3 Q(~ri , ~ei ))
We use the following notation:
Q(~a, ~b) = ~at~b + ~bt~a

1 1 ~ei · ~
ri+1 ei · ~
~ ri
I3 = 4
− + 3e2i I2
4 k~ ri k2
ei × ~ ri+1 ri4

1 1 1 ri · ~ei
~
J3 = − 4 − I3
4e2i ri4 ri+1 e2i
1
K3 = I2 − ri2 I3 − 2(~
ri · ~
ei )J3
e2i
The code for computing the form-factor and the gradient

can be extended to compute the second derivative as well.
Figure 23 shows the extension of the pseudo-code needed
for computing simultaneously the point-to-area form-factor
and its first two derivatives (we did not include the part of
the code that is exactly identical). Once again, recycling ge-
ometric computations previously done reduces the cost of
computing the Hessian matrix, even if the cost is still high
since matrix operations are quite expensive: a single matrix
addition has the same cost as 9 standard additions.
The exact extra cost of computing the Hessian matrix de-
pends on the computer and on the compiler. On a R4000
SGI, with the standard cc compiler, it is 80 % of the cost of
160
Common symmetry plane

The maximum lies
on this line
Location of the maximum
Figure 24: The symmetries of the scene can help find the location of the maximum.
Figure 25: Direct illumination with our refinement criterion, unoccluded scene.
161
Figure 26: Minimal and maximal emitter for a simple configuration.
Figure 27: Direct illumination with our refinement criterion, with one occluder.
162
Figure 28: With generic occluders, the unimodality conjectures do not hold.
Figure 29: With certain occluders, the unimodality conjectures still holds.
163
3.4.4 A Frequency Analysis of Light Transport (Siggraph 2005)

Auteurs : Frédo D, Nicolas H, Cyril S, Eric C et François S
Conférence : SIGGRAPH 2005. Cet article a été également publié dans ACM Transactions on
Graphics, vol. 24, no 3, p. 1115–1126.
Date : août 2005
A Frequency Analysis of Light Transport
Frédo Durand Nicolas Holzschuch Cyril Soler Eric Chan François X. Sillion
MIT-CSAIL ARTIS∗ GRAVIR/IMAG-INRIA MIT-CSAIL ARTIS∗ GRAVIR/IMAG-INRIA
Abstract 1. Spectrum of the source 2. Spectrum after first blocker 3. Spectrum after 2nd blocker
angular freq.
angular freq.
angular freq.
We present a signal-processing framework for light transport. We
study the frequency content of radiance and how it is altered by spatial frequencies spatial frequencies spatial frequencies
phenomena such as shading, occlusion, and transport. This extends
previous work that considered either spatial or angular dimensions,
and it offers a comprehensive treatment of both space and angle. 4. Incoming Spectrum
at receiver
We show that occlusion, a multiplication in the primal, amounts
angular freq.
in the Fourier domain to a convolution by the spectrum of the
blocker. Propagation corresponds to a shear in the space-angle fre- spatial frequencies
quency domain, while reflection on curved objects performs a dif-
ferent shear along the angular frequency axis. As shown by previ-
5 Outgoing Spectrum
ous work, reflection is a convolution in the primal and therefore a after receiver
angular freq.
multiplication in the Fourier domain. Our work shows how the spa-
tial components of lighting are affected by this angular convolution.
Our framework predicts the characteristics of interactions such spatial frequencies
as caustics and the disappearance of the shadows of small features.

Predictions on the frequency content can then be used to control Figure 1: Space-angle frequency spectra of the radiance function
sampling rates for rendering. Other potential applications include measured in a 3D scene. We focus on the neighborhood of a ray
precomputed radiance transfer and inverse rendering. path and measure the spectrum of a 4D light field at different steps,
Keywords: Light transport, Fourier analysis, signal processing which we summarize as 2D plots that include only the radial com-
ponents of the spatial and angular dimensions. Notice how the
blockers result in higher spatial frequency and how transport in
1 Introduction free space transfers these spatial frequencies to the angular domain.
Light in a scene is transported, occluded, and filtered by its complex Aliasing is present in the visualized spectra due to the resolution
interaction with objects. By the time it reaches our eyes, radiance is challenge of manipulating 4D light fields.
an intricate function, and simulating or analyzing it is challenging.
Frequency analysis of the radiance function is particularly inter- scene and how it is affected by phenomena such as occlusion, re-
esting for many applications, including forward and inverse render- flection, and propagation in space (Fig. 1). We first present the two-
ing. The effect of local interactions on the frequency content of dimensional case for simplicity of exposition. Then we show that it
radiance has previously been described in a limited context. For in- extends well to 3D because we only consider local neighborhoods
stance, it is well-known that diffuse reflection creates smooth (low- of rays, thereby avoiding singularities on the sphere of directions.
frequency) light distributions, while occlusion and hard shadows Although we perform our derivations in an abstract setting, we
create discontinuities and high frequencies. However, a full char- keep practical questions in mind. In particular, we strongly be-
acterization of global light transport in terms of signal processing lieve that a good understanding of frequency creation and atten-
and frequency analysis presents two major challenges: the domain uation allows for more efficient sampling strategies for stochas-
of light rays is intricate (three dimensions for position and two for tic approaches such as Monte-Carlo global illumination. Further-
direction), and light paths can exhibit an infinite number of bounces more, it leads to better sampling rates for light-field rendering, pre-
(i.e. in terms of signal processing, the system has dense feedback). computed radiance transfer, and related applications. Finally, our
To address the above challenges, we focus on the neighborhood frequency analysis can shed key practical insights on inverse prob-
of light paths [Shinya et al. 1987]. This restriction to local prop- lems and on the field of statistics of natural images by predicting
erties is both a price to pay and a fundamental difficulty with the which phenomena can cause certain local-frequency effects.
problem we study: characteristics such as reflectance or presence
and size of blockers are non-stationary, they vary across the scene.
This paper presents a theoretical framework for characterizing
light transport in terms of frequency content. We seek a deep un- 1.1 Contributions
derstanding of the frequency content of the radiance function in a
∗ ARTIS is a team of the GRAVIR lab (UMR 5527), a joint unit of CNRS, This paper makes the following contributions:
INPG, INRIA and UJF. Frequency study in space and angle. Our framework encom-
passes both spatial and directional variations of radiance, while
Copyright © 2005 by the Association for Computing Machinery, Inc. most previous work studied only one of these two components.
Permission to make digital or hard copies of part or all of this work for personal or classroom
use is granted without fee provided that copies are not made or distributed for commercial Local surface interactions. We describe the frequency effects of
advantage and that copies bear this notice and the full citation on the first page. Copyrights local shading, object curvature, and spatially-varying BRDF.
for components of this work owned by others than ACM must be honored. Abstracting with
credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to
Global light-transport. We provide expressions for the frequency
lists, requires prior specific permission and/or a fee. Request permissions from Permissions modification due to light transport in free space and occlusion.
Dept, ACM Inc., fax +1 (212) 869-0481 or e-mail [email protected].
Most of the derivations in this paper are carried out in 2D for clarity,
© 2005 ACM 0730-0301/05/0700- 1115 $5.00
but we show that the main characterizations extend to 3D.
1115
1.2 Related work ℓR 2D light field (2D) around ray R
x spatial dimension (distance to central ray)
Radiance exhibits both spatial and angular variations. A wealth of v directional dimension in 2-plane parameterization
previous work has studied the frequency content along one of these θ directional dimension in plane-sphere paramerization
components, but rarely have both space and angle been addressed. bf Fourier transform of function f
We do not discuss all applications of Fourier analysis, but rather ΩX frequency along dimension X
√
focus on studies of frequency modification in light transport. i −1
Filtering and sampling Heckbert’s seminal work on texture an- f ⊗g convolution of f by g
tialiasing [1989] derives local bandwidth for texture pre-filtering d Transport distance
based on a first-order Taylor expansion of the perspective transform. V(x, v) visibility function of the blockers
The effect of perspective is also studied in the contexts of hologra- cos+ (θ) clamped cosine term: max(cos θ, 0)
phy and light field sampling [Halle 1994; Isaksen et al. 2000; Chai dE Differential irradiance (after cosine term).
et al. 2000; Stewart et al. 2003], mostly ignoring visibility and spec- ρ BRDF
ular effects.
Local illumination as a convolution Recently, local illumina- Figure 2: Notations.
tion has been characterized in terms of convolution and it was
shown that the outgoing radiance is band-limited by the BRDF virtual plane at virtual
plane distance 1 plane
[Ramamoorthi and Hanrahan 2001b; Ramamoorthi and Hanrahan
2004; Basri and Jacobs 2003]. However the lighting is assumed to v θ x v
come from infinity and occlusion is ignored. Frolova et al. [2004]
x x R’ d
central central
explored spatial lighting variations, but only for convex diffuse ob- ray R ray R R
jects. We build on these approaches and extend them by adding x’= v
x-vd
spatial dimensions as well as other phenomena such as occlusion
and transport, at the expense of first-order approximations and a lo- (a) (b) (c)
cal treatment. Ramamoorthi et al. [2004] have also studied local
occlusion in a textured object made of pits such as a sponge. Our Figure 3: (a-b) The two light field parameterization used in this
treatment of occlusion considers complex blockers at an arbitrary article. Locally, they are mostly equivalent: we linearize v = tan θ.
distance of the blocker and receiver. (c) Transport in free space: the angular dimension v is not affected
Wavelets and frequency bases Wavelets and spherical harmon- but the spatial dimension is reparameterized depending on v.
ics have been used extensively as basis functions for lighting sim-
ulation [Gortler et al. 1993; Keller 2001] or pre-computed radi- 2 Preliminaries
ance transfer [Sloan et al. 2002; Ramamoorthi and Hanrahan 2002].
We want to analyze the radiance function in the neighborhood of a
They are typically used in a data-driven manner and in the context
ray along all steps of light propagation. For this, we need a number
of projection methods, where an oracle helps in the selection of the
of definitions and notations, summarized in Fig. 2. Most of the
relevant components based on the local frequency characteristics of
derivations in this paper are carried out in 2D for clarity, but we
radiance. Refinement criteria for multiresolution calculations often
shall see that our main observations extend naturally to 3D.
implicitly rely on frequency decomposition [Sillion and Drettakis
1995]. In our framework we study the frequency effect of the equa-
tions of light transport in the spirit of linear systems, and obtain 2.1 Local light field and frequency content
a more explicit characterization of frequency effects. Our results We consider the 4D (resp. 2D) slice of radiance at a virtual plane
on the required sampling rate can therefore be used with stochastic orthogonal to a central ray. We focus on the neighborhood of the
methods or to analyze the well-posedness of inverse problems. central ray, and we call radiance in such a 4D (resp. 2D) neighbor-
Ray footprint A number of techniques use notions related to band- hood slice a local light field (Fig. 3 left). Of the many parameteri-
width in a ray’s neighborhood and propagate a footprint for adaptive zations that have been proposed for light fields, we use two distinct
refinement [Shinya et al. 1987] and texture filtering [Igehy 1999]. ones in this paper, each allowing for a natural expression of some
Chen and Arvo use perturbation theory to exploit ray coherence transport phenomena. Both use the same parameter for the spatial
[2000]. Authors have also exploited on-the-fly the frequency con- coordinates in the virtual plane, x, but they differ slightly in their
tent of the image to make better use of rays [Bolin and Meyer 1998; treatment of directions. For our two-plane parameterization, we
Myszkowski 1998; Keller 2001]. Our work is complementary and follow Chai et al. [2000] and use the intersection v with a parallel
provides a framework for frequency-content prediction. plane at unit distance, expressed in the local frame of x (Fig. 3-a).
Illumination differentials have been used to derive error bounds In the plane-sphere parameterization, we use the angle θ with the
on radiance variations (e.g. gradients [Ward and Heckbert central direction (Fig. 3-b) [Camahort et al. 1998]. These two pa-
1992; Annen et al. 2004], Jacobians [Arvo 1994], and Hessians rameterizations are linked by v = tan θ and are equivalent around
[Holzschuch and Sillion 1998], but only provide local information, the origin thanks to a linearization of the tangent.
which cannot easily be used for sampling control. We study the Fourier spectrum of the radiance field ℓR , which
Fourier analysis has also been extensively used in optics [Good- we denote by b ℓR . For the two-plane parameterization, we use the
man 1996], but in the context of wave optics where phase and inter- following definition of the Fourier transform:
ferences are crucial. In contrast, we consider geometric optics and Z ∞ Z ∞
characterize frequency content in the visible spatial frequencies. bℓR (Ω x , Ωv ) = ℓR (x, v)e−2iπΩx x e−2iπΩv v d x dv (1)
x=−∞ v=−∞
The varying contrast sensitivity of humans to these spatial frequen-
cies can be exploited for efficient rendering, e.g. [Bolin and Meyer Examples are shown for two simple light sources in Fig. 4, with
1995; Ferwerda et al. 1997; Bolin and Meyer 1998; Myszkowski the spatial dimension along the horizontal axis and the direction
1998]. Finally we note that the Fourier basis can separate different along the vertical axis. We discuss the plane-sphere parameteriza-
phenomena and thus facilitate inverse lighting [Ramamoorthi and tion in Section 4.
Hanrahan 2001b; Basri and Jacobs 2003] depth from focus [Pent- One of the motivations for using Fourier analysis is the
land 1987] and shape from texture [Malik and Rosenholtz 1997]. convolution-multiplication theorem, which states that a convolution
1116
1 1
Ray space Frequency space Ray space Frequency space 3

0.5 0.5
Ωv (dir.)
Ωv (dir.)
v (dir.)
v (dir.)
0
5
-5 0
0 0
x (space) Ωx (space) x (space) Ωx (space)
-10 -5 0 5 10 -10 -5 0 5 10 -5 0 5
(a) g(v) (b) b

g(Ωv ) b u , Ωv )
(c) G(Ω
Figure 4: (a) A point light source is a Dirac in space times a constant Figure 5: Unit area in lightfield parameterisation.
in angle. (b) Its Fourier transform is a constant in space times a
Dirac in angle. (c) A spot light with a finite-size bulb has a smooth where V(x, v) is equal to 1 when there is full visibility and to 0
falloff in angle. (d) Its Fourier transform is a sinc times a bell curve. when the occluders are blocking light transport. At the location of
occlusion, V mostly depends on x (Fig. 6, step 3).
in the primary domain corresponds to a multiplication in the Fourier According to the multiplication theorem, such a multiplication
domain, and vice-versa. As we show in this paper, it affords a com- amounts to a convolution in the frequency domain:
pact formulation of frequency modification.
b
ℓR′ (Ω x , Ωv ) = b
ℓR (Ω x , Ωv ) ⊗ b
V(Ω x , Ωv ) (5)
2.2 Overview
When light flows in a scene, phenomena such as transport in free If the occluders are lying inside a plane orthogonal to the ray, the
space, occlusion, and shading each modify the local light field in occlusion function is a constant in angle, and its Fourier transform
a characteristic fashion. These operations are described (in 2D) in is a Dirac in the angular dimension. In the general case of non-
Section 3 as filters operating on the frequency signal b
ℓ. In Section planar occluders, their spectrum has frequency content in both di-
4, we describe the general case of local shading and extend the mensions, but the angular frequency content is restricted to a wedge
presentation to 3D in Section 5. Section 6 compares our framework with a span proportional to the depth extent of the blockers. Our
with previous work and shows a simple application. formulation also handles semi-transparent blockers by using non-
binary occlusion functions.
3 Transport phenomena as linear filters After occlusion, another transport step usually occurs, shearing
the spectrum and propagating the occlusion from the spatial dimen-
This section describes the effect on frequency content of successive sion to the angular dimension (Fig. 6, step 4).
stages of light transport. All phenomena are illustrated in Fig. 6
for a simple 2D scene, where light is emitted at the source, trans-
ported in free space, occluded by obstacles, transported again, and 3.3 Local diffuse shading
reflected by a surface. At each step, we show a space-direction We first treat local shading by a planar Lambertian reflector. Curved
plot of radiance, in primal and frequency space, as well as space- and glossy reflectors will be treated in greater details in Section 4.
direction frequency-domain plots obtained in a similar 3D scene For a diffuse reflector with albedo ρ, the outgoing radiance ℓo has
(Fig. 9-b). Note the excellent qualitative agreement between the no directional variation; it is simply the integral over all directions
2D predictions and 3D observations. of the incoming radiance per unit area on the receiver surface:
Z
3.1 Travel in free space
ℓo (x) = ρ ℓi (x, v) dA⊥ (6)
Travel in free space is a crucial operation because the directional Ω
variation also turns into spatial variation. Consider a slide projector:
At the source, we have a Dirac in space and the image in the direc- dA⊥ , the differential area on the receiver surface is cos+ θ times the
tional domain. At the receiver, the signal of the image is present in Jacobian of the lightfield parameterization; with v = tan θ, we have:
a combination of space and angle. When light travels in free space,
the value of radiance is unchanged along a ray, and travel in free dv
dA⊥ = 3
= g(v) dv
space is a reparameterization of the local light field (Fig. 3-c). The (1 + v2 ) 2
value of radiance at a point x after transport can be found using:
We introduce dE(x, v) = ℓi (x, v)g(v), the differential irradiance at
ℓR (x, v) = ℓR′ (x − vd, v), (2) point x from the direction v. Since dE is a product of two functions,
its spectrum is the convolution of their spectra:
where d is the travel distance. To compute the Fourier transform b ℓR ,
we insert the change of variable x′ = x − vd in the integral of Eq. 1: c x , Ωv ) = b
dE(Ω ℓi (Ω x , Ωv ) ⊗ b
g(Ωv )
R∞ R∞
b ′
ℓR (Ω x , Ωv ) = R x′ =−∞ Rv=−∞ ℓR (x′ , v)e−2iπΩx (x +vd) e−2iπΩv v d x′ dv
∞ ∞ ′ The reflected radiance ℓo is the integral of dE over all directions v; it
= ℓ (x′ , v)e−2iπΩx x e−2iπ(Ωv +dΩx )v d x′ dv,
x′ =−∞ v=−∞ R c at Ωv = 0, that is, ℓbo (Ωt ) = ρdE(Ω
c t , 0).
is therefore the value of dE
This is a shear in the directional dimension (Fig. 6, steps 2 and 4): Putting everything together, we have:
h i
b b
ℓo (Ω x ) = ρ b
ℓi (Ω x , Ωv ) ⊗ b
ℓR′ (Ω x , Ωv ) = b
ℓR (Ω x , Ωv + dΩ x ) (3) g(Ωv )
Ωv =0
(7)
The longer the travel, the more pronounced the shear. g(v) is a bell-like curve (Fig. 5-a); its Fourier transform is:
3.2 Visibility b
g(Ωv ) = 4π|Ωv |K1 (2π|Ωv |)
Occlusion creates high frequencies and discontinuities in the radi-
ance function. Radiance is multiplied by the binary occlusion func- where K1 is the first-order modified Bessel function of the second
tion of the occluders: kind. bg is highly concentrated on low frequencies (Fig. 5-b); the
effect of convolution by b
g is a very small blur of the spectrum in the
ℓR′ (x, v) = ℓR (x, v) V(x, v) (4) angular dimension (Fig. 6, step 5).
1117
Step Light field Fourier transform 3D Version
X - Theta spectrum
1. Emission L 2 20 30
Emitter 4
3.5
20
0.6
Ωv (direction)
1 10 3
v (direction)
2.5
10
2
0.3 1.5
0 0 0 1
0.5
0 0
-10
-1 -10
Occluders -20
-2 -20 -30
-4L -3L -2L -L 0 L 2L 3L 4L -10/L -5/L -1/L1/L 5/L 10/L -30 -20 -10 0 10 20 30
Receiver x (space) Ωx (space)
X - Theta spectrum
2. Transport 2 20 30
4
20 3.5
0.6
Ωv (direction)
1 10 3
v (direction)
2.5
10
2
0.3 1.5
0 0 0 1
0.5
0 0
-10
-1 -10
-20
-2 -20 -30
-4L -3L -2L -L 0 L 2L 3L 4L -10/L -5/L -1/L1/L 5/L 10/L -30 -20 -10 0 10 20 30
x (space) Ωx (space)
× = ⊗ = X - Theta spectrum
3. Visibility 2 20 30
4
20 3.5
0.6
Ωv (direction)
1 10 3
v (direction)
2.5
10
2
0.3 1.5
0 0 0 1
0 0.5
0
-10
-1 -10
-20
-2 -20 -30
-4L -3L -2L -L 0 L 2L 3L 4L -10/L -5/L -1/L1/L 5/L 10/L -30 -20 -10 0 10 20 30
X - Theta spectrum
4. Transport 2 20 30
4
20 3.5
0.6
Ωv (direction)
1 10 3
v (direction)
2.5
10
2
0.3 1.5
0 0 0 1
0.5
0 0
-10
-1 -10
-20
-2 -20 -30
-4L -3L -2L -L 0 L 2L 3L 4L -10/L -5/L -1/L1/L 5/L 10/L -30 -20 -10 0 10 20 30
2 20
5. cos+θ
1 0.6
Ωv (direction)
v (direction)
10
× = 0.3
0 ⊗ = 0
0
-1 -10
-2 -20
-4L -3L -2L -L 0 L 2L 3L 4L -10/L -5/L -1/L1/L 5/L 10/L
6. Diffuse BRDF 2 20
X - Theta spectrum
1 0.6
Ωv (direction)
10
v (direction)
30
0.02
0 0.3 20
0 0.015
10
0 0.01
-1 -10 0 0.005
0
-10
-2 -20
-4L -3L -2L -L 0 L 2L 3L 4L -10/L -5/L -1/L1/L 5/L 10/L
-20
-30
-30 -20 -10 0 10 20 30
0.2 0.08
Radiosity Fourier spectrum
X - Y spectrum
30
0.02
20
0.015
0.1 0.04 10
0.01
0 0.005
0
-10
-20
0 0
-4L -3L -2L -L 0 L 2L 3L 4L -10/L -5/L -1/L 1/L 5/L 10/L -30
x (space) Ωx (space) -30 -20 -10 0 10 20 30
Figure 6: Effects on the spectrum of the various steps of light transport with a diffuse reflector. 2D Fourier transforms for steps 1 to 4 are
obtained analytically; step 5 (convolution) is performed numerically. 3D Version spectrums are obtained numerically, via a Photon-Mapping
algorithm and a FFT of the light field computed.
1118
20 20
Ramamoorthi and Hanrahan [2001b] and extends it in several ways,
0.6 0.6
Ωv (direction)
Ωv (direction)
10 10
0.3 0.3
which we discuss in Section 6.1.
0 0
0 0 In a nutshell, local reflection simply corresponds to a multiplica-
-10 -10 tion by the cosine term and a convolution by the BRDF. However,
-20 -20 a number of reparameterizations are necessary to take into account
-10/L -5/L -1/L1/L 5/L 10/L -10/L -5/L -1/L1/L 5/L 10/L
Ωx (space) Ωx (space) the incidence and outgoing angles, as well as the surface curvature.
We first treat the special case of rotation-invariant BRDFs such as
Figure 7: Scene configuration Phong before addressing more general forms as well as texture and
Emitter for visibility experiment. Left: spatially-varying BRDFs. Recall that we study frequency content in
spectrum with only one occluder. ray neighborhoods, which means that for local reflection, we con-
First occluder
Right: spectrum with two occlud- sider an incoming neighborhood and an outgoing neighborhood.
Second occluder Plane-sphere parameterization Since local reflection mostly in-
ers, computed with full precision
Receiver and phase. volves integrals over the directional dimension, it is more naturally
expressed in a parameterization where angles are uniform. This is
3.4 Example and discussion why we use here a plane-sphere parameterization where the direc-
tional component θ is the angle to the central ray (Fig. 3-b). The
Fig. 6 illustrates the various steps of light transport for a simple spatial dimension is unaffected.
scene such as Fig. 9b. The slopes of the transport shears correspond In the plane-sphere parameterization, the domain of directions is
to the travel distance (steps 2 and 4). Visibility increases the spatial the S 1 circle, which means that frequency content along this dimen-
frequency content through the convolution by a horizontal kernel in sion is now a Fourier series, not a transform. Fig. 8 shows the effect
frequency space (step 3). There are only a finite number of blockers of reparameterizing angles on the frequency plane. The frequency
in Fig. 6, which explains why their spectrum is not a Dirac comb distribution is very similar, although the spectrum is blurred by the
times a sinc, but a blurry version. The blocker spectrum mostly non-linear reparameterization. For bandwidth analysis, this intro-
contains a main central lobe corresponding to the average occlusion duces no significant error. Note that for all local interactions with
and two side lobes corresponding to the blocker main frequency. the surface (and thus in this entire section), there is no limitation to
This results in a replication of the sheared source spectrum on the small values of θ, the linearization v = tan θ ≈ θ will only be used
two sides. The smaller the blocker pattern, the further-away these again after light leaves the surface, for subsequent transport.
replicas are in frequency space. The final diffuse integration (step
6) discards all directional frequencies.
4.1 Rotation-invariant BRDFs on curved receivers
The main differences between the 3D and 2D plots of the spectra
in Fig. 6 come from aliasing problems that are harder to fix with Local shading is described by the shading equation
the 4D light field.Furthermore, in the 3D scene, the position of the Z
blockers is jittered (see Fig. 9), which results in a smoother spec- ℓo (xi , θo ) = ℓ(xi , θi ) ρ xi (θo′ , θi′ ) cos+ θi′ dθi (8)
trum. θi
Feature-based visibility The spectra in Fig. 6 show that the second
transport (step 4) pushes the “replicas” to the angular domain. This where the primed angles are in the local frame of the normal while
effect is more pronounced for high-frequency blockers, for which the unprimed angles are in the global frame (Fig. 10). For now, we
the replicas are farther from the vertical line. Since the final diffuse assume that the BRDF ρ does not vary with xi . Local shading is
integration keeps only the spatial line of frequencies (step 5), the mostly a directional phenomenon with no spatial interaction: the
main high-frequency lobe of the blockers is eliminated by diffuse outgoing radiance at a point is only determined by the incoming
shading. This is related to the feature-based approach to visibility radiance at that point. However, the normal varies per point.
[Sillion and Drettakis 1995], where the effect of small occluders As pointed out by Ramamoorthi and Hanrahan [2001b], local re-
on soft shadows is approximated by an average occlusion. How- flection combines quantities that are naturally expressed in a global
ever, our finding goes one step further: where the feature-based frame (incoming and outgoing radiance) and quantities that live in
technique ignores high frequencies, we show that, for small-enough the local frame defined by the normal at a point (cosine term and
blockers, most high-frequencies are effectively removed by integra- BRDF). For this, we need to rotate all quantities at each spatial
tion. location to align them with the normal. This means that we ro-
Combining several blockers A difficult scene for visibility is the tate (reparameterize) the incoming radiance, perform local shading
case of two occluders that individually block half of the light, and in the local frame, and rotate (reparameterize ) again to obtain the
together block all the light (Fig. 7). In our framework, if one carries outgoing radiance in a global frame. All steps of the local shading
out the computations with full precision, taking phase into account, process are illustrated in Fig. 10 and discussed below.
one gets the correct result: an empty spectrum (Fig. 7, right).
However, for practical applications, it is probably not necessary Step 1 & 7: Reparameterization into the tangent frame We
to compute the full spectrum. Instead, we consider elements of first take the central incidence angle θ0 into account, and reparam-
information about the maximal frequency caused by the scene con- eterize in the local tangent frame with respect to the central normal
figuration, as we show in Section 6.2. In that case, one can get direction. This involves a shift by θ0 in angle and a scale in space
an overestimation of the frequencies caused by a combination of 20 20
blockers, but not an underestimation. 0.4 0.4
Ωθ (direction)
Ωv (direction)
10 10
0.2 0.2
0 0
0 0
4 General case for surface interaction -10 -10
-20 -20
So far, we have studied only diffuse shading for a central ray normal -10/L -5/L -1/L1/L 5/L 10/L -10/L -5/L -1/L1/L 5/L 10/L
to a planar receiver (although rays in the neighborood have a non- Ωx (space) Ωx (space)
normal incident angle). We now discuss the general case, taking Figure 8: Spectrum arriving at the receiver (step 4 of Fig. 6), be-
into account the incidence angle, arbitrary BRDF, receiver curva- fore and after the sphere-plane reparameterization. Left: (Ω x , Ωv )
ture as well as spatial albedo variation. Our framework builds upon spectrum. Right: (Ω x , Ωθ ) spectrum.
1119
X - Theta spectrum of incident light on receiver X - Theta spectrum of incident light on receiver X - Theta spectrum of incident light on receiver
30 30 30
2 2 2
20 1.5 20 1.5 20 1.5
10 1 10 1 10 1
0 0.5 0 0.5 0 0.5
-10 0 -10 0 -10 0
-20 -20 -20
-30 -30 -30
-30 -20 -10 0 10 20 30 -30 -20 -10 0 10 20 30 -30 -20 -10 0 10 20 30
(a) (b) (c)

Figure 9: Complex frequency effects in light transport. The three scenes have the same area light and diffuse receiver and differ only by
the frequency content of the blockers. (a) Large blockers result in few high frequencies. (b) With smaller (higher frequency) blockers, high
frequencies increase on the receiver. (c) For very high-frequency blockers, high frequencies on the receiver nearly disappear.
by 1/ cos θ0 . We also flip the directions so that incident rays are Step 4: Mirror-direction reparameterization Common
pointing up and match the traditional local reflection configuration BRDFs mostly depend on the difference between the mirror
(Fig. 10), step 1). We omit the full derivation for brevity and pro- reflection and the outgoing direction. This is why we remap the
vide directly the equations corresponding to step 1 and 7 of Fig. 10: local incoming light using a mirror reflection around the normal
(Fig. 10 step 4): dE r′ (θr′ ) = dE ′ (−θi′ ).
bℓi (Ω x , Ωθ ) = e−iΩθ θ0 /| cos θ0 | bℓ (−Ω x cos θ0 , Ωθ ) (9) dr′ (Ω x , Ω′θ ) = dE
dE d′ (Ω x , −Ω′θ ) (14)
ℓb′′ (Ω x , Ωθ ) = eiΩθ θ1 | cos θ1 | ℓbo (Ω x / cos θ1 , Ωθ ) (10)
This is equivalent to reparameterizations of surface light fields
and BRDFs [Wood et al. 2000; Ramamoorthi and Hanrahan 2002].
Step 2: Per-point rotation The directional slice corresponding
to each point must be shifted to rotate it in the local frame of the Step 5: BRDF convolution In the mirror parameterization, as-
normal at that point (Fig. 10 step 2): θi′ = θi − α(xi ). suming that the BRDF depends only on the angle difference, the
For a smooth surface, we use a first-order Taylor expansion of shading equation 8 becomes:
the angle α of the normal at a point xi . Given the curvature k, we Z
have α(xi ) = kxi and the reparameterization is θi′ = θi − k xi . This is ℓo′ (xi , θo′ ) = dEr′ (xi , θr′ ) ρ′ (θo − θr′ ) dθr′ (15)
a shear, but now along the directional dimension, in contrast to the θr′
transport shear. Similarly, the Fourier transform is sheared along Which is a convolution of dE r′ by ρ′ for each xi : that is, we convolve
the spatial dimension (Fig. 10 step 2, last row): the 2D function dE r′ by a spatial Dirac times the directional shift-
invariant BRDF ρ′ (Fig. 10 step 5). In the Fourier domain, this is a
ℓbi′ (Ω′x , Ω′θ ) = b
ℓi (Ω′x + kΩ′θ , Ω′θ ) (11) multiplication by a spatial constant times the directional spectrum
of the BRDF.
After this reparameterization, our two-dimensional spatio-
directional local light field is harder to interpret physically. For dr′ (Ω′x , Ω′θ ) ρb′ (Ω′θ )
ℓbo′ (Ω′x , Ω′θ ) = dE (16)
each column, it corresponds to the incoming radiance in the frame
of the local normal: the frame varies for each point. In a sense, we Note, however, that our expression of the BRDF is not recipro-
have unrolled the local surface and warped the space of light ray in cal. We address more general forms of BRDF below.
the process [Wood et al. 2000]. The direction of the shear depends
on the sign of the curvature (concave vs. convex). Step 6: Per-point rotation back to tangent frame We now
apply the inverse directional shear to go back to the global frame.
Step 3: Cosine term and differential irradiance In the local Because we have applied a mirror transform in step 4, the shear and
frame of each point, we compute differential irradiance by multi- inverse shear double their effect rather than canceling each other.
plying by the spatially-constant clamped cosine function cos+ . This Since the shear comes from the object curvature, this models the
multiplication corresponds in frequency space to a convolution by effect of concave and convex mirror and how they deform reflection.
a Dirac in space times a narrow function in angle: In particular, a mirror sphere maps the full 360 degree field to the
180 degree hemisphere, as exploited for light probes.
d′ (Ω x , Ωθ ) = ℓb′ (Ω x , Ωθ ) ⊗ cos
dE d+ (Ωθ )δΩx =0 (12)
i
4.2 Discussion
Over the full directional domain, the spectrum of cos+ is: The important effects due to curvature, cosine term, and the BRDF
are summarized in Fig. 10. Local shading is mostly a directional
2 phenomenon, and the spatial component is a double-shear due to
d+ (Ωθ ) = cos π2 Ωθ
cos (13)
1 − (2πΩθ )2 curvature (step 2 and 6). The cosine term results, in frequency
Most of the energy is centered around zero (Fig. 12-a) and the 1/Ω 2θ space, in a convolution by a small directional kernel (step 3) while
frequency falloff comes from the derivative discontinuity1 at π/2. the BRDF band-limits the signal with a multiplication of the spec-
Equivalent to the two-plane reparameterization (Section 3.3), the trum (step 5). Rougher materials operate a more aggressive low-
cosine term has only a small vertical blurring effect. pass, while in the special case of mirror BRDFs, the BRDF is a
Dirac and the signal is unchanged.
1 A function with a discontinuity in the nth derivative has a spectrum Curvature has no effect on the directional bandwidth of the out-
falling off as 1/Ωn+1 . A Dirac has constant spectrum. going light field, which means that previous bounds derived in the
1120
1 2 3 4 5 6 7
cosine term shading inverse per point
reparameterization per point rotation mirror reparameterization
to tangent frame to local frame (differential irradiance) reparameterization rotation back to outgoing light field
(integral with BRDF) to tangent frame
incoming N N N N outgoing
light field
θ0 θi Ro light field
Scene
α(xi)
Ri θ’i θ’i θ’o θo θ1 θ“
x θ θ’r θ’o-θ’r
tangent
plane xi
x“
Ray space
mirror
shift by θ1
shift by θ0
scale by 1/cos θ0
= = scale by cos θ1
ℓ’i ℓo
ℓi cos+ dE’ dE’r ρ' ℓ’o ℓ’’o
Fourier
scale by cos θ0 = = scale by 1/cos θ1
ℓi ℓ’i cos+ dE’ dE’r ρ' ℓ’o ℓo ℓ’’o
Figure 10: Local shading for a curved receiver

2
with arbitrary BRDF. 2
^ +
cos cos+
1incoming 2curvature 3 transport
Fourier domain
Alu.
1 2 concave spectrum shear shear
1.5 1.5 Silver
Copper
incoming mirror Plastic
1 1
beam
of light 3 0.5 0.5
focal
point 0 0
-3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3
Figure 11: Caustic due to negative curvature. A shear in angle and d+

(a) cos d
(b) cos + f
then in space are combined to result in a transfer from the direc-
tional frequencies to the spatial frequencies. Figure 12: Spectrum of the clamped cosine in the sphere-plane pa-
rameterization. Spectrum of cos+ f (cosine and Fresnel terms) for
special case of inifinite lighting [Ramamoorthi and Hanrahan 2004; different materials.
Basri and Jacobs 2003; Ramamoorthi and Hanrahan 2001a; Ra-
mamoorthi and Hanrahan 2002] are valid for spatially-varying il- the shading remains the same with a convolution by ρ′ . Combining
lumination. However, the spatial frequency content is strongly af- the multiplication by f with the mirror reparameterization of step 4
fected by curvature, which has important practical implications. and the convolution by ρ′ of step 5, we obtain in frequency space a
The effect of the curvature shear is further increased by the spa- convolution followed by a multiplication:
tial scaling back to the tangent frame in step 7, as described by
Eq. 10. We stress that this explains the well-known difficulty in
sampling specular lighting in situations such as backlighting on the ℓbo′ (Ω′x , Ω′θ ) = dEdr′ (Ω′x , Ω′θ ) ⊗ b
f (−Ω′θ )δΩx =0 ρb′ (Ω′θ )
silhouette of a curved object. This is modeled by the effect of the (17)
curvature shear, the BRDF bandwidth, and the angular scale due to
rotation into the tangent frame. Fig. 12-b shows the spectra of the cosine term cos+ multiplied
by the Fresnel term for typical materials; it contains mostly low fre-
A case study: simple caustics Caustics are an example of the quencies. Other approximations with separable functions depend-
interaction between spatial and angular aspects of light transport. ing on θo′ are equally easy, just reversing the order of the multipli-
We illustrate this effect with a simple case similar to a solar oven cation and convolution. BRDFs are often approximated by sums
(Fig 11). A parallel beam of light hits a surface of negative cur- of separable terms, which can be handled easily in our framework
vature with a mirror (Dirac) BRDF and converges toward a focal because the Fourier transform is linear.
point. This is modeled in our framework by an incoming spectrum
that has energy only in the angular domain. The shear due to curva-
ture followed by the shear due to transport result in a signal where 4.4 Texture mapping
the energy is concentrated in space: it is a Dirac at the focal point. When the result of shading is modulated by a texture T (x), this
multiplication corresponds to a convolution in the Fourier domain:
4.3 Rotation-varying BRDFs
Not all BRDFs can be simplified into a term that depends only on b b(Ω x )δΩθ =0 ⊗ ℓbo (Ω x , Ωθ )
ℓT (Ω x , Ωθ ) = T (18)
the difference to the mirror direction. For example, the Fresnel term
depends on the incoming angle. We now derive the effect of shading Since the texture has only spatial components, its spectrum is
by a BRDF that is factored into separable terms that depend on the restricted to the line of spatial frequencies. This means that texture
incoming angle θi′ and the difference between the outgoing angle θo′ mapping only affects frequencies along the spatial dimension.
and the mirror direction θr′ [Ramamoorthi and Hanrahan 2002], that
is, ρ(θi′ , θo′ ) = f (θi′ ) ρ′ (θo − θr′ ).
4.5 Spatially-varying BRDFs
Since the term f does not depend on the outgoing angle, it can
be applied in the same way as the cos+ term, using a multiplication We now extend our model to include spatially-varying BRDFs and
that corresponds to a convolution in frequency space; the rest of revisit step 5 (shading). For each point, shading is still a convolution
1121
step 1, 2, 3, 4 step 5
rotation around central Spherical coordinates We use the spherical coordinates θ, ϕ
the x axis by α local normal where θ, in [−π, π], is the azimuth and ϕ, in [−π/2, π/2], the co-
z
normal Mirror
incoming direction latitude. The distortion of this parameterization is cos ϕ, which
light field (wrt local N) means that one must remain around the equator to avoid distortion.
tor
equa In this neighborhood, the parameterization is essentially Euclidean,
ϕ' ϕ θ' r y to a first-order approximation.
equato
θ Local reflection is challenging because it involves four neigh-
x to
r
ua outgoing borhoods of direction around: the incoming direction, the normal,
eq direction the mirror direction, and the outgoing direction; in general, we can-
(a) (b) (c)
not choose a spherical parameterization where they all lie near the
equator. Fortunately, we only need to consider two of these neigh-
Figure 13: 3D direction parameterizations. borhoods at a time (Fig. 4).
For this, we exploit the fact that a rotation around an axis on the
over the directional domain, but the kernel varies spatially. equator can be approximated to first order by a Euclidean rotation
To model this effect, we exploit the fact that a 2D Fourier trans- of the (θ, ϕ) coordinates: (θ′ , ϕ′ ) = Rα (θ, ϕ)
form can be decomposed into two separable 1D transforms, the For brevity, we omit the comprehensive remapping formulas for
first one vertically, then horizontally. We consider the intermedi- 3D shading, but we describe the appropriate parameterization for
ate semi-Fourier space ℓ̊(x, Ωθ ) that represents for each location x each step as well as the major differences with the 2D case.
the 1D Fourier transform of the directional variation of incoming
light. The full Fourier space is then the 1D Fourier transform of the Tangent frame We start with a parameterization where the equa-
semi-Fourier transform along the x dimension. We have tor is in the incident plane, as defined by the central ray of the in-
cident light field and the central normal vector (Fig. 13-b). If the
˚ r′ (x, Ωθ ) ρ̊(x, Ωθ ),
ℓ˚o′ (x, Ωθ ) = dE light field has been properly rotated, only the x spatial dimension
undergoes the scaling by cos θ0 (Eq. 9)
which is a multiplication in the semi-Fourier domain, and therefore
a convolution along x only in full Fourier space:
Curvature In 2D, we approximated the angle with the local nor-
dr′ (Ω x , Ωθ ) ⊗ x b mal linearly by α(x) = kx; For a surface, the corresponding lin-
ℓb′ (Ω x , Ωθ ) = dE ρ(Ω x , Ωθ )
earization of the normal direction (θ N , ϕN ) involves a bilinear form
This means that in order to characterize the effect of spatially- [Do Carmo 1976]:
varying BRDFs, we consider the spectrum of ρ(x, θ). We then (θN , ϕN ) = M (x, y) (19)
take the spectrum of the incoming illumination b ℓ and convolve it
If x and y are aligned with the principal directions of the sur-
only horizontally along Ω x , not vertically. We call this a semi-
face, the matrix M is an anisotropic scaling where the scale fac-
convolution in Ω x , which we note ⊗ x .
tors are the two principal curvatures. The corresponding remapping
In the special case of non-varying BRDFs, the spectrum of
of (x, y, θ, ϕ) is a shear in 4D, with different amounts in the two
ρ(x, θ) is a Dirac times the directional spectrum of the BRDF. The
principal directions. As with the 2D case, the frequency content is
horizontal convolution is a multiplication. If the spectrum of ρ is
sheared along the spatial dimensions.
separable (texture mapping), then the spatially-varying BRDF case
is a multiplication followed by a convolution. The special case of a
a spatially-varying combination of BRDFs [Lensch et al. 2001] can Differential irradiance and cosine Step 3 is mostly un-
be handled more simply as the superposition of multiple BRDFs changed. Since we placed the equator along the incident plane,
with weights encoded as textures. the cosine term depends only on θ to a first approximation. The
spectrum is convolved with a small 1D kernel in θ (Fig. 12-a).
5 Extension to 3D Rotationally-symmetric BRDFs The mirror reparameteriza-
We now show how our framework extends to 3D scenes. tion of step 4 is unchanged, and the angles remain near the equator
since the equator also contains the normals. We express the con-
5.1 Light-field parameterization phenomena volution of the mirrored incoming light field by the BRDF in the
neighborhood of the outgoing direction. For this, we rotate our
The derivations presented in Section 3 involve a two-plane light- spherical coordinates so that the new equator contains both the mir-
field parameterization and extend directly to 3D. The only notable ror direction and the direction of the central outgoing ray (Fig. 13-
difference is the calculation of differential irradiance (Eq. 7), where c). Because all the angles are near the equator, the difference angles
the projected surface area in 3D becomes: between an outgoing ray and a mirrored ray can be approximated
du dv by θo′ − θr′ and ϕ′o − ϕ′r , and Eq. 16 applies.
dA⊥ = = G(u, v) du dv
(1 + v2 + u2 )2
Recap of rotations In summary, we first need to rotate the light-
Fig. 5-c presents the spectrum of G(u, v). field parameterization so that the central incidence plane is along
one of the axes before reparameterizing from two-plane to sphere-
5.2 Shading in plane-sphere parameterization plane (Fig. 13-b). We then need to rotate between the mirror repa-
rameterization and the BRDF convolution to place the central out-
The sphere S 2 of directions is unfortunately hard to parameterize, going direction on the equator (Fig. 13-c). Finally we rotate again
which prompted many authors to use spherical harmonics as the to put the outgoing plane defined by the normal and central outgo-
equivalent of Fourier basis on this domain. In contrast, we have ing direction in the equator (not shown).
chosen to represent directions using spherical coordinates and to
use traditional Fourier analysis, which is permitted by our restric-
tion to local neighborhoods of S 2 . This solution enables a more di- 5.3 Anisotropies in 3D
rect extension of our 2D results, and in particular it expresses well Our extension to 3D exploits the low distortion of spherical coordi-
the interaction between the spatial and angular components. nates near the equator, at the cost of additional reparameterization
1122
Ray space Fourier Spectrum formula lows the 2D derivation, with additional reparameterizations steps
Transport and anisotropies.
Travel shear shear b
ℓ(Ω x , Ωv + dΩ x ) In practice, the notion of locality is invoked for three different
Visibility multiplication convolution b
ℓ⊗bV reasons, whose importance depends on the context and application:
– the use of first-order Taylor series, for example for the curvature
Local geometric configuration or for the tan θ ≈ θ remapping,
e−iΩθ θ0 b
Light incidence scale spatial scale spatial | cos θ0 | ℓ (−Ω x cos θ0 , Ωθ ) – the principle of uncertainty, which states that low frequencies
Outgoing angle scale spatial scale spatial eiΩθ θ1
| cos θ1 | ℓbo ( cos
Ωx
θ , Ωθ ) cannot be measured on small windows (in which case big neigh-
1
Curvature shear shear b

ℓi (Ω′x − kΩ′θ , Ω′θ ) borhoods are desired), or in other words, that localization is not
possible in space and frequency at the same time,
Local shading – most real scenes are not stationary, that is, their properties such as
Cosine term multiplication convolution ℓbi′ ⊗ cos
d+ the presence and size of blockers vary spatially. Smaller neighbor-
BRDF convolution multiplication ρ′ dEdr′ hoods might mean more homogeneous properties and more locally-
Texture mapping multiplication convolution Tb⊗ b ℓ pertinent conclusions.

Separable BRDF multiplication convolution dr′ ⊗ b
dE f ρ′ We now discuss how our work extends previous frequency char-
then convolution then multiplication acterization, before illustrating how it can be applied, through a
Space-vary. BRDF semi-convolution semi convolution d′ ⊗ x b
dE ρ proof of concept in the context of ray-tracing.
(angles only) (spatial only)
6.1 Relation to previous work
Table 1: Summary of all phenomena
Light field sampling Our formulation of transport in free space is
similar to the derivations of light-field spectra [Isaksen et al. 2000;
to align the equator with the relevant neighborhoods. Fortunately, Chai et al. 2000], and the same relationship between slope and dis-
these reparameterization act locally like Euclidean rotations along tance is found. Our expression as a transport operator makes it
axes that preserve the space-angle separation. easier to extend these analyzes to arbitrary input signals, and in
Compared to 2D , the 3D case involves anisotropies both in particular to non-Lambertian objects and occlusion.
the directional and spatial components. The spatial scale to ac- Ray footprint Approaches based on ray differentials [Shinya et al.
count for the incident and exitant angle affects only one of the 1987; Igehy 1999; Chen and Arvo 2000] capture the shear trans-
spatial dimensions, along the corresponding plane normal to the forms due to transport and curvature, and our first-order Taylor ex-
tangent frame. Curvature is usually different along the two prin- pansion for curvature corresponds to the same differentials. The
cipal directions. The directional cosine term mostly depends on approach by Igehy [1999] only uses 2D derivatives by considering
θ, while rotationally-symmetric BRDFs only depend on the spher- only ray paths that converge to the viewpoint.
ical distance between mirror and outgoing directions and is more Signal processing framework for local reflection Our framework
influenced by θ except around the specular peak. These additional extends Ramamoorthi and Hanrahan’s signal processing framework
anisotropies make the 3D situation more complex, but locally they for local reflection [2004] with the following key differences:
correspond to linear transforms and preserve the distinction and in- – we take into account spatial variation of the incoming light and
teraction between spatial and directional effects derived in 2D. the curvature of the receiver,
Other local shading effects such as separable BRDFs, texture – however, we characterize reflection only for a ray neighborhood,
mapping, and spatially-varying BRDFs can be directly extended – they parameterize the outgoing radiance by by α and θ ′ o , while we
from Section 4. While the formulas are complex and are not de- use a more natural outgoing parameterization in the global frame,
rived in this paper, the qualititative effects and relevant parameters at the cost of reparameterization,
remain the same as in 2D. – as discussed above, our expression of the cosine term is a con-
volution in the frequency domain. This cleanly separates the com-
putation of incoming irradiance and BRDF convolution, at the cost
6 Discussion of additional steps. It also allows us to express the cosine term for
Table 1 summarizes the building blocks that compose our frequency BRDFs such as Phong.
analysis of light transport. This variety of phenomena can be char- On convolution It might come as a surprise that two phenomena
acterized using simple mathematical operators: scale, shear, con- that have been expressed by previous work as convolutions in the
volution and multiplication. Even spatially-varying BRDFs can be primary space, soft shadows [Soler and Sillion 1998] and the cosine
handled using a semi-convolution that occurs only along the spatial term [Ramamoorthi and Hanrahan 2004] correspond in our frame-
dimensions. work to convolutions in the frequency domain. We show here that
Some operations such as occlusion are simpler in the original our formulation in fact extends these previous work and that the
ray space, while others such as shading are more natural in fre- primary-space convolution is a special case. The key is that they
quency space. Our framework allows us to express them all in consider functions that do not vary in one of the domains (space
a unified way. As discussed above, the 3D case essentially fol- resp. direction). The corresponding spectra are therefore restricted
to a line, since the Fourier transform of a constant is a Dirac.
incoming curvature convolution diffuse Consider the cosine term for infinitely-distant light sources. The
light shear ˆ+
by cos integration lighting varies only in angle, and its spectrum is restricted to the
vertical line of directions (Fig. 14(a)). After the curvature shear, it is
a 1D function on the line kΩθ = Ω x (Fig. 14(b)), which we convolve
with the vertical kernel cos d+ . However, for each spatial frequency
Ω x , there is only one non-zero value of the sheared function. As
(a) (b) (c) (d) a result, this convolution is a so-called outer product of the two
one-dimensional functions, and the result is the pairwise product
Figure 14: Special case of diffuse shading for an infinite environ- d′ (Ω x , Ωθ ) = ℓi (0, Ω x /k)cos
dE d+ (Ω x /k−Ωθ ). (Fig. 14(c)). The diffuse
ment map. In this case, the convolution and multiplication are integration then restricts the function to the line Ωθ = 0, where
equivalent to a mutltiplication. d+ (Ω x /k). The convolution followed by
dE ′ (Ω x , 0) = ℓi′ (0, Ω x /k)cos
1123
Glossy reflection criterion
environment (BRDF, curvature, normal, distance)
N
θ1
curvature k d
V
scale space shear

k -k Harmonic average of blocker distance
cutoff Ωρ by 1/cosθ1 slope d
Fourier spectra
unoccluded
slope
occluder at 1/d’
distance d’
Figure 16: Criteria and sampling pattern used to render Fig. 17. The
incoming curvature mirror BRDF inverse exitant transport sampling adapts to curvature, the viewing angle, the BRDF as well
spectrum shear reparam. low pass shear angle to eye as the harmonic average of the distance to potential blockers.
(a) (b) (c) (d) (e) (f) (g)
Figure 15: Bandwidth derivation for our adaptive ray tracer. 2k

B=d Ωρ (20)
n.v
restriction to the horizontal line turned into a simple product of 1D This corresponds to the difficulty in appropriately sampling
functions, which corresponds to a convolution in the primary space. curved objects at grazing angles, as discussed in Section 4.2. In ad-
The case of soft shadows studied by Soler and Sillion [1998] dition, distant objects are minified and the apparent curvature is in-
is similar: the emitter is diffuse and has a spectrum restricted to creased. In 3D, the curvature and normal angle involve anisotropy,
Ωv = 0, and the blockers are planar and have the same restrictions. and in practice, we use the absolute value of the larger principal
The transport from source to occluders results in slanted lines in curvature. However, our implementation computes this criterion
frequency space that are convolved together (Fig. 6, step 3). Our directly in screen-space with finite-difference approximation to the
framework extends these two cases to arbitrary cases where the curvature. As a result, the effect of the normal angle, the distance,
spectra are not restricted to lines. and the anisotropies are included for free; see Fig. 16 for a vi-
sualization of this criterion. The faceted look is due to the finite-
difference curvature and Phong normal interpolation. The BRDF
6.2 Sampling rate for glossy rendering bandwidth for the two dinosaurs and environments were approxi-
We show how our framework can be used to drive image-space sam- mated manually based on the BRDF exponent.
pling in ray tracing. In particular, we illustrate how our framework
can be used to derive sampling rates for algorithms that do not need Occlusion The above derivation assumes full visibility. We inte-
to perform computations in the Fourier domain. While we demon- grate the effect of the blockers using a worst-case assumption on the
strate a working implementation, we emphasize that this application blocker spectrum, we consider that it has content at all frequency.
is meant only as a proof of concept and that further development is Based on the effect of the transport shear, we approximate the spec-
necessary to make it fully general, as we discuss below. trum due to a blocker at distance d ′ by a line of slope 1/d ′ . Going
We are motivated by the rendering of glossy materials, which de- through the same steps, we obtain an approximate bound of:
spite effective recent developments [Lawrence et al. 2004] remains !
1 1
computationally expensive. We observe, however, that glossy ob- B′ = d + 2k Ωρ (21)
jects appear blurry, so it should be possible to reduce the image n.v d′
sampling rate. Our framework permits a quantitative expression of To evaluate this criterion, we use the harmonic average of the
the required sampling rate. distance to occluders. This information is computed by sampling
a hundred rays for a small set of visible points in the image, in
Unoccluded environment map We first consider the case of practice 20,000. The criteria is reconstructed over the image us-
environment-map rendering without occlusion. The incoming light ing the same reconstruction algorithm as for the final image, which
field has only directional content (Fig. 15), and the light incidence we describe shortly. The blocker criterion is shown Fig. 15. It is
angle (Table 1 row 1) has no effect. The shear of curvature (b) similar to Ward et al.’s criterion for irradiance caching [Ward et al.
results in a line of slope k that gets convolved with the cosine nar- 1988], but expressing it in a unified frequency framework allows us
row kernel cos d+ , which we neglect. After mirror reparameteriza- to combine it with other bandwidth considerations such as BRDF
tion (c), a glossy BRDF band-limits this signal (d), which we ap- roughness.
proximate by a cutoff at an angular frequency Ωρ . This cutoff de-
pends on the BRDF of the object visible in a given region of the Algorithm and image reconstruction Our proof-of-concept
image. The upper endpoint of the resulting segment is at coordi- computes visibility using four samples per pixel, but uses
nate (−kΩρ , Ωρ ). We apply the inverse shear (step e) and the scale aggressively-sparse samples for shading: on average, 0.05 samples
by 1/ cos θ1 = 1/(n.v), where n is the normal and v the unit vec- per pixel. We use an edge-preserving reconstruction that exploits
tor towards the viewpoint. We obtain a maximum frequency of the higher-resolution depth and normal to reconstruct the shading,
(− cos2kθ1 Ωρ , Ωρ ) for the light leaving an object in the direction of the in the spirit of McCool’s filtering of Monte-Carlo ray tracing out-
puts [1999] but based on a bilateral filter [Tomasi and Manduchi
viewpoint (Fig. 15 step f). A transport shear with distance d yields
1998]. As demonstrated in Fig. 17, this results in a smooth recon-
a bound of − cos2kθ1 Ωρ , Ωρ − d cos2kθ1 Ωρ struction where needed and on sharp silhouettes. The spatial width
A view computation corresponds to the restriction of the function of the bilateral filter is scaled according to the bandwidth predic-
to the directional domain, and if we assume d >> 1, we obtain the tion. Given a bandwidth map, we use the blue-noise method by
following approximate bound on the directional bandwidth for a Ostromoukhov et al. [2004] to generate a set of image samples
region of the image: (Fig. 17, right). In summary, our algorithm is as follows:
1124
Uniform sampling Using our bandwidth prediction
Figure 17: Scene rendered without and with adaptive sampling rate based on our prediction of frequency content. Only 20,000 shading
samples were used to compute these 800 × 500 image. Note how our approach better captures the sharp detail in the shiny dinosaur’s head
and feet. The criteria and sampling are shown in Fig. 16. Images rendered using PBRT [Pharr and Humphreys 2004]
Compute visibility at full resolution theoretical justification, but also to allow extensions to more gen-
Use finite-differences for curvature criterion eral cases (such as from diffuse to glossy). Our preliminary study of
Compute harmonic blocker distance for sparse samples sampling rates in ray tracing is promising, and we want to develop
Perform bilateral reconstruction new algorithms and data structures to predict local bandwidth, es-
Compute B’ based on blocker and curvature
pecially for occlusion effects. Precomputed radiance transfer is an-
Generate blue noise sampling based on B’
Compute shading for samples
other direct application of our work.
Perform bilateral reconstruction Our analysis extends previous work in inverse rendering [Ra-
mamoorthi and Hanrahan 2001b; Basri and Jacobs 2003] and we
Observe how our sampling density is increased in areas of are working on applications to inverse rendering with close-range
high curvature, grazing angles, and near occluders. The environ- sources, shape from reflection, and depth from defocus.
ment map casts particularly soft shadows, and note how the high-
frequency detail on the nose of the foreground dinosaur is well cap-
tured, especially given that the shading sampling is equivalent to a Acknowledgments
200 × 100 resolution image. We thank Jaakko Lehtinen, the reviewers of the Artis and MIT
Although these results are encouraging, the approach needs im- teams, as well as the SIGGRAPH reviewers for insightful feedback.
provement in several areas. The visibility criterion in particular This work was supported by an NSF CAREER award 0447561
should take into account the light source intensity in a directional “Transient Signal Processing for Realistic Imagery,” an NSF CISE
neighborhood to better weight the inverse distances. Even so, the Research Infrastructure Award (EIA9802220), an ASEE National
simple method outlined above illustrates how knowledge of the Defense Science and Engineering Graduate fellowship, the Eu-
modifications of the frequency content through light transport can ropean Union IST-2001-34744 “RealReflect” project, an INRIA
be exploited to drive rendering algorithms. In particular, similar équipe associée, and the MIT-France program.
derivations are promising for precomputed radiance transfer [Sloan
et al. 2002] in order to relate spatial and angular sampling.
References
A, T., K, J., D, F.,  S, H.-P. 2004. Spherical
7 Conclusions and future work harmonic gradients for mid-range illumination. In Rendering
We have presented a comprehensive framework for the description Techniques 2004 (Proc. EG Symposium on Rendering 2004).
of radiance in frequency space, through operations of light trans-
port. By studying the local light field around the direction of prop- A, J. 1994. The irradiance Jacobian for partially occluded
agation, we can characterize the effect of travel in free space, oc- polyhedral sources. In Computer Graphics Proceedings, Annual
clusion, and reflection in terms of frequency content both in space Conference Series, ACM SIGGRAPH, 343–350.
and angle. In addition to the theoretical insight offered by our anal- B, R.,  J, D. 2003. Lambertian reflectance and linear
ysis, we have shown that practical conclusions can be drawn from a subspaces. IEEE Trans. Pattern Anal. Mach. Intell. 25, 2.
frequency analysis, without explicitly computing any Fourier trans-
forms, by driving the sampling density of a ray tracer according to B, B. G.,  M, N. L. 1993. Smooth transitions between
frequency predictions. bump rendering algorithms. In Computer Graphics Proceedings,
Future work On the theory side, we are working on the anal- Annual Conference Series, ACM SIGGRAPH, 183–190.
ysis of additional local shading effects such as refraction, bump-
mapping, and local shadowing [Ramamoorthi et al. 2004]. We hope B, M. R.,  M, G. W. 1995. A frequency based ray
to study the frequency cutoff for micro, meso, and macro-geometry tracer. In Computer Graphics Proceedings, Annual Conference
effects [Becker and Max 1993]. The study of participating media Series, ACM SIGGRAPH, 409–418.
is promising given the ability of Fourier analysis to model differ- B, M. R.,  M, G. W. 1998. A perceptually based adap-
ential equations. The spectral analysis of light interaction in a full tive sampling algorithm. In Computer Graphics Proceedings,
scene is another challenging topic. Finally, the addition of the time Annual Conference Series, ACM SIGGRAPH, 299–309.
dimension is a natural way to tackle effects such as motion blur.
We are excited by the wealth of potential applications encom- C, E., L, A.,  F, D. 1998. Uniformly sam-
passed by our framework. In rendering, we believe that many tra- pled light fields. In Rendering Techniques ’98 (Proc. of EG
ditional algorithms can be cast in this framework in order to derive Workshop on Rendering ’98), Eurographics, 117–130.
1125
C, J.-X., C, S.-C., S, H.-Y.,  T, X. 2000. Plenoptic O, V., D, C.,  J, P.-M. 2004. Fast
sampling. In Computer Graphics Proceedings, Annual Confer- hierarchical importance sampling with blue noise properties.
ence Series, ACM SIGGRAPH, 307–318. ACM Transactions on Graphics (Proc. SIGGRAPH 2004) 23, 3
(Aug.), 488–495.
C, M.,  A, J. 2000. Theory and application of specular
path perturbation. ACM Trans. Graph. 19, 4, 246–278. P, A. P. 1987. A new sense for depth of field. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 9, 4 (July).
D C, M. 1976. Differential Geometry of Curves and Sur-
faces. Prentice Hall. P, M.,  H, G. 2004. Physically Based Rendering:
From Theory to Implementation. Morgan Kaufmann.
F, J. A., S, P., P, S. N.,  G,
D. P. 1997. A model of visual masking for computer graphics. R, R.,  H, P. 2001. An efficient represen-
In Computer Graphics Proceedings, Annual Conference Series, tation for irradiance environment maps. In Computer Graphics
ACM SIGGRAPH, 143–152. Proceedings, Annual Conference Series, ACM SIGGRAPH.
F, D., S, D.,  B, R. 2004. Accuracy of spher- R, R.,  H, P. 2001. A signal-processing
ical harmonic approximations for images of Lambertian objects framework for inverse rendering. In Computer Graphics Pro-
under far and near lighting. In ECCV 2004, European Confer- ceedings, Annual Conference Series, ACM SIGGRAPH.
ence on Computer Vision, 574–587.
R, R.,  H, P. 2002. Frequency space envi-
G, J. W. 1996. Introduction To Fourier Optics. McGraw- ronment map rendering. ACM Transactions on Graphics (Proc.
Hill. SIGGRAPH 2002) 21, 3, 517–526.
G, S. J., S̈, P., C, M. F.,  H, P. 1993. R, R.,  H, P. 2004. A signal-processing
Wavelet radiosity. In Computer Graphics Proceedings, Annual framework for reflection. ACM Transactions on Graphics 23, 4.
Conference Series, ACM SIGGRAPH, 221–230. R, R., K, M.,  B, P. 2004. A
Fourier theory for cast shadows. In ECCV 2004, European Con-
H, M. 1994. Holographic stereograms as discrete imaging
ference on Computer Vision, 146–162.
systems. In SPIE Proc. Vol. 2176: Practical Holography VIII,
S. Benton, Ed., SPIE, 73–84. S, M., T, T.,  N, S. 1987. Principles and appli-
cations of pencil tracing. Computer Graphics (Proc. SIGGRAPH
H, P. 1989. Fundamentals of Texture Mapping and Image ’87) 21, 4, 45–54.
Warping. Master’s thesis, University of California at Berkeley,
Computer Science Division. S, F.,  D, G. 1995. Feature-based control of vis-
ibility error: A multi-resolution clustering algorithm for global
H, N.,  S, F. X. 1998. An exhaustive error- illumination. In Computer Graphics Proceedings, Annual Con-
bounding algorithm for hierarchical radiosity. Computer Graph- ference Series, ACM SIGGRAPH, 145–152.
ics Forum 17, 4.
S, P.-P., K, J.,  S, J. 2002. Precomputed radi-
I, H. 1999. Tracing ray differentials. In Computer Graphics ance transfer for real-time rendering in dynamic, low-frequency
Proceedings, Annual Conference Series, ACM SIGGRAPH. lighting environments. ACM Trans. on Graphics 21, 3, 527–536.
I, A., MM, L.,  G, S. J. 2000. Dynamically S, C.,  S, F. X. 1998. Fast calculation of soft shadow
reparameterized light fields. In Computer Graphics Proceedings, textures using convolution. In Computer Graphics Proceedings,
Annual Conference Series, ACM SIGGRAPH, 297–306. Annual Conference Series, ACM SIGGRAPH, 321–332.
K, A. 2001. Hierarchical monte carlo image synthesis. Math- S, J., Y, J., G, S. J.,  MM, L. 2003. A new
ematics and Computers in Simulation 55, 1–3 (Feb.), 79–92. reconstruction filter for undersampled light fields. In Proc. EG
Symposium on Rendering 2003, Eurographics, 150–156.
L, J., R, S.,  R, R. 2004. Ef-
ficient BRDF importance sampling using a factored representa- T, C.,  M, R. 1998. Bilateral filtering for gray
tion. ACM Transactions on Graphics (Proc. SIGGRAPH 2004) and color images. In Proc. IEEE International Conference on
23, 3 (Aug.), 496–505. Computer Vision, IEEE, 836–846.
L, H. P. A., K, J., G, M., H, W.,  S- W, G. J.,  H, P. 1992. Irradiance gradients. In Proc.
, H.-P. 2001. Image-based reconstruction of spatially varying of EG Workshop on Rendering ’92, Eurographics, 85–98.
materials. In Rendering Techniques ’01 (Proc. EG Workshop on
Rendering 2001), Eurographics, 104–115. W, G. J., R, F. M.,  C, R. D. 1988. A ray
tracing solution for diffuse interreflection. Computer Graphics
M, J.,  R, R. 1997. Computing local surface ori- (Proc. SIGGRAPH ’88) 22, 4 (Aug.), 85 – 92.
entation and shape from texture for curved surfaces. Interna-
tional Journal of Computer Vision 23, 2, 149–168. W, D. N., A, D. I., A, K., C, B., D, T.,
S, D. H.,  S, W. 2000. Surface light fields for
MC, M. D. 1999. Anisotropic diffusion for monte carlo noise 3D photography. In Computer Graphics Proceedings, Annual
reduction. ACM Transactions on Graphics 18, 2, 171–194. Conference Series, ACM SIGGRAPH, 287–296.
M, K. 1998. The visible differences predictor: applica-

tions to global illumination problems. In Rendering Techniques
’98 (Proc. EG Workshop on Rendering ’98), Eurographics.
1126
4.
Utilisation des cartes graphiques
programmables
Nous avons observé que plusieurs effets lumineux, comme les ombres et les reflets spécu-
laires, sont à la fois essentiels pour la qualité visuelle de la simulation et coûteux à calculer dans
le cadre d’une simulation globale de l’éclairage. D’un autre coté, les cartes graphiques modernes,
programmables, sont capables d’effectuer de nombreux calculs pour chaque pixel de l’image af-
fichée. Il devient alors possible de décharger le CPU d’un certain nombres de calculs en les
confiant à la carte graphique.
Dans ce chapitre, nous avons présenté nos travaux sur l’utilisation des cartes graphiques dans
la simulation des effets lumineux, tels que les ombres douces et les reflets spéculaires. Ces travaux
permettent d’augmenter le réalisme visuel de la simulation de l’éclairage, tout en libérant du
temps de calcul pour la simulation d’autres effets.
4.1 Introduction
La simulation de l’éclairage demande beaucoup de ressources de calcul, tant pour le proces-
seur que pour la mémoire. La qualité visuelle du résultat dépend beaucoup de certains effets à
haute fréquence, tels que les frontières d’ombre ou les reflets spéculaires. Nos expériences nous
ont montré que ces mêmes effets sont aussi les plus coûteux en temps de calcul. Sur certaines
scènes, le seul calcul des frontières des ombres causées par l’éclairage direct représente 90 % du
temps de calcul de l’éclairage global (direct et indirect).
Cette précision est requise surtout pour l’aspect visuel de la scène et non pour la précision
numérique des calculs. Les calculs d’éclairage indirect peuvent utiliser une version simplifiée de
l’éclairage direct, avec des frontières d’ombre grossièrement modélisées, sans perte de précision.
Plusieurs techniques de simulation de l’éclairage, comme la méthode de radiosité hiérarchique
ou le Photon Mapping, exploitent d’ailleurs cette propriété.
En reprenant notre étude sur les propriétés fréquentielles de la fonction d’éclairage [5], nous
voyons que les effets à haute fréquence se produisent principalement :
– en présence de BRDF spéculaires,
– sur les frontières d’ombre,
– lorsque deux objets sont proches l’un de l’autre.
Une caractéristique commune à tous ces cas est qu’ils ne nécessitent pas une information
complète sur l’éclairage dans l’ensemble de la scène : chacun d’eux ne fait intervenir qu’un petit
nombre d’objets simultanément. Ainsi, pour déterminer une frontière d’ombre on a seulement
besoin de connaître les positions de la source lumineuse, des obstacles et du récepteur. Bien que
ces phénomènes ne soient pas purement locaux, il ne s’agit pas non plus à proprement parler de
phénomènes globaux. On pourrait parler de phénomènes semi-locaux.
177
178 CHAPITRE 4. UTILISATION DES CARTES GRAPHIQUES PROGRAMMABLES
À l’inverse, les phénomènes réellement globaux, qui font intervenir l’éclairage sur l’ensemble
de la scène, impliquent une BRDF plutôt diffuse, et sont plutôt des phénomènes à basse fré-
quence, pour lesquels un échantillonnage spatial faible peut être suffisant.
Nous avons d’un côté un ensemble de phénomènes qui sont importants surtout pour leur effet
visuel, donc qui n’ont besoin d’être calculés que pour l’image affichée et qui ne font intervenir
qu’un petit nombre d’éléments de la scène. De l’autre coté, nous avons des cartes graphiques
dont les capacités se sont étendues et qui sont capables d’effectuer des programmes puissants, en
parallèle, pour chaque pixel de l’écran. Leurs principales limitations (les calculs sont limités à
l’image affichée et chaque programme n’a qu’une petite zone mémoire) correspondent en fait ici
à nos besoins.
Il est donc naturel de faire porter la partie coûteuse mais visuellement importante des calculs
de simulation de l’éclairage sur les cartes graphiques programmables, tout en gardant le proces-
seur central pour les calculs globaux, qui nécessitent un accès à l’information sur l’ensemble de
la scène.
Notons qu’il est également possible d’utiliser les cartes graphiques pour accélérer la simula-
tion de l’éclairage global ; dans le cadre de nos travaux, nous l’avions fait pour accélérer les re-
quêtes de visibilité en utilisant un hémicube [9], pour calculer efficacement les frontières d’ombre
et de pénombre [13], pour calculer rapidement le pourcentage de visibilité d’un objet avec des
occlusion queries [13] et pour l’affichage de fonctions d’éclairage non-linéaires [13]. Ces travaux
ne seront pas décrits ici.
4.2 Calcul des ombres douces

Les ombres forment un élément essentiel dans la perception d’une scène virtuelle. Elles four-
nissent des informations sur les positions relatives des objets, sur leur géométrie, sur leurs reliefs.
On distingue généralement les ombres dures, causées par une source ponctuelle, et les ombres
douces, causées par une source étendue (voir figure 3.2.
Le calcul d’une ombre dure revient, pour chaque pixel, à résoudre un problème simple de
visibilité entre deux points. En revanche, le calcul d’une ombre douce impose de connaître, pour
chaque pixel, la proportion de la source lumineuse qui est visible du point. Il s’agit donc d’un pro-
blème de visibilité point-surface, beaucoup plus complexe. Il existe plusieurs méthodes simples
pour calculer des ombres dures en temps-réel1, 2 , tandis que le calcul des ombres douces est en-
core un problème ouvert.
Dans le cadre du projet CYBER, dirigé par Jean-Marc Hasenfratz, avec l’ingénieur de re-
cherche Marc Lapierre, nous avons effectué un étude exhaustive des algorithmes de calcul des
ombres douces en temps-réel [7] (voir p. 184). Cette étude a mis en évidence plusieurs limitations
des algorithmes existants :
– Pour tous ces algorithmes, le coût en temps de calcul est directement lié à la taille de la
zone de pénombre. Or plus la zone de pénombre est large, et plus elle correspond à des
phénomènes à basse fréquence, donc moins visibles. On arrive à ce résultat paradoxal que
moins l’ombre douce est visible, et plus elle est coûteuse à calculer.
– Pour calculer la pénombre, il est nécessaire de connaître la portion des obstacles qui
masque partiellement la source lumineuse. Cette information étant aussi difficile à calculer
que l’ombre douce elle-même, les méthodes temps-réel reposent sur une approximation,
en général basée sur les obstacles visibles depuis un point situé au centre de la source. On
1. Lance W. « Casting Curved Shadows on Curved Surfaces ». Computer Graphics (Proc. of SIGGRAPH ’78),
12(3):270–274, 1978.
2. Franklin C. C. « Shadow Algorithms for Computer Graphics ». Computer Graphics (Proc. of SIGGRAPH ’77),
11(2):242–248, 1977.
4.3. PRÉCALCUL D’OCCLUSION AMBIANTE 179
ne calcule ensuite que l’ombre douce causée par cette partie des obstacles. Cette approxi-
mation limite la qualité des ombres douces, particulièrement si la source lumineuse est très
étendue par rapport aux obstacles.
– Plusieurs algorithmes sont basés sur une analyse du modèle géométrique des objets. Le
coût des calculs est alors proportionnel à la complexité géométrique des obstacles. L’ombre
douce est généralement un phénomène à basse fréquence, dont la complexité visuelle est
beaucoup moins grande que la complexité géométrique des obstacles. On a ici un surcroît
de travail inutile, lié à la complexité des objets.
Figure 4.1 – Notre algorithme calcule les ombres douces en temps-réel (à gauche) en remplaçant
les obstacles par une version discrétisée (à droite) calculée à partir de la shadow map. Ces images
sont calculées à 84 Hz.
En se basant sur la connaissance accumulée dans cet état de l’art, ainsi que sur les connais-
sances issues du projet CYBER, avec l’étudiant en M2R Lionel Atty (co-encadré avec Jean-Marc
Hasenfratz), nous avons développé un nouvel algorithme de calcul des ombres douces [2]. Cet
algorithme se base sur une discrétisation des obstacles dans une carte de profondeur, et calcule
l’ombre douce causée par l’obstacle discrétisé (voir figure 4.1). Bien qu’il ne résolve pas tous
les points soulevés par notre étude, cet algorithme a l’avantage d’être très rapide et surtout indé-
pendant de la complexité géométrique des obstacles. De plus, plus la pénombre est large, plus
on peut utiliser une discrétisation des obstacles avec un petit nombre de pixels : on arrive ainsi à
consacrer moins de temps de calcul aux ombres douces causées par des sources très étendues.
Le point essentiel de notre algorithme est l’utilisation d’une version discrétisée des obstacles
pour calculer l’ombre douce. Cette approche est amenée à se multiplier dans les travaux futurs :
en remplaçant un modèle géométrique complexe par une version discrète, plus simple et calculée
de façon automatique, il serait possible d’accélérer d’autres algorithmes de calcul.
4.3 Précalcul d’occlusion ambiante

L’occlusion ambiante est utilisée pour moduler l’éclairage ambiant en fonction des obstacles
proches. Elle est définie, soit comme une constante (le pourcentage des directions qui sont blo-
quées par les obstacles proches), soit comme une fonction plus complexe, par exemple un lobe
conique, définissant à la fois le pourcentage d’occlusion et la direction moyenne des obstacles
(voir figure 4.2(a)).
(a) L’occlusion ambiante peut être (b) Nous plaçons une grille 3D autour de (c) Lors du rendu, ces valeurs sont
définie comme un cône (d, α). l’objet. Au centre de chaque cellule, on cal- utilisées pour l’ombrage des objets
cule l’occlusion ambiante. voisins. Cette scène est affichée à
800 Hz.
Figure 4.2 – Précalcul d’occlusion ambiante
Pour les scènes animées, des recherches récentes3, 4 portent sur le stockage d’un champ d’oc-
clusion ambiante, attachée à un objet mobile, et qui influencent les objets voisins. Pour un sto-
ckage compact, ces recherches stockent le champ d’occlusion ambiante en le projetant sur un
espace de fonctions simples mais adaptées (par exemple des fractions rationnelles de la distance
au centre) et en stockant les coefficients dans une carte 2D cubique, indexée par la direction par
rapport au centre de l’objet. On a donc un stockage en O(n2 ), au prix d’un travail supplémentaire
sur les données, à la fois dans le pré-calcul et au moment du rendu.
En collaboration avec Mattias Malmer et Fredrik Malmer (Syndicate Ent. AB) et Ulf Assars-
son (Chalmers University of Technology) nous avons montré qu’il était plus rentable de stocker
ces données sous forme brute dans une grille 3D, sans pré-traitement [1] (voir p. 224). Au mo-
ment du rendu, les données sont affichées directement. Théoriquement, l’inconvénient de cette
méthode est que le stockage est en O(n3 ). En pratique, l’occlusion ambiante étant un phéno-
mène qui varie très lentement, on utilise de petites valeurs de n. Nous avons trouvé que n = 32
convenait pour toutes nos scènes. Pour ces valeurs de n, le coût du stockage brut en O(n3 ) est
comparable à celui du stockage en O(n2 ) (environ 100 Ko par obstacle), à cause du plus grand
nombre de coefficients par cellule dans ce dernier.
Outre son intérêt évident sur le plan pratique, ce résultat est également intéressant sur le
plan scientifique : pour certains phénomènes, une représentation brute peut être plus intéres-
sante qu’une représentation élaborée. Cet effet est surtout présent sur des phénomènes à basse
fréquence, comme l’occlusion ambiante. Il sera intéressant d’étudier l’emploi de tels stockages
sous forme de grille 3D pour d’autres phénomènes d’éclairage.
4.4 Calcul des réflexions spéculaires

La réflexion spéculaire sur un objet est un phénomène à haute fréquence, qui participe à
l’aspect réaliste de la scène, tout en donnant des informations sur le matériau de l’objet et sur
la proximité avec les objets réfléchis. Ces réflexions spéculaires sont généralement simulées en
3. Kun Z, Yaohua H, Steve L, Baining G et Heung-Yeung S. « Precomputed Shadow Fields for Dynamic
Scenes ». ACM Transactions on Graphics (proceedings of Siggraph 2005), 24(3), 2005.
4. Janne K et Samuli L. « Ambient Occlusion Fields ». Dans Symposium on Interactive 3D Graphics
and Games, p. 41–48, 2005.
4.5. ÉCLAIRAGE INDIRECT 181
utilisant une environment map de la scène, mais cette technique fait l’hypothèse que la distance
entre le réflecteur et l’objet réfléchi est infinie.
(a) Notre algorithme (b) Référence (lancer de rayons) (c) Environment mapping
Figure 4.3 – Réflexions spéculaires calculées avec notre algorithme
Avec le doctorant David Roger (co-encadré avec François Sillion), nous avons développé une
méthode de calcul des réflexions spéculaires en temps-réel qui fonctionne même lorsqu’il y a
contact entre le réflecteur et l’objet réfléchi [3] (voir p. 238). Cette technique calcule (sur la carte
graphique) la réflexion des sommets de la scène, puis interpole entre les positions réfléchies des
sommets. Le principal avantage de la méthode est sa robustesse, y compris dans des situations
difficiles (voir figure 4.3).
Notre méthode a une complexité linéaire par rapport au nombre d’objets dans la scène et
fonctionne en temps réel pour des scènes jusqu’à 20 000 polygones. Le principal inconvénient
repose sur l’interpolation linéaire entre les positions réfléchies des sommets, qui est faite par la
carte, qui peut produire des erreurs visibles dans la réflexion calculée.
4.5 Éclairage indirect

Avec les doctorant Emmanuel Turquin (encadré par François Sillion) et Janne Kontkannen
(Helsinki University of Technology), nous nous sommes intéressés à la simulation de l’éclairage
indirect en temps réel [12] (voir p. 4.7.6). Notre méthode repose sur trois points importants :
– Un opérateur de transport global, qui représente l’éclairage indirect dans l’ensemble de la
scène en fonction de l’éclairage direct. Cet opérateur est calculé à partir de l’opérateur de
transport local, qui représente un rebond de la lumière dans la scène, par multiplications
successives.
– Pour pouvoir calculer ces opérateurs de façon efficace, nous les avons exprimé dans une
base d’ondelettes spécifique, séparant les dimensions angulaires et spatiales.
– Enfin, nous avons développé une méthode pour calculer directement la projection de
l’éclairage direct sur notre base d’ondelettes, afin de pouvoir la donner en entrée à l’opéra-
teur de transport global.
L’essentiel des calculs d’éclairage indirect sont confiés au CPU, tandis que la carte graphique
réalise un calcul précis de l’éclairage direct. En combinant les deux résultats, on a accès à l’éclai-
rage global dans la scène de façon interactive (voir figure 4.4).
(a) Scène de test
(b) Éclairage direct (c) Éclairage indirect calculé par notre (d) Éclairage global résultat
algorithme
Figure 4.4 – Notre algorithme pour le calcul interactif de l’éclairage global. Ces images sont
calculées à 15 Hz.
4.6 Discussion
Dans ce chapitre, nous avons présenté nos travaux sur la simulation d’effets lumineux à l’aide
des cartes graphiques programmables. Nous avons vu qu’il est possible de simuler certains effets
importants pour le réalisme de l’éclairage, comme les ombres douces ou les réflexions spécu-
laires, à une vitesse compatible avec l’interactivité.
Les cartes graphiques programmables ouvrent de nouvelles directions grâce à leur puissance
de calcul, mais cette puissance a ses limites. L’architecture de la carte fait qu’il est plus pratique
de l’utiliser pour des calculs localisés, ne nécessitant pas un accès global à la scène. Ainsi les
cartes graphiques sont bien adaptées pour les calculs d’éclairage en un point, pour les calculs
d’ombre (même douce) ou pour les réflexions spéculaires.
De la même manière, nous avons vu que les cartes graphiques pouvaient être utilisée pour les
effets lumineux attachés à un objet, et qui ne se portent que sur les objets très proches, comme
l’occlusion ambiante.
Ainsi, les phénomènes pour lesquels les cartes graphiques sont les mieux adaptées sont aussi
des phénomènes qui sont plutôt à haute fréquence, selon notre analyse fréquentielle du chapitre
précédent.
Ces travaux sur l’emploi des cartes graphiques pour la simulation de l’éclairage sont égale-
ment les travaux les plus prometteurs en termes de collaborations industrielles et de coopérations
internationales.
4.7. ARTICLES 183
4.7 Articles
– A survey of real-time soft shadows algorithms (CGF 2003)
– Soft shadow maps: efficient sampling of light source visiblity (CGF 2006)
– Fast Precomputed Ambient Occlusion for Proximity Shadows (JGT 2006)
– Accurate specular reflections in real-time (EG 2006)
– Wavelet radiance transport for interactive indirect lighting (EGSR 2006)
4.7.2 A survey of real-time soft shadows algorithms (CGF 2003)

Auteurs : Jean-Marc H, Marc L, Nicolas H et François X. S.
Journal : Computer Graphics Forum, vol. 22, no 4 (une version préliminaire a été publiée comme
State-of-the-art-report à la conférence Eurographics 2003).
Date : décembre 2003.
Volume 22 (2003), Number 4 pp. 753–774
A Survey of Real-time Soft Shadows Algorithms
J.-M. Hasenfratz† , M. Lapierre‡ , N. Holzschuch§ and F. Sillion§
Artis GRAVIR/IMAG-INRIA∗∗
Abstract
Recent advances in GPU technology have produced a shift in focus for real-time rendering applications, whereby
improvements in image quality are sought in addition to raw polygon display performance. Rendering effects
such as antialiasing, motion blur and shadow casting are becoming commonplace and will likely be considered
indispensable in the near future. The last complete and famous survey on shadow algorithms — by Woo et al.52 in
1990 — has to be updated in particular in view of recent improvements in graphics hardware, which make new
algorithms possible. This paper covers all current methods for real-time shadow rendering, without venturing into
slower, high quality techniques based on ray casting or radiosity. Shadows are useful for a variety of reasons: first,
they help understand relative object placement in a 3D scene by providing visual cues. Second, they dramatically
improve image realism and allow the creation of complex lighting ambiances. Depending on the application, the
emphasis is placed on a guaranteed framerate, or on the visual quality of the shadows including penumbra effects
or “soft shadows”. Obviously no single method can render physically correct soft shadows in real time for any
dynamic scene! However our survey aims at providing an exhaustive study allowing a programmer to choose the
best compromise for his/her needs. In particular we discuss the advantages, limitations, rendering quality and
cost of each algorithm. Recommendations are included based on simple characteristics of the application such
as static/moving lights, single or multiple light sources, static/dynamic geometry, geometric complexity, directed
or omnidirectional lights, etc. Finally we indicate which methods can efficiently exploit the most recent graphics
hardware facilities.
Categories and Subject Descriptors (according to ACM CCS): I.3.7 [Computer Graphics]: Three-Dimensional
Graphics and Realism – Color, shading, shadowing, and texture, I.3.1 [Computer Graphics]: Hardware Archi-
tecture – Graphics processors, I.3.3 [Computer Graphics]: Picture/Image Generation – Bitmap and framebuffer
operations
Keywords: shadow algorithms, soft shadows, real-time, shadow mapping, shadow volume algorithm.
1. Introduction described the geometry underlying cast shadows (see Figure

1), and more recently the paper from Knill et al.34 .
Cast shadows are crucial for the human perception of the 3D
world. Probably the first thorough analysis of shadows was With the emergence of computer graphics technology, re-
Leonardo Da Vinci’s48 (see Figure 1), focusing on paintings searchers have developed experiments to understand the im-
and static images. Also of note is the work of Lambert35 who pact of shadows on our perception of a scene. Through dif-
ferent psychophysical experiments they established the im-
portant role of shadows in understanding:
• the position and size of the occluder49, 38, 27, 30, 31 ;
† University Pierre Mendès France – Grenoble II
• the geometry of the occluder38 ;
‡ University Joseph Fourier – Grenoble I
• the geometry of the receiver38 .
§ INRIA
∗∗ Artis is a team of the GRAVIR/IMAG laboratory, a joint re- Wanger49 studied the effect of shadow quality on the per-
search unit of CNRS, INPG, INRIA, UJF. ception of object relationships, basing his experiments on
c The Eurographics Association and Blackwell Publishers 2003. Published by Blackwell

Publishers, 108 Cowley Road, Oxford OX4 1JF, UK and 350 Main Street, Malden, MA
02148, USA.
185
Hasenfratz et al. / Real-time Soft Shadows
ferent abilities and limitations, allowing easier algorithm se-

lection depending on the application’s constraints.
2. Basic concepts of hard and soft shadows

2.1. What is a shadow?
Consider a light source L illuminating a scene: receivers are
objects of the scene that are potentially illuminated by L. A
point P of the scene is considered to be in the umbra if it
can not see any part of L, i.e. it does not receive any light
directly from the light source.
If P can see a part of the light source, it is in the penumbra.
Figure 1: Left: Study of shadows by Leonardo da Vinci48 — The union of the umbra and the penumbra is the shadow,
Right: Shadow construction by Lambert35 . the region of space for which at least one point of the light
source is occluded. Objects that hide a point from the light
source are called occluders.
We distinguish between two types of shadows:
shadow sharpness. Hubona et al.27 discuss the general role
and effectiveness of object shadows in 3D visualization. In attached shadows, occuring when the normal of the re-
their experiments, they put in competition shadows, viewing ceiver is facing away from the light source;
mode (mono/stereo), number of lights (one/two), and back- cast shadows, occuring when a shadow falls on an object
ground type (flat plane, “stair-step” plane, room) to measure whose normal is facing toward the light source.
the impact of shadows.
Self-shadows are a specific case of cast shadows that occur
Kersten et al.30, 31 and Mamassian et al.38 study the rela- when the shadow of an object is projected onto itself, i.e. the
tionship between object motion and the perception of rela- occluder and the receiver are the same.
tive depth. In fact, they demonstrate that simply adjusting Attached shadows are easy to handle. We shall see later, in
the motion of a shadow is sufficient to induce dramatically Section 4, that some algorithms cannot handle self-shadows.
different apparent trajectories of the shadow-casting object.
These psychophysical experiments convincingly establish 2.2. Importance of shadow effects
that it is important to take shadows into account to pro-
duce images in computer graphics applications. Cast shad- As discussed in the introduction, shadows play an important
ows help in our understanding of 3D environments and soft role in our understanding of 3D geometry:
shadows take part in realism of the images. • Shadows help to understand relative object position
Since the comprehensive survey of Woo et al.52 , progress and size in a scene49, 38, 27, 30, 31 . For example, without a
in computer graphics technology and the development of cast shadow, we are not able to determine the position of
consumer-grade graphics accelerators have made real-time an object in space (see Figure 2(a)).
3D graphics a reality3 . However incorporating shadows, and • Shadows can also help us understanding the geometry
especially realistic soft shadows, in a real-time application, of a complex receiver38 (see Figure 2(b)).
has remained a difficult task (and has generated a great re- • Finally, shadows provide useful visual cues that help in
search effort). This paper presents a survey of shadow gen- understanding the geometry of a complex occluder38
eration techniques that can create soft shadows in real time. (see Figure 3).
Naturally the very notion of “real-time performance” is dif-
ficult to define, suffice it to say that we are concerned with 2.3. Hard shadows vs. soft shadows
the display of 3D scenes of significant complexity (several
The common-sense notion of shadow is a binary status, i.e. a
tens of thousands of polygons) on consumer-level hardware
point is either “in shadow” or not. This corresponds to hard
ca. 2003. The paper is organized as follows:
shadows, as produced by point light sources: indeed, a point
We first review in Section 2 basic notions about shad- light source is either visible or occluded from any receiving
ows: hard and soft shadows, the importance of shadow ef- point. However, point light sources do not exist in practice
fects showing problems encountered when working with soft and hard shadows give a rather unrealistic feeling to images
shadows and classical techniques for producing hard shad- (see Figure 4(c)). Note that even the sun, probably the most
ows in real time. Section 3 then presents existing algorithms common shadow-creating light source in our daily life, has
for producing soft shadows in real time. Section 4 offers a a significant angular extent and does not create hard shad-
discussion and classifies these algorithms based on their dif- ows. Still, point light sources are easy to model in computer
186
(a) Shadows provide information about the relative positions (b) Shadows provide information about the geometry of the re-
of objects. On the left-hand image, we cannot determine the ceiver. Left: not enough cues about the ground. Right: shadow
position of the robot, whereas on the other three images we reveals ground geometry.
understand that it is more and more distant from the ground.
Figure 2: Shadows play an important role in our understanding of 3D geometry.
(a) (b) (c)
Figure 3: Shadows provide information about the geometry of the occluder. Here we see that the robot holds nothing in his left
hand on Figure 3(a), a ring on Figure 3(b) and a teapot on Figure 3(c).
graphics and we shall see that several algorithms let us com- gree of softness (blur) in the shadow varies dramatically with
pute hard shadows in real time. the distances involved between the source, occluder, and re-
ceiver. Note also that a hard shadow, with its crisp bound-
In the more realistic case of a light source with finite ex-
ary, could be mistakenly perceived as an object in the scene,
tent, a point on the receiver can have a partial view of the
while this would hardly happen with a soft shadow.
light, i.e. only a fraction of the light source is visible from
that point. We distinguish the umbra region (if it exists) in
which the light source is totally blocked from the receiver,
and the penumbra region in which the light source is par-
tially visible. The determination of the umbra and penumbra
is a difficult task in general, as it amounts to solving visibility
relationships in 3D, a notoriously hard problem. In the case
In computer graphics we can approximate small or distant
of polygonal objects, the shape of the umbra and penumbra
light source as point sources only when the distance from the
regions is embedded in a discontinuity mesh13 which can be
light to the occluder is much larger than the distance from
constructed from the edges and vertices of the light source
the occluder to the receiver, and the resolution of the final
and the occluders (see Figure 4(b)).
image does not allow proper rendering of the penumbra. In
Soft shadows are obviously much more realistic than hard all other cases great benefits can be expected from properly
shadows (see Figures 4(c) and 4(d)); in particular the de- representing soft shadows.
187
Point light source Area light source Area light source
Occluder Occluder Occluder
ice to
s
h v due
Umbra
ert
eacdows
Hard Shadow
a
Sh
Penumbra
Receiver Receiver Receiver
(a) Geometry of hard shadows (b) Geometry of soft shadows
(c) Illustration of hard shadows (d) Illustration of soft shadows
Figure 4: Hard vs. soft shadows.
2.4. Important issues in computing soft shadows points in the scene where the light source is not occluded
by any object taken separately, but is totally occluded by
2.4.1. Composition of multiple shadows
the set of objects taken together. The correlation between
While the creation of a shadow is easily described for a (light the partial visibility functions of different occluders cannot
source, occluder, receiver) triple, care must be taken to allow be predicted easily, but can sometimes be approximated or
for more complex situations. bounded45, 5 .
As a consequence, the shadow of the union of the objects
Shadows from several light sources Shadows produced can be larger than the union of the shadows of the objects
by multiple light sources are relatively easy to obtain if we (see Figure 6). This effect is quite real, but is not very visible
know how to deal with a single source (see Figure 5). Due on typical scenes, especially if the objects casting shadows
to the linear nature of light transfer we simply sum the con- are animated.
tribution of each light (for each wavelength or color band).
2.4.2. Physically exact or fake shadows
Shadows from several objects For point light sources,
shadows due to different occluders can be easily combined Shadows from an extended light source Soft shadows
since the shadow area (where the light source is invisible) is come from spatially extended light sources. To model prop-
the union of all individual shadows. erly the shadow cast by such light sources, we must take into
account all the parts of the occluder that block light com-
With an area light source, combining the shadows of sev-
ing from the light source. This requires identifying all parts
eral occluders is more complicated. Recall that the lighting
of the object casting shadow that are visible from at least
contribution of the light source on the receiver involves a
one point of the extended light source, which is algorithmi-
partial visibility function: a major issue is that no simple
cally much more complicated than identifying parts of the
combination of the partial visibility functions of distinct oc-
occluder that are visible from a single point.
cluders can yield the partial visibility function of the set of
occluders considered together. For instance there may be Because this visibility information is much more difficult
188
Figure 5: Complex shadow due to multiple light sources. Note the complex interplay of colored lights and shadows in the
complementary colors.
Figure 7: When the light source is significantly larger than the occluder, the shape of the shadow is very different from the
shape computed using a single sample; the sides of the object are playing a part in the shadowing.
to compute with extended light sources than with point light the light source are actually seeing different sides of the ob-
sources, most real-time soft shadow algorithms compute vis- ject (see Figure 7). In that case, the physically exact shadow
ibility information from just one point (usually the center of is very different from the approximated version.
the light source) and then simulate the behavior of the ex-
While large light sources are not frequent in real-time al-
tended light source using this visibility information (com-
gorithms, the same problem also occurs if the object casting
puted for a point).
shadow is extended along the axis of the light source, e.g.
This method produces shadows that are not physically ex- a character with elongated arms whose right arm is point-
act, of course, but can be close enough to real shadows for ing toward light source, and whose left arm is close to the
most practical applications. The difference between the ap- receiver.
proximation and the real shadow is harder to notice if the In such a configuration, if we want to compute a better
objects and their shadow are animated — a common occur- looking shadow, we can either:
rence in real-time algorithms.
• Use the complete extension of the light source for visibil-
The difference becomes more noticeable if the difference ity computations. This is algorithmically too complicated
between the actual extended light source and the point used to be used in real-time algorithms.
for the approximation is large, as seen from the object cast- • Separate the light source into smaller light sources24, 5 .
ing shadow. A common example is for a large light source, This removes some of the artefacts, since each light source
close enough from the object casting shadow that points of is treated separately, and is geometrically closer to the
189
Light source Illumination in the umbra region An important question

is the illumination in regions that are in the umbra — com-
pletely hidden from the light source. There is no light reach-
Occluder 2
ing theses regions, so they should appear entirely black, in
Occluder 1 theory.
However, in practice, some form of ambient lighting is
used to avoid completely dark regions and to simulate the
fact that light eventually reaches these regions after several
Visibility of light
Occluder 1
reflections.
source (in %)
Occluder 2 Real-time shadow methods are usually combined with

illumination computations, for instance using the simple
Occluder 1 and 2 OpenGL lighting model. Depending on whether the shadow
method operates before or after the illumination phase, am-
Figure 6: The shadow of two occluders is not a simple com- bient lighting will be present or absent. In the latter case the
bination of the two individual shadows. Note in particular shadow region appears completely dark, an effect that can
the highlighted central region which lies in complete shadow be noticeable. A solution is to add the ambient shading as a
(umbra) although the light source is never blocked by a sin- subsequent pass; this extra pass slows down the algorithm,
gle occluder. but clever re-use of the Z-buffer on recent graphics hardware
make the added cost manageable40 .
Shadows from different objects As shown in Sec-

point sample used to compute the silhouette. The speed tion 2.4.1, in presence of extended light sources, the shadow
of the algorithm is usually divided by the number of light of the union of several objects is larger than the union of
sources. the individual shadows. Furthermore, the boundary of the
• Cut the object into slices45 . We then compute soft shadows shadow caused by the combination of several polygonal ob-
separately for each slice, and combine these shadows. By jects can be a curved line13 .
slicing the object, we are removing some of the visibility
problems, and we allow lower parts of the object — usu- Since these effects are linked with the fact that the light
ally hidden by upper parts — to cast shadow. The speed source is extended, they can not appear in algorithms that
of the algorithm is divided by the number of slices, and use a single point to compute surfaces visible from the light
combining the shadows cast by different slices remains a source. All real-time soft shadow algorithms therefore suffer
difficult problem. from this approximation.
However, while these effects are both clearly identifiable
Approximating the penumbra region When real-time on still images, they are not as visible in animated scenes.
soft shadow algorithms approximate extended light sources There is currently no way to model these effects with real-
using points, they are in fact computing a hard shadow, and time soft shadow algorithms.
extending it to compute a soft shadow.
There are several possible algorithms: 2.4.3. Real-time
• extend the umbra region outwards, by computing an outer Our focus in this paper is on real-time applications, therefore
penumbra region, we have chosen to ignore all techniques that are based on an
• shrink the umbra region, and complete it with an inner expensive pre-process even when they allow later modifica-
penumbra region, tions at interactive rates37 . Given the fast evolution of graph-
• compute both inner penumbra and outer penumbra. ics hardware, it is difficult to draw a hard distinction between
real-time and interactive methods, and we consider here that
The first method (outer penumbra only) will always create frame rates in excess of 10 fps, for a significant number of
shadows made of an umbra and a penumbra. Objects will polygons, are an absolute requirement for “real-time” appli-
have an umbra, even if the light source is very large with cations. Note that stereo viewing usually require double this
respect to the occluders. This effect is quite noticeable, as it performance.
makes the scene appear much darker than anticipated, except
For real-time applications, the display refresh rate is often
for very small light sources.
the crucial limiting factor, and must be kept high enough (if
On the other hand, computing the inner penumbra region not constant) through time. An important feature to be con-
can result in light leaks between neighboring objects whose sidered in shadowing algorithms is therefore their ability to
shadows overlap. guarantee a sustained level of performance. This is of course
190
impossible to do for arbitrary scenes, and a more impor-

tant property for these algorithms is the ability to paramet-
rically vary the level of performance (typically at the price
of greater approximation), which allows an adaptation to the
scene’s complexity.
2.4.4. Shadows of special objects

Figure 8: Shadow map for a point light source. Left: view
Most shadowing algorithms make use of an explicit repre-
from the camera. Right: depth buffer computed from the light
sentation of the object’s shapes, either to compute silhou-
source.
ettes of occluders, or to create images and shadow maps.
Very complex and volumetric objects such as clouds, hair,
grass etc. typically require special treatment.
• The color of the objects is modulated depending on
2.4.5. Constraints on the scene whether they are in shadow or not.
Shadowing algorithms may place particular constraints on Shadow mapping is implemented in current graphics
the scene. Examples include the type of object model (tech- hardware. It uses an OpenGL extension for the comparison
niques that compute a shadow as a texture map typically re- between Z values, GL_ARB_SHADOW† .
quire a parametric object, if not a polygon), or the neces-
sity/possibility to identify a subset of the scene as occlud- Improvements The depth buffer is sampled at a limited
ers or shadow receivers. This latter property is important in precision. If surfaces are too close from each other, sampling
adapting the performance of the algorithm to sustain real- problems can occur, with surfaces shadowing themselves. A
time. possible solution42 is to offset the Z values in the shadow
map by a small bias51 .
2.5. Basic techniques for real-time shadows If the light source has a cut-off angle that is too large, it
is not possible to project the scene in a single shadow map
In this State of the Art Review, we focus solely on real-time without excessive distortion. In that case, we have to replace
soft shadows algorithms. As a consequence, we will not de- the light source by a combination of light sources, and use
scribe other methods for producing soft shadows, such as ra- several depth maps, thus slowing down the algorithm.
diosity, ray-tracing, Monte-Carlo ray-tracing or photon map-
ping. Shadow mapping can result in large aliasing problems if
the light source is far away from the viewer. In that case, in-
We now describe the two basic techniques for computing dividual pixels from the shadow map are visible, resulting in
shadows from point light sources, namely shadow mapping a staircase effect along the shadow boundary. Several meth-
and the shadow volume algorithm. ods have been implemented to solve this problem:
2.5.1. Shadow mapping • Storing the ID of objects in the shadow map along with
their depth26 .
Method The basic operation for computing shadows is • Using deep shadow maps, storing coverage information
identifying the parts of the scene that are hidden from the for all depths for each pixel36 .
light source. Intrisically, it is equivalent to visible surface • Using multi-resolution, adaptative shadow maps18 , com-
determination, from the point-of-view of the light source. puting more details in regions with shadow boundaries
The first method to compute shadows17, 44, 50 starts by that are close to the eye.
computing a view of the scene, from the point-of-view of • Computing the shadow map in perspective space46 , effec-
the light source. We store the z values of this image. This tively storing more details in parts of the shadow map that
Z-buffer is the shadow map (see Figure 8). are closer to the eye.
The shadow map is then used to render the scene (from The last two methods are directly compatible with exist-
the normal point-of-view) in a two pass rendering process: ing OpenGL extensions, and therefore require only a small
amount of coding to work with modern graphics hardware.
• a standard Z-buffer technique, for hidden-surface re-
An interesting alternative version of this algorithm is to
moval.
• for each pixel of the scene, we now have the geometri-
cal position of the object seen in this pixel. If the distance † This extension (or the earlier version, GL_SGIX_SHADOW, is
between this object and the light is greater than the dis- available on Silicon Graphics Hardware above Infinite Reality 2,
tance stored in the shadow map, the object is in shadow. on NVidia graphics cards after GeForce3 and on ATI graphics cards
Otherwise, it is illuminated. after Radeon9500.
191
Light source • then we construct the shadow volume by extruding these

edges along the direction of the point light source. For
each edge of the silhouette, we build the half-plane sub-
tended by the plane defined by the edge and the light
Occluder 1 source. All these half-planes define the shadow volume,
Viewer and knowing if a point is in shadow is then a matter of
Occluder 2 knowing if it is inside or outside the volume.
+1 • for each pixel in the image rendered, we count the num-
ber of faces of the shadow volume that we are crossing
+1 +1 between the view point and the object rendered. Front-
+1 facing faces of the shadow volume (with respect to the
+1 -1
view point) increment the count, back-facing faces decre-
+1 -1 ment the count (see Figure 9). If the total number of faces
+1 -1 is positive, then we are inside the shadow volume, and the
pixel is rendered using only ambient lighting.
1 2 1 0
The rendering pass is easily done in hardware using a
Figure 9: Shadow volume. stencil buffer23, 32, 15 ; faces of the shadow volume are ren-
dered in the stencil buffer with depth test enabled this way:
in a first pass, front faces of the shadow volumes are ren-
dered incrementing the stencil buffer; in a second pass, back
warp the shadow map into camera space55 rather than the faces are rendered, decrementing it. Pixels that are in shadow
usual opposite: it has the advantage that we obtain a modu- are “captured” between front and back faces of the shadow
lation image that can be mixed with a texture, or blurred to volume, and have a positive value in the stencil buffer. This
produce antialiased shadows. way to render volumes is called zpass.
Discussion Shadow mapping has many advantages: Therefore the complete algorithm to obtain a picture using
the Shadow Volume method is:
• it can be implemented entirely using graphics hardware;
• creating the shadow map is relatively fast, although it still • render the scene with only ambient/emissive lighting;
depends on the number and complexity of the occluders; • calculate and render shadow volumes in the stencil buffer;
• it handles self-shadowing. • render the scene illuminated with stencil test enabled:
only pixels which stencil value is 0 are rendered, others
It also has several drawbacks: are not updated, keeping their ambient color.
• it is subject to many sampling and aliasing problems;
• it cannot handle omni-directional light sources; Improvements The cost of the algorithm is directly linked
• at least two rendering passes are required (one from the to the number of edges in the shadow volume. Batagelo and
light source and one from the viewpoint); Júnior7 minimize the number of volumes rendered by precal-
culating in software a modified BSP tree. McCool39 extracts
2.5.2. The Shadow Volume Algorithm the silhouette by first computing a shadow map, then extract-
ing the discontinuities of the shadow map, but this method
Another way to think about shadow generation is purely ge- requires reading back the depth buffer from the graphics
ometrical. This method was first described by Crow12 , and board to the CPU, which is costly. Brabec and Seidel10 re-
first implemented using graphics hardware by Heidmann23 . ports a method to compute the silhouette of the occluders
using programmable graphics hardware14 , thus obtaining an
Method The algorithm consists in finding the silhouette almost completely hardware-based implementation of the
of occluders along the light direction, then extruding this shadow volume algorithm (he still has to read back a buffer
silhouette along the light direction, thus forming a shadow into the CPU for parameter transfer).
volume. Objects that are inside the shadow volume are in
Roettger et al.43 suggests an implementation that doesn’t
shadow, and objects that are outside are illuminated.
require the stencil buffer; he draws the shadow volume in
The shadow volume is calculated in two steps: the alpha buffer, replacing increment/decrement with a mul-
tiply/divide by 2 operation.
• the first step consists in finding the silhouette of the oc-
cluder as viewed from the light source. The simplest Everitt and Kilgard15 have described a robust implementa-
method is to keep edges that are shared by a triangle fac- tion of the shadow volume algorithm. Their method includes
ing the light and another in the opposite direction. This capping the shadow volume, setting w = 0 for extruded ver-
actually gives a superset of the true silhouette, but it is tices (effectively making infinitely long quads) and setting
sufficient for the algorithm. the far plane at an infinite distance (they prove that this step
192
only decreases Z-buffer precision by a few percents). Finally, • Algorithms that are based on an object-based approach,
they render the shadow volume using the zfail technique; it and build upon the shadow volume method described
works by rendering the shadow volume backwards: in Section 2.5.2. These algorithms are described in Sec-
tion 3.2.
• we render the scene, storing the Z-buffer;
• in the first pass, we increment the stencil buffer for all
back-facing faces, but only if the face is behind an existing 3.1. Image-Based Approaches
object of the scene;
In this section, we present soft shadow algorithms based on
• in the second pass, we decrement the stencil buffer for all
shadow maps (see Section 2.5.1). There are several methods
front-facing faces, but only if the face is behind an existing
to compute soft shadows using image-based techniques:
object;
• The stencil buffer contains the intersection of the shadow 1. Combining several shadow textures taken from point
volume and the objects of the scene. samples on the extended light source25, 22 .
2. Using layered attenuation maps1 , replacing the shadow
The zfail technique was discovered independently by map with a Layered Depth Image, storing depth informa-
Bilodeau and Songy and by Carmack. tion about all objects visible from at least one point of the
Recent extensions to OpenGL15, 16, 21 allow the use of light source.
shadow volumes using stencil buffer in a single pass, instead 3. Using several shadow maps24, 54 , taken from point sam-
of the two passes required so far. They also15 provide depth- ples on the light source, and an algorithm to compute the
clamping, a method in which polygon are not clipped at the percentage of the light source that is visible.
near and far distance, but their vertices are projected onto 4. Using a standard shadow map, combined with image
the near and far plane. This provides in effect an infinite view analysis techniques to compute soft shadows9 .
pyramid, making the shadow volume algorithm more robust. 5. Convolving a standard shadow map with an image of the
light source45 .
The main problem with the shadow volume algorithm
is that it requires drawing large polygons, the faces of the The first two methods approximate the light source as a
shadow volume. The fillrate of the graphics card is often the combination of several point samples. As a consequence,
bottleneck. Everitt and Kilgard15, 16 list different solutions to the time for computing the shadow textures is multiplied
reduce the fillrate, either using software methods or using by the number of samples, resulting in significantly slower
the graphics hardware, such as scissoring, constraining the rendering. On the other hand, these methods actually com-
shadow volume to a particular fragment. pute more information than other soft shadow methods, and
thus compute more physically accurate shadows. Most of the
Discussion The shadow volume algorithm has many ad- artefacts listed in Section 2.4.2 will not appear with these
vantages: two methods.
• it works for omnidirectional light sources; 3.1.1. Combination of several point-based shadow
• it renders eye-view pixel precision shadows; images25, 22
• it handles self-shadowing.
The simplest method22, 25 to compute soft shadows using im-
It also has several drawbacks: age based methods is to place sample points regularly on the
extended light source. These sample points are used to com-
• the computation time depends on the complexity of the pute binary occlusion maps, which are combined into an at-
occluders; tenuation map, used to modulate the illumination (calculated
• it requires the computation of the silhouette of the occlud- separately).
ers as a preliminary step;
• at least two rendering passes are required; Method Herf25 makes the following assumptions on the ge-
• rendering the shadow volume consumes fillrate of the ometry of the scene:
graphics card.
• a light source of uniform color,
• subtending a small solid angle with respect to the receiver,
3. Soft shadow algorithms • and with distance from the receiver having small variance.
In this section, we review algorithms that produce soft shad- With these three assumptions, contributions from all sam-
ows, either interactively or in real time. As in the previous ple points placed on the light source will be roughly equal.
section, we distinguish two types of algorithms:
The user identifies in advance the object casting shadows,
• Algorithms that are based on an image-based approach, and the objects onto which we are casting shadow. For each
and build upon the shadow map method described in Sec- object receiving shadow, we are going to compute a texture
tion 2.5.1. These algorithms are described in Section 3.1. containing the soft shadow.
193
ation map contains only n − 1 gray levels. With fewer than 9

samples (3 × 3), the user sees several hard shadows, instead
of a single soft shadow (see Figure 11).
Herf’s method is easy to parallelize, since all occlusion
maps can be computed separately, and only one computer
is needed to combine them. Isard et al.28 reports that a par-
allel implementation of this algorithm on a 9-node Sepia-2a
parallel calculator with high-end graphics cards runs at more
than 100 fps for moderately complex scenes.
Figure 10: Combining several occlusion maps to compute
3.1.2. Layered Attenuation Maps1
soft shadows. Left: the occlusion map computed for a single
sample. Center: the attenuation map computed using 4 sam- The Layered Attenuation Maps1 method is based on a modi-
ples. Right: the attenuation map computed using 64 samples. fied layered depth image29 . It is an extension of the previous
method, where we compute a layered attenuation map for
the entire scene, instead of a specific shadow map for each
object receiving shadow.
Method It starts like the previous method: we place sam-

ple points on the area light source, and we use these sample
points to compute a modified attenuation map:
• For each sample point, we compute a view of the scene,
Figure 11: With only a small number of samples on the light along the direction of the normal to the light source.
source, artefacts are visible. Left: soft shadow computed us- • Theses images are all warped to a central reference, the
ing 4 samples. Right: soft shadow computed using 1024 sam- center of the light source.
ples. • For each pixel of these images:
– In each view of the scene, we have computed the dis-
tance to the light source in the Z-buffer.
– We can therefore identify the object that is closest to
We start by computing a binary occlusion map for each
the light source.
sample point on the light source. For each sample point on
– This object makes the first layer of the layered attenu-
the light source, we render the scene into an auxiliary buffer,
ation map.
using 0 for the receiver, and 1 for any other polygon. These
– We count the number of samples seeing this object,
binary occlusion maps are then combined into an attenuation
which gives us the percentage of occlusion for this ob-
map, where each pixel stores the number of sample points
ject.
on the light source that are occluded. This attenuation map
– If other objects are visible for this pixel but further
contains a precise representation of the soft shadow (see Fig-
away from the light they make the subsequent layers.
ures 10 and 11).
– For each layer, we store the distance to the light source
In the rendering pass, this soft shadow texture is combined and the percentage of occlusion.
with standard textures and illumination, in a standard graph-
The computed Layered Attenuation Map contains, for all
ics pipeline.
the objects that are visible from at least one sample point,
the distance to the light source and the percentage of sample
Discussion The biggest problem for Herf25 method is ren-
points seeing this object.
dering the attenuation maps. This requires N p Ns rendering
passes, where N p is the number of objects receiving shad- At rendering time, the Layered Attenuation Map is used
ows, and Ns is the number of samples on the light source. like a standard attenuation map, with the difference that all
Each pass takes a time proportionnal to the number of poly- the objects visible from the light source are stored in the
gons in the objects casting shadows. In practice, to make this map:
method run in real time, we have to limit the number of re-
• First we render the scene, using standard illumination and
ceivers to a single planar receiver.
textures. This first pass eliminates all objects invisible
To speed-up computation of the attenuation map, we can from the viewer.
lower the number of polygons in the occluders. We can also • Then, for each pixel of the image, we find whether the
lower the number of samples (n) to increase the framerate, corresponding point in the scene is in the Layered Atten-
but this is done at the expense of image quality, as the attenu- uation Map or not. If it is, then we modulate the lighting
194
Light source P1
P12 a
P2
Occluder
P0
p1 p2 q1 q2 P3
Visibility of light
source (in %)
P34
b P4
Figure 13: Using the visibility channel to compute visibil-

ity from a polygonal light source. The shadow maps tell us
that vertices P0 , P1 and P4 are occluded and that vertices
Figure 12: Percentage of a linear light source that is visible. P2 and P3 are visible. The visibility channel for edge [P1 P2 ]
tells us that this edge is occluded for a fraction a; similarly,
the visibility channel for edge [P3 P4 ] tells us that this edge
value found by the percentage of occlusion stored in the is occluded for a fraction b. The portion of the light that is
map. If it isn’t, then the point is completely hidden from occluded is the hatched region, whose area can be computed
the light source. geometrically using a and b.
Discussion The main advantage of this method, compared

to the previous method, is that a single image is used to store
discontinuity, we form a polygon linking the frontmost ob-
the shadowing information for the entire scene, compared to
ject (casting shadow) to the back object (receiving shadow).
one shadow texture for each shadowed object. Also, we do
These polygons are then rendered in the point of view of the
not have to identify beforehand the objects casting shadows.
other sample, using Gouraud shading, with value 0 on the
The extended memory cost of the Layered Attenuation closer points, and 1 on the farthest points.
Map is reasonable: experiments by the authors show that
This gives us a visibility channel, which actually encodes
on average, about 4 layers are used in moderately complex
the percentage of the edge linking the two samples that is
scenes.
visible.
As with the previous method, the speed and realism are
The visibility channel is then used in a shadow mapping
related to the number of samples used on the light source.
algorithm. For each pixel in the rendered image, we first
We are rendering the entire scene Ns times, which precludes
check its position in the shadow map for each sample.
real-time rendering for complex scenes.
• if it is in shadow for all sample points, we assume that it
3.1.3. Quantitative Information in the Shadow Map24 is in shadow, and therefore it is rendered black.
• if it is visible from all sample points, we assume that it
Heidrich et al.24 introduced another extension of the shadow
is visible, and therefore rendered using standard OpenGL
map method, where we compute not only a shadow map,
illumination model.
but also a visibility channel(see Figure 12), which encodes
• if it is hidden for some sample point, and visible from
the percentage of the light source that is visible. Heidrich
another point, we use the visibility channel to modulate
et al.24 ’s method only works for linear light sources, but it
the light received by the pixel.
was later extended to polygonal area light sources by Ying
et al.54 . Ying et al.54 extended this algorithm to polygonal area
light sources: we generate a shadow map for each vertex of
Method We start by rendering a standard shadow map for the polygonal light source, and a visibility channel for each
each sample point on the linear light source. The number of edge. We then use this information to compute the percent-
sample points is very low, usually they are equal to the two age of the polygonal light source that is visible from the cur-
end vertices of the linear light source. rent pixel.
In each shadow map, we detect discontinuities using im- For each vertex of the light source, we query the shadow
age analysis techniques. Discontinuities in the shadow map map of this vertex. This gives us a boolean information,
happen at shadow boundaries. They are separating an object whether this vertex is occluded or not from the point of view
casting shadow from the object receiving shadow. For each of the object corresponding to the current pixel. If an edge
195
Point
light Umbra
Point light r
Inner penumbra P
Blocked
pixels R
Outer penumbra
Occluder
P
Shadow map
r
Occluder
P'
Umbra
Inner penumbra
Receiver
Receiver Outer penumbra
Figure 14: Extending the shadow of a point light source: for

Figure 15: Extending the shadow of a single sample: For
each occluder identified in the shadow map, we compute a
each pixel in the image, we find the corresponding pixel P
penumbra, based on the distance between this occluder and
in the shadow map. Then we find the nearest blocked pixel.
the receiver.
P is assumed to be in the penumbra of this blocker, and we
compute an attenuation coefficient based on the relative dis-
tances betwen light source, occluder and P.
links an occluded vertex to an non-occluded one, the visibil-
ity channel for this edge gives us the percentage of the edge
that is occluded (see Figure 13). Computing the visible area
of the light source is then a simple 2D problem. This area can and later modified to use graphics hardware by Brabec and
be expressed as a linear combination of the area of triangles Seidel9 .
on the light source. By precomputing the area of these trian- This method is very similar to standard shadow mapping.
gles, we are left with a few multiplications and additions to It starts by computing a standard shadow map, then uses
perform at each pixel. the depth information available in the depth map to extend
the shadow region and create a penumbra. In this method,
Discussion The strongest point of this algorithm is that it we distinguish between the inner penumbra (the part of the
requires a small number of sampling points. Although it can penumbra that is inside the shadow of the point sample) and
work with just the vertices of the light source used as sam- the outer penumbra (the part of the umbra that is outside
pling points, a low number of samples can result in artefacts the shadow of the point sample, see Figure 14). Parker et
in moderately complex scenes. These artefacts are avoided al.41 compute only the outer penumbra; Brabec and Seidel9
by adding a few more samples on the light source. compute both the inner and the outer penumbra; Kirsch and
This method creates fake shadows, but nicely approxi- Doellner33 compute only the inner penumbra. In all cases,
mated. The shadows are exact when only one edge of the the penumbra computed goes from 0 to 1, to ensure continu-
occluder is intersecting the light source, and approximate if ity with areas in shadow and areas that are fully illuminated.
there is more than one edge, for example at the intersection
of the shadows of two different occluders, or when an oc- Method In a first pass, we create a single standard shadow
cluder blocks part of the light source without blocking any map, for a single sample — usually at the center of the light
vertex. source.
The interactivity of the algorithm depends on the time it During rendering, as with standard shadow mapping, we
takes to generate the visibility channels, which itself depends identify the position of the current pixel in the shadow map.
on the complexity of the shadow. On simples scenes (a few Then:
occluders) the authors report computation times of 2 to 3
• if the current pixel is in shadow, we identify the nearest
frames per second.
pixel in the shadow map that is illuminated.
The algorithm requires having a polygonal light source, • if the pixel is lit, we identify the nearest pixel in the
and organising the samples, so that samples are linked by shadow map that corresponds to an object that is closer
edges, and for each edge, we know the sample points it links. to the light source than the current pixel (see Figure 15).
In both cases, we assume that the object found is casting a
3.1.4. Single Sample Soft Shadows9, 33
shadow on the receiver, and that the point we have found is
A different image-based method to generate soft shadows in the penumbra. We then compute an attenuation coefficient
was introduced by Parker et al.41 for parallel ray-tracing based on the relative positions of the receiver, the occluder
196
and the light source: A faster version of this algorithm, by Kirsch and
dist(PixelOccluder , PixelReceiver ) Doellner33 , computes both the shadow map and a shadow-
f= width map: for each point in shadow, we precompute the dis-
RSzReceiver |zReceiver − zOccluder |
tance to the nearest point that is illuminated. For each pixel,
where R and S are user-defineable parameters. The inten- we do a look-up in the shadow map and the shadow-width
sity of the pixel is modulated using8 : map. If the point is occluded, we have the depth of the cur-
rent point (z), the depth of the occluder (zoccluder ) and the
• 0.5 ∗ (1 + f ), clamped to [0.5, 1] if the pixel is outside the shadow width (w). A 2D function gives us the modulation
shadow, coefficient:
• 0.5 ∗ (1 − f ), clamped to [0, 0.5] if the pixel is inside the
shadow. 1 if z = zoccluder
I(z, w) = w
1 + cbias − cscale zoccluder −z otherwise
For pixels that are far away from the boundary of the
shadow, either deep inside the shadow or deep inside the
The shadow-width map is generated from a binary occlu-
fully lit area, f gets greater than 1, resulting in a modulation
sion map, transformed into the width map by repeated appli-
coefficient of respectively 0 or 1. On the original shadow
cations of a smoothing filter. This repeated filtering is done
boundary, f = 0, the two curves meet each other continu-
using graphics hardware, during rendering. Performances
ously with a modulation coefficient of 0.5. The actual width
depend mostly on the size of the occlusion map and on the
of the penumbra region depends on the ratio of the distances
size of the filter; for a shadow map resolution of 512 × 512
to the light source of the occluder and the receiver, which is
pixels, and a large filter, they attain 20 frames per second.
perceptually correct.
Performance depends linearly on the number of pixels in the
The slowest phase of this algorithm is the search of neigh- occlusion map, thus doubling the size of the occlusion map
bouring pixels in the shadow map, to find the potential oc- divides the rendering speed by 4.
cluder. In theory, an object can cast a penumbra than spans
the entire scene, if it is close enough to the light source. In
3.1.5. Convolution technique45
practice, we limit the search to a maximal distance to the
current pixel of Rmax = RzReceiver . As noted earlier, soft shadows are a consequence of partial
visibility of an extended light source. Therefore the calcula-
To ensure that an object is correctly identified as being in
tion and soft shadows is closely related to the calculation of
shadow or illuminated, the information from the depth map
the visible portion of the light source.
is combined with an item buffer, following Hourcade and
Nicolas26 . Soler and Sillion45 observe that the percentage of the
source area visible from a receiving point can be expressed
Discussion The aim of this algorithm is to produce percep- as a simple convolution for a particular configuration. When
tually pleasing, rather than physically exact, soft shadows. the light source, occluder, and receiver all lie in parallel
The width of the penumbra region depends on the ratio of planes, the soft shadow image on the receiver is obtained
the respective distances to the light source of the occluder by convolving an image of the receiver and an image of the
and the receiver. The penumbra region is larger if the oc- light source. While this observation is only mathematically
cluder is far from the receiver, and smaller if the occluder is valid in this very restrictive configuration, the authors de-
close to the receiver. scribe how the same principle can be applied to more general
Of course, the algorithm suffers from several shortcom- configurations:
ings. Since the shadow is only determined by a single sam- First, appropriate imaging geometries are found, even
ple shadow map, it can fail to identify the proper shadowing when the objects are non-planar and/or not parallel. More
edge. It works better if the light source is far away from the importantly, the authors also describe an error-driven algo-
occluder. The middle of the penumbra region is placed on rithm in which the set of occluders is recursively subdivided
the boundary of the shadow from the single sample, which according to an appropriate error estimate, and the shadows
is not physically correct. created by the subsets of occluders are combined to yield the
The strongest point of this algorithm is its speed. Since it final soft shadow image.
only needs to compute a single shadow map, it can achieve
framerates of 5 to 20 frames per second, compared with 2 to Discussion The convolution technique’s main advantages
3 frames per second for multi-samples image-based meth- are the visual quality of the soft shadows (not their phys-
ods. The key parameter in this algorithm is R, the search ical fidelity), and the fact that it operates from images of
radius. For smaller search values of R, the algorithms works the source and occluders, therefore once the images are ob-
faster, but can miss large penumbras. For larger values of R, tained the complexity of the operations is entirely under con-
the algorithm can identify larger penumbras, but takes longer trol. Sampling is implicitly performed when creating a light
for each rendering. source image, and the combination of samples is handled
197
by the convolution operation, allowing very complex light

source shapes.
The main limitation of the technique is that the soft
shadow is only correct in a restricted configuration, and the
proposed subdivision mechanism can only improve the qual-
ity when the occluder can be broken down into smaller parts.
Therefore the case of elongated polygons in th direction of
the light source remains problematic. Furthermore, the sub-
division mechanism, when it is effective in terms of quality,
involves a significant performance drop.
3.2. Object-Based Approaches

Figure 16: Extending the shadow volume of an occluder
Several methods can be used to compute soft shadows in with cones and planes.
animated scenes using object-based methods:
1. Combining together several shadow volumes taken from
point samples on the light source, in a manner similar to method, thus filling in black the parts of the map that are
the method described for shadow maps in Section 3.1.1. occluded.
2. extending the shadow volume19, 53, 11 using a specific
heuristic (Plateaus19 , Penumbra Maps53 , Smoothies11 ). Then, the edges of the silhouette of the objects are trans-
3. computing a penumbra volume for each edge of the formed into volumes (see Figure 16):
shadow silhouette2, 4, 5 . • All the vertices of the silhouette are first turned into cones,
with the radius of the cone depending on the distance be-
3.2.1. Combining several hard shadows tween the occluder vertex and the ground, thus simulating
a spherical light source.
Method The simplest way to produce soft shadows with • then edges joining adjacent vertices are turned into sur-
the shadow volume algorithm is to take several samples on faces. For continuity, the surface joining two cones is an
the light source, compute a hard shadow for each sample hyperboloid, unless the two cones have the same radius
and average the pictures produced. It simulates an area light (that is, if the two original vertices are at the same distance
source, and gives us the soft shadow effect. of the ground), in which case the hyperboloid degenerates
However, the main problem with this method, as with to a plane.
the equivalent method for shadow maps, is the number of These shadow volumes are then projected on the receiver
samples it requires to produce a good-looking soft shadow, and colored using textures: the axis of the cone is black, and
which precludes any real-time application. Also, it requires the contour is white. This texture is superimposed with the
the use of an accumulation buffer, which is currently not sup- shadow volume texture: Haines’ algorithm only computes
ported on standard graphics hardware. the outer penumbra.
An interesting variation has been proposed by Vignaud47 , One important parameter in the algorithm is the way we
in which shadow volumes from a light source whose position color the penumbra volume; it can be done using Gouraud
changes with time are added in the alpha buffer, mixed with shading, values from the Z-buffer or using a 1D texture.
older shadow volumes, producing a soft shadow after a few The latter gives more control over the algorithm, and allows
frames where the viewer position does not change. penumbra to decrease using any function, including sinu-
soid.
3.2.2. Soft Planar Shadows Using Plateaus
The first geometric approach to generate soft shadows has Discussion The first limitation of this method is that it is
been implemented by Haines19 . It assumes a planar receiver, limited to shadows on planar surfaces. It also assumes a
and generates an attenuation map that represents the soft spherical light source. The size of the penumbra only de-
shadow. The attenuation map is created by converting the pends on the distance from the receiver to the occluders, not
edges of the occluders into volumes, and is then applied to from the distance between the light source and the occlud-
the receiver as a modulating texture. ers. Finally, it suffers from the same fillrate bottleneck as the
original shadow volume algorithm.
Method The principle of the plateaus method19 is to gener- A significant improvement is Wyman and Hansen53 ’s
ate an attenuation map, representing the soft shadow. The Penumbra Map method: the interpolation step is done us-
attenuation map is first created using the shadow volume ing programmable graphics hardware6, 20, 14 , generating a
198
penumbra map that is applied on the model, along with a

shadow map. Using a shadow map to generate the umbra re-
gion removes the fill-rate bottleneck and makes the method
very robust. Wyman and Hansen report framerate of 10 to 15
frames per second on scenes with more than 10,000 shadow-
casting polygons.
The main limitation in both methods19, 53 is that they only
compute the outer penumbra. As a consequence, objects will
always have an umbra, even if the light source is very large
with respect to the occluders. This effect is clearly notice-
able, as it makes the scene appear much darker than antici-
pated, except for very small light sources. Figure 17: Computing the penumbra wedge of a silhouette
edge: the wedge is a volume based on the silhouette edge
3.2.3. Smoothies11 and encloses the light source.
Chan and Durand11 present a variation of the shadow vol-
ume method that uses only graphics hardware for shadow
generation.
3.2.4. Soft Shadow Volumes2, 4, 5
Method We start by computing the silhouette of the object.
This silhouette is then extended using “smoothies”, that are Akenine-Möller and Assarsson2 , Assarsson and Akenine-
planar surfaces connected to the edges of the occluder and Möller4 and Assarsson et al.5 have developed an algorithm
perpendicular to the surface of the occluder. to compute soft shadows that builds on the shadow volume
We also compute a shadow map, which will be used for method and uses the programmable capability of modern
depth queries. The smoothies are then textured taking into graphics hardware6, 20, 14 to produce real-time soft shadows.
account the distance of each silhouette vertex to the light
source, and the distance between the light source and the Method The algorithm starts by computing the silhou-
receiver. ette of the object, as seen from a single sample on the
light source. For each silhouette edge, we build a silhouette
In the rendering step, first we compute the hard shadow
wedge, that encloses the penumbra caused by this edge (see
using the shadow map, then the texture from the smooth-
Figure 17). The wedge can be larger than the penumbra, that
ies is projected onto the objects of the scene to create the
is we err on the safe side.
penumbra.
Then, we render the shadow volume, using the standard
Discussion As with Haines19 , Wyman and Hansen53 and method (described in Section 2.5.2) in a visibility buffer.
Parker41 , this algorithm only computes the outer penumbra. After this first pass, the visibility buffer contains the hard
As a consequence, occluders will always project an umbra, shadow.
even if the light source is very large with respect to the oc-
cluders. As mentionned earlier, this makes the scene appear In a subsequent pass, this visibility buffer is updated so
much darker than anticipated, an effect that is clearly notice- that it contains the soft shadow values. This is done by ren-
able except for very small light sources. dering the front-facing triangles of each wedge. For each
pixel covered by these triangles, we compute the percent-
The size of the penumbra depends on the ratio of the dis- age of the light source that is occluded, using fragment
tances between the occluder and the light source, and be- programs20 . For pixels that are covered by the wedge but
tween receiver and light source, which is perceptually cor- in the hard shadow (as computed by the previous pass), we
rect. compute the percentage of the light source that is visible, and
Connection between adjacent edges is still a problem with add this value to the visibility buffer. For pixels covered by
this algorithm, and artefacts appear clearly except for small the wedge but in the illuminated part of the scene, we com-
light sources. pute the percentage of the light source that is occluded and
substract this value from the visibility buffer (see Figures 18
The shadow region is produced using the shadow map and 19).
method, which removes the problem with the fill rate bot-
tleneck experienced with all other methods based on the After this second pass, the visibility buffer contains the
shadow volume algorithm. As with the previous method53 , percentage of visibility for all pixels in the picture. In a third
the strong point of this algorithm is its robustness: the au- pass, the visibility buffer is combined with the illumination
thors have achieved 20 frames per second on scenes with computed using the standard OpenGL lighting model, giving
more than 50,000 polygons. the soft shadowed picture of the scene.
199
seen from the area light source, is very different from the
silhouette computed using the single sample. Such scenes
e0 include scenes where a large area light source is close to
the object (see Figure 7), and scenes where the shadows of
c e1 several objects are combined together (as in Figure 6). In
those circumstances, it is possible to compute a more accu-
rate shadow by splitting the light source into smaller light
sources. The authors report that splitting large light sources
into 2 × 2 or 3 × 3 smaller light sources is usually enough
Figure 18: Computing the area of the light source that is
to remove visible artefacts. It should be noted that splitting
covered by a given edge. The fragment program computes
the light source into n light sources does not cut the speed
the hatched area for each pixel inside the corresponding
of the algorithm by n, since the rendering time depends on
wedge.
the number of pixels covered by the penumbra wedges, and
smaller light sources have smaller penumbra wedges.
b b One key to the efficiency of the algorithm is its use of

a fragment programs20 . The fragment programs take as input
a the projections of the extremities of the edge onto the plane
c c c
of the light source, and give as output the percentage of the
light source that is occluded by the edge (see Figure 18). If
100% 10% 8% 82%
several edges are projecting onto the light source, their con-
- - = tributions are simply added (see Figure 19) — this addition
is done in the framebuffer. The authors have implemented
a b 1-a-b
several fragment programs, for spherical light sources, for
textured rectangular light sources and for non-textured rect-
Figure 19: Combining several connected edges. The portion
angular light sources.
of the light source that is occluded is equal to the sum of the
portions of the light source occluded by the different edges.
4. Classification
4.1. Controlling the time
Discussion The complexity of the algorithm depends on Algorithms used in real time or interactive applications must
the number of edges in the silhouette of the object, and on be able to run at a tuneable framerate, in order to spend less
the number of pixels covered by each penumbra wedge. As time for rendering at places where there is a lot of computa-
a consequence, the easiest optimisation of the algorithm is tion taking place, and more time when the processor is avail-
to compute tighter penumbra wedges5 . able.
The main advantage of this algorithm is its speed. Using Ideally, soft shadow methods used in real-time applica-
programmable graphics hardware for all complex computations should take as input the amount of time available for
tions, and tabulating complex functions into pre-computed rendering, and return a soft shadow computed to the best of
textures, framerates of 150 frames per second are obtained the algorithm within the prescribed time limit. Since this re-
on simple scenes, 50 frames per second on moderately com- view focuses on hot research algorithms, this feature has not
plex scenes (1,000 shadow-casting polygons, with a large been implemented in any of the algorithms reviewed here.
light source), with very convincing shadows. Performance However, all of these algorithms are tunable in the sense that
depends mostly on the number of pixels covered by the there is some sort of parameter that the user can tweak, go-
penumbra wedges, so smaller light sources will result in ing from soft shadows that are computed very fast, but are
faster rendering. possibly wrong, to soft shadows that can take more time to
It should be noted that although a single sample is used to compute but are either more visually pleasing or more phys-
compute the silhouette of the object, the soft shadow com- ically accurate.
puted by this algorithm is physically exact in simple cases, Several of these parameters are available to a various de-
since visibility is computed on the entire light source. More gree in the methods reviewed:
precisely this happens when the silhouette of the occluder
• The easiest form of user control is the use of a differ-
remains the same for all points on the light source, e.g. for a
ent level-of-detail for the geometry of the occluders. Sim-
convex object that is distant enough from the light source.
pler geometry will result in faster rendering, either with
The weak point of the algorithm is that it computes the image-based methods or with object-based methods. It
silhouette of the object using only a single sample. It would can be expected that the difference in the shadow will not
fail on scenes where the actual silhouette of the object, as be noticeable with animated soft shadows.
200
Method Time Quality Tunable Light Scene Required Hardware

Image-based
Multi-samples22, 25 I * Y Polygon 1 planar receiver
Distributed Multi-samples28 RT ** Y Planar ShadowMap
Single sample9, 33 RT * Y Sphere ShadowMap
Convolution45 I ** Y Polygon 2D Convol.
Visibility Channel24, 54 I ** Y Linear, Polygon 2D Convol.
Geometry-based
Plateaus19 I ** Y Sphere 1 planar receiver
Penumbra Map53 RT ** Y Sphere Vertex & Frag. Programs
Smoothie11 RT ** Y Sphere Vertex & Frag. Programs
Soft Shadow Volumes2, 4, 5 RT *** Y Sphere, Rect. Fragment Programs
Table 1: Comparison of soft shadows algorithms (see Section 4 for details)
• Another form of user control is to add more samples physically exact shadows if the number of samples is
on the light source22, 25, 1 , or to subdivide large light sufficient. However, with current hardware, the number
sources into a set of smaller ones2, 4, 5, 24, 54 . It should be of samples compatible with interactive applications gives
noted that the order of magnitude for this parameter is shadows that are not visually excellent (hence the poor
variable: 256 to 1024 samples are required for point- mark these methods receive in table 1).
based methods22, 25, 1 to produce shadows without arte- Physically exact on simple scenes: Methods that compute
facts, while area-based methods2, 4, 5, 24, 54 just need to cut the percentage of the light source that is visible from the
the light source into 2 × 2 or 3 × 3 smaller sources. Ei- current pixel will give physically exact shadows in places
ther way, the rendering time is usually multiplied by the where the assumptions they make on the respective ge-
number of samples or sources. ometry of the light source and the occluders are verified.
• All image-based methods are also tuneable by changing For example, soft shadow volumes4, 5 give physically ex-
the resolution of the buffer. act shadows for isolated convex objects, provided that the
• Other parameters are method-specific: silhouette computed is correct (that the occluder is far
away from the light source). Visibility channel24, 54 gives
– the single sample soft shadows9 method is tuneable by
physically exact shadows for convex occluders and lin-
changing the search radius;
ear light sources24 , and for isolated edges and polygonal
– Convolution45 is tuneable by subdividing the occluders
light sources54 . Convolution45 is physically exact for pla-
into several layers;
nar and parallel light source, receiver and occluder.
– Plateaus19 are tuneable by changing the number of ver-
tices used to discretize the cones and patches; Always approximate: All methods that restrict them-
– Smoothies11 are tuneable by changing the maximum selves to computing only the inner- or the outer-
width of the smoothies; penumbra are intrisically always approximate. They in-
clude single-sample soft shadows using shadow-width
map33 , plateaus19 and smoothies11 . The original imple-
4.2. Controlling the aspect mentation of single sample soft shadows9 computes both
the inner- and the outer-penumbra, but gives them always
Another important information in chosing a real-time soft the same width, which is not physically exact.
shadow algorithm is the aspect of the shadow it produces.
Some of the algorithms described in this review can produce
a physically exact solution if we allow them a sufficient ren-
dering time. Other methods produce a physically exact solu-
The second class of methods is probably the more inter-
tion in simple cases, but are approximate in more complex
esting for producing nice looking pictures. While the con-
scenes, and finally a third class of methods produce shadows
ditions imposed seem excessively hard, it must be pointed
that are always approximate, but are usually faster to com-
out that they are conditions for which it is guaranteed that
pute.
the shadow is exact in all the points of the scene. In most
Physically exact (time permitting): Methods based on places of a standard scene, these methods will also produce
point samples on the light source22, 25, 1 will produce physically exact shadows.
201
4.3. Number and shape of the light sources Receiver: The strongest restriction is when the object re-
ceiving shadows is a plane, as with the plateaus method19 .
The first cause for the soft shadow is the light source. Each
Multi-sample soft shadow25, 22 is also restricted to a small
real-time soft shadow method makes an assumption on the
number of receivers for interactive rendering. In that case,
light sources, their shapes, their angles of emission and more
self-shadowing is not applicable.
importantly their number.
Field of emission: All the methods that are based on an Self-shadowing: The convolution45 method requires that
image of the scene computed from the light source are re- the scene is cut into clusters, within which no self-shadows
stricted with respect to the field of emission of the light are computed.
source, as a field of emission that is too large will result in
distortions in the image. This restriction applies to all image-
Silhouette: For all the methods that require a silhouette ex-
based algorithms, plus smoothies11 and volume-based algo-
traction — such as object-based methods — it is implicitly
rithms if the silhouette is computed using discontinuities in
assumed that we can compute a silhouette for all the objects
the shadow map39 .
in the scene. In practice, this usually means that the scene is
On the contrary, volume-based methods can handle omni- made of closed triangle meshes.
directional illumination.
Shape: For extended light sources, the influence of the

4.5. New generation of GPUs
shape of the light source on a soft shadow is not directly per-
ceptible. Most real-time soft shadow methods use this prop- Most real-time soft shadow methods use the features of the
erty by restricting themselves to simple light source shapes, graphics hardware that were available to the authors at the
such as spheres or rectangles: time of writing:
• Single-sample soft shadows9, 33 , plateaus19 and Shadow-map: all image-based methods use the
smoothies11 assume a spherical light source. Soft GL_ARB_SHADOW extension for shadow maps. This
shadow volumes5 also work with a spherical light source. extension (or an earlier version) is available, for example,
• Visibility channel24 was originally restricted to linear light on Silicon Graphics hardware above the Infinite Reality
sources. 2, on NVIDIA graphics cards above the GeForce 3 and
• Subsequent implementation of the visibility channel on ATI graphics above the Radeon9500.
works with polygonal light sources54 .
Imaging subset: along with this extension, some methods
• Other methods place less restriction on the light source.
also compute convolutions on the shadow map. These
Multi-sample methods25, 1 can work with any kind of light
convolutions can be computed in hardware if the Imag-
source. Convolution45 are also not restricted. However, in
ing Subset of the OpenGL specification is present. This is
both cases, the error in the algorithm is smaller for planar
the case on all Silicon Graphics machines and NVIDIA
light sources.
cards.
• Convolution45 and soft shadow volumes4, 5 work with tex-
Programmable GPU: finally, the most recent real-time
tured rectangles, thus allowing any kind of planar light
soft shadow methods use the programming capability in-
source. The texture can even be animated4, 5 .
troduced in recent graphics hardware. Vertex programs14
and fragment programs21 are used for single-sample
Number: All real-time soft shadow algorithms are assum- soft shadows33 , penumbra maps53 , smoothies11 and soft
ing a single light source. Usually, computing the shadow shadow volumes4, 5 . In practice, this restricts these algo-
from several light sources results in multiplying the ren- rithms to only the latest generation of graphics hardware,
dering time by the number of light sources. However, for such as the NVIDIA GeForce FX or the ATI Radeon 9500
all the methods that work for any kind of planar light and above.
source25, 1, 45, 4, 5 , it is possible to simulate several co-planar
light sources by placing the appropriate texture on a plane. Many object-based algorithms suffer from the fact that
This gives us several soft shadows in a single application of they need to compute the silhouette of the occluders, a
the algorithm. However, it has a cost: since the textured light costly step that can only be done on the CPU. Wyman
source is larger, the algorithms will run more slowly. and Hansen53 report that computing the silhouette of a
moderately complex occluder (5000 polygons) uses 10 ms
in their implementation. If the next generation of graph-
4.4. Constraints on the scene
ics hardware would include the possibility to compute
The other elements causing shadows are the occluders and this silhouette entirely on the graphics card10 , object-based
the receivers. Most real-time soft shadows methods make algorithms53, 11, 2, 4, 5 would greatly benefit from the speed-
some assumptions on the scene, either explicit or implicit. up.
202
5. Conclusions 3. Tomas Akenine-Möller and Eric Haines. Real-Time

Rendering. A K Peters Ltd, 2nd edition, 2002. 2
In this State of the Art Review, we have described the is-
sues encountered when working with soft shadows. We have 4. Ulf Assarsson and Tomas Akenine-Möller. A
presented existing algorithms that produce soft shadows in geometry-based soft shadow volume algorithm using
real time. Two main categories of approaches have been re- graphics hardware. ACM Transactions on Graphics
viewed, based on shadow maps and shadow volumes. Each (SIGGRAPH 2003), 22(3), 2003. 14, 15, 17, 18
one has advantages and drawbacks, and none of them can
simultaneously solve all the problems we have mentioned. 5. Ulf Assarsson, Michael Dougherty, Michael Mounier,
This motivated a discussion and classification of these meth- and Tomas Akenine-Möller. An optimized soft shadow
ods, hopefully allowing easier algorithm selection based on volume algorithm with real-time performance. In
a particular application’s constraints. Graphics Hardware, 2003. 4, 5, 14, 15, 16, 17, 18
We have seen that the latest algorithms benefit from the 6. ATI. SmartshaderTM technology white paper. http://
programmability of recent graphics hardware. Two main di- www.ati.com/products/pdf/smartshader.pdf, 2001. 14,
rections appear attractive to render high-quality soft shad- 15
ows in real time: by programming graphics hardware, and by 7. Harlen Costa Batagelo and Ilaim Costa Júnior. Real-
taking advantage simultaneously of both image-based and time shadow generation using BSP trees and stencil
object-based techniques. Distributed rendering, using for in- buffers. In SIBGRAPI, volume 12, pages 93–102, Oc-
stance PC clusters, is another promising avenue although lit- tober 1999. 8
tle has been achieved so far. Interactive display speeds can
be obtained today even on rather complex scenes. Continu- 8. Stefan Brabec. Personnal communication, May 2003.
ing improvements of graphics technology — in performance 13
and programmability — lets us expect that soft shadows will 9. Stefan Brabec and Hans-Peter Seidel. Single sample
soon become a common standard in real-time rendering. soft shadows using depth maps. In Graphics Interface,
2002. 9, 12, 17, 18
Acknowledgments
10. Stefan Brabec and Hans-Peter Seidel. Shadow vol-
The “Hugo” robot used in the pictures of this paper was cre- umes on programmable graphics hardware. Computer
ated by Laurence Boissieux. Graphics Forum (Eurographics 2003), 25(3), Septem-
ber 2003. 8, 18
This work was supported in part by the “ACI Jeunes
Chercheurs” CYBER of the French Ministry of Research, 11. Eric Chan and Fredo Durand. Rendering fake soft shad-
and by the “Région Rhône-Alpes” through the DEREVE re- ows with smoothies. In Rendering Techniques 2003
search consortium. (14th Eurographics Symposium on Rendering). ACM
We wish to express our gratitude to the authors of the Press, 2003. 14, 15, 17, 18
algorithms described in this review, who have provided us 12. Franklin C. Crow. Shadow algorithms for computer
with useful detailed information about their work, and to graphics. Computer Graphics (SIGGRAPH 1977),
the anonymous reviewers whose comments and suggestions 11(3):242–248, 1977. 8
have significantly improved the paper.
13. George Drettakis and Eugene Fiume. A fast shadow
Remark: All the smooth shadows pictures in this paper algorithm for area light sources using backprojection.
were computed with distributed ray-tracing, using 1024 In Computer Graphics (SIGGRAPH 1994), Annual
samples on the area light sources. Conference Series, pages 223–230. ACM SIGGRAPH,
1994. 3, 6
References 14. Cass Everitt. OpenGL ARB vertex pro-

gram. http://developer.nvidia.com/docs/IO/8230/
1. Maneesh Agrawala, Ravi Ramamoorthi, Alan Heirich, GDC2003_OGL_ARBVertexProgram.pdf, 2003. 8,
and Laurent Moll. Efficient image-based methods for 14, 15, 18
rendering soft shadows. In Computer Graphics (SIG-
GRAPH 2000), Annual Conference Series, pages 375– 15. Cass Everitt and Mark J. Kilgard. Practical and robust
384. ACM SIGGRAPH, 2000. 9, 10, 17, 18 stenciled shadow volumes for hardware-accelerated
rendering. http://developer.nvidia.com/object/robust_
2. Tomas Akenine-Möller and Ulf Assarsson. Approxi-
shadow_volumes.html, 2002. 8, 9
mate soft shadows on arbitrary surfaces using penum-
bra wedges. In Rendering Techniques 2002 (13th Eu- 16. Cass Everitt and Mark J. Kilgard. Optimized stencil
rographics Workshop on Rendering), pages 297–306. shadow volumes. http://developer.nvidia.com/docs/IO/
ACM Press, 2002. 14, 15, 17, 18 8230/GDC2003_ShadowVolumes.pdf, 2003. 9
203
17. Cass Everitt, Ashu Rege, and Cem Cebenoyan. Hard- 30. Daniel Kersten, Pascal Mamassian, and David C. Knill.
ware shadow mapping. http://developer.nvidia.com/ Moving cast shadows and the perception of relative
object/hwshadowmap_paper.html. 7 depth. Technical Report no 6, Max-Planck-Institut fuer
biologische Kybernetik, 1994. 1, 2
18. Randima Fernando, Sebastian Fernandez, Kavita Bala,
and Donald P. Greenberg. Adaptive shadow maps. 31. Daniel Kersten, Pascal Mamassian, and David C. Knill.
In Computer Graphics (SIGGRAPH 2001), Annual Moving cast shadows and the perception of relative
Conference Series, pages 387–390. ACM SIGGRAPH, depth. Perception, 26(2):171–192, 1997. 1, 2
2001. 7
32. Mark J. Kilgard. Improving shadows and reflections
19. Eric Haines. Soft planar shadows using plateaus. Jour- via the stencil buffer. http://developer.nvidia.com/docs/
nal of Graphics Tools, 6(1):19–27, 2001. 14, 15, 17, IO/1348/ATT/stencil.pdf, 1999. 8
18
33. Florian Kirsch and Juergen Doellner. Real-time soft
20. Evan Hart. ARB Fragment Program: Frag- shadows using a single light sample. Journal of WSCG
ment level programmability in OpenGL. (Winter School on Computer Graphics 2003), 11(1),
http://www.ati.com/developer/gdc/GDC2003_OGL_ 2003. 12, 13, 17, 18
ARBFragmentProgram.pdf, 2003. 14, 15, 16
34. David C. Knill, Pascal Mamassian, and Daniel Kersten.
21. Evan Hart. Other New OpenGL Stuff: Important Geometry of shadows. Journal of the Optical Society
stuff that doesn’t fit elsewhere. http://www.ati.com/ of America, 14(12):3216–3232, 1997. 1
developer/gdc/GDC2003_OGL_MiscExtensions.pdf,
35. Johann Heinrich Lambert. Die freye Perspektive. 1759.
2003. 9, 18
1, 2
22. Paul S. Heckbert and Michael Herf. Simulating soft
36. Tom Lokovic and Eric Veach. Deep shadow maps.
shadows with graphics hardware. Technical Report
In Computer Graphics (SIGGRAPH 2000), Annual
CMU-CS-97-104, Carnegie Mellon University, January
Conference Series, pages 385–392. ACM SIGGRAPH,
1997. 9, 17, 18
2000. 7
23. Tim Heidmann. Real shadows, real time. In Iris Uni-
37. Céline Loscos and George Drettakis. Interactive
verse, volume 18, pages 23–31. Silicon Graphics Inc.,
high-quality soft shadows in scenes with moving ob-
1991. 8
jects. Computer Graphics Forum (Eurographics 1997),
24. Wolfgang Heidrich, Stefan Brabec, and Hans-Peter Sei- 16(3), September 1997. 6
del. Soft shadow maps for linear lights high-quality. In
38. Pascal Mamassian, David C. Knill, and Daniel Kersten.
Rendering Techniques 2000 (11th Eurographics Work-
The perception of cast shadows. Trends in Cognitive
shop on Rendering), pages 269–280. Springer-Verlag,
Sciences, 2(8):288–295, 1998. 1, 2
2000. 5, 9, 11, 17, 18
39. Michael D. McCool. Shadow volume recontsruction
25. Michael Herf. Efficient generation of soft shadow tex-
from depth maps. ACM Transactions on Graphics,
tures. Technical Report CMU-CS-97-138, Carnegie
19(1):1–26, 2000. 8, 18
Mellon University, 1997. 9, 10, 17, 18
40. Steve Morein. ATI radeon hyperz technology. In
26. J.-C. Hourcade and A. Nicolas. Algorithms for
Graphics Hardware Workshop, 2000. 6
antialiased cast shadows. Computers & Graphics,
9(3):259–265, 1985. 7, 13 41. Steven Parker, Peter Shirley, and Brian Smits. Single
sample soft shadows. Technical Report UUCS-98-019,
27. Geoffre S. Hubona, Philip N. Wheeler, Gregory W. Shi-
Computer Science Department, University of Utah, Oc-
rah, and Matthew Brandt. The role of object shadows
tober 1998. 12, 15
in promoting 3D visualization. ACM Transactions on
Computer-Human Interaction, 6(3):214–242, 1999. 1, 42. William T. Reeves, David H. Salesin, and Robert L.
2 Cook. Rendering antialiased shadows with depth maps.
Computer Graphics (SIGGRAPH 1987), 21(4):283–
28. M. Isard, M. Shand, and A. Heirich. Distributed ren-
291, 1987. 7
dering of interactive soft shadows. In 4th Eurograph-
ics Workshop on Parallel Graphics and Visualization, 43. Stefan Roettger, Alexander Irion, and Thomas Ertl.
pages 71–76. Eurographics Association, 2002. 10, 17 Shadow volumes revisited. In Winter School on Com-
puter Graphics, 2002. 8
29. Brett Keating and Nelson Max. Shadow penumbras for
complex objects by depth-dependent filtering of multi- 44. Mark Segal, Carl Korobkin, Rolf van Widenfelt, Jim
layer depth images. In Rendering Techniques 1999 Foran, and Paul Haeberli. Fast shadows and lighting ef-
(10th Eurographics Workshop on Rendering), pages fects using texture mapping. Computer Graphics (SIG-
205–220. Springer-Verlag, 1999. 10 GRAPH 1992), 26(2):249–252, July 1992. 7
204
45. Cyril Soler and François X. Sillion. Fast calculation of

soft shadow textures using convolution. In Computer
Graphics (SIGGRAPH 1998), Annual Conference Se-
ries, pages 321–332. ACM SIGGRAPH, 1998. 4, 6, 9,
13, 17, 18
46. Marc Stamminger and George Drettakis. Perspective
shadow maps. ACM Transactions on Graphics (SIG-
GRAPH 2002), 21(3):557–562, 2002. 7
47. Sylvain Vignaud. Real-time soft shadows on geforce
class hardware. http://tfpsly.planet-d.net/english/3d/
SoftShadows.html, 2003. 14
48. Leonardo Da Vinci. Codex Urbinas. 1490. 1, 2
49. Leonard Wanger. The effect of shadow quality on
the perception of spatial relationships in computer gen-
erated imagery. Computer Graphics (Interactive 3D
Graphics 1992), 25(2):39–42, 1992. 1, 2
50. Lance Williams. Casting curved shadows on curved
surfaces. Computer Graphics (SIGGRAPH 1978),
12(3):270–274, 1978. 7
51. Andrew Woo. The shadow depth map revisited. In
Graphics Gems III, pages 338–342. Academic Press,
1992. 7
52. Andrew Woo, Pierre Poulin, and Alain Fournier. A sur-
vey of shadow algorithms. IEEE Computer Graphics
and Applications, 10(6):13–32, November 1990. 1, 2
53. Chris Wyman and Charles Hansen. Penumbra maps:
Approximate soft shadows in real-time. In Render-
ing Techniques 2003 (14th Eurographics Symposium on
Rendering). ACM Press, 2003. 14, 15, 17, 18
54. Zhengming Ying, Min Tang, and Jinxiang Dong. Soft
shadow maps for area light by area approximation. In
10th Pacific Conference on Computer Graphics and
Applications, pages 442–443. IEEE, 2002. 9, 11, 17,
18
55. Hansong Zhang. Forward shadow mapping. In Ren-
dering Techniques 1998 (9th Eurographics Workshop
on Rendering), pages 131–138. Springer-Verlag, 1998.
8
205
4.7.3 Soft shadow maps: efficient sampling of light source visiblity (CGF 2006)
Auteurs : Lionel A, Nicolas H, Marc L, Jean-Marc H, Charles
H et François X. S.
Date : décembre 2006
Volume 0 (1981), Number 0 pp. 1–17
Soft Shadow Maps:

Efficient Sampling of Light Source Visibility
Lionel Atty1 , Nicolas Holzschuch1 , Marc Lapierre2 , Jean-Marc Hasenfratz1,3 , Charles Hansen4 and François X. Sillion1
1 ARTIS/GRAVIR–IMAG INRIA 2 MOVI/GRAVIR–IMAG INRIA

3 Université Pierre Mendès-France 4 School of Computing, University of Utah
Figure 1: Our algorithm computes soft shadows in real-time (left) by replacing the occluders with a discretized version (right), using informa-
tion from the shadow map. This scene runs at 84 fps.
Abstract
Shadows, particularly soft shadows, play an important role in the visual perception of a scene by providing visual
cues about the shape and position of objects. Several recent algorithms produce soft shadows at interactive rates,
but they do not scale well with the number of polygons in the scene or only compute the outer penumbra. In
this paper, we present a new algorithm for computing interactive soft shadows on the GPU. Our new approach
provides both inner- and outer-penumbra, and has a very small computational cost, giving interactive frame-rates
for models with hundreds of thousands of polygons.
Our technique is based on a sampled image of the occluders, as in shadow map techniques. These shadow samples
are used in a novel manner, computing their effect on a second projective shadow texture using fragment programs.
In essence, the fraction of the light source area hidden by each sample is accumulated at each texel position of
this Soft Shadow Map. We include an extensive study of the approximations caused by our algorithm, as well as
its computational costs.
Categories and Subject Descriptors (according to ACM CCS): I.3.1 [Computer Graphics]: Graphics processors I.3.7
[Computer Graphics]: Color, shading, shadowing, and texture
1. Introduction imity resulting in a hard shadow upon contact. The advent

of powerful graphics hardware on low-cost computers has
Shadows add important visual information to computer-
led to the emergence of many interactive soft shadow algo-
generated images. The perception of spatial relationships be-
rithms (for a detailed study of these algorithms, please refer
tween objects can be altered or enhanced simply by mod-
to [HLHS03]).
ifying the shadow shape, orientation, or position [WFG92,
Wan92,KMK97]. Soft shadows, in particular, provide robust In this paper, we introduce a novel method based on
contact cues by the hardening of the shadow due to prox- shadow maps to interactively render soft shadows. Our
c The Eurographics Association and Blackwell Publishing 2006. Published by Blackwell

Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden,
MA 02148, USA.
207
L. Atty et al. / Soft Shadow Maps
tions, we conduct an extensive analysis of our algorithm;

first, in Section 5, we study the approximations in our soft
shadows, then in Section 6 we study the rendering times of
our algorithm. Both studies are done first from a theoretical
point of view, then experimentally. Finally, in Section 7, we
conclude and expose possible future directions for research.
2. Previous Work
Researchers have investigated shadow algorithms for
Figure 2: Applying our algorithm (200, 000 polygons, oc- computer-generated images for nearly three decades. The
cluder map 256 × 256, displayed at 32 fps). reader is referred to a recent state-of-the art report by Hasen-
fratz et al. [HLHS03], the overview by Woo et al. [WPF90]
and the book by Akenine-Möller and Haines [AMH02].
method interactively computes a projective shadow texture, The two most common methods for interactively produc-
the Soft Shadow Map, that incorporates soft shadows based ing shadows are shadow maps [Wil78] and shadow vol-
on light source visibility from receiver objects (see Fig. 2). umes [Cro77]. Both of these techniques have been extended
This texture is then projected onto the scene to provide infor soft shadows. In the case of shadow volumes, Assarsson
teractive soft shadows of dynamic objects and dynamic area and Akenine-Möller [AAM03] used penumbra wedges in a
light sources. technique based on shadow volumes to produce soft shad-
There are several advantages to our technique when com- ows. Their method depends on locating silhouette edges to
pared to existing interactive soft-shadow algorithms: First, form the penumbra wedges. While providing good soft shad-
it is not necessary to compute silhouette edges. Second, the ows without an overestimate of the umbra, the algorithm is
algorithm is not fill-bound, unlike methods based on shadow fill-limited, particularly when zoomed in on a soft shadow
volumes. These properties provide better scaling for occlud- region. Since it is necessary to compute the silhouette edges
ing geometry than other GPU based soft shadow techniques at every frame, the algorithm also suffers from scalability is-
[WH03, CD03, AAM03]. Third, unlike some other shadow sues when rendering occluders with large numbers of poly-
map based soft shadow techniques, our algorithm does not gons.
dramatically overestimate the umbra region [WH03, CD03]. The fill-rate limitation is a well known limitation
Fourth, while other methods have relied on an interpola- of shadow-volume based algorithms. Recent publica-
tion from the umbra to the non-shadowed region to approxi- tions [CD04, LWGM04] have focused on limiting the fill-
mate the penumbra for soft shadows [AHT04,WH03,CD03, rate for shadow-volume algorithms, thus removing this lim-
BS02], our method computes the visibility of an area light itation.
source for receivers in the penumbra regions.
On shadow maps, Chan and Durand [CD03] and Wyman
Our algorithm also has some limitations when compared
and Hansen [WH03] both employed a technique which uses
to existing algorithms. First, our algorithm splits scene ge-
the standard shadow map method for the umbra region and
ometry into occluders and receivers and self shadowing is
builds a map containing an approximate penumbra region
not accounted for. Also, since our algorithm uses shadow
that can be used at run-time to give the appearance, includ-
maps to approximate occluder geometry, it inherits the well
ing hard shadows at contact, of soft shadows. While these
known issues with aliasing from shadow map techniques.
methods provide interactive rendering, both only compute
For large area light sources, the soft shadows tend to blur
the outer-penumbra, the part of the penumbra that is outside
the artifacts but for smaller area light sources, such aliasing
the hard shadow. In effect, they are overestimating the umbra
is apparent.
region, resulting in the incorrect appearance of soft shadows
We acknowledge that these limitations are important, and in the case of large area light sources. These methods also
they may prevent the use of our algorithm in some cases. depend on computing the silhouette edges in object space
However, there are many applications such as video games for each frame; this requirement limits the scalability for oc-
or immersive environments where the advantages of our cluders with large numbers of polygons.
algorithm (a very fast framerate, and a convincing soft
Arvo et al. [AHT04] used an image-space flood-fill
shadow) outweigh its limitations. We also think that this new
method to produce approximate soft shadows. Their algo-
algorithm could be the start of promising new research.
rithm is image-based, like ours, but works on a detection of
In the following section, we review previous work on shadow boundary pixels, followed by several passes to re-
interactive computation of soft shadows. In Section 3, we place the boundary by a soft shadow, gradually extending
present the basis of our algorithm, and in the following sec- the soft shadow at each pass. The main drawback of their
tion, we provide implementation details. In the next two sec- method is that the number of passes required is proportional
c The Eurographics Association and Blackwell Publishing 2006.
208
to the extent of the penumbra region, and the rendering time Compute depth map of receivers
is proportional to the number of shadow-filling passes. Compute depth map of occluders
for all pixels in occluder map
Guennebaud et al. [GBP06] also used the back projection Retrieve depth of occluder at this pixel
of each pixel in the shadow map to compute the soft shadow. Compute micro-patch associated with this pixel
Their method was developed independently of ours, yet is Compute extent of penumbra for this micro-patch
very similar. The main differences between the two methods for all pixels in penumbra extent for micro-patch
lie in the order of the computations: we compute the soft Retrieve receiver depth at this pixel
shadow in shadow map space, while they compute the soft Compute percentage of occlusion for this pixel
Add to current percentage in soft shadow map
shadow in screen space, requiring a search in the shadow
end
map.
end
Brabec and Seidel [BS02] and Kirsch and Doell- Project soft shadow map on the scene
ner [KD03] use a shadow map to compute soft shadows,
Figure 4: Our algorithm
by searching at each pixel of the shadow map for the near-
est boundary pixel, then interpolating between illumination
and shadow as a function of the distance between this pixel
and the boundary pixel and the distances between the light
source, the occluder and the receiver. Their algorithm re- depth buffers: one for the occluders (the occluder map) and
quires scanning the shadow map to look for boundary pixels, the other for the receivers.
a potentially costly step; in practical implementations they The occluder map depth buffer is used to discretize the
limit the search radius, thus limiting the actual size of the set of occluders (see Fig. 3(b)): each pixel in this occluder
penumbra region. map is converted into a micro-patch that covers the same
Soler and Sillion [SS98] compute a soft shadow map as image area but is is located in a plane parallel to the light
the convolution of two images representing the source and source, at a distance corresponding to the pixel depth. Pixels
blocker. Their technique is only accurate for planar and par- that are close to the light source are converted into small
allel objects, although it can be extended using an object hi- rectangles and pixels that are far from the light source are
erarchy. Our technique can be seen as an extension of this converted into larger rectangles. At the end of this step, we
approach, where the convolution is computed for each sam- have a discrete representation of the occluders.
ple of an occlusion map, and the results are then combined. The receiver map depth buffer will be used to provide the
Finally, McCool [McC00] presented an algorithm merg- receiver depth, as our algorithm uses the distance between
ing shadow volume and shadow map algorithms by detect- light source and receiver to compute the soft shadow values.
ing silhouette pixels in the shadow map and computing a We compute the soft shadow of each of the micro-patches
shadow volume based on these pixels. Our algorithm is sim- constituting the discrete representation of the occluders (see
ilar in that we are computing a shadow volume for each pixel Fig. 3(c)), and sum them into the soft shadow map (SSM)
in the shadow map. However, we never display this shadow (see Fig. 3(d)). This step would be potentially costly, but
volume, thus avoiding fill-rate issues. we achieve it in a reasonable amount of time with two key
points: 1) the micro-patches are parallel to the light source,
so computing their penumbra extent and their percentage of
3. Algorithm
occlusion only requires a small number of operations, and 2)
3.1. Presentation of the algorithm these operations are computed on the graphics card, exploit-
ing the parallelism of the GPU engine. The percentage of
Our algorithm assumes a rectangular light source and starts
occlusion from each micro-patch takes into account the rel-
by separating potential occluders (such as moving charac-
ative distances between the occluders, the receiver and the
ters) from potential receivers (such as the background in a
light source. Our algorithm introduces several approxima-
scene) (Fig. 3(a)). We will compute the soft shadows only
tions on the actual soft shadow. These approximations will
from the occluders onto the receivers.
be discussed in Section 5.
Our algorithm computes a Soft Shadow Map, (SSM), for
The pseudo-code for our algorithm is given in Fig. 4.
each light source: a texture containing the texelwise percent-
In the following subsections, we will review in detail the
age of occlusion from the light source. This soft shadow map
individual steps of the algorithm: discretizing the occlud-
is then projected onto the scene from the position of the light
ers (Section 3.2), computing the penumbra extent for each
source, to give soft shadows (see Fig. 2).
micro-patch (Section 3.3) and computing the percentage of
Our algorithm is an extension of the shadow map algo- occlusion for each pixel in the Soft Shadow Map (Sec-
rithm: we start by computing depth buffers of the scene. tion 3.4). Specific implementation details will be given in
Unlike the standard shadow map method, we will need two Section 4.
209
Light source Light source Light source Light source
Occluders Occluders P
Soft Shadow Map

Shadow of P
Receivers Receivers Receivers Receivers
(a) Scene view (b) Discretizing occluders (c) Soft shadows from one (d) Summing the soft shadows
micro-patch
Figure 3: The main steps of our algorithm
Light source L CL
L L’
L’
O O
Occluding patch
P P’
P’ CP
P
(a) (b)
Penumbra
Umbra
Figure 6: Finding the apex of the pyramid is reduced to a
2D problem
Figure 5: The penumbra extent of a micro-patch is a rectan-
gular pyramid
light source and have the same aspect ratio, the penumbra
3.2. Discretizing the occluders
extent of each micro-patch is a rectangular pyramid (Fig. 5).
The first step in our algorithm is a discretization of the oc- Finding the penumbra extent of the light source is equivalent
cluders. We compute a depth buffer of the occluders, as seen to finding the apex O of the pyramid (Fig. 6(a)). This reduces
from the light source, then convert each pixel in this occluder to a 2D problem, considering parallel edges (LL′ ) and (PP′ )
map into the equivalent polygonal micro-patch that lies in a on both polygons (Fig. 6(b)). Since (LL′ ) and (PP′ ) are par-
plane parallel to the light source, at the appropriate depth allel lines, we have:
and occupies the same image plane extent (see Fig. 1).
OL OL′ LL′
The occluder map is axis-aligned with the rectangular = =
OP OP′ PP′
light source and has the same aspect ratio: all micro-patches
This ratio is the same if we consider the center of each line
created in this step are also axis-aligned with the light source
segment:
and have the same aspect ratio.
OCL LL′
=
3.3. Computing penumbra extents OCP PP′
Each micro-patch in the discretized occluder is potentially Since the micro-patch and the light source have the same
LL′
blocking some light between the light source and some por- aspect ratio, the ratio r = PP ′ is the same for both sides of the
tion of the receiver. To reduce the amount of computations, micro-patch (thus, the penumbra extent of the micro-patch is
we compute the penumbra extent of the micro-patches, and indeed a pyramid).
we only compute occlusion values inside these extents.
We find the apex of the pyramid by applying a scaling to
Since the micro-patches are parallel, axis-aligned with the the center of the micro-patch (CP ), with respect to the center
210
Light source
Light source
A
Occluding patch
A= *
Occluding patch
Penum
bra ex
Virtua tent
l plane Penu
mbra
exten
t
Figure 7: The intersection between the pyramid and the vir- Figure 9: We reproject the occluding micro-patch onto the
tual plane is an axis-aligned rectangle light source and compute the percentage of occlusion.
L
CL
L’
3.4. Computing the soft shadow map
For all the pixels of the SSM lying inside this penumbra ex-
zO
tent, we compute the percentage of the light source that is
O occluded by this micro-patch. This percentage of occlusion
depends on the relative positions of the light source, the oc-
zR cluders and the receivers. To compute it, for each pixel on the
receiver inside this extent, we project the occluding micro-
facet back onto the light source [DF94] (Fig. 9). The result
R R’ of this projection is an axis-aligned rectangle; we need to
CR compute the intersection between this rectangle and the light
source.
Figure 8: Computing the position and extent of the penum-
bra rectangle for each micro-patch. Computing this intersection is equivalent to computing
the two intersections between the respective intervals on
both axes. This part of the computation is done on the GPU,
using a fragment program: the penumbra extent is converted
into an axis-aligned quad, which we draw in a float buffer.
of the light source (CL ): For each pixel inside this quad, the fragment program com-
putes the percentage of occlusion. These percentages are
−−→ r −−−→
CL O = CLCP summed using the blending capability of the graphics card
1+r (see Section 4.2).
LL′
where r is again the ratio r = PP′ .
3.5. Two-sided soft-shadow maps
We now use this pyramid to compute occlusion in the
soft shadow map (see Fig. 7). We use a virtual plane, par- As with many other soft shadow computation algo-
allel to the light source, to represent this map (which will be rithms [HLHS03], our algorithm exhibits artifacts because
projected onto the scene). The intersection of the penumbra we are computing soft shadows using a single view of the oc-
pyramid with this virtual plane is an axis-aligned rectangle. cluder. Shadow effects linked to parts of the occluder that are
We only have to compute the percentage of occlusion inside not directly visible from the light source are not visible. In
this rectangle. Fig. 10(a), our algorithm only computes the soft shadow for
the front part of the occluder, because the back part of the oc-
Computing the position and size of the penumbra rectan- cluder does not appear in the occluder map. This limitation
gle uses the same formulas as for computing the apex of the is frequent in real-time soft-shadow algorithms [HLHS03].
pyramid (see Fig. 8):
For our algorithm, we have devised an extension that
−−−→ z −−→
CLCR = R CL O solves this limitation: we compute two occluder maps. In
zO the first, we discretize the closest, front-facing faces of the
z − zO occluders (see Fig. 10(b)). In the second, we discretize the
RR′ = LL′ R
zO furthest, back-facing faces of the occluders (see Fig. 10(c)).
211
(a) Original algorithm (b) Closest, front faces of the (c) Furthest, back faces of the (d) Combining the two soft
occluder discretized with their occluder discretized with their shadow maps
shadow shadow
Figure 10: The original algorithm fails for some geometry. The two-pass method gives the correct shadow.
(a) One pass (148 fps) (b) One pass with bottom patches (c) Two passes (84 fps) (d) Ground truth
(142 fps)
Figure 11: Two-pass shadow computations enhance precision.
We then compute a soft shadow map for each occluder centage of occlusion at each pixel of the soft shadow map is
map, and merge them, using the maximum of each occluder done on the GPU (see section 4.2).
map. The resulting occlusion map has eliminated most arti-
These contributions from each micro-patch are added to-
facts (Fig. 10(d) and 11). Empirically, the cost of the two-
gether; we use for this the blending ability of the GPU: oc-
pass algorithm is between 1.6 and 1.8 times the cost of the
clusion percentages are rendered into a floating-point buffer
one-pass algorithm. Depending on the size of a model and
with blending enabled, thus the percentage values for each
the quality requirements of a given application, the second
micro-patch are automatically added to the previously com-
pass may be worth this extra cost. For example, for an ani-
puted percentage values.
mated model of less than 100, 000 polygons, the one-pass al-
gorithm renders at approximately 60 fps. Adding the second
pass drops the framerate to 35 fps — which is still interac- 4.2. Computing the intersection
tive.
For each pixel of the SSM lying inside the penumbra extent
of a micro-patch, we compute the percentage of the light
source that is occluded by this micro-patch, by projecting
4. Implementation details the occluding micro-patch back onto the light source (see
4.1. Repartition between CPU and GPU Fig. 9). We have to compute the intersection of two axis-
aligned rectangles, which is the product of the two intersec-
Our algorithm (see Fig. 4) starts by rendering two depth tions between the respective intervals on both axes.
maps, one for the occluders and one for the receivers; these
depth maps are both computed by the GPU. Then, in order We have therefore reduced our intersection problem from
to generate the penumbra extents for the micro-patches, the a 2D problem to two separate 1D problems. To further op-
occluders depth map is transferred back to the CPU. timize the computations, we use the SAT instructions in the
fragment program assembly language: without loss of gen-
On the CPU, we generate the penumbra extents for the erality, we can convert the rectangle corresponding to the
micro-patch associated to each non-empty pixel of the oc- light source to [0, 1] × [0, 1]. Each interval intersection be-
cluders depth map. We then render these penumbra extents, comes the intersection between one [a, b] interval and [0, 1].
and for each pixel, we execute a small fragment program Exploiting the SAT instruction and swizzling, computing the
to compute the percentage of occlusion. Computing the per- area of the intersection between the projection of the oc-
212
cluder [a, b] × [c, d] and the light source [0, 1] × [0, 1] only • We are adding many small values (the occlusion from
requires three instructions: each micro-patch) to form a large value (the occlusion
from the entire occluder). If the micro-patches are too
MOV_SAT rs,{a,b,c,d}
SUB rs, rs, rs.yxwz small, we run into numerical accuracy issues, especially
MUL result.color, rs.x, rs.z with floating-point numbers expressed on 16 bits. This
cause of error will be analyzed in Section 5.1.3.
Computing the [a, b] × [c, d] intervals requires projecting
the micro-patch onto the light source and scaling the projec-
tion. This uses 8 other instructions: 6 basic operations (ADD, 5.1.1. Discretization error
MUL, SUB), one reciprocal (RCP) and one texture lookup to Our algorithm computes the shadow of the discretized oc-
get the depth of the receiver. The total length of our fragment cluder, not the shadow of the actual occluder. The dis-
program is therefore 11 instructions, including one texture cretized occluder corresponds to the part of the occluder
lookup. that is visible from the camera used to compute the depth
buffers, usually the center of the light source. Although
4.3. Possible improvements we reproject each micro-patch of the discretized occluder
onto the area light source, we are missing the parts of the
As it stands, our algorithm makes a very light use of GPU occluder that are not visible from the shadow map cam-
resources: we only execute a very small fragment program, era but are still visible from some points of the area light
once for each pixel covered by the penumbra extent, and we source. This is a limitation that is frequent in real-time soft
exploit the blending ability for floating point buffers. shadow algorithms [HLHS03], especially algorithms relying
on the silhouette of the occluder as computed from a single
The main bottleneck of our algorithm is that the penum-
point [WH03, CD03, AAM03].
bra extents have to be computed on the CPU. This requires
transfering the occluders depth map to the CPU, and loop- We also use a discrete representation based on the shadow
ing over the pixels of the occluders depth map on the CPU. map, not a continuous representation of the occluder. For
It should be possible to remove this step by using the render- each pixel of the shadow map, we are potentially overes-
to-vertex- buffer function: instead of rendering the occlud- timating or underestimating the actual occluder by at most
ers depth map, we would directly render the penumbra ex- half a pixel.
tents for each micro-patch into a vertex buffer. This vertex
buffer would be rendered in a second pass, generating the If the occluder has one or more edges aligned with the
soft shadow map. edges of the shadow map, these discretization errors are of
the same sign over the edge, and add themselves; the worst
case scenario is a square aligned with the axis of the shadow
5. Error Analysis and comparison map.
In this section, we analyze our algorithm, its accuracy and For more practical occluders the discretization errors on
how it compares with the exact soft-shadows. We first study neighboring micro-patches compensate: some of the micro-
potential sources of error from a theoretical point of view, in patches overestimate the occluder while others underesti-
Section 5.1, then we conduct an experimental analysis, com- mate it.
paring the soft shadows produced with exact soft shadows,
in Section 5.2.
5.1.2. Overlapping reprojections
At any given point on the receiver, the parts of the light
5.1. Theoretical analysis
source that are occluded by two neighboring micro-patches
Our algorithm replaces the occluder with a discretized ver- should be joined exactly for our algorithm to compute the
sion. This discretization ensures interactive framerates, but exact percentage of occlusion on the light source. This is
it can also be a source of inaccuracies. From a given point typically not the case, and these parts may overlap or there
on the receiver, we are separately estimating occlusion from may be a gap between them (Fig. 12). The amount of over-
several micro-patches, and adding these occlusion values to- lap (or gap) between the occluded parts of the light source
gether. We have identified three potential sources of error in depends on the relative positions of the light source, the oc-
our algorithm: cluding micro-patches and the receiver
• We are only computing the shadow of the discretized oc- If we consider the 2D equivalent of this problem (Fig. 13),
cluder, not the shadow of the actual occluder. This source with two patches separated by δh and at a distance zO from
of error will be analyzed in Section 5.1.1. the light source, with the receiver being at a distance zR from
• The reprojections of the micro-patches on the light source the light source, there is a point P0 on the receiver where
may overlap or be disjoined. This cause of error will be there is no overlap between the occluded parts. As we move
analyzed in Section 5.1.2. away from this point, the overlap increases. For a point at a
213
Light source The amount of overlap is therefore limited by:

L zR δh
Overlap |x2 − x1 | < (3)
2 zO (zR − zO − δh)
Equation 3 represents the error our algorithm makes for

each pair of micro-patches. The overall error of our algo-
rithm is the sum of the modulus of all these errors, for all the
micro-patches projecting on the light source at a given point.
Occluding patches
This is a conservative estimate, as usually some patches
overlap while others present gaps; the actual sum of the oc-
Receiver
clusion values from all the micro-patches is closer to the real
value than what our estimation tells (see Section 5.2).
The theoretical error caused by our algorithm depends on
Figure 12: The reprojection of two neighboring micro-
several factors:
patches may overlap.
Size of the light source: The maximum amount of overlap
(Eq. 3) depends directly on the size of the light source.
L
x2 x1 The larger the light source, the larger the error. Our prac-
tical experiments confirm this.
zO Distance between micro-patches: The maximum amount
of overlap (Eq. 3) also depends linearly on δh, the dis-
δh tance in z between neighboring micro-patches. Since δh
depends on the discretization of the occluder, the error in-
zR troduced by our algorithm is related to the resolution of
the bitmap: the smaller the resolution of the bitmap, the
larger the error. Our practical experiments confirm this,
x R but there is a maximum resolution after which the error
PO does not decrease.
Note that this source of error is related to the effective
resolution of the bitmap, that is the number of pixels used
Figure 13: Computing the extent of overlap or gap between for discretizing the occluder. If the occluder occupies only
two neighboring micro-patches. a small portion of the bitmap, the effective resolution of
the bitmap is much smaller than its actual resolution. For-
tunately, the cost of the algorithm is also related to the
effective resolution of the bitmap.
distance x from P0 , the boundaries of the occluding micro- Distance to the light source/the receiver: If the occluder
patches project at abscissa x1 and x2 ; as the occluding micro- touches either the light source or the receiver, the amount
patches and the light source lie in parallel planes, we have: of overlap (Eq. 3) goes toward infinity. When the occluder
x1 zO is touching the receiver, the area where the overlap occurs
= (as defined by equation 2) goes towards 0, thus the er-
x zR − z O
ror does not appear. When the occluder is touching the
x2 zO + δh receiver, the actual effect depends on the shape of the oc-
=
x zR − zO − δh cluder. In some cases, overlaps and gaps can compensate,
resulting in an acceptable shadow.
The amount of overlap is therefore:
5.1.3. Floating-point blending accuracy
zO zO + δh
x2 − x1 = x −
zR − zO zR − zO − δh Our algorithm adds together many small scale occlusion val-
zR δh ues — the occlusion from each micro-patch — to compute a
= −x (1) large scale occlusion value — the occlusion from the com-
(zR − zO )(zR − zO − δh)
plete occluder. This addition is done with the blending abil-
x itself is limited, since the occlusion area must fall inside ity of the GPU, using blending of floating-point buffers. At
the light source: the time of writing, blending is only available in hardware
L zR − z O for 16-bits floating-point buffers. As a result, we sometimes
|x| < (2) encounter problems of numerical accuracy.
2 zO
214
(a) 1282 pixels, FP16 blending (b) 5122 pixels, FP16 blending (c) 5122 pixels, FP32 blending (d) Ground truth (CPU)
(66 Hz) (20 Hz) (CPU)
Figure 14: Blending with FP16 numbers: if the resolution of the shadow map is too high, numerical issues appear, resulting in
wrong shadows. Using higher accuracy for blending removes this issue (here, FP32 blending was done on the CPU).
Figure 14 shows an example of these problems. Uncon- scenes exhibit several interesting features. The Buddha and
ventionally, increasing the resolution of the shadow map Bunny are complex models, with folds and creases. The
makes these problems more likely to appear (for a complete Bunny also has important self-occlusion, and in our scene
study of floating-point blending accuracy, see appendix A). it is in contact with the ground, providing information on the
The best workaround is therefore to use relatively low reso- behavior of our algorithm in that case. The square plane is
lution for the occluder map, such as 128 × 128 or 256 × 256. an illustration of the special case of occluders aligned with
While this may seem a low resolution compared to other the axes of the occluders depth map.
shadow map algorithms, our shadow map is focused on the
We have tested both the one-pass and the two-pass ver-
moving occluder (such as a character), not on the entire
sions of our algorithm. We selected four separate parame-
scene, so 128 × 128 pixels is usually enough resolution.
ters: the size of the light source, the resolution of the shadow
We see this is only as a temporary issue that will disappear map and moving the occluder, either vertically from the re-
as soon as hardware FP32 blending becomes available on ceiver to the light source or laterally with respect to the light
graphics cards. source. For each parameter, we plot the variation of the error
introduced by our algorithm as a function of the parameter
and analyze the results.
5.2. Comparison with ground truth
We ran several tests to experimentally compare the shadows 5.2.2. Visual comparison with ground truth
produced by our algorithm with the actual shadows. The ref- Fig. 16 shows a side by side comparison of our algorithm
erence values were computed using occlusion queries, giv- with ground truth. Even though there are slight differences
ing an accurate estimation of the real occlusion of the light with ground truth, our algorithm exhibits the proper behavior
source. In this section, we review the practical differences for soft shadows: sharp shadows at places where the object
we observed. is close to the ground, a large penumbra zone where the ob-
ject is further away from the receiver. Our algorithm visibly
5.2.1. Experimentation method computes both the inner and the outer penumbra of the ob-
For each image, we computed an error metric as thus: for ject.
each pixel in the soft shadow map, we compute the actual Looking at the picture of the differences (Fig. 16(d)
occlusion value (using occlusion queries), and the difference and 16(g)) between the shadow values computed by our al-
with the occlusion value computed using our algorithm. We gorithm and the ground truth values, it appears that the dif-
summed the modulus of the differences, then divided the references lie mostly on the silhouette: since our algorithm
sult by the total number of pixels lying either in the shadow only computes the soft shadow of the discretized object, as
or in the penumbra, averaging the error over the actual soft seen from the center of the light source. The actual shape of
shadow. We used the number of pixels that are either in the soft shadow depends on subtle effects happening at the
shadow or in penumbra and not the total number of pixels boundary of the silhouette.
in the occluders depth map because the soft shadow can oc-
cupy only a small part of the depth map. Dividing by the 5.2.3. Size of the buffer
total number of pixels in the depth map would have under-
Figure 17 shows the average difference between the occlu-
estimated the error.
sion values computed with our algorithm and the actual oc-
We have used 3 different scenes (a square plane parallel to clusion values for our three test scenes, when changing the
the light source, a Buddha model and a Bunny model). These resolution of the shadow map. In these figures, the abscissa
215
(a) Square plane (b) Buddha (c) Bunny
Figure 15: The test scenes we have used
15 %
12 %
9%
6%
3%
0%
(a) Scene view (b) Our algorithm (c) Ground Truth (d) Difference between the occlu-
sion values
15 %
12 %
9%
6%
3%
0%
(e) Our algorithm (f) Ground Truth (g) Difference between the occlusion val-
ues
Figure 16: Visual comparison of our algorithm with ground truth.
0.08 0.08 0.08

single pass single pass single pass
double pass double pass double pass
Average error
Average error
Average error
0.04 0.04 0.04
0 0 0
0 256 512 768 1024 0 256 512 768 1024 0 256 512 768 1024
Buffer resolution (pixels) Buffer resolution (pixels) Buffer resolution (pixels)
Figure 17: Variation of the error with respect to the resolution of the shadow map
216
is the number of pixels for one side of the shadow map, so as half a pixel. The result is a large error, but it occurs
128 corresponds to a 128 × 128 shadow map. For this test, only at the shadow boundary.
we used non-power of two textures, in order to have enough Light source size: except for the special case of point light
sampling data. We can make several observations by looking sources, the error increases with the size of the light
at the data: source. This is consistent with our theoretical analysis
(see Section 5.1.2).
Two-pass version: the two-pass version of the algorithm
consistently outperforms the single-pass version, always 5.2.5. Occluder moving laterally
giving more accurate results. The only exception is of
course the square plane: since it has no thickness, the Figure 19 shows the average difference between the occlu-
single-pass and two-pass version give the same results. sion values computed with our algorithm and the actual oc-
Shadow map Resolution: as expected from the theoretical clusion values, when we move the occluder from left to right
study (see Section 5.1.2), the error decreases as the res- under the light source. The parameter corresponds to the po-
olution of the shadow map increases. What is interesting sition with respect to the center of the light, with 0 meaning
is that this effect reaches a limit quite rapidly. Roughly, that the center of the object is aligned with the center of the
increasing the shadow map resolution above 200 pixels light. We used a bitmap of 128 × 128 for all these tests.
does not bring an improvement in quality. Since the com- The error is at its minimum when the occluder is roughly
putation costs are related to the size of the shadow map, under the light source, and increases as the occluder moves
shadow map sizes of 200 × 200 pixels are close to opti- laterally. The Buddha and Bunny models are not symmetric,
mal. so their curves are slightly asymmetric, and the minimum
The fact that the error does not decrease continuously as does not correspond exactly to 0.
we increase the resolution of the occluder map is a little
surprising at first, but can be explained. It is linked to the 5.2.6. Occluder moving vertically
silhouette effect. As we have seen in Fig. 16, the error Figure 20 shows the average difference between the occlu-
introduced by our algorithm comes from the boundary of sion values computed with our algorithm and the actual oc-
the silhouette of the occluder, from parts of the occluder clusion values, when we move the occluder vertically. The
that are not visible from the center of the light source, but smallest value of the parameter corresponds to an occluder
visible from other parts of the light source. Increasing the touching the receiver, and the largest value corresponds to
resolution of the shadow map does not solve this problem. an occluder touching the light source. We used a bitmap of
The optimal size for the shadow map is related to the size 128 × 128 for all these tests.
of the light source. As the light source gets larger, we can
use smaller bitmaps. As predicted by the theory, the error increases as the oc-
Discretization error: the error curve for the square plane cluder approaches the light source (see Section 5.1.2). For
presents many important spikes. Looking at the results, the Bunny, the error becomes quite large when the upper ear
it appears that these spikes correspond to discretization touches the light source.
error (see Section 5.1.1). Since the square occluder is
aligned with the axis of the shadow map, it magnifies dis- 6. Complexity
cretization error.
The main advantages of our algorithm are its rendering
speed and its scalability. With a typical setup (a modern
5.2.4. Size of the light source PC, an occluder map of 128 × 128 pixels, a scene between
Figure 18 shows the average difference between the oc- 50, 000 polygons and 300, 000 polygons), we get framerates
clusion values computed with our algorithm and the ac- between 30 and 150 fps. In this section, we study the nu-
tual occlusion values when we change the size of the light merical complexity of our algorithm and its rendering speed.
source for our three test scenes. The parameter values We first conduct a theoretical analysis of the complexity of
range from a point light source (parameter=0.01) to a very our algorithm, in Section 6.1, then an experimental analysis,
large light source, approximately as large as the occluder where we test the variation of the rendering speed with re-
(parameter=0.2). We used a bitmap of 128 × 128 pixels for spect to several parameters: the size of the shadow map, the
all these tests. We can make several observations by looking number of polygons and the size of the light source (Sec-
at the data: tion 6.2). Finally, in Section 6.3, we compare the complex-
ity of our algorithm with a state-of-the-art algorithm, Soft
Point light sources: the beginning of the curves Shadow Volume [AAM03].
(parameter=0.01) corresponds to a point light source. In
that case, the error is quite large. This corresponds to an
6.1. Theoretical complexity
error of 1, over the entire shadow boundary; as we are
computing the shadow of the discretized occluder, we Our algorithm starts by rendering a shadow map and down-
miss the actual shadow boundary, sometimes by as much loading it into main memory. This preliminary step has a
217
0.12 single pass 0.12 single pass 0.12 Single pass

double pass double pass Double pass
Average error
Average error
Average error
0.08 0.08 0.08
0.04 0.04 0.04
0 0 0
0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2 0 0.05 0.1 0.15 0.2
Light source size Light source size Light source size
Figure 18: Variation of the error with respect to the size of the light source
0.05 0.05 0.05

Average error
Average error
Average error
0.025 0.025 0.025
0 0 0
-0.3 -0.2 -0.1 0 0.1 0.2 0.3 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 -0.3 -0.2 -0.1 0 0.1 0.2 0.3
Occluder position Occluder position Occluder position
Figure 19: Variation of the error with respect to the lateral position of the occluder
complexity linear with respect to the number of polygons in 6.2. Experimental complexity
the scene, and linear with the size of the shadow map, mea-
All measurements in this section were conducted on a 2.4
sured in the total number of pixels.
GHz Pentium4 PC with a GeForce 6800 Ultra graphics card.
Then, for each pixel of the shadow map corresponding to All framerates and rendering times correspond to observed
the occluder, we compute its extent in the occlusion map, framerates, that is the framerate for a user manipulating our
and for each pixel of this extent we execute a small fragment system. We are therefore measuring the time it takes to dis-
program of 11 instructions, including one texture lookup. play the scene and to compute soft shadows, not just the time
it takes to compute soft shadows.
The overall complexity of this second step of the algo-
rithm is the number of pixels covered by the occluder, mul- 6.2.1. Number of polygons
tiplied by the number of pixels covered by the extent for We studied the influence of the polygon count. Fig. 6.2
each of them, multiplied by the cost of the fragment pro- shows the observed rendering time (in ms) as a function
gram. This second step is executed on the GPU, and benefits of the polygon count, with a constant occluder map size of
from the high-performance and the parallelism of the graph- 128 × 128 pixels. The first thing we note is the speed of our
ics card. algorithm: even on a large scene of 340, 000 polygons, we
achieve real-time framerates (more than 30 frames per sec-
The worst case situation would be a case where each ond). Second, we observe that the rendering time varies lin-
micro-patch in the shadow map covers a large number of early with respect to the number of polygons. That was to
pixels in the soft shadow map. But this situation corresponds be expected, as we must render the scene twice (once for
to an object with a large penumbra zone, and if we have a the occluder map and once for the actual display), and the
large penumbra zone, we can use a lower resolution for the time it takes for the graphics card to display a scene varies
shadow maps. So we can compensate the cost for the algo- linearly with respect to the number of polygons. For smaller
rithm by running it with bitmaps of lower resolution. scenes (less than 10,000 polygons, rendering time below 10
ms), some factors other than the polygon count play a more
Using our algorithm with a large resolution shadow map
important role.
in a situation of large penumbra results in relatively high
computing costs, but a low resolution shadow map would Our algorithm exhibits good scaling, and can handle sig-
give the same visual results, for a smaller computation time. nificantly large scenes without incurring a high performance
218
0.15 0.15 0.15
0.1
Average error
0.1 0.1
Average error
Average error
0.05 0.05 0.05
0 0 0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75
Occluder vertical position Occluder vertical position Occluder vertical position
Figure 20: Variation of the error with respect to the vertical position of the occluder
50
Rendering times
30 fps
Rendering time (ms)
40
30
20
10
0
0 100000 250000 500000
Number of Polygons
(a) Rendering times (in ms) (b) Our largest test scene (565,203 polygons)
Figure 21: Influence of polygon count
150
Rendering times
30 fps
Rendering time (ms)
10 fps
100
50
0
2 2 2
128 256 512
Number of pixels
(a) Rendering times (in ms) (b) Test scene (24,000 polygons)
Figure 22: Influence of the size of the occluder map
219
As you can see from Fig. 24(a), the rendering time in-
creases with the size of the light source. What is interesting
is the error introduced by our algorithm (see Fig. 24(b)). The
error logically increases with the size of the light source, and
for small light sources, larger bitmaps result in more accu-
rate images. But for large light sources, a smaller bitmap
will give a soft shadow of similar quality. A visual compari-
son of the soft shadows with a small bitmap and ground truth
shows the small bitmap gives a very acceptable soft shadow
(a) Bitmap of 642 (184 fps) (b) Ground truth (see Fig. 23).
Figure 23: Large light sources with small bitmaps This effect was observed by previous researchers: as the
light source becomes larger, the features in the soft shadow
become blurrier, hence they can be modeled accurately with
a smaller bitmap.
cost. The maximum size of the scene depends on the require-
ments of the user.
6.3. Comparison with Soft-Shadow Volumes
6.2.2. Size of occluder map
Finally, we performed a comparison with a state-of-the art
Fig. 6.2.1 shows the observed rendering times (in ms) of algorithm for computing soft shadows, the Soft-Shadow Vol-
our algorithm, on a scene with 24,000 polygons (Fig. 22(b)), umes by Assarsson and Akenine-Möller [AAM03].
when the size of the occluder map changes. We plotted the
rendering time as a function of the number of pixels in the Fig. 25 shows the same scene, with soft shadows, com-
occluder map (that is, the square of the size of the occluder puted by both algorithms. We ran the tests with a varying
map) to illustrate the observed linear variation of rendering number of jeeps, to test how both algorithms scale with re-
time with respect to the total number of pixels. spect to the number of polygons. Fig. 25(c) shows the ren-
dering times as a function of the number of polygons for both
An occluder map of 5122 pixels gives a rendering time of algorithms. These figures were computed using a window of
150 ms — or 7 fps, too slow for interactive rendering. An 512 × 512 pixels for both algorithms, and with the two-pass
occluder map of 1282 or 2562 pixels gives a rendering time version of our algorithm, with an occluder map resolution of
of 10 to 50 ms, or 20 to 100 fps, fast enough for real-time 210 × 210.
rendering. For a large penumbra region, an occluder map of
1282 pixels qualitatively gives a reasonable approximation, Our algorithm scales better with respect to the number of
as in Fig. 22(b). For a small penumbra region, our algorithm polygons. On the other hand, soft shadow volumes provide
behaves like the classical shadow mapping algorithm and ar- a better looking shadow (see Fig. 25(b)), closer to the actual
tifacts can appear with a small occluder map of 1282 pixels; truth.
in that case, it is better to use 2562 pixels. It is important to remember that the rendering time for the
The fact that the rendering time of our algorithm is pro- Soft- Shadow Volumes algorithm varies with the number of
portional to the number of pixels in the occluder map con- screen pixels covered by the penumbra region. If the view-
firms that the bottleneck of our algorithm is its transfer to the point is close to a large penumbra region, the rendering time
CPU. Due to the cost of this transfer, we found that for some becomes much larger. The figures we used for this compari-
scenes it was actually faster to use textures whose dimen- son correspond to an observer walking around the scene (as
sions are not a power of 2: if the difference in pixel count is in Fig. 25(b)).
sufficient, the gain in transfer time compensates the losses in
rendering time. 7. Conclusion and Future Directions
6.2.3. Light source size In this paper, we have presented a new algorithm for com-
puting soft shadows in real-time on dynamic scenes. Our al-
Another important parameter is the size of the light source,
gorithm is based on the shadow mapping algorithm, and is
compared to the size of the scene itself. A large light source
entirely image-based. As such, it benefits from the advan-
results in a large penumbra region for each micro-patch, re-
tages of image-based algorithms, especially speed.
sulting in more pixels of the soft shadow map covered, and
a larger computational cost. Fig. 24(a) shows the observed The largest advantage of our algorithm is its high fram-
framerate as a function of the size of the light source. We did erate, hence there remains plenty of computational power
the tests with several bitmap resolutions (2562 , 1282 , 642 ). available for performing other tasks, such as interacting
Fig. 24(b) shows the error as a function of the size of the with the user or performing non-graphics processing such
light source, for the same bitmap resolutions. as physics computations within game engines. Possibly the
220
300 0.07
2562 642
30 fps 12822
250 1282 0.06 256
642
rendering time (ms)

0.05
200
Average error
0.04
150
0.03
100
0.02
50 0.01
0 0
0 0.05 0.1 0.15 0.2 0.25 0.3 0 0.05 0.1 0.15 0.2 0.25 0.3
Light source size Light source size
(a) Rendering time (b) Average error
Figure 24: Changing the size of the light source (floating bunny scene)
300
Soft Shadow Volumes
Soft Shadow Maps
250
Rendering Time (ms)

200
150
100
50
0
0 5000 10000 15000 20000
Number of Polygons
(a) Soft Shadow Maps (b) Soft Shadow Volumes (c) Rendering times
Figure 25: Comparison with Soft-Shadow Volumes
largest limitation of our algorithm is the fact that it does not 8. Acknowledgments
compute self-occlusion and it requires a separation between
Most of this work was conducted while Charles Hansen was on sab-
occluders and receivers. We know that this limitation is very batical leave from the University of Utah, and was a visiting profes-
important, and we plan to remove it in future work, possibly sor at ARTIS/GRAVIR IMAG, partly funded by INPG and INRIA.
by using layered depth images.
The authors gratefully thank Ulf Assarsson for doing the tests for
the comparison with the soft-shadow volumes algorithm.
An important aspect of our algorithm is that we can use
low-resolution shadow maps in places with a large penum- The Stanford Bunny, Happy Buddha and dragon 3D models appear-
bra, even though we still need higher resolution shadow ing in this paper and in the accompanying video were digitized and
maps for places with small penumbra, for example close to kindly provided by the Stanford University Computer Graphics Lab-
the contact between the occluder and the receiver. An obvi- oratory.
The smaller Buddha 3D model appearing in this paper was digitized
ous improvement to our algorithm would be the ability to use
and kindly provided by Inspeck.
hierarchical shadow maps, switching resolutions depending The Jeep 3D model appearing in this paper was designed and kindly
on the shadow being computed. This work could also be provided by Psionic.
combined with perspective-corrected shadow maps [SD02, The horse 3D model appearing in the accompanying video was dig-
WSP04, MT04, CG04], in order to have higher resolution in itized by Cyberware, Inc., and was kindly provided by the Georgia
places with sharp shadows close to the viewpoint. Tech. “Large Geometric Models Archive”.
The skeleton foot 3D model appearing in the accompanying video
In its current form, our algorithm still requires a transfer was digitized and kindly provided by Viewpoint Datalabs Intl.
of the occluder map from the GPU to the main memory, and
a loop, on the CPU, over all the pixels in the occluder map.
References
We would like to design a GPU only implementation of our
algorithm, using the future render-to-vertex-buffer capabili- [AAM03] A SSARSSON U., A KENINE -M ÖLLER T.: A
ties. geometry-based soft shadow volume algorithm using
221
graphics hardware. ACM Transactions on Graphics (Proc. niques 2004 (Proc. of the Eurographics Symposium on
of SIGGRAPH 2003) 22, 3 (2003), 511–520. Rendering) (2004), pp. 153–160.
[AHT04] A RVO J., H IRVIKORPI M., T YYSTJÄRVI J.: [SD02] S TAMMINGER M., D RETTAKIS G.: Perspective
Approximate soft shadows using image-space flood-fill shadow maps. ACM Transactions on Graphics (Proc. of
algorithm. Computer Graphics Forum (Proc. of Euro- SIGGRAPH 2002) 21, 3 (2002), 557–562.
graphics 2004) 23, 3 (2004), 271–280. [SS98] S OLER C., S ILLION F. X.: Fast calculation of soft
[AMH02] A KENINE -M ÖLLER T., H AINES E.: Real-Time shadow textures using convolution. In SIGGRAPH ’98
Rendering, 2nd ed. A. K. Peters, 2002. (1998), pp. 321–332.
[BS02] B RABEC S., S EIDEL H.-P.: Single sample soft [Wan92] WANGER L.: The effect of shadow quality on
shadows using depth maps. In Graphics Interface (2002). the perception of spatial relationships in computer gener-
ated imagery. In Symposium on Interactive 3D Graphics
[CD03] C HAN E., D URAND F.: Rendering fake soft shad-
(1992), pp. 39–42.
ows with smoothies. In Rendering Techniques 2003 (Proc.
of the Eurographics Symposium on Rendering) (2003), [WFG92] WANGER L., F ERWERDA J. A., G REENBERG
pp. 208–218. D. P.: Perceiving spatial relationships in computer-
generated images. IEEE Computer Graphics and Appli-
[CD04] C HAN E., D URAND F.: An efficient hybrid cations 12, 3 (1992), 44–58.
shadow rendering algorithm. In Rendering Techniques
2004 (Proc. of the Eurographics Symposium on Render- [WH03] W YMAN C., H ANSEN C.: Penumbra maps: Ap-
ing) (2004), pp. 185–195. proximate soft shadows in real-time. In Rendering Tech-
niques 2003 (Proc. of the Eurographics Symposium on
[CG04] C HONG H., G ORTLER S. J.: A lixel for every Rendering) (2003), pp. 202–207.
pixel. In Rendering Techniques 2004 (Proc. of the Euro-
graphics Symposium on Rendering) (2004), pp. 167–172. [Wil78] W ILLIAMS L.: Casting curved shadows on
curved surfaces. Computer Graphics (Proc. of SIG-
[Cro77] C ROW F. C.: Shadow algorithms for computer GRAPH ’78) 12, 3 (1978), 270–274.
graphics. Computer Graphics (Proc. of SIGGRAPH ’77)
[WPF90] W OO A., P OULIN P., F OURNIER A.: A survey
11, 2 (1977), 242–248.
of shadow algorithms. IEEE Computer Graphics & Ap-
[DF94] D RETTAKIS G., F IUME E.: A fast shadow algo- plications 10, 6 (1990), 13–32.
rithm for area light sources using backprojection. In SIG-
[WSP04] W IMMER M., S CHERZER D., P URGATHOFER
GRAPH ’94 (1994), pp. 223–230.
W.: Light space perspective shadow maps. In Rendering
[GBP06] G UENNEBAUD G., BARTHE L., PAULIN M.: Techniques 2004 (Proc. of the Eurographics Symposium
Real-time soft shadow mapping by backprojection. In on Rendering) (2004), pp. 143–152.
Rendering Techniques 2006 (Proc. of the Eurographics
Symposium on Rendering) (2006).
Appendix A: Floating-point blending accuracy
[HLHS03] H ASENFRATZ J.-M., L APIERRE M.,
H OLZSCHUCH N., S ILLION F.: A survey of real- In this section, we review the issues behind the hardware
time soft shadows algorithms. Computer Graphics Forum blending accuracy problems we have encountered and pro-
22, 4 (2003), 753–774. pose a temporary fix for these issues.
[KD03] K IRSCH F., D OELLNER J.: Real-time soft shad- All the accuracy issues are linked to the fact that hard-
ows using a single light sample. Journal of WSCG (Winter ware blending is, at the time of writing, only available for
School on Computer Graphics) 11, 1 (2003). 16-bits floating point numbers. NVidia graphics hardware
stores these floating-point numbers using s10e5 format: one
[KMK97] K ERSTEN D., M AMASSIAN P., K NILL D. C.: bit of sign, 10 bits of mantissa, 5 bits of exponent, with a
Moving cast shadows and the perception of relative depth. bias of 15 for the exponent. The important point for addition
Perception 26, 2 (1997), 171–192. is that the mantissa is stored on 10 bits. As a result, adding a
[LWGM04] L LOYD B., W ENDT J., G OVINDARAJU N., large number X and a small number ε will give an inaccurate
M ANOCHA D.: Cc shadow volumes. In Rendering Tech- result if ε < 2−10 X:
niques 2004 (Proc. of the Eurographics Symposium on X +ε = X if ε < 2−10 X (inFP16)
Rendering) (2004), pp. 197–205.
[McC00] M C C OOL M. D.: Shadow volume reconstruc- For example, 2048 + 1 = 2048 (in FP16 format) and 0.5 +
1
tion from depth maps. ACM Transactions on Graphics 2049= 0.5 (also in FP16 format).
19, 1 (2000), 1–26. In some cases, the addition of the contribution from all
[MT04] M ARTIN T., TAN T.-S.: Anti-aliasing and conti- micro-patches will be 1 (meaning complete occlusion of the
nuity with trapezoidal shadow maps. In Rendering Tech- light source). As a consequence, we can expect numerical
222
accuracy issues if some micro-patches hide less than 2−10 of shadow map would be used for large penumbra regions, and
the light source. Because 322 = 210 , it means that the width the high-resolution shadow map for areas with hard shad-
of the reprojection of one micro-patch should be larger than ows, e.g. when the occluder and the receiver are in contact.
1
32 of the width of the light source.
This translates easily into conditions for the position of
the occluder:
1 1 64 tan α
< + (4)
zO zR NL
where L is the width of the light source, N is the resolution of
the bitmap, α is the half-angle of the camera used to generate
the shadow map, zO is the distance between the light source
and the occluder and zR is the distance between the light
source and the receiver.
Bitmap resolution: The most important thing is that in-
creasing N makes this error more likely to appear. This
explains why using a bitmap of 512 × 512 pixels we see
a poor looking shadow, while the 128 × 128 bitmap gives
the correct shadow (see Fig. 14).
Light source size: In equation 4, the size of the light source
appears in a product with the resolution of the bitmap. If
the light source is large, the bitmap must be low resolu-
tion in order to avoid FP16 blending errors. Fortunately,
a large light source means a large penumbra for most oc-
cluders, so a low resolution bitmap might be enough for
these penumbra effects.
Occluder position: As the occluder moves closer to the re-
ceiver, the likeliness of blending errors gets lower.
Camera half-angle: Similarly, increasing the camera half-
angle improves the FP16 blending accuracy.
Basically, all these conditions amount to the same thing: us-
ing less pixels to describe the occluder in the shadow map.
While this improves the FP16 blending accuracy, it obvi-
ously degrades the discretization of the occluder and also
increases the overlapping between reprojections of neigh-
boring pixels.
In our experiments (see Fig. 14) the blending accuracy
problem appears very often when the resolution of the
shadow map is larger than 512 × 512, sometimes with a
shadow map resolution of 256 × 256 and very rarely with
a shadow map resolution of 128 × 128.
The problem will disappear when hardware blending will
become available on higher accuracy floating point numbers.
FP32 have a mantissa of 23 bits, allowing the use of micro-
patches that block less than 2−23 of the light source, mean-
ing that the width of the back-projection of the micro-patch
should be at least larger that 2−11 than the width of the light
source (64 times smaller than the current threshold). Com-
pared with the current method, it would allow the use of
shadow maps with a resolution above 4096 × 4096.
With FP16 blending only, the best solution is to use a hier-
archical shadow map for soft-shadow computations, as was
suggested by Guennebaud et al. [GBP06]: the low resolution
223
4.7.4 Fast Precomputed Ambient Occlusion for Proximity Shadows (JGT 2006)
Auteurs : Mattias M, Fredrik M, Ulf A et Nicolas H
Journal : Journal of Graphics Tools (accepté, à paraître)
Date : accepté en octobre 2006.
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 1 — #1

✐ ✐
Vol. [VOL], No. [ISS]: 1–13
Fast Precomputed Ambient Occlusion

for Proximity Shadows
Mattias Malmer and Fredrik Malmer

Syndicate
Ulf Assarsson
Chalmers University of Technology
Nicolas Holzschuch
ARTIS-GRAVIR/IMAG INRIA
Abstract. Ambient occlusion is used widely for improving the realism of real-time lighting
simulations. We present a new method for precomputed ambient occlusion, where we store
and retrieve unprocessed ambient occlusion values in a 3D grid. Our method is very easy to
implement, has a reasonable memory cost, and the rendering time is independent from the
complexity of the occluder or the receiving scene. This makes the algorithm highly suitable
for games and other real-time applications.
1. Introduction
An “ambient term” is commonly used in illumination simulations to account for the

light that remains after secondary reflections. This ambient term illuminates areas
of the scene that would not otherwise receive any light. In first implementations,
ambient light was an uniform light, illuminating all points on all objects, regardless
of their shape or position, flattening their features, giving them an unnatural look.
To counter this effect, ambient occlusion was introduced by [Zhukov et al. 98]. By
computing the accessibility to ambient lighting, and using it to modulate the effects,
they achieve a much better look. Ambient occlusion is widely used in special ef-
© A K Peters, Ltd.
1 1086-7651/06 $0.50 per page
225
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 2 — #2

✐ ✐
2 journal of graphics tools
Figure 1. Example of contact shadows. This scene runs at 800 fps.
fects for motion pictures [Landis 02] and for illumination simulations in commercial
software [Christensen 02, Christensen 03].
Ambient occlusion also results in objects having contact shadows: for two close
objects, ambient occlusion alone creates a shadow of one object onto the other (see
Figure 1).
For offline rendering, ambient occlusion is usually precomputed at each vertex of
the model, and stored either as vertex information or into a texture. For real-time
rendering, recent work [Zhou et al. 05, Kontkanen and Laine 05] suggest storing
ambient occlusion as a field around moving objects, and projecting it onto the scene
as the object moves. These methods provide important visual cues for the spatial
position of the moving objects, in real-time, at the expense of extra storage. They
pre-process ambient occlusion, expressing it as a function of space whose parameters
are stored in a 2D texture wrapped around the object. In contrast, our method stores
these un-processed, in a 3D grid attached to the object. The benefits are numerous:
• faster run-time computations, and very low impact on the GPU, with a com-
putational cost being as low as 5 fragment shader instructions per pixel,
• very easy to implement, just by rendering one cube per shadow casting object,
• shorter pre-computation time,
• inter-object occlusion has high quality even for receiving points inside the oc-
cluding object’s convex hull,
• handles both self-occlusion and inter-object occlusion in the same rendering

pass.
• easy to combine with indirect lighting stored in environment maps.
The obvious drawback should be the memory cost, since our method’s memory
costs are in O(n3 ), instead of O(n2 ). But since ambient occlusion is a low frequency
226
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 3 — #3

✐ ✐
Malmer et al.: Fast Precomputed Ambient Occlusion 3
phenomenon, in only needs a low resolution sampling. In [Kontkanen and Laine 05],
as in our own work, a texture size of n = 32 is sufficient. And since we are storing
a single component per texel, instead of several function coefficients, the overall
memory cost of our method is comparable to theirs. For a texture size of 32 pixels,
[Kontkanen and Laine 05] report a memory cost of 100 Kb for each unique moving
object. For the same resolution, the memory cost of our algorithm is of 32 Kb if we
only store ambient occlusion, and of 128 Kb if we also store the average occluded
direction.
2. Background
Ambient occlusion was first introduced by [Zhukov et al. 98]. In modern imple-
mentations [Landis 02, Christensen 02, Christensen 03, Pharr and Green 04, Bun-
nell 05, Kontkanen and Laine 05], it is defined as the percentage of ambient light
blocked by geometry close to point p:
Z
1
ao( p) = (1 − V(ω))⌊n · ω⌋ dω (1)
π Ω
Occlusion values are weighted by the cosine of the angle of the occluded direction
with the normal n: occluders that are closer to the direction n contribute more, and
occluders closer to the horizon contribute less, corresponding to the importance of
each direction in terms of received lighting. Ambient occlusion is computed as a
percentage, with values between 0 and 1, hence the π1 normalization factor.
Most recent algorithms [Bunnell 05, Kontkanen and Laine 05] also store the aver-
age occluded direction, using it to modulate the lighting, depending on the normal at
the receiving point and the environment.
[Greger et al. 98] also used a regular grid to store illumination values, but their
grid was attached to the scene, not to the object. [Sloan et al. 02] attached radiance
transfer values to a moving object, using it to recompute the effects of the moving
object on the environment.
3. Algorithm
3.1. Description of the algorithm
Our algorithm inserts itself in a classical framework where other shading informa-
tion, such as direct lighting, shadows, etc. are computed in separate rendering passes.
One rendering pass will be used to compute ambient lighting, combined with ambi-
ent occlusion. We assume we have a solid object moving through a 3D scene, and
we want to compute ambient occlusion caused by this object.
227
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 4 — #4

✐ ✐
Figure 2. We construct a grid around the object. At the center of each grid element, we
compute a spherical occlusion sample. At runtime, this information is used to apply shadows
on receiving objects.
Our algorithm can either be used with classical shading, or with deferred shading.
In the latter case, the world-space position and the normal of all rendered pixels is
readily available. In the former, this information must be stored in a texture, using
the information from previous rendering passes.
Precomputation: The percentage of occlusion from the object is precomputed at

every point of a 3D grid surrounding the object (see Figure 2). This grid is
stored as a 3D texture, linked to the object.
Runtime: • render world space position and normals of all shadow receivers in
the scene, including occluders.
• For each occluder:
1. render the back faces of the occluder’s grid (depth-testing is dis-
abled).
2. for every pixel accessed, execute a fragment program:
(a) retrieve the world space position of the pixel.
(b) convert this world space position to voxel position in the grid,
passed as a 3D texture
(c) retrieve ambient occlusion value in the grid, using linear inter-
polation.
3. Ambient occlusion values a from each occluder are blended in the
frame buffer using multiplicative blending with 1 − a.
The entire computation is thus done in just one extra rendering pass. We used the
back faces of the occluder’s grid, because it is unlikely that they are clipped by the
far clipping plane; using the front faces could result in artifacts if they are clipped by
the front clipping plane.
228
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 5 — #5

✐ ✐
3.2. Shading surfaces with ambient occlusion alone
The ambient occlusion values we have stored correspond to the occlusion caused by
the occluder itself: Z
1
ao′ ( p) = (1 − V(ω)) dω (2)
4π Ω
that is, the percentage of the entire sphere of directions that is occluded. When we
apply these occlusion values at a receiving surface, during rendering, the occlusion
only happens over a half-space, since the receiver itself is occluding the other half-
space. To account for this occlusion, we scale the occlusion value by a factor 2.
This shading does not take into account the position of the occluder with respect to
the normal of the receiver. It is an approximation, but we found it performs quite
well in several cases (see Figure 1). It is also extremely cheap in both memory and
computation time, as the value extracted from the 3D texture is used directly.
We use the following fragment program (using Cg notation):
1 float4 pworld = texRECT ( PositionTex , pscreen )
2 float3 pgrid = mul ( MWorldToGrid , pworld )
3 out . color.w = 1 - tex3D( GridTexture , pgrid )
There are two important drawbacks with this simple approximation: first, the in-
fluence of the occluder is also visible where it should not, such as a character moving
on the other side of a wall; second, handling self-occlusion requires a specific treat-
ment, with a second pass and a separate grid of values.
3.3. Shading surfaces with ambient occlusion and average occluded direction
For more accurate ambient occlusion effects, we also store the average occluded
direction. That is equivalent to storing the set of occluded directions as a cone (see
Figure 3). The cone is defined by its axis (d) and the percentage of occlusion a
(linked to its aperture angle α). Axis and percentage of occlusion are precomputed
for all moving objects and stored on the sample points of the grid, in an RGBA
texture, with the cone axis d stored in the RGB-channels and occlusion value a stored
in the A-channel.
3.3.1. Accounting for surface normal of receiver
In order to compute the percentage of ambient occlusion caused by the moving oc-
cluder, we clip the cone of occluded directions by the tangent surface to the receiver
(see Figure 3(b)). The percentage of effectively occluded directions is a function of
two parameters: the angle between the direction of the cone and the normal at the
receiving surface (β), and the percentage of occlusion of the cone (a). We precom-
pute this percentage and store it in a lookup table T clip . The lookup table also stores
229
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 6 — #6

✐ ✐
n
d
β α
d
α
(a) The cone is defined by its direction d (b) The cone is clipped by the tangent plane to the
and its aperture α. receiver to give the ambient occlusion value.
Figure 3. Ambient occlusion is stored as a cone.
Figure 4. Ambient occlusion computed with our algorithm that accounts for the surface
normal of the receiver and the direction of occlusion.
the effect of the diffuse BRDF (the cosine of the angle between the normal and the
direction). For simplicity, we access the lookup table using cos β.
We now use the following fragment program:
1 float4 pworld = texRECT ( PositionTex , pscreen )

2 float3 pgrid = mul ( MWorldToGrid , pworld )
3 float4 {dgrid , a} = tex3D ( GridTexture , pgrid )
4 float3 dworld = mul ( MGridToWorld , dgrid )
5 float3 n = texRECT ( NormalTex , pscreen )
6 float cos β = dot (dworld ,n)
7 float AO = texRECT (T clip , float2 (a, cos β))
8 out .color .w = 1-AO
This code translates to 16 shader assembler instructions. Figure 4 and 5 were

rendered using this method, with a grid resolution of 323 .
Compared to storing only ambient occlusion values, using the average occluded
direction has the advantage that results are more accurate and self-occlusion is natu-
rally treated.
230
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 7 — #7

✐ ✐
(a) (b) (c) Ground Truth

Figure 5. Ambient occlusion values, accounting for the normal of the occluder and the direc-
tion of occlusion (135 to 175 fps).
(a) Gouraud shading (b) Blending occlusion from (c) Ground truth
multiple occluders
Figure 6. Checking the accuracy of our blending method: comparison of Ambient Occlusion
values computed with ground truth.
3.3.2. Combining occlusion from several occluders
When we have several moving occluders in the scene, we compute occlusion values
from each moving occluder, and merge these values together. The easiest method to
do this is to use OpenGL blending operation: in a single rendering pass, we render
the occlusion values for all the moving occluders. The occlusion value computed
for the current occluder is blended to the color buffer, multiplicatively modulating it
with (1 − a).
[Kontkanen and Laine 05] show that modulating with (1 − ai ), for all occluders i,
is statistically the best guess. Our experiences also show that it gives very satisfying
results for almost all scenes. This method has the added advantage of being very
simple to implement: the combined occlusion value for one pixel is independent
from the order in which the occluders are treated for this pixel, so we only need one
rendering pass.
Each occluder is rendered sequentially, using our ambient occlusion fragment pro-
gram, into an occlusion buffer. The cone axes are stored in the RGB channels and
the occlusion value is stored in the alpha channel. Occlusion values are blended mul-
tiplicatively and cone axes are blended additively, weighted by their respective solid
231
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 8 — #8

✐ ✐
angle:
αR = (1 − αA )(1 − αB )
dR = αA d A + α B d B
This is achieved using glBlendFuncSeparate in OpenGL. See Figures 5 and 6 for a

comparison of blending values from several occluders with the ground truth values,
computed with distributed ray-tracing: The two pictures exhibit the same important
features, although our method is noticeably lighter (see also Section 4.3).
We have designed a more advanced method for blending the occlusions between
two cones, taking into account the respective positions of the cones and their aper-
ture (see the supplemental materials), but our experiments show that the technique
described here generally gives similar results, runs faster and is easier to implement.
3.3.3. Illumination from an environment map
The occlusion cones can also be used to approximate the incoming lighting from
an environment map, as suggested by [Pharr and Green 04]. For each pixel, we
first compute the lighting due to the environment map, using the surface normal for
Lambertian surfaces, or using the reflected cone for glossy objects. Then, we subtract
from this lighting the illumination corresponding to the cone of occluded directions.
We only need to change the last step of blending the color buffer and occlusion
buffer. Each shadow receiving pixel is rendered using the following code:
P 
1 Read cone d, α from occlusion buffer
2 Read normal from normal buffer
3 Compute mipmap level from cone angle α
4 A = EnvMap(d, α). i.e., lookup occluded light within the cone
5 B = AmbientLighting(normal). i.e., lookup the incoming light due to the envi-
ronment map.
6 return B-A.
In order to use large filter sizes, we used lat-long maps. It is also possible to use
cube maps with a specific tool for mip-mapping across texture seams [Scheuermann
and Isidoro 06].
3.4. Details of the algorithm
3.4.1. Spatial extent of the grid
An important parameter of our algorithm is the spatial extent of the grid. If the grid
is too large, we run the risk of under-sampling the variations of ambient occlusion,
232
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 9 — #9

✐ ✐
(a) (b) (c) Robot parts

Figure 7. Using ambient occlusion with environment lighting. These images are rendered in
roughly 85 fps.
ri Ai
ei
Occluder
Grid
(a) Our notations (b) Cubic object (c) Elongated object.

Notice the grid is thinner
along the longer axis
Figure 8. Our notations for computing the optimal grid extent based on the bounding-box of
the occluder (a), and optimal grid extents computed with ǫ = 0.1 (b-c).
otherwise we have to increase the resolution, thus increasing the memory cost. If the
grid is too small, we would miss some of the effects of ambient occlusion.
To compute the optimal spatial extent of the grid, we use the bounding box of the
occluder. This bounding box has three natural axes, with dimension 2ri on each axis,
and a projected area of Ai perpendicular to axis i (see Figure 8(a)).
Along the i axis, the ambient occlusion of the bounding box is approximately:
1 Ai
ai ≈ (3)
4π (d − ri )2
where d is the distance to the center of the bounding box.

If we decide to neglect occlusion values smaller than ǫ, we find that the spatial
extent ei of the grid along axis i should be:
r
Ai
ei = ri + (4)
4πǫ
We take ǫ = 0.1, giving an extent of ei ≈ 3ri for a cubic bounding box (see
Figure 8(b)). For elongated objects, equation 4 gives an elongated shape to the grid,
233
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 10 — #10

✐ ✐
(a) Using raw values, discontinuities can (b) After re-scaling, ambient occlusion
appear blends continuously
Figure 9. We need to re-scale occlusion values inside the grid to avoid visible artifacts.
following the shape of the object, but with the grid being thinner on the longer axes
of the object (see Figure 8(c)).
We use a relatively large epsilon value (0.1), resulting in a small spatial extent. As
a consequence, there can be visible discontinuities on the boundary of the grid (see
Figure 9(a)). To remove these discontinuities, we re-scale the values inside the grid
so that the largest value at the boundary is 0. If the largest value on the boundary of
the grid is V M , each cell of the grid is rescaled so that its new value V ′ is:
(
′ V if V > 0.3
V = V−V M
0.3 0.3−V M
if V ≤ 0.3
The effect of this scaling can be seen on Figure 9(b). The overall aspect of ambient
occlusion is kept, while the contact shadow ends continuously on the border of the
grid.
3.4.2. Voxels inside the occluder
Sampling points that are inside the occluder will have occlusion values of 1, ex-
pressing that they are completely hidden. As we interpolate values on the grid, a
point located on the boundary of the occluder will often have non-correct values.
To counter this problem, we modify the values inside the occluder (which are never
used) so that the interpolated values on the surface are as correct as possible.
A simple but quite effective automatic way to do this is: for all grid cells where
occlusion value is 1, replace this value by an average of the surrounding grid cells
that have an occlusion value smaller than 1. This algorithm was used on all the
figures in this paper.
234
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 11 — #11

✐ ✐
4. Results
All timings and figures in this paper were computed on a Pentium 4, running at 2.8
GHz, with a NVidia GeForce 7800GTX, using a grid resolution of 323 .
4.1. Timing results
The strongest point of our method is its performance: adding ambient occlusion
to any scene increases the rendering time by ≈ 0.9 ms for each occluder. In our
experiments, this value stayed the same regardless of the complexity of the scene or
of the occluder. We can render scenes with 40 different occluders at nearly 30 fps.
The cost of the method depends on the number of pixels covered by the occluder’s
grid, so the cost of our algorithm decreases nicely for occluders that are far from the
viewpoint, providing an automatic level-of-detail.
The value of 0.9 ms corresponds to the typical situation, visible in all the pictures
in this paper: the occluder has a reasonable size, neither too small nor too large,
compared to the size of the viewport.
4.2. Memory costs
Precomputed values for ambient occlusion are stored in a 3D texture, with a memory
cost of O(n3 ) bytes. With a grid size of 32, the value we have used in all our tests,
the memory cost for ambient occlusion values is 32 Kb per channel. Thus, storing
just the ambient occlusion value gives a memory cost of 32 Kb. Adding the average
occluded direction requires three extra channels, bringing the complete memory cost
to 128 Kb.
4.3. Comparison with Ground Truth
Figure 5(b)-5(c) and 6(b)-6(c) show a side-by-side comparison between our algo-
rithm and ground truth. Our algorithm has computed all the relevant features of
ambient occlusion, including proximity shadows. The main difference is that our
algorithm tends to underestimate ambient occlusion.
There are several reasons for this difference: we have limited the spatial influence
of each occluder, by using a small grid, and the blending process (see Section 3.3.2)
can underestimate the combined occlusion value of several occluders.
While it would be possible to improve the accuracy of our algorithm (using a
more accurate blending method and a larger grid), we point out that ambient oc-
clusion methods are approximative by nature. What is important is to show all the
235
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 12 — #12

✐ ✐
relevant features: proximity shadows and darkening of objects in contact, something

our algorithm does.
Acknowledgments. ARTIS is an INRIA research project and a research team in the GRAVIR
laboratory, a joint research unit of CNRS, INRIA, INPG and UJF.
This work was started while Ulf Assarsson was a post-doctoral student at the ARTIS re-
search team, funded by INRIA.
The space ship model used in this paper was designed by Max Shelekhov.
References
[Bunnell 05] Michael Bunnell. “Dynamic Ambient Occlusion and Indirect Lighting.” In
GPU Gems 2, edited by Matt Pharr, pp. 223–233. Addison Wesley, 2005.
[Christensen 02] Per H. Christensen. “Note 35: Ambient occlusion, image-based illumi-
nation, and global illumination.” In PhotoRealistic RenderMan Application Notes.
Emeryville, CA, USA: Pixar, 2002.
[Christensen 03] Per H. Christensen. “Global Illumination and All That.” In Siggraph 2003
course 9: Renderman, Theory and Practice, edited by Dana Batall, pp. 31 – 72. ACM
Siggraph, 2003.
[Greger et al. 98] G. Greger, P. Shirley, P. M. Hubbard, and D. P. Greenberg. “The Irradiance
Volume.” IEEE Computer Graphics and Applications 18:2 (1998), 32–43.
[Kontkanen and Laine 05] Janne Kontkanen and Samuli Laine. “Ambient Occlusion Fields.”
In Symposium on Interactive 3D Graphics and Games, pp. 41–48, 2005.
[Landis 02] Hayden Landis. “Production Ready Global Illumination.” In Siggraph 2002
course 16: Renderman in Production, edited by Larry Gritz, pp. 87 – 101. ACM Sig-
graph, 2002.
[Pharr and Green 04] Matt Pharr and Simon Green. “Ambient Occlusion.” In GPU Gems,
edited by Randima Fernando, pp. 279–292. Addison Wesley, 2004.
[Scheuermann and Isidoro 06] Thorsten Scheuermann and John Isidoro. “Cubemap Filtering
with CubeMapGen.” In Game Developer Conference 2006, 2006.
[Sloan et al. 02] Peter-Pike Sloan, Jan Kautz, and John Snyder. “Precomputed radiance trans-
fer for real-time rendering in dynamic, low-frequency lighting environments.” ACM
Transactions on Graphics (Proc. of Siggraph 2002) 21:3 (2002), 527–536.
[Zhou et al. 05] Kun Zhou, Yaohua Hu, Steve Lin, Baining Guo, and Heung-Yeung Shum.
“Precomputed Shadow Fields for Dynamic Scenes.” ACM Transactions on Graphics
(proceedings of Siggraph 2005) 24:3.
[Zhukov et al. 98] S. Zhukov, A. Iones, and G. Kronin. “An Ambient Light Illumination
Model.” In Rendering Techniques ’98 (Proceedings of the 9th EG Workshop on Render-
ing), pp. 45 – 56, 1998.
236
✐ ✐
✐ ✐
✐ ✐
“jgt” — 2006/10/27 — 15:33 — page 13 — #13

✐ ✐
Web Information:
Two videos, recorded in real-time and demonstrating the effects of pre-computed ambient
occlusion on animated scenes are available at:
http://www.ce.chalmers.se/˜uffe/ani.mov
http://www.ce.chalmers.se/˜uffe/cubedance.mov
A technique for better accuracy in blending the occlusion from two cones is described in a
supplemental material.
Mattias Malmer, Syndicate, Grevgatan 53, 114 58 Stockholm, Sweden.

(www.syndicate.se)
Fredrik Malmer, Syndicate, Grevgatan 53, 114 58 Stockholm, Sweden

(www.syndicate.se)
Ulf Assarsson, Department of Computer Science and Engineering, Chalmers University of

Technology, S-412 96 Gothenburg, Sweden.
([email protected])
Nicolas Holzschuch, ARTIS/GRAVIR IMAG INRIA, INRIA Rhône-Alpes, 655 avenue de

l’Europe, Innovallée, 38334 St-Ismier CEDEX, France.
([email protected])
Received [DATE]; accepted [DATE].
237
✐ ✐
✐ ✐
4.7.5 Accurate specular reflections in real-time (EG 2006)

Auteurs : David R et Nicolas H
Conférence : Eurographics 2006, Vienne, Autriche. Cet article a également été publié dans Com-
puter Graphics Forum, vol. 25, no 3.
Date : septembre 2006
EUROGRAPHICS 2006 / E. Gröller and L. Szirmay-Kalos Volume 25 (2006), Number 3
(Guest Editors)
Accurate Specular Reflections in Real-Time
David Roger and Nicolas Holzschuch
ARTIS–GRAVIR† IMAG INRIA
Figure 1: Left: Specular reflections computed with our algorithm. Middle: ray-traced reference. Right: Environment map reflection.
Abstract
Specular reflections provide many important visual cues in our daily environment. They inform us of the shape of
objects, of the material they are made of, of their relative positions, etc. Specular reflections on curved objects are
usually approximated using environment maps. In this paper, we present a new algorithm for real-time computation
of specular reflections on curved objects, based on an exact computation for the reflection of each scene vertex.
Our method exhibits all the required parallax effects and can handle arbitrary proximity between the reflector and
the reflected objects.
Graphics and Realism
1. Introduction done using environment mapping. While these techniques

perform quite well in a wide variety of cases, they have their
Reflections on specular objects are important in our percep-
shortcomings. They perform best if the reflected object is at
tion of a synthetic 3D scene. They convey important infor-
a large distance from the reflector, but as the reflected ob-
mation about the specular reflector itself, conveying its shape
ject moves closer to the specular reflector, reflection errors
and its fabric. They can also give information about the rel-
become more visible. The worst case for environment map-
ative spatial positions of objects or the distance between the
ping techniques is when the reflector is in contact with the
reflector and the reflected object. Finally, they give informa-
object being reflected, as in Figure 1. Environment mapping
tion about objects that are not directly visible (see Figure 1).
technique also suffer from the parallax problem: from all the
Real-time computation of specular reflections is usually points on the specular reflector, we are seeing the same side
of the reflected objects, even if the specular reflector is large
enough to see the different sides of an object.
† GRAVIR is UMR 5527 GRAVIR, a joint research laboratory of In this paper, we present a new method for computing
CNRS, INRIA, INPG and UJF. specular reflections. Our method is vertex based: we com-
c The Eurographics Association and Blackwell Publishing 2006. Published by Blackwell

Publishing, 9600 Garsington Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden,
MA 02148, USA.
239
D. Roger & N. Holzschuch / Accurate Specular Reflections in Real-Time
pute the accurate reflected position of each vertex in the field, but organized like an environment map. Both methods
scene, then interpolate between these positions. The advan- remove parallax issues, at the cost of a longer precomputa-
tage of our method is that it is computing the reflection of tion time. The specular reflector is also restricted, and can
the object depending on the position on the reflector. We are only be moved inside the area where the light field or the
therefore exhibiting all parallax effects, and we can handle environment maps were computed. If it is moved outside of
proximity and even contact between the reflector and the re- this area, the environment light field must be recomputed, a
flected objects. costly step.
However, our method also has obvious limitations: as it is Other research have dealt with distance-based reflection.
vertex-based and uses the graphics hardware for linear inter- The simplest method is to replace the infinite-radius sphere
polation between the projections of the vertices, artifacts can associated with the environment map by a finite-radius
appear if the model is not finely tessellated enough. These sphere [Bjo04]; the reflection changes with the position of
artifacts can be overcome using either adaptive tessellation the reflector in the environment, but parallax effects can not
or curvilinear interpolation. If the model is finely tesselated, be modeled.
these artifacts are not visible. Our algorithm provides solu-
More accurate methods use the Z-buffer to compute a dis-
tions for situations where no convincing solutions existed
tance map along with the environment map. For each pixel of
before.
the environment map, they know both its color and the dis-
Our paper is organized as follows: in the next section, we tance to the center of the reflected object. Patow [Pat95] and
review previous work on real-time computation of specular Kalos et al. [SKALP05] used this information to select the
reflections. Then, in section 3, we present our algorithm for proper pixel inside the environment map. Their reflections
computing vertex-based specular reflections on curved sur- change depending on the distance between the reflector and
faces. In section 4, we present experiments on various scenes the reflected object. Kalos et al. [SKALP05] use the GPU for
and comparisons with existing methods. Finally, in section 5, a fast computation of the reflected pixel, and achieve real-
we conclude and present future directions for research. time rendering for moderately complex scenes. Still, image
based methods are inherently limited to the information in-
cluded in the original image.
2. Previous Works
For planar reflectors, the easiest way to compute the re-
Ray-tracing has historically been used to compute reflec- flection is vertex-based, using an alternative camera to com-
tions on specular objects. Despite several advances using ei- pute the image of the scene as reflected by the planar reflec-
ther highly parallel computers [WSB01, WBWS01, WSS05] tor. For curved reflectors, there is no simple rule to tell the
or GPUs [CHH02, PBMH02], ray-tracing is not, currently, position of the reflection of the objects. Even for a finite-
available for real-time computations on a standard worksta- radius sphere, the simplest specular reflector, the position of
tion. the reflection depends on a 4th-order polynomial.
Planar specular reflectors are easy to model, at the cost of Mitchell and Hanrahan [MH92] used the equation of the
a second rendering pass, with a camera placed in the mir- underlying surface to compute the characteristic points in the
ror position of the viewpoint [McR96]. Curved reflectors are caustic created by a curved reflector. Ofek [Ofe98] and Ofek
more complex; the easiest method uses environment map- and Rappoport [OR98] computed the explosion map to find
ping [BN76]. intersected triangles ID based on the reflected vector. Chen
and Arvo [CA00b, CA00a] used ray-tracing to compute the
Environment mapping computes an image of the scene
reflection of some vertices, then applied perturbation to these
and maps it on the reflector as if it was located at an infi-
reflections to compute the reflection of neighboring vertices.
nite distance. The reflection only depends on the direction
of the incoming vector from the viewpoint, and can be eas- Estalella et al. [EMD∗ 05] computed the reflection of scene
ily computed in real-time on graphics hardware. Obviously, vertices on curved specular objects by an iterative method.
environment mapping suffers from parallax issues, since the At each iteration, the position of the reflection of the ver-
reflection depends on a single image computed from a sin- tex is modified, using the angles between the normal, the
gle point of view. There is also the question of accuracy: vertex and the viewpoint, in the direction where these an-
since all objects are assumed to be at an infinite distance, gles will follow Descartes’ law. They did a fixed number of
their reflection is not necessarily accurate, and the difference iterations, and have implemented the method only on the
becomes larger as the object gets closer to the reflector. CPU. In a subsequent work, developed concurrently with
ours, Estalella et al. [EMDT06] extended this work to the
There has been much research to improve the original en-
GPU, searching the position of the reflection of the vertex in
vironment mapping algorithm. To remove the parallax is-
image space.
sues, Martin and Popescu [MP04] interpolate between sev-
eral environment maps. Yu et al. [YYM05] used an envi- Our method is comparable to that of Estalella et
ronment light-field, containing all the information of a light al. [EMD∗ 05, EMDT06], but we use a different refinement
240
requires knowledge about the second derivatives of the func-

V
tion.
Our application is inherently graphical: we are display-
ing the result of our computations on the screen, and chang-
ing parameters — the viewpoint, the reflected scene, the re-
P flector — dynamically. One of the most important points
E
for such graphical applications is temporal coherency: the
Specular reflection of one point must not change suddenly between
reflector frames. We therefore need spatial information about the ac-
Figure 2: Finding the reflection of a given vertex curacy of the computations: if we have not yet computed the
position of one point with sub-pixel accuracy, we run the risk
of seeing temporal discontinuities at the next frame. We also
observed in our experiences that the number of iterations re-
criterion, keeping geometric bounds on the reflected position quired for convergence varies greatly with the configuration
for robustness. We use these geometric bounds for adaptive of the vertex. Spatial information about convergence help in
refinement, stopping the iteration as soon as we reach sub- adapting the number of iterations to the current case.
pixel accuracy. In our experience, these two elements are of
Line search methods typically use residuals to check the
great importance: in all the scenes we used, we encountered
numerical accuracy of the computations, but they do not pro-
robustness-related issues, especially for reflections at graz-
vide information about the spatial accuracy. At each step,
ing angle. We also noticed that the number of iterations re-
we know the distance traveled from the previous step, but
quired to reach convergence varies greatly with the position
this information is only linear. Since the reflector is a 2-
of the reflection.
dimension surface, it can happen that the algorithm has
closed in on the result along one dimension, but is still far
3. Algorithm from it on the other dimension.
3.1. Principles The secant method searches for roots of one function f
Our algorithm is vertex-based: we compute the reflected po- by replacing it with a linear interpolation between samples,
picking the root of the linear interpolation and iterating.
sition of all the scene vertices, then let the graphics hardware
While the secant method does not guarantee that the root
interpolate between these vertices and solve visibility issues
with a Z-buffer. Our algorithm therefore inserts itself as a re- remains bracketed, it provides a good information about the
accuracy achieved so far, and converges faster than the sim-
placement for the usual projection of the vertices. Knowing
pler bisection method. Newton’s method converges faster
the position of the viewpoint, E, for each vertex V, we find
the point P on the specular reflector that corresponds to the than the secant method, but requires computing the deriva-
tive of f .
position of V (see Figure 2).
Since we are looking for zeros of ∇ℓ, we apply to it a
The difficult part in this algorithm is computing P as a
variant of the secant method. At each step, we maintain a
function of V and E. Except in the most basic case of planar
triangle of sample points where we compute ∇ℓ and linearly
specular reflectors, there is no simple relationship between
interpolate between these gradients. At each step, the trian-
P, V and E. Even for a sphere, the explicit position of P de-
gle of sample points gives us approximate geometric bounds
pends on a polynomial of the fourth order; finding the roots
on the projection of the vertex.
of this polynomial is feasible, but takes actually longer than
the iterative method we use.
3.2. Algorithm for specular vertex reflection
According to Fermat’s principle, light travels along paths
of extremal length, so P must correspond to an extremum Our algorithm for computing the reflection of a 3D scene in
of the optical path length ℓ = EP + PV. We are searching a specular reflector uses the following steps:
for extrema of ℓ, or equivalently, for zeros of its first order 1. render the scene into the framebuffer, with direct lighting
derivative, the gradient ∇ℓ. and shadowing;
This is an optimization problem, with a function of two 2. for all vertices of the scene, find their reflection on the
parameters (the surface of the specular reflector is a 2D man- specular reflector;
ifold). Usually, optimization problems are solved with line 3. interpolate between these vertices, computing lighting
search methods, such as the gradient descent or the con- and doing hidden surface removal.
jugate gradient methods. These method progress iteratively For each vertex, finding the position of its reflection is
from an initial guess. At each step, they know the direction done iteratively, using a variant of the secant method on the
in which they should progress, but not necessarily the dis- gradient of the optical path length: at each step, we maintain
tance along this direction. Knowing this distance accurately a triangle of sample points, and we:
241
10
0
(a) Example of successive trian- (b) Example image rendered with our (c) Number of iterations required for conver-
gles generated by our algorithm algorithm gence
Figure 3: Convergence of our iterative system.
• compute the gradient of the optical path length for each

sample point (see section 3.3.2),
• linearly interpolate between these gradients, ur
• find the resulting gradient with the smallest norm (see sec-
O
r(θ,φ)
tion 3.3.3).
• discard the original sample point with the largest gradi-
ent, replace it by the new sample point and iterate (see
Figure 3(a)).
At each step, the projected area of the triangle gives us an Figure 4: To reduce dimensionality, we assume that the re-
indication of the accuracy of our computations. We stop the flector is star-shaped.
computation if this area falls below a certain threshold.
Our method converges quickly in most cases, in 5 to 10
iterations in moderately complex cases but can require up 3.3. Details of the algorithm
to 20 iterations for certain difficult points, such as vertices
whose reflection is close to the boundary of the reflector (see 3.3.1. Specular reflector parameterization
Figure 3(c)).
In order to provide interesting reflections, it is better if our
The method is robust enough to converge even if the initial reflector is actually smooth. We also assume that it is pa-
set of sample points is poorly chosen. However, it converges rameterizable. Finally, to reduce the dimensionality of the
faster if the sample points are close to the actual solution. problem, we assume that the reflector is star-shaped: there is
Section 3.3.4 describes our strategy for picking the initial a point O that is directly connected to all the points on the
sample points. surface of the reflector (see Figure 4).
Once we have computed the reflection of each vertex, we This reduces the equation of the specular reflector to a
project it on the screen and let the graphics hardware does scalar function, r. Using spherical coordinates, for example,
linear interpolation between the vertices. We exploit the fact all point P(θ, φ) on the receiver can be expressed as:
that we know the spatial position of the point being reflected  
 sin θ cos φ 
to compute direction-dependent lighting (see Section 3.3.5).
P(θ, φ) = O + r(θ, φ)ur with ur =  sin θ sin φ 

 
cos θ
 
Hidden surface removal requires special handling, as we
have several possible sources of occlusion: the scene and the
reflector may be hiding each other, parts of the reflector may For our algorithm, we will also need the variations of the
be hiding themselves, and parts of the reflected scene are surface of the reflector: we also compute the derivatives of
hiding other parts of the reflected scene. Section 3.3.6 de- the function r.
scribes our solution to these combined occlusion issues.
In a preliminary step, r and its partial derivatives are com-
The entire algorithm was implemented on the GPU, using puted and stored in a texture. Although our algorithm works
programmable capabilities for vertex and fragment process- with any kind of reflector, the star-shaped hypothesis allows
ing. Hardware implementation issues are described in sec- us to retrieve all the required information about the specular
tion 3.3.7. reflector at any given point with a single texture read. This
242
will be useful for implementing our algorithm efficiently on

the GPU. V
Using spherical coordinates introduces singularities in the
parameterization, at the poles. To avoid numerical issues in
our computations, we do not use r or its partial derivatives E B
A
directly, but we only use 3-dimensional vectors such as P
or ∇r. All computations and interpolations are done in 3D O
space, never in parameter space.
3.3.2. Optical path derivatives

Assuming we have a sample point on the surface of the Figure 5: On a sphere, the reflection lies on the arc (A, B)
reflector, we can compute the length ℓ of the optical path
length from the viewpoint E to the vertex V through P (see
Figure 2):
˜ D = 0.
Ideally, we would like to select (α, β) such that ∇d
ℓ = EP + PV
However, this is not always possible, unless the vectors a, b
˜ Dk
and c are linearly dependent. So we pick (α, β) so that k∇d
The gradient of the optical path length depends on the ˜ 2
is minimum: we derivate k∇dD k with respect to α and β,
derivative of point P on the reflector surface:
and find (α, β) such that both derivatives are null. This is
∇ℓ = ∇(EP) + ∇(PV) equivalent to solving the linear system:
 −−→ −−→ 
 EP PV  (
∇ℓ = d(P)  +  αa2 + β(a · b) + (a · c) = 0
EP PV  α(a · b) + βb2 + (b · c) = 0
Here d(P) is the derivative of point P, a linear form oper- whose determinant is:
ating on a vector. With our parameterization of P on a star-
shaped reflector, d(P) is also reduced in dimension, and we δ = a2 b2 − (a · b)2
can express ∇ℓ as a function of ∇r: The (α, β) parameters give us a new point D. We discard the
∇ℓ = (∇r · e)ur + (uθ · e)uθ + (uφ · e)uφ (1) point in (A, B, C) with the largest gradient and replace it with
point D, then iterate.
with:
−−→ −−→ In some circumstances, the determinant δ of the system
EP PV
e= + can be null or very small, making the system ill-conditioned.
EP PV When it happens, we backtrack in time, replacing one of
    the points {A, B, C} by the most recently discarded point. Of
 cos θ cos φ   − sin φ 
course, we cannot replace the most recently added point, or
uθ =  cos θ sin φ uφ =  cos φ
   
the system would enter an infinite loop.
 
− sin θ 0
 
∇r can be expressed as a function of the partial deriva- 3.3.4. Initialization

tives of r, but it is not actually necessary in our case. We are
storing information about r and its derivatives in a texture, Our method is efficient and converges even if arbitrary sam-
which will be accessed by the GPU. As a single texture read ple points are used as a starting triangle. However, the con-
gives access to 4 channels, we store r and its gradient ∇r, vergence is faster if the starting triangle is small and close to
saving computations. the result. It is not necessary for our initial guess to actually
enclose the result, since our algorithm is able to extrapolate
3.3.3. Finding a better estimate for vertex reflection outside the triangle if necessary.
At each step, we have a triangle of sample points (A, B, C). For a spherical reflector, the reflection of a vertex V is in
For all points D, expressed in barycentric coordinates with the plane defined by V, the eye E and the center of the sphere
respect to (A, B, C): O. Ofek [Ofe98] shows that the reflected vertex is bound on
D = αA + βB + (1 − α − β)C the arc of circle [AB] where A (resp. B) is the projection of
V (resp. E) on the reflector (see Figure 5).
˜ D the gradient of ℓ using
we compute an approximation ∇d
linear approximation: For non-spherical reflectors, this property does not hold.
We nevertheless use A and B as as two of our initial points.
˜ D = α∇dA + β∇dB + (1 − α − β)∇dC
∇d The third point C is chosen so that ABC is an equilateral
= αa + βb + c triangle.
243
the scene and the reflector may be occluding each other,

L and we also have to conduct hidden-surface removal on the
reflected scene. The ideal solution would be to use several
depth buffers, or a multi-channel depth-buffer. As these are
V not available, we have designed a workaround.
For each vertex V, when we compute its projection P,
we store in the depth buffer the distance between P and V.
E This way, the Z-buffer of the graphics card naturally removes
fragments of the reflected scene that are hidden by other ob-
P jects.
Specular
To solve the other occlusion issues, we use the following
reflector strategy:
Figure 6: Computing the illumination of the reflected scene: • pre-render the frontmost back-facing polygons of the re-
illumination at the reflected point is computed using its flector into a depth texture; clear the Z-buffer and frame-
−−→ −−→ buffer.
BRDF, with V L and V P as incoming and outgoing direc-
tions; it is then multiplied by the BRDF on the reflector, with • render the scene, with lighting and shadowing; clear the
−−→ −−→ stencil-buffer.
PV and PE as incoming and outgoing directions.
• render the reflector, with hidden surface removal. For pix-
els that are touched by the reflector, set the stencil buffer
to 1.
V’ • clear the depth buffer and render the reflected scene us-
ing our algorithm. The fragments generated are discarded
V
if the stencil buffer is not equal to 1 (using the classical
P stencil test) and if they are further away than the back-
E P’
faces of the reflector (using the depth texture computed at
the first step).
• (optional) enable blending and render the reflector, com-
puting its illumination.
Figure 7: For a ray originating from the eye, we have to re-
Our strategy correctly handles occlusions between the re-
solve visibility issues both between P and P′ , on the reflector,
flector and the scene (using the stencil test), as well as self
and between V and V ′ , on the reflected ray.
occlusion of the reflector, using the depth texture. Note that
we have to use frontmost back-facing polygons: using the
frontmost front-facing polygons would falsely remove all the
3.3.5. Direction-dependent lighting on the reflected reflected scene for locally convex reflectors, since we are lin-
scene early interpolating between reflected points that are on the
When we display a fragment of the reflected scene, we surface of the reflector.
know its spatial position V and the approximate spatial posi-
tion of its reflection P. We use this information to compute 3.3.7. GPU implementation
directionally-dependent lighting:
We have implemented our algorithm on the GPU for better
• compute illumination at point V, using its BRDF, with the efficiency. To compute the reflected position of one vertex,
light source L as the incoming direction and the reflected we need access to the equation and derivatives of the spec-
point P as the outgoing direction (see Figure 6). ular reflector. Since we stored these in a texture to handle
• multiply this by the BRDF of the specular reflector at arbitrary specular reflectors, this limits us to two possible
point P, using the reflected point V as the incoming di- implementation strategies:
rection, and the viewpoint E as the outgoing direction.
• place our algorithm in vertex shader, using graphics hard-
This simple rule allows us to have directional lighting on ware with vertex texture fetch (NVidia GeForce 6 and
the reflected scene. The lighting on the reflected scene is thus above).
not necessarily the same as the lighting on the original scene. • place our algorithm in a fragment shader and render the
reflected positions of the vertices in a Vertex Buffer Ob-
3.3.6. Multiple Hidden-Surface Removal
ject. In a subsequent pass, render this VBO. This requires
Hidden surface removal requires special handling, as we hardware with render-to-vertex-buffer capability, which
have several possible sources of occlusion (see Figure 7): was not available to us at the time of writing.
244
We have used the first strategy, but found that it suffers 80

Our algorithm
from several limitations: there are less vertex processing 70 Env. Map.
units than fragment processing units on GPUs, so we are not No Reflections
60
Rendering time (ms)

taking full advantage of its parallel engine; a texture fetch in
a vertex processor has a large latency; vertex processors can 50
not currently read from cube maps or rectangular textures, 40

forcing us to use square textures. 30
∗
As pointed out by [EMD 05], it makes sense to use a cube 20
map to store the information about the specular reflector, 10
since reflector information is queried based on a direction
0
vector d, as cube maps are. In the current implementation, 0 10000 20000 30000 40000 50000 60000 70000
we have to convert the vector d into spherical coordinates Number of polygons
(θ, φ), a costly step.
Figure 10: Observed rendering time (in ms) for render-
An implementation of our algorithm using the second ing scenes, with no specular reflections, with environment-
strategy is likely to have much better rendering times, as well mapping specular reflections and with our algorithm.
as a simpler code.
4. Experiments and Comparisons

4.1. Comparison with other reflection methods the worse timing results if the scene being reflected com-
pletely surrounds the reflector. In that case, many objects are
The strongest point of our algorithm is its ability to produce reflected on the silhouette, dragging the rendering process.
reflections with great accuracy. Figure 1 and Figure 8 show, These scenes are also more interesting to render, which is
for comparison, pictures generated with our algorithm, ray- why we used them nevertheless in all our timings results.
traced pictures for reference, and pictures generated with en-
vironment mapping. Our method handles all the reflection Figure 10 shows the rendering times for scenes of various
issues, including contacts between the reflector and the re- sizes, surrounding the specular reflector. For comparison, we
flected object. Differences between our method and the envi- plotted the rendering time for the scene, without specular re-
ronment mapping method especially appear for objects that flections, with specular reflections simulated by environment
are close to the reflector, such as the hand in Figure 1 and the mapping and with specular reflections computed by our al-
handle of the kettle in Figure 8. Notice how the reflection of gorithm. The extra cost introduced by our algorithm is al-
the handle of the kettle appears to be flying in the reflection ways larger than that of environment maps, but it remains
of the room in Figure 8(c). within the same order of magnitude. We observe satisfying
performances for scenes up to 40,000 polygons, and we also
For objects that are close to the reflector, our algorithm
observe that rendering times depend linearly on the number
exhibits all the required parallax effects. One of the prob-
of vertices (all timings in this section were measured on a
lems with environment mapping techniques is when objects
2-processor Pentium IV, running at 3 GHz, with a NVidia
are visible from some parts of the reflector but not from its
GeForce 7800 graphics card). We measure rendering times
center. In figure 9, our algorithm properly renders the back
in ms by taking the reciprocal of the observed framerate,
of the chair.
multiplied by 1000.
Another strong point of our algorithm is its robustness and
temporal stability. As shown in the accompanying video, re-
flections computed by our algorithm exhibit great temporal 4.3. Robustness and early exit
stability, without temporal aliasing. This property is essen-
As our method is based on a triangle of sample points and
tial for practical applications, such as video games.
uses the gradient of the optical path at each sample point, it
has several advantages:
4.2. Rendering speed
• we make large steps if we are far from the solution,
As we have seen in Figure 3(c), the number of iterations re- and smaller steps as we approach the solution (see Fig-
quired for convergence depends greatly on the position of ure 3(a)). This ensures faster convergence, even with poor
the reflection. Reflections close to the center of the reflector initial conditions.
converge quickly, in less than 5 iterations, while reflections
• the method is remarkably robust, and converges even for
of objects located close to the silhouette of the reflector take
difficult cases, such as vertices reflected at grazing angles;
longer to reach convergence.
in that case, it takes longer to reach a converged solution,
As a consequence, the rendering time depends on the re- but it reaches one, sometimes after more than 10 iterations
spective position of the object and the reflector. We observe (see Figure 3(c)).
245
(a) Our method (b) Ray traced reference (c) Environment mapping
Figure 8: Comparison of our results (left) with ray-tracing (center, for reference) and environment-mapping (right). The differ-
ence are especially visible for objects that are close to the kettle, such as its handle and the right hand of the character.
(a) Our algorithm (b) Environment mapping
Figure 9: Our algorithm is able to display objects that are not visible from the center of the reflector. Notice here how the back
of the chair is properly rendered.
We note that for simple cases, our method reaches con-

vergence very quickly (less than 5 iterations), while for dif-
ficult cases it requires more computations. As we are doing
our computations on the vertex processing units, the fact that
different vertices require different computation times is not
a big issue. In our tests, we found that using the early exit
greatly improved the speed of the computations compared
to using a fixed number of iterations.
Spatial consistency could become a larger issue if we
moved the computations to the fragment processing unit, but
we observe (see Figure 3(c)) that all the vertices from one
object have similar complexities; all these vertices should
take roughly the same computation time, ensuring that early
exit also works well in this situation.
Figure 11: Example of a reflection with a concave reflec-
4.4. Concave reflectors tor. As our algorithm only captures the first reflection of the
scene in the bowl, the top of the bowl looks empty.
Concave reflectors are a special case. As noted by [Ofe98,
OR98], concave reflectors divide space into three zones. Ob-
jects that are in the first zone, close to the reflector, are
reflected only once and upside-up. Objects that are in the
246
The solution to these issues would be to use curvilin-

ear interpolation, or adaptive tessellation. In the meantime,
we apply our algorithm to well-tessellated scenes (see Fig-
ure 12(b)). Note that curvilinear interpolation of depth val-
ues would be easier with current graphics hardware than
curvilinear interpolation in pixel space.
5. Conclusion and Future Works

We have presented an algorithm for computing reflections on
curved specular surfaces, using vertex-based computations.
Our algorithm produces realistic specular reflections in real-
time, showing all the required parallax effects. Our algorithm
is iterative, with an adaptive number of iterations, and has a
Figure 13: Example of Z-fighting when a small object is lay- geometry-based criterion for deciding convergence.
ered on top of a larger object. In its current form, our algorithm uses linear interpolation
between the projections of the vertices, resulting in artifacts
for scenes that are not finely tessellated. Solutions to this
third zone, far from the reflector, are reflected only once, problem are either adaptive tessellation or curvilinear inter-
and upside-down (as in Figure 11). Objects that are in the polation techniques.
second zone, between the other two, can have several reflec- The strongest point of our algorithm is that it can handle
tions, sometimes an infinite number, and their reflection is arbitrary geometry on the reflector and the reflected object,
numerically unstable. including contact between the two surfaces. It is for this sit-
uation — close proximity between the reflected object and
As with [Ofe98, OR98], our algorithm properly handles
the reflector — that current environment-map methods do
objects that are either completely in the first or the third
not provide convincing results. We think that our algorithm
zone, but not objects that cross or are in the second zone.
would be best used as a complement to existing methods,
In our experiments, another problem appeared: concave handling the reflection of close objects, while environment-
objects are highly likely to cause secondary reflections (remap based methods would be used for the reflection of fur-
flections with several bounces inside the specular reflector). ther objects and the background.
As our algorithm only captures the first reflection of the
As our algorithm provides a method to compute the re-
scene by the specular reflector, the place where these sec-
flected ray passing by two endpoints, it can be used for other
ondary reflections should be looks empty.
computations, such as caustics and refraction computations.
4.5. Tesselation issues References

One of the biggest drawback of our algorithm is that we are [Bjo04] B K.: Finite-radius sphere environment
only computing the exact reflection position at the vertices, mapping. In GPU Gems. Addison-Wesley, 2004.
and we let the graphics hardware interpolate between the re-
flected positions. Currently, the graphics hardware is only [BN76] B J. F., N M. E.: Texture and reflection
able to interpolate linearly. This has several consequences. in computer generated images. Communications of the
The first one is that the interpolated objects are located be- ACM 19, 10 (1976), 542–547.
hind the front face of the reflector if the reflector is locally [CA00a] C M., A J.: Perturbation methods for in-
convex. Thus, the front face of the reflector would hide the teractive specular reflections. IEEE Transactions on Visu-
reflection. We had to ensure that the front face of the reflec- alization and Computer Graphics 6, 3 (2000), 253–264.
tor was not present in the Z-buffer to avoid this problem. [CA00b] C M., A J.: Theory and application of
The second one is that for objects that are not finely tesse- specular path perturbation. ACM Transactions on Graph-
lated, we see interpolation artifacts. These artifacts can either ics 19, 4 (2000), 246–278.
be discontinuities between neighboring faces with different
levels of tessellation, or a reflection that looks straight, as [CHH02] C N. A., H J. D., H J. C.: The ray
in Figure 12(a). The third consequence appears for thin ob- engine. In Graphics Hardware 2002 (2002).
jects layered on top of another, larger object (see Figure 13). [EMD∗ 05] E P., M I., D G., T D.,
Because we are linearly interpolating Z-values as well as po- D O., C F.: Accurate interactive specular re-
sition, the back object may pop in front of the other object, flections on curved objects. In Proceedings of VMV 2005
partially occluding it. (Nov. 2005).
247
(a) The bar is not tessellated, and its reflection is not curved — (b) The problem disappears if we tessellate the bar.
as it should be.
Figure 12: The scene has to be well tesselated, or artifacts appear because we cannot render curved triangles.
[EMDT06] E P., M I., D G., T [WSB01] W I., S P., B C.: Interactive
D.: A gpu-driven algorithm for accurate interactive reflec- distributed ray tracing of highly complex models. In Ren-
tions on curved objects. In Rendering Techniques 2006 dering Techniques 2001 (Proc. 12th EUROGRAPHICS
(Proc. EG Symposium on Rendering) (June 2006). Workshop on Rendering) (2001), pp. 277–288.
[McR96] MR T.: Programming with OpenGL: [WSS05] W S., S J., S P.: Rpu: a
Advanced rendering. Siggraph’96 Course, 1996. programmable ray processing unit for realtime ray trac-
ing. ACM Transactions on Graphics (Proc. of Siggraph
[MH92] M D., H P.: Illumination from
2005) 24, 3 (2005), 434–444.
curved reflectors. Computer Graphics (Proc. of SIG-
GRAPH ’92) 26, 2 (1992), 283–291. [YYM05] Y J., Y J., MM L.: Real-time reflec-
tion mapping with parallax. In Proc. I3D 2005 (2005),
[MP04] M A., P V.: Reflection Morphing.
pp. 133–138.
Tech. Rep. CSD TR#04-015, Purdue University, 2004.
[Ofe98] O E.: Modeling and Rendering 3-D Objects.
PhD thesis, Institute of Computer Science, The Hebrew
University, 1998.
[OR98] O E., R A.: Interactive reflections
on curved objects. In Proc. of SIGGRAPH ’98 (1998),
pp. 333–342.
[Pat95] P G. A.: Accurate reflections through a Z-
buffered environment map. In Proceedings of Sociedad
Chilena de Ciencias de la Computacin (1995).
[PBMH02] P T. J., B I., M W. R., H-
 P.: Ray tracing on programmable graphics hardware.
ACM Transactions on Graphics (Proc. of Siggraph 2002)
21, 3 (July 2002), 703–712.
[SKALP05] S-K L., A B., L I., P-
 M.: Approximate ray-tracing on the GPU with dis-
tance impostors. Computer Graphics Forum(Proceedings
of Eurographics ’05) 24, 3 (2005).
[WBWS01] W I., B C., W M., S
P.: Interactive rendering with coherent ray tracing. Com-
puter Graphics Forum (Proc. of EUROGRAPHICS 2001)
20, 3 (2001).
248
4.7. ARTICLES 249
4.7.6 Wavelet radiance transport for interactive indirect lighting (EGSR 2006)
Auteurs : Janne K, Emmanuel T, Nicolas H et François X. S
Conférence : Eurographics Symposium on Rendering 2006.
Date : juin 2006
Eurographics Symposium on Rendering (2006)
Tomas Akenine-Möller and Wolfgang Heidrich (Editors)
Wavelet Radiance Transport for Interactive Indirect Lighting
Janne Kontkanen1 , Emmanuel Turquin2 , Nicolas Holzschuch2 and François X. Sillion2
1 Helsinki University of Technology 2 ARTIS† GRAVIR/IMAG INRIA
Maze scene Direct lighting Indirect lighting Complete illumination

Figure 1: This scene is rendererd 15 FPS by our system with full global illumination. The light sources (spotlights) and the
viewpoint can be modified interactively. The precomputation time was 23 minutes.
Abstract
Global illumination is a complex all-frequency phenomenon including subtle effects caused by indirect lighting.
Computing global illumination interactively for dynamic lighting conditions has many potential applications,
notably in architecture, motion pictures and computer games. It remains a challenging issue, despite the consid-
erable amount of research work devoted to finding efficient methods. This paper presents a novel method for fast
computation of indirect lighting; combined with a separate calculation of direct lighting, we provide interactive
global illumination for scenes with diffuse and glossy materials, and arbitrarily distributed point light sources. To
achieve this goal, we introduce three new tools: a 4D wavelet basis for concise radiance expression, an efficient
hierarchical pre-computation of the Global Transport Operator representing the entire propagation of radiance in
the scene in a single operation, and a run-time projection of direct lighting on to our wavelet basis. The resulting
technique allows unprecedented freedom in the interactive manipulation of lighting for static scenes.
Graphics and Realism
1. Introduction change. In this specific setup, interactive global illumination

remains a particularly challenging issue.
Illumination simulation methods have many interesting ap-
plications, for example in architectural design, lighting de-
sign, computer games or motion pictures. These applica- However, by taking advantage of the linearity of the ren-
tions make use of global illumination algorithms, known for dering equation it is possible to precompute light transport
their high computational demands, and would greatly benefit offline, and to use this data during run-time to obtain con-
from improved interactivity. We note that these applications vincing global illumination effects: Precomputed Radiance
are often dealing with indoor scenes, illuminated by local Transport methods (PRT) [SKS02, Leh04] precompute the
light sources whose position and orientation are subject to relationship between the emitted light and the radiance out-
going from the surfaces of the scene. In order to keep the
complexity manageable, these methods usually express the
† ARTIS is a research project in the GRAVIR/IMAG laboratory, a emission in a low dimensional basis. The most common way
joint unit of CNRS, INPG, INRIA and UJF. to do this is to consider only infinitely distant lighting, and
251
J. Kontkanen, E. Turquin, N. Holzschuch & F. X. Sillion / Wavelet Radiance Transport for Interactive Indirect Lighting
thus reduce the dimensionality of the emission to 2. Yet, lo- multiple bounces of light, at the expense of rendering time.
cal light sources (a.k.a. near-field illumination) have 5 de- This approach can be seen as a dynamic generalization of
grees of freedom, that can be narrowed down to 4 without Greger’s irradiance volumes [GSHG98].
loss of generality if we consider that light travels through
Dachsbacher and Stamminger [DS06] introduce an ex-
a vacuum; this high dimensionality tends to make classical
tended shadow map to create first-bounce indirect light
PRT methods extremely costly.
sources. They splat the contribution of these sampled
In this paper, we present a technique for interactive com- sources onto the final image using deferred shading. They
putation of global illumination in static scenes with dif- only compute the first indirect light bounce, without taking
fuse and glossy materials, and arbitrarily placed dynamic visibility into account. Nevertheless, they observe that the
point/spotlights. Our algorithm uses a precomputed Global results look plausible in most situations.
Transport Operator that expresses the relationship between
Despite using GPUs or even custom hardware, the above
incident and outgoing surface radiance. During run-time we
methods currently barely run interactively, unless they re-
project the direct light form the light sources to the surfaces,
strict themselves to small scenes or degrade the accuracy of
and apply this precomputed operator to get full global illu-
the simulation.
mination. Rather than following the common compute, then
compress scheme, we try to generate the operator directly in In a separate research direction, PRT techniques [Leh04]
a compact representation. precompute light exchanges and store the relationship be-
tween incoming lighting and outgoing global illumination
Our contributions are: a new 4D wavelet basis for compact
on the surface of an object. The result of these precomputa-
representation of radiance, a method to efficiently compute
tions, the light transport operator, is compressed and used at
the Global Transport Operator, greatly speeding up the pre-
runtime for interactive display of global illumination. Most
computation time, and a method to efficiently project direct
PRT techniques start by precomputing the light transport op-
lighting from point light sources on our hierarchical basis at
erator with great accuracy, then compress it, typically using
runtime. These three contributions, combined together, re-
clustered principal component analysis [SHHS03].
sult in interactive manipulation of light sources, with imme-
diately visible results in the global lighting. The cost of the uncompressed light transport is directly
related to the degrees of freedom (DOF) n in the operator,
The most noticeable limitation of our approach is directly growing with O(kn ). As discussed in section 1, the general
linked to a well-known problem of finite-element methods expression for emission space, assuming no participating
for global illumination: our basis functions have to be ex- media, has 4 DOF. Given that the outgoing surface radiance
pressed on the surfaces of the scene. Incidentally, our ex- also has 4, the general form of the operator end up with 8
ample scenes are exclusively composed of large quads. An- DOF.
other important limitation is that BRDFs must be relatively
low-frequency to be efficiently representable in our wavelet To keep memory and precomputation costs tractable, most
basis. PRT techniques somehow restrict these degrees of free-
dom. It is generally achieved by assuming infinitely distant
lighting, as done Sloan et al. [SKS02] and many others.
2. Previous work Another option is to fix the locations of the light sources
in space [DKNY95]. Yet another one is to fix the view-
Global illumination has been the subject of research in Com-
point [NRH03]: in this work, Ng et al. demonstrate that
puter Graphics for decades. Dutré et al. [DBB03] give a
all-frequency lighting from an infinitely distant environment
complete survey of the state-of-the art of global illumina-
can be rendered efficiently by using a light transport opera-
tion techniques. There have been plentiful research efforts to
tor expressed in Haar wavelet basis and non-linearly com-
speed up global illumination computations and achieve real-
pressed. The fixed viewpoint restriction applies when the
time or interactive framerates.
scene contains glossy materials. In a subsequent publica-
Ray-tracing has been ported to the GPU [PBMH02, tion [NRH04] the authors remove this restriction by intro-
PDC∗ 03] or to specific architectures [WSB01, SWS05]. The ducing triple wavelet product integrals. As a result they
same has been done with the radiosity algorithm [Kel97, are able to generate high quality pictures that solve the 6-
CHL04], while others use the GPU for fast computation of dimensional transport problem, but not with real-time or in-
hierarchical form-factors [LMM05]. teractive rates.
Nijasure et al. [NPG05] compute a representation of the Haar wavelets have succesfully been used by oth-
incident radiance at several sample points sparsely covering ers [LSSS04, WTL06] to efficiently express all-frequency
the volume enclosed by the scene. Incident radiance is stored transport from detailed environment maps to glossy surfaces.
using spherical harmonics. Spherical harmonics coefficients These method utilize separable decomposition, consisting of
are interpolated between the sample points and applied to the a purely light-dependent term and a purely view-dependent
surfaces of the scene. The system can be iterated to compute term.
252
The approaches closest to ours are those of Kristensen

et al. [KAMJ05] and Hasan et al. [HPB06]. Both consider
static scenes under near-field illumination, and they sepa-
rate the computation of direct lighting, done on the fly us-
ing standard GPU-based methods, and indirect lighting pre-
computed on some specific basis functions. Kristensen et
al. [KAMJ05] use a 3D unstructured point cloud basis, pre-
computing radiance transport from this basis to the surfaces
of the scene. At run-time, uniform point light sources are
projected onto the point-cloud basis, then they apply the pre- (a) Standard refinement (b) Non-standard refinement
computed transport operator to obtain indirect illumination
Figure 2: Forming multi-dimensional wavelet basis.
on the surfaces.
In a work concurrent to ours, Hasan et al. [HPB06] pre-
provides interactive visualization of global illumination ef-
compute direct-to-indirect transport corresponding to our
fects in this scene under dynamic local lighting. We compute
GTO and express it in wavelet basis. The receiving basis
a 4D wavelet representation of indirect radiance on the sur-
consist in the visible pixels, and the sending basis is build
faces of the scene. Our technique can be split into two high-
by distributing point samples into the scene, which are then
level components: an offline component for precomputing
hierarchically clustered. a preprocess. Each of these meth-
the light transport operator, and a run-time component for
ods presents different limitations: the former is restricted
rendering indirect lighting using this operator.
to omni-directional point lights, and the later renders high-
quality pictures but only for diffuse-to-diffuse indirect trans- The run-time component uses the precomputed transport
fer (although the last reflection can be arbitrarily glossy) and operator to interactively render global illumination:
fixed viewpoint. As a comparison, our method doesn’t suffer
from these restrictions, but is limited to simple geometry and 1. Project direct lighting onto our wavelet basis.
works best with diffuse materials. 2. Apply the precomputed global transport operator, result-
ing in indirect lighting, expressed in our wavelet basis.
A common problem to many of the existing PRT methods 3. Convert this result to outgoing radiance, and blend with
is their fairly inefficient approach to precomputation. A lot of direct lighting.
information computed during this step is discarded during a
subsequent compression stage. One of our main concerns is We treat direct lighting separately because it contains sharp
precisely to avoid unnecessary computations and rather try details that are best rendered using specific techniques. Our
to directly generate a concise operator (Hasan et al. share run-time component is explained in section 5.
this objective). This way, we greatly reduce the memory The offline component consists of these two steps:
cost and computation time for the precomputation step. We
show clear improvement in precomputation times compared 1. Compute a Direct Transport Operator (DTO) for the
to Kristensen et al., but as stated earlier, our finite element scene. The DTO expresses the propagation of light inside
approach also brings restrictions not present in their work. the scene, and corresponds to a single bounce of light.
2. Compute a Global Transport Operator (GTO) using the
Our work draws inspiration from hierarchical finite ele- DTO. The GTO expresses the full radiance propagation
ment methods for global illumination [HSA91, GSCH93]. inside the scene, i.e. an infinite number of light bounces.
Wavelet and hierarchical algorithms adapt the solution to
the geometry and lighting conditions: a coarse resolution is For efficient computation and compact representation, we
used where the illumination is smooth, and a finer resolution express the operators in wavelet basis. Our bases for express-
when there are sharp variations. ing the surface radiance and the operators are described in
section 4.2. The computation of the DTO is explained in de-
In our precomputation, we adopt some of the solutions tail in section 4.3. In section 4.4 we discuss how the DTO
used in Wavelet Radiance [CSSD94, CSSD96] which solves can be used to efficiently compute the GTO using Neumann
global illumination using a hierarchical 4D wavelet basis. series and non-linear compression.
Wavelet Radiance uses non-standard decomposition to rep-
resent surface radiance, but a standard decomposition for the
transport operator. The latter enables direct computation of 4. Offline component
the transfer coefficients without needing a push-pull step.
4.1. Wavelet Basis for Surface Radiance
Surface radiance is a 4-dimensional function: two dimen-
3. Overview of the algorithm
sions for the surface location and two dimensions for the di-
Our method takes as input the geometric definition of a static rection. Because of its 4D nature, storing a tabulated version
and easily parametrized scene (e.g. composed by quads), and of surface radiance is prohibitively costly. This problem is
253
even more pronounced for the 8-dimensional transport oper-

ator that expresses the radiance transport between surfaces.
An efficient expression and computation of the transport
operator highly benefits from a hierarchical representation;
hence we chose wavelets as our basis functions. We elected
s s
Haar wavelets for their computational simplicity, but our s a
algorithm can be applied to any type of tree wavelet ba- r r
sis [GSCH93]. s a
The building blocks of Haar basis are the following Figure 3: Light transport from sending basis function
smooth function φ and wavelet ψ: b ss (y)bas (ω) to receiving basis function brs (x)bra (α).
1 for 0 ≤ x < 1
(
φ(x) =
0 otherwise
4.2. Wavelet Basis for Transport Operator
(1)
1 for 0 ≤ x < 1/2 The projected transport operator consists of coefficients that



for 1/2 ≤ x < 1 describe the influence of each basis function to all the other

ψ(x) =  −1


 0 otherwise ones. The non-standard operator decomposition is a more

All the wavelets and smooth functions of Haar basis are common choice in hierarchical radiosity, as in theory it gives
formed by scales and translates of the above elementary a more compact representation than standard decomposition.
functions as follows: In spite of this we chose to follow [CSSD94] and used stan-
dard operator decomposition. We see two advantages in us-
φij = φ(2i x − j), ψij = ψ(2i x − j) (2) ing the standard decomposition: it decouples the resolution
Where i gives the scale, and j gives the translation. For com- for sender and receiver, and there is no need for a push-pull
prehensive introduction to Haar wavelets and wavelets in step. The former is an obvious advantage, for example when
general we refer to [SDS96]. the sender and receiver differ greatly in size or in complexity.
Multi-dimensional wavelet bases are usually formed by The latter requires an explanation: conventional global il-
combining one-dimension wavelet bases. There are two sys- lumination methods, using the non-standard representation,
tems for creating multi-dimensional wavelet bases: the stan- require a push-pull step between light bounces [HSA91].
dard refinement (see Figure 2a), where the dimensions are For these methods, the cost of the push-pull step is not pro-
refined separately, and the non-standard refinement (see Fig- hibitive. However, we are using the DTO to compute the
ure 2b), where refinement is performed alternatively along Global Transport Operator, using Neumann series (see sec-
all dimensions. tion 4.4). During this computation, we perform several mul-
tiplications between operators. During these operator multi-
The non-standard refinement method merges together the
plications, the fact that we do not require a push-pull step
different dimensions, treating them equally. As a conse-
greatly accelerates the computation.
quence, it is more widely used in fields such as Image Anal-
ysis and Image Synthesis, where the two spatial dimensions
serve an equal purpose. 4.3. Direct Transport Operator
For Radiance computations, the spatial and angular di- We compute the Direct Transport Operator to express a sin-
mensions are not equivalent. A surface can exhibit large vari- gle bounce of light. As we are going to conduct operator
ations on the spatial domain and be more continuous over multiplications, we require the output space of the DTO to
the angular domain, and linking the resolutions of the spa- be equal to its input space. This leaves a choice: either we
tial and angular dimensions is not always efficient. For this express the DTO in terms of incident radiance or in terms of
reason, we decouple the spatial and angular domains, using outgoing radiance. We chose to use the incident form of the
standard refinement between these dimensions. For the 2D Direct Transport Operator.
sub-domains for angular and spatial dimensions, we still use
non-standard refinement. Our wavelet basis for 4D radiance The incident form of the Direct Transport Operator is de-
therefore uses a combination of standard and non-standard fined as follows:
refinement.
Z
(T L)(x, x ← y) = fr (ω, y, y → x)V(x, y)⌊ω · ny ⌋L(y, ω) dω
For the angular domain, the hemisphere of directions is
(3)
mapped to the unit square using a cosine-weighted concen-
tric map [SC97]; we then apply wavelet analysis over the The transport operator maps the incident radiance arriving
unit square. Using this mapping allows pre-integrated cosine to location y from direction ω to incident radiance at another
on the hemisphere of directions, with a low angular distor- location x from direction x ← y. Given a certain distribution
tion and constant area mapping. of incident radiance, applying this operator once gives the
254
distribution of light that has been reflected once from the sis, the refinement may arrive at a certain link from several
surfaces of the scene. Here fr refers to BRDF and V to the parent links. This means that when we chose the standard
visibility term. Along with the ⌊ω · ny ⌋ term, they form the method for combining dimensions in our 8D basis, we par-
kernel k(x, y, ω) of the light transport operator. tially lost the tree-property of our basis.
The projected form of the transport operator is obtained However, we use the same solution for the problem as
by integrating the 6D kernel against each 8D wavelet, in a was used in the conventional wavelet radiance [CSSD94].
similar fashion to [CSSD94]: If, in the refinement process, we arrive at a 8D wavelet coef-
Z ficient that has already been visited, we terminate the traver-
K(x, y, ω)b ss (y)bas (ω)brs (x)bra (x ← y) dω d x d y (4) sal. The difference with conventional wavelet radiance is that
we have four independent subspaces instead of two.
⌊x←y·ny ⌋
Where K(x, y, ω) = k(x, y, ω) and brs , bra , b ss and bas
r2xy An important point in our algorithm is that, as with
refer to the elementary non-standard basis functions of re- Wavelet Radiance [CSSD94], we are computing wavelet
ceiving spatial, receiving angular, sending spatial and send- transport coefficients directly between wavelet coefficients,
ing angular dimensions, respectively (see section 4.1). x and not between smooth functions. This eliminates the need for
y are integrated over surfaces, while ω is integrated over the push-pull step.
hemisphere oriented according to corresponding surface nor-
mal. For a visual illustration, see Figure 3. 4.3.1. Numerical Integration
In the context of light transport, a wavelet coefficient ob- The actual coefficients corresponding to each link are com-
tained from Equation 4 has traditionally been called a link. puted by generating quasi-randomly distributed samples in
We will use this term to refer to a group of coefficients for the support area of the link. Thus, we are computing Equa-
8D basis functions sharing the same support on all 2D sub- tion 4 by quasi-Monte Carlo integration.
spaces. In practice, this means that each link corresponds to
The coefficients of the coarsest links are difficult to com-
255 wavelets and a single smooth function coefficient. This
pute accurately without a significantly large amount of sam-
can be seen by considering that each 2D sub-space has 4 el-
ples. On the other hand the finer scale wavelets do not re-
ementary non-standard basis functions that share the same
quire as many samples since within a smaller support the
support, and 44 = 256. As an example of elementary 2D
kernel does not deviate as much. Because of this we adopt
functions, see the four functions in the lower left corner of
the adaptive integration procedure used in Wavelet Radi-
Figure 2b.
ance [CSSD94]: we first refine the link structure to the finest
We compute the Direct Transport Operator by progres- level and then perform a wavelet transform to compute the
sively refining the existing links. We start by creating inter- coarser links in terms of the finer ones. As a result, only
actions between coarsest level basis functions in the scene, the finest scale wavelet coefficients are computed directly. In
and then refine these. At each step, we consider 256 ba- our implementation, this procedure is done during a single
sis function coefficients. Note, however, that not all the 256 recursive visit.
coefficients are stored. We only store the necessary parts
of link: a link between two diffuse surfaces does not need 4.3.2. Refinement Oracle
wavelets in angular domain. In practice, each link contains
The refinement oracle considers each link, i.e. a cluster of
between 1 and 256 wavelet coeffients depending on its type.
coefficients of wavelets sharing the same support at the time.
A refinement oracle (see section 4.3.2) tells us whether It works by testing quasi-random samples of the kernel, and
a link needs to be refined. For each link it has a choice to using explicit knowledge of the BRDF. If the oracle finds
refine in any of the 2D sub-domains (spatial receiver, angu- that the operator is smooth, then the refinement stops and the
lar receiver, spatial sender, angular sender). The refinement kernel samples are used to compute the wavelet coefficients.
oracle may independently choose each option, possibly re-
At each refinement step, the refinement oracle has to se-
fining both the sender and the receiver in space and angle, or
lect whether to refine the sender or the receiver, or both, and
simply refining the receiver in space, or any combination.
whether to refine them spatially or angularly, or both. Thus,
When the link is refined, we create the child wavelets, and the oracle can refine between 0 to 4 dimensions, resulting in
recursively consider each newly created link. Consider a re- 16 possible combinations. The ability to make an indepen-
finement of one of the spatial basis functions: when a spa- dent refinement decision in each sub-space is a consequence
tial basis function is refined, four new child links are created of using standard refinement as described in sections 4.1
(spatial patch is divided into four child patches). However, and 4.2.
when two of the 2D sub-domains are refined, there will be
The decision to refine the interaction in the angular do-
4 × 4 = 16 child links to consider, and finally if all the dimen-
main is based solely on the BRDFs of the sender and re-
sions are refined 44 = 256 child links are created.
ceiver, unless the sender and the receiver are mutually invis-
When performing progressive refinement in standard ba- ible, in which case the interaction is not refined. The basis
255
functions are mutually invisible if no unobstructed ray can

be generated between the supports of the sender and the re-
ceiver. Note that this can happen even if the spatial basis
functions are not mutually occluded, but the angular support
of the sending basis function is oriented in such a way that it
does not point towards the receiving basis function.
Diffuse surfaces are never subdivided angularly. For an
arbitrarily glossy BRDF, the maximum level of angular sub- E = 0.1555 E = 0.0093
division depends on the resolution of its wavelet represen-
tation. Note that at this point of the algorithm, the wavelet
representation is only used to control the refinement reso-
lution, whereas the kernel samples are evaluated using the
original, possibly analytic representation of the BRDF.
Spatial refinement is based on the kernel deviation esti-
mated on the samples. To take advantage of the standard op-
erator decomposition, i.e. the ability to refine the sender and
receiver independently, we apply the following heuristics: E = 0.0216 E = 0.0013
• Refine sender if (max(K) − min(K))A s > ǫ s

• Refine receiver if (max(K) − min(K))Ar > ǫr
With K as defined in Equation 4; A s , Ar refer to the surface
areas covered by the supports of the basis functions; and ǫ s ,
ǫr to the user selected thresholds. We use separate thresh-
olds for sending and receiving refinement, since it is use-
ful to generate asymmetrically refined matrices, where the
E = 0.0025 E = 0.0002
sending basis functions are coarser than the receiving (see
Original GTO Multiplied by fine DTO
section 4.4).
Figure 4: Multiplying the GTO by the original fine-
resolution DTO (right) improves the visual quality compared
4.4. Global Transport Operator
to the original GTO (left). Notice that even with a larger nu-
Having computed the direct transport operator T , which ex- merical error, the gathered GTO gives a more pleasing result
presses a single bounce of light transfer between the surfaces (compare upper right corner with lower left corner). Error
in the scene, we use it to compute the global transport oper- E refers to the sum of squared differences of wavelet coeffi-
ator, using the Neumann series: cients when compared to uncompressed GTO.
G = I + T + T 2 + T 3 + T 4 + ...
After a coarse GTO has been computed, we still perform
The global transport operator expresses the relationship be- one more step to improve the results: we multiply the series
tween the converged incident lighting and the incoming in- by the original fine-resolution DTO from the left. This opera-
cident lighting. G is computed iteratively, from T . At each tion can be thought as a kind of final gathering that improves
step, we compute T n+1 = T n T . the visual aspect of the result by using higher resolution rep-
The computation of above series is rather expensive using resentation for the last bounce before the light meets the eye
a high resolution representation of T , since in the end all (see Figure 4), in the same spirit as Hasan et al. [HPB06].
the basis functions interact with each other (unless the scene
consists of separate local environments with no mutual vis- 5. Run-time component
ibility to each other). For this reason, we aggressively com-
press the matrices during the computation: after each com- The run-time component of our method works as follows:
putation of T n , we apply non-linear compression to the re- 1. Project direct lighting on the wavelet basis defined on the
sult, removing all coefficients below a certain threshold. surfaces of the scene (section 5.1).
Because of the compression, the number of coefficients in 2. Use the precomputed GTO to transform the projected di-
T n decreases when n increases (see Figure 6). We stop the rect light to the converged incident radiance (section 5.2).
computation when all the coefficients in T n are smaller than 3. Transform incident radiance into outgoing radiance by
our threshold, which in our experiments required up to ten applying the BRDF of the surfaces (section 5.3).
iterations. To speed-up the computations, we pre-multiply 4. Render the indirect light using the wavelet basis and com-
by the sparsest matrix, computing T n times T . bine it with direct light computed separately (section 5.4).
256
5.1. Direct Light Projection 5.2. Application of the GTO
In order to use the GTO to generate indirect lighting, we Once we have a wavelet projected representation of direct
need to project the light from the dynamic light sources to incident lighting, we multiply it by the GTO to give the con-
the 4D radiance basis defined on the surfaces of the scene. verged incident radiance:
X = GE
For each light source and for each surface of the scene,
our method proceeds as follows: where E represents the projected direct light, G is the GTO,
and X is the resulting converged incident radiance. All the
1. Estimate the level of precision required. wavelet representations above are in sparse format, so that
2. Compute all the smooth coefficients at this level of preci- only non-zero coefficients are stored.
sion, by integrating direct lighting on the support of each For efficient multiplication, it is important to take advan-
coefficient. tage of the sparseness of E: typically the direct light can be
3. Perform a wavelet transform on these smooth coeffi- expressed with a small number of wavelet coefficients, since
cients to compute the wavelet coefficients, then discard it is often either spatially localized or falling from a far away
the smooth coefficients. This generates a wavelet rep- light source, in which case only coarse basis functions are
resentation of the direct light on all the surfaces of the present in E. We perform the multiplication by considering
scene. only the non-zero element of E and accumulating the results
to X.
This projection of direct lighting onto our wavelet basis
We use the same technique to minimize the amount of
is fundamental for interactive rendering, so it is important to
dynamic memory allocations in X as we used for computing
perform these computations efficiently. Unfortunately, step
E (section 5.1).
2 involves computing direct lighting for all the smooth coef-
ficients, a costly step for arbitrary light sources.
5.3. Multiplication by the BRDF
To estimate the level of precision required (step 1), we
look at the solid angle subtended by the geometry of the ob- X represents the incident indirect radiance, and yet we need
ject, multiplied by the intensity of the light source in the di- the outgoing radiance for display. Thus, we need a final mul-
rection of the object. tiplication by the BRDF. In our implementation, we asso-
ciate a wavelet representation of the BRDF with each surface
The computation of the smooth coefficients (step 2) in- of the scene and this step simply translates into a multipli-
volves computing direct lighting in the scene, including visi- cation in wavelet space. Note that we use the same wavelet
bility between the light source and the support of the smooth representation that the oracle uses to determine the angular
coefficient. In our implementation, we tried both area light refinement (section 4.3.2).
sources and point light sources, but we found that only
point light sources were currently compatible with interac- 5.4. Rendering from the Wavelet Basis
tive framerates.
To generate the final view for the user, we first render the
For point light sources, we compute the visibility using scene representing the indirect light, and then additively
occlusion queries, i.e. we render the smooth functions from blend in the direct light using standard techniques.
the view of light source using GL_ARB_occlusion_query ex-
The indirect light is synthesized from the 4D wavelet ba-
tension of OpenGL, and estimate the solid angle each basis
sis to textures using the CPU. Then the whole scene is drawn
function subtends based on the number of visible pixels. Our
using standard texture mapping and optionally bi-linear fil-
current implementation only supports direct light projection
tering (results without this filtering can be seen in Figure 4).
to directionally smooth basis functions. This means that di-
rect light falling on a specular surface gets reflected as if the To get rid of the discontinuities that would appear between
surface was diffuse. neighboring coarse level quads, we use border texels (sup-
ported by standard graphics hardware) to ensure a smooth
To benefit from our sparse wavelet representation for sur- reconstructed result across the edges of quads.
face radiance, the elements need to be dynamically (de-
Each quad is associated with its own texture, and thus it
)allocated. To avoid an excessive amount of dynamic mem-
is possible to use a specific texture resolution for each quad.
ory management we use the following method: before pro-
In our current implementation we select the texture resolu-
jecting the direct light at each frame we set the existing al-
tion according to the maximum of the spatial and angular
located coefficients to zero. Then we project the light as de-
resolutions present in the wavelet basis.
scribed, and after the projection we de-allocate the entries
that are still null. This minimizes the amount of dynamic al- The texture synthesis is performed by traversing all non-
locations and de-allocations required during run-time. zero wavelet coefficients for a given quad. For performance,
257
100 x
GTO error
10 gathered GTO error
receiver
1
0.1
b00 0.01
y ω
b10 b01 0.001 sender
Figure 5: For texture synthesis, we traverse the wavelet
b20 hi-b11 b 0
2
0.0001
erarchy in the order shown here. We terminate ...the angu-
... ... ... 1e-05 x
lar traversal as soon as we detect that the angular sub-tree 1e-06
receiver
points away from the viewer. 1e-07
a00 b001e-08
1.8e+06
a10 1e-60 a01 b10 b01
1e-09 y ω

1.6e+06 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1
a20 4e-6
a11 a02 b20 b11 b02 senderthreshold
1.5e-5 Compression
1.4e+06 ... 6e-5
...2.4e-4 ... ... ... ... ... ...
1e-3
Figure 7: GTO error in the maze scene as a function of the
1.2e+06
threshold on wavelet coefficients.
1e+06
800000 to be used. Nevertheless, we believe it is fair to say that our

600000
method performs the precomputation faster.
400000 This acceleration comes from our algorithm’s ability to
200000
avoid the computation of unnecessary data. All the informa-
tion computed during the pre-processing step is used for the
0
1 2 3 4 5 6 7 8 9 10 runtime computation of indirect lighting. On the other hand,
Figure 6: Number of non-zero entries in DT On , as a func- our technique suffers from well-known issues in finite el-
tion of number of bounces n, depending on the threshold ement methods: we need a parameterization of our scene,
used for wavelet compression, for the maze scene. which restricted us to easily parametrizable surfaces (quads
in our current implementation).
we exploit our knowledge of the view direction and avoid Our precomputation time is dominated by the hierarchi-
traversing the subtrees of wavelets that do not point towards cal refinement to compute the DTO, while the Neumann se-
the eye: we traverse the wavelet hierarchy first in spatial or- ries evaluation to compute the GTO is relatively fast. The
der, then in angular order (see Figure 5), and terminate the threshold used for non-linear wavelet compression in the
angular traversal if we detect that the whole subtree points GTO computations has an immediate impact on the memory
away from the viewer. cost of our algorithm (see Figure 6): not using compression
in the maze scene results in approximately 2 million links
stored. As each link stores 9 to 16 wavelet coefficients in
6. Experiments and comparison floating point and in three channels, the average cost of a
We computed the GTO at different resolutions for three dif- link is 150 bytes. Thus these 2 million links correspond to
ferent scenes: the Cornell box, a maze and a simple scene approximately 300 Mb which is not practical for real-time
for testing glossy illumination (see Figure 10). The results use.
are summarized in Table 1. As can be seen, all the results Even a very small threshold (ǫ = 10−6 ) on our wavelet co-
run either in real-time or at least at interactive framerates. efficients brings the number of links down to 400,000, cor-
responding to memory cost of 60 Mb. A more aggressive
6.1. Offline component compression (ǫ = 10−4 ) further divides these numbers by 6,
bringing the memory cost to 10 Mb.
The most important result is the speed of the precomputa-
We checked the relationship between the level of wavelet
tion step. For comparison, the maze scene we used is an ex-
compression used on the GTO and the error we make on the
act replica of the scene used by Kristensen et al. [KAMJ05].
operator. We tested both the standard GTO and the “gath-
The time it takes for our method to compute the GTO on this
ered GTO”, where the GTO is premultiplied by the origi-
scene varies between 24 and 74 minutes depending on the re-
nal, fine-resolution DTO. Using the non-compressed opera-
quired quality. Kristensen et al. report a precomputation time
tor as a reference, we computed the error as a function of the
of 6.5 hours on a cluster of 32 PCs. Since their method is eas-
threshold used for compression (see Figure 7). The error on
ily parallelizable, we may assume that the performance is al-
both operators decreases regularly with the threshold, with
most linear with the number of machines, translating into a
the error on the gathered GTO being consistently smaller
total computation time of approximately 8 days using a sin-
than the error on the standard GTO.
gle PC. The comparison is based on a visual judgement. For
more exact evaluation, a numerical error metric would need We also checked the relationship between the memory
258
Table 1: Summary of the performance of our algorithm. All matrices were computed with a single 3 GHz Pentium 4.
C C H- M M H- G G H-
tDT O (precomp.) 2min 25min 23min 1h 12min 1min 9min
tGT O (precomp.) <1s 2s 40s 1min <1s 1min
FPS (run-time) 60 25 15 7 8 3
Links DTO 3477 30366 65501 288628 4260 36778
Links GTO 418 648 24151 24599 1712 34176
Links gathered GTO 14169 53100 164813 589361 44383 195037
Memory cons. in MB 1.7 6.4 19.7 70 5.3 23.4
1e+07
GTO 140 Indirect lighting
gathered GTO Projection of Direct onto Wavelet basis
Direct lighting
120
Rendering time (ms)

1e+06
100
Number of links
80
100000
60
40
10000
20
0
1000 Cornell (low) Cornell (high) Maze (low) Maze (high)
1e-8 1e-7 1e-6 1e-5 1e-4 1e-3 0.01 0.1 1 10
Error
Figure 9: Rendering times for the different steps of our run-
time component. For each scene, we tried a high resolution
Figure 8: Memory cost of the GTO as a function of the error
(moderately compressed) and a low resolution (aggressively
on the operator (maze scene).
compressed) GTO.
cost of the GTO and the error it represents (see Figure 8).
For both versions of the GTO, the error decreases as the dominate, especially when using a high quality GTO (mod-
memory cost increases. We observe that, surprisingly, the erate compression). We observe that the rendering time for
GTO outperforms the gathered GTO: for a given error, it indirect lighting is related to the number of coefficients in
always provides a more compact representation of the oper- the GTO.
ator. Even so, this compact representation does not always
translate into visual quality (see Figure 4): comparing the
two representations of the GTO with similar error levels, we 7. Conclusions and Future work
found that the gathered GTO gives better visual results. The
non-linear compression we used for computing the GTO re- In this paper, we have presented a novel algorithm for fast
moves links based on the energy they represent. However, computation of indirect lighting. Combined with a separate
the visual quality of the image is not directly linked to en- computation of direct lighting, our algorithm allows interac-
ergy levels, but also to other spatial information. tive global illumination.
Our algorithm makes use of three different contributions:
6.2. Runtime component first a new wavelet basis for efficient storage of radiance, us-
ing standard refinement to separate the angular and spatial
We analyzed the performance of our runtime component. dimensions, secondly a hierarchical precomputation method
Timings for each step can be seen on Figure 9. We tried two for PRT, and third a fast projection of direct lighting on our
different scenes, each of them with a different level of com- basis. Our method works in a top-down approach, and there-
pression of the GTO. All these rendering times correspond fore aims to only compute the information that is necessary
to observed framerates. for PRT computations.
Computations related to direct lighting only dependent on
Our main limitation is inherited from the finite element
scene complexity, as the projection of direct lighting does
methods: the elements (i.e. the basis functions) need to be
not depend on the accuracy on the GTO. Computing the pro-
mapped to the surfaces of the scene. In the simplest case,
jection of direct lighting is about as expensive as the compu-
the scene needs to initially consist of large quads as in the
tation of visible direct lighting on the GPU.
examples of this paper. Nonetheless, as a future work we
Not surprisingly, the computations related to indirect wish to study the possibility of to relieve these restrictions in
lighting (multiplying the direct lighting by the GTO, the the spirit of bi-scale radiance transfer [SLSS03]. This would
multiplication by the BRDF and conversion to textures) require an easily parameterizable coarse version of the scene
259
Direct
Indirect
Combined
C 1 C 2 M G H-

Figure 10: Example results on our test scenes. In the glossy case (right column), we show the indirect illumination non-filtered,
and the composited image using bi-linear filtering.
and a method to transfer lighting from this coarse surface to Finally, to leverage the full potential of our 4D represen-
a finer one. tation, we plan to explore run-time projection of arbitrarily
distributed area light-sources instead of point lights. This
Another direction for future work is establishing more ex-
would have to be coupled with an accurate display of di-
plicit relationship between the different compressions done
rect lighting, which could benefit from the information we
in our algorithm. In the current method, we have separate
collect during the projection.
thresholds for hierarchical refinement and the non-linear
compression used during the Neumann series computation.
It would be advantageous to only have a single threshold re- Acknowledgements
lated to the quality of GTO we want to obtain.
We would like to thank the Computer Graphics Group of
The oracle’s ability to independently refine each sub- Helsinki University of Technology, the ARTIS team, and our
space is both a strength and a challenge. The refinement anonymous reviewers for their valuable feedback. This work
heuristics we presented are not optimal, since they do not has been supported by Bitboys, Hybrid Graphics, Remedy
take into account the dependence of directional and spatial Entertainment, Anima Vitae, and the National Technology
dimensions, as explained by Durand et al. [DHS∗ 05]. For Agency of Finland.
instance, it might not make sense to link a sender with a nar-
row angular support to a spatially large receiver. We believe
that this is a very promising direction for future research and References
that clear performance improvements are possible.
[CHL04] C G., H M. J., L A.: Radiosity
Yet another idea concerns applying knowledge of run- on graphics hardware. In Graphics Interface 2004 (2004).
time importance, projected from the camera to the surfaces
of the scene. This projection could be used to speed-up the [CSSD94] C P. H., S E. J., S
GTO multiplication as we would know beforehand which D. H., DR T. D.: Wavelet Radiance. In Photore-
basis functions really contribute to the image. So we could alistic Rendering Techniques (Proc. of EG Workshop on
use only the parts of the GTO that actually have an visible Rendering) (June 1994), pp. 287–302.
effect on the result. [CSSD96] C P. H., S E. J., S
260
D. H., DR T. D.: Global Illumination of Glossy En- [NRH03] N R., R R., H P.: All-
vironments Using Wavelets and Importance. ACM Trans- frequency shadows using non-linear wavelet lighting ap-
actions on Graphics 15, 1 (Jan. 1996), 37–71. proximation. ACM Transactions on Graphics (Proc. of
[DBB03] D́ P., B K., B P.: Advanced Global SIGGRAPH 2003) 22, 3 (July 2003), 376–381.
Illumination. AK Peters, 2003. [NRH04] N R., R R., H P.: Triple
[DHS∗ 05] D F., H N., S C., C E., product wavelet integrals for all-frequency relighting.
S F. X.: A frequency analysis of light transport. ACM Transactions on Graphics (Proc. of SIGGRAPH
ACM Transactions on Graphics (Proc. of SIGGRAPH 2004) 23, 3 (Aug. 2004), 477–487.
2005) 24, 3 (Aug. 2005). [PBMH02] P T. J., B I., M W. R., H-
[DKNY95] D Y., K K., N H., Y-  P.: Ray tracing on programmable graphics hard-
 H.: A quick rendering method using basis func- ware. ACM Transactions on Graphics (Proc. of SIG-
tions for interactive lighting design. Computer Graph- GRAPH 2002) 21, 3 (July 2002), 703–712.
ics Forum (Proc. of EUROGRAPHICS 1995) 14, 3 (Sept. [PDC∗ 03] P T. J., D C., C M.,
1995). J H. W., H P.: Photon mapping on pro-
[DS06] D C., S M.: Splatting in- grammable graphics hardware. In Graphics Hardware
direct illumination. In Interactive 3D Graphics 2006 2003 (2003), pp. 41–50.
(2006). [SC97] S P., C K.: A low distortion map between
[GSCH93] G S. J., S P., C M. F., H- disk and square. Journal of Graphic Tools 2, 3 (1997),
 P.: Wavelet Radiosity. In SIGGRAPH ’93 (1993), 45–52.
pp. 221–230. [SDS96] S E. J., DR T. D., S D. H.:
[GSHG98] G G., S P., H P. M., G- Wavelets for Computer Graphics. Morgan Kaufmann,
 D. P.: The irradiance volume. IEEE Computer 1996.
Graphics and Applications 18, 2 (March/April 1998), 32– [SHHS03] S P.-P., H J., H J., S J.: Clus-
43. tered principal components for precomputed radiance
[HPB06] H M., P F., B K.: Direct-to- transfer. ACM Transactions on Graphics (Proc. of SIG-
indirect transfer for cinematic relighting. ACM Transac- GRAPH 2003) 22, 3 (2003).
tions on Graphics (Proc. of SIGGRAPH 2006) 25, 3 (Aug. [SKS02] S P.-P., K J., S J.: Precomputed
2006). radiance transfer for real-time rendering in dynamic, low-
[HSA91] H P., S D., A L.: A Rapid frequency lighting environments. ACM Transactions on
Hierarchical Radiosity Algorithm. Computer Graphics Graphics (Proc. of SIGGRAPH 2002) 21, 3 (July 2002).
(Proc. of SIGGRAPH ’91) 25, 4 (July 1991). [SLSS03] S P.-P., L X., S H.-Y., S J.: Bi-
[KAMJ05] K A. W., A-M̈ T., J scale radiance transfer. ACM Transactions on Graphics
H. W.: Precomputed local radiance transfer for real-time (Proc. of SIGGRAPH 2003) 22, 3 (2003).
lighting design. ACM Transactions on Graphics (Proc. of [SWS05] S W J. S., S P.: RPU: A pro-
SIGGRAPH 2005) 24, 3 (Aug. 2005). grammable ray processing unit for realtime ray tracing.
[Kel97] K A.: Instant radiosity. In SIGGRAPH ’97 ACM Transactions on Graphics (Proc. of SIGGRAPH
(1997), pp. 49–56. 2005) 24, 3 (Aug. 2005).
[Leh04] L J.: Foundations of Precomputed Radi- [WSB01] W I., S P., B C.: Interactive
ance Transfer. Master’s thesis, Helsinki University of distributed ray tracing of highly complex models. In Ren-
Technology, Sept. 2004. dering Techniques 2001 (Proc. EG Workshop on Render-
ing) (2001).
[LMM05] L E. B., M K.-L., M N.: Calculating
hierarchical radiosity form factors using programmable [WTL06] W R., T J., L D.: All-frequency re-
graphics hardware. Journal of Graphics Tools 10, 4 lighting of glossy objects. ACM Transactions on Graphics
(2005). (to appear) (2006).
[LSSS04] L X., S P.-P. J., S H.-Y., S J.:
All-frequency precomputed radiance transfer for glossy
objects. In Rendering Techniques (Proc. of EG Sympo-
sium on Rendering) (2004), pp. 337–344.
[NPG05] N M., P S. N., G V.: Real-time
global illumination on GPUs. Journal of Graphics Tools
10, 2 (2005).
261
Direct
Indirect
Combined
C 1 C 2 M G H-

Figure 10: Example results on our test scenes. In the glossy case (right column), we show the indirect illumination non-filtered,
and the composited image using bi-linear filtering.
262
5.
Conclusion et perspectives
Nous avons développé trois thèmes principaux dans ce mémoire : la simulation de l’éclai-
rage par des méthodes multi-échelles à éléments finis, la détermination des caractéristiques de la
fonction d’éclairage, et la simulation en temps-réel ou interactif de certains effets lumineux.
Nous allons tenter de résumer notre contribution dans chaque thème :
– En ce qui concerne la simulation de l’éclairage par éléments finis, nous avons montré
l’efficacité de la représentation hiérarchique, y compris avec des fonctions d’ondelettes
d’ordre élevé. Nous avons également montré combien les méthodes par éléments finis sont
dépendantes du maillage original, et nous avons développé une méthode pour s’affranchir
des limitations de ce maillage. Enfin, nous avons montré comment combiner les ondelettes
d’ordre élevé avec un maillage de discontinuité.
– Pour l’analyse des propriétés de la fonction d’éclairage, nous avons développé une mé-
thode pour prédire le contenu fréquentiel local de l’éclairage, pour chaque interaction, en
fonction des obstacles rencontrés. Nous avons également développé une méthode pour le
calcul des dérivées de la fonction d’éclairage.
– Enfin, dans le domaine du rendu temps-réel, nous avons développé des méthodes pour la si-
mulation en temps réel de certains effets lumineux : ombres douces, réflexions spéculaires,
éclairage indirect.
Ces travaux ont rencontré plusieurs limitations ou difficultés, et posent un certain nombre de
problèmes intéressants à résoudre.
Ainsi, les méthodes par éléments finis, même hiérarchiques, sont fortement liées au maillage
employé pour représenter la scène. Cette limitation est inhérente à la représentation par éléments
finis, même si plusieurs méthodes ont été développées qui permettent de s’en affranchir partiel-
lement (clustering, face-clustering, instantiation, virtual mesh...). D’autre part, une partie trop
importante du temps de calcul est consacrée à des effets qui sont importants pour l’aspect visuel
de la scène, comme les frontières d’ombre, mais moins importants pour le calcul de l’éclairage
indirect.
L’analyse fréquentielle de la fonction d’éclairage ouvre la voie à de nombreuses études fu-
tures. Nous avons développé un outil pour la prédiction du comportement fréquentiel de l’éclai-
rage en chaque point. Il reste beaucoup de recherches à faire dans ce domaine, à la fois pour le
calcul effectif du contenu fréquentiel dans la scène et pour l’emploi de ces fréquences dans les
méthodes de simulation de l’éclairage. Il est important que le gain en temps de calcul grâce à
l’utilisation du contenu fréquentiel soit supérieur au temps pris pour calculer ce contenu fréquen-
tiel.
Enfin, l’utilisation de cartes graphiques pour la simulation d’effets lumineux est un domaine
très prometteur. En simulant certains effets lumineux sur la carte graphique, il est possible d’aug-
menter le réalisme des simulations d’éclairage tout en diminuant le temps de calcul. En même
263
264 CHAPITRE 5. CONCLUSION ET PERSPECTIVES
temps, ces cartes programmables ont des limitations : elles fonctionnent sur un modèle SIMD,
sans communications possibles entre les différents processeurs, avec un nombre limité d’instruc-
tions... Les algorithmes qui pourront le mieux exploiter la puissance de ces cartes seront ceux qui
s’adaptent à ces limitations. En général, on pourra plutôt les utiliser pour simuler des phénomènes
locaux ou semi-locaux.
5.1 Perspectives
Notre but, pour nos travaux futurs, est d’obtenir une simulation photoréaliste de l’éclairage
global dans une scène quelconque, en temps-réel. Pour atteindre ce but, nous allons poursuivre
plusieurs directions de recherche :
– D’une part, il est nécessaire de pouvoir simuler l’ensemble des phénomènes liés à l’éclai-
rage direct en temps-réel. Les calculs d’éclairage direct avec des fonctions de réflectance
quelconques et des sources étendues, les calculs d’ombre douce, les réflexions sur des sur-
faces semi-spéculaires sont quelques uns des phénomènes locaux ou semi-locaux que nous
voulons pouvoir simuler de façon photoréaliste en temps-réel.
– D’autre part, un certain nombre de phénomènes liés à l’éclairage indirect ne sont obser-
vables qu’à proximité immédiate des objets, comme l’occlusion ambiante causée par un
objet, ou les réflexions causées par des BRDF semi-diffuses. Pour ces phénomènes, il fau-
drait attacher aux objets mobiles une zone d’influence, à l’intérieur de laquelle on cal-
culerait l’effet. Cette zone d’influence pourrait porter un certain nombre de coefficients
pré-calculés pour la simulation.
– Dans la simulation de l’éclairage, on dispose généralement de plusieurs algorithmes pour
simuler un effet donné, ou d’un ensemble de paramètres pour un algorithme donné. On
a donc des choix à faire, et pour guider ces choix, nous proposons d’utiliser notre ana-
lyse fréquentielle de l’éclairage. On pourrait alors choisir l’algorithme le mieux adapté, ou
encore limiter l’échantillonnage pour les phénomènes à basse fréquence.
– Cette analyse fréquentielle de l’éclairage a également des applications pour les simulations
offline de l’éclairage. Notre approche devrait permettre de guider des calculs de simulation
de l’éclairage par lancer de photons, ou encore d’adapter l’échantillonnage spatial dans les
méthodes de Precomputed Radiance Transfer.
Liste des publications
Journaux internationaux
[1] Mattias Malmer, Fredrik Malmer, Ulf Assarsson et Nicolas Holzschuch. Fast Precomputed
Ambient Occlusion for Proximity Shadows. Journal of Graphics Tools, 2007. (à paraître).
[2] Lionel Atty, Nicolas Holzschuch, Marc Lapierre, Jean-Marc Hasenfratz, Chuck Hansen et
François Sillion. Soft shadow maps: Efficient sampling of light source visibility. Computer
Graphics Forum, vol. 25, no 4, décembre 2006.
[3] David Roger and Nicolas Holzschuch. Accurate specular reflections in real-time. Computer
Graphics Forum (Proceedings of Eurographics 2006), vol. 25, no 3, septembre 2006.
[4] Aurélien Martinet, Cyril Soler, Nicolas Holzschuch et François Sillion. Accurate detection
of symmetries in 3D shapes. ACM Transactions on Graphics, vol. 25, no 2, avril 2006.
[5] Frédo Durand, Nicolas Holzschuch, Cyril Soler, Eric Chan et François Sillion. A frequency
analysis of light transport. ACM Transactions on Graphics (Proceedings of SIGGRAPH
2005), vol. 24, no 3, août 2005.
[6] Cyrille Damez, Nicolas Holzschuch et François Sillion. Space-time hierarchical radiosity
with clustering and higher-order wavelets. Computer Graphics Forum, vol. 23, no 2, avril
2004.
[7] Jean-Marc Hasenfratz, Marc Lapierre, Nicolas Holzschuch et François Sillion. A survey
of real-time soft shadows algorithms. Computer Graphics Forum, vol. 22, no 4, décembre
2003.
[8] François Cuny, Laurent Alonso et Nicolas Holzschuch, A novel approach makes higher
order wavelets really efficient for radiosity, Computer Graphics Forum (Proceedings of Eu-
rographics 2000), vol. 19, no 3, septembre 2000.
[9] Laurent Alonso et Nicolas Holzschuch, Using graphics hardware to speed-up your visibility
queries, Journal of Graphics Tools, vol. 5, no 2, avril 2000.
[10] François Cuny, Laurent Alonso, Christophe Winkler et Nicolas Holzschuch, Radiosité à
base d’ondelettes sur des mailles quelconques, Revue internationale de CFAO et d’informa-
tique graphique, vol. 14, no 1, octobre 1999.
[11] Nicolas Holzschuch et François Sillion, An exhaustive error-bounding algorithm for hierar-
chical radiosity, Computer Graphics Forum, vol. 17, no 4, décembre 1998.
265
266 LISTE DES PUBLICATIONS
Conférences ou workshops internationaux avec comité de

lecture et publication des actes
[12] Janne Kontkanen, Emmanuel Turquin, Nicolas Holzschuch et François Sillion. Wavelet
radiance transport for interactive indirect lighting. In Rendering Techniques 2006 (Euro-
graphics Symposium on Rendering), juin 2006.
[13] Nicolas Holzschuch and Laurent Alonso. Combining higher-order wavelets and discon-
tinuity meshing: a compact representation for radiosity. In Rendering Techniques 2004
(Eurographics Symposium on Rendering), juin 2004.
[14] Nicolas Holzschuch, François Cuny et Laurent Alonso, Wavelet Radiosity on Arbitrary
Planar Surfaces. In Rendering Techniques 2000 (Eurographics Workshop on Rendering),
juin 2000.
[15] Nicolas Holzschuch et François Sillion, Accurate Computation of the Radiosity Gradient
for Constant and Linear Emitters. In Rendering Techniques ’95 (Eurographics Workshop
on Rendering), juin 1995.
[16] Nicolas Holzschuch, François Sillion et George Drettakis, An Efficient Progressive Refine-
ment Strategy for Hierarchical Radiosity, dans Photorealistic Rendering Techniques (Euro-
graphics Workshop on Rendering), juin 1994.
Conférences sans comité de lecture ou sans publication des

actes
[17] David Roger and Nicolas Holzschuch. Réflexions spéculaires en temps réel sur des surfaces
lisses. In AFIG ’05 (Actes des 18èmes journées de l’AFIG), novembre 2005.
[18] Aurélien Martinet, Cyril Soler, Nicolas Holzschuch et François Sillion. Accurately detec-
ting symmetries of 3D shapes. In Symposium on Geometry Processing 2005, Poster Session,
juin 2005.
[19] Aurélien Martinet, Cyril Soler, Nicolas Holzschuch et François Sillion. Organisation au-
tomatique de scenes 3D. In AFIG ’04 (Actes des 17èmes journées de l’AFIG), novembre
2004.
[20] Jean-Marc Hasenfratz, Marc Lapierre, Nicolas Holzschuch et François Sillion. A survey
of real-time soft shadows algorithms. In Eurographics State-of-The-Art Reports, septembre
2003.
[21] Cyrille Damez, Nicolas Holzschuch et François Sillion, Space-Time Hierarchical Radiosity
with Clustering and Higher-Order Wavelets, dans Eurographics 2001 Short Presentations,
Manchester, Royaume-Uni, septembre 2001.
[22] François Cuny, Laurent Alonso, Christophe Winkler et Nicolas Holzschuch, Gestion effi-
cace des polygones complexes pour la radiosité. In AFIG 1998 (Actes des 6e journées de
l’AFIG), novembre 1998.
[23] Nicolas Holzschuch, Current trends in research in visualisation, Intramural Conference on
Digital Imaging, University of Cape Town, Cape Town, 1996.
[24] Nicolas Holzschuch, Un critère de raffinement efficace pour la radiosité et la radiosité hié-
rarchique, poster aux Journées Jacques Cartier, Saint-Etienne, 1995.
[25] Nicolas Holzschuch, Les ondelettes en synthèse d’image, Colloquium IMAG sur les onde-
lettes, IMAG, Grenoble, 1994.
[26] Nicolas Holzschuch, Vers une radiosité évoluant en temps réel, Journées Groplan, Nantes,
novembre 1992.
LISTE DES PUBLICATIONS 267
Rapports internes et articles soumis

[27] Mattias Malmer, Fredrik Malmer, Ulf Assarsson et Nicolas Holzschuch. Fast Precompu-
ted Ambient Occlusion for Proximity Shadows. Raport de recherche RR-5779, INRIA,
décembre 2005.
[28] Lionel Atty, Nicolas Holzschuch, Marc Lapierre, Jean-Marc Hasenfratz, Chuck Hansen et
François Sillion. Soft shadow maps: Efficient sampling of light source visibility. Raport de
recherche RR-5750, INRIA, novembre 2005.
[29] Aurélien Martinet, Cyril Soler, Nicolas Holzschuch et François Sillion. Accurately detec-
ting symmetries of 3D shapes. Raport de recherche RR-5692, INRIA, septembre 2005.
[30] Nicolas Holzschuch, Le Contrôle de l’erreur dans la méthode de radiosité hiérarchique,
Thèse, Université Joseph Fourier – Grenoble I, mars 1996.
[31] Nicolas Holzschuch, Synthèse d’images et radiosité hiérarchique, mémoire de Magistère,
Magistère de Mathématiques Fondamentales et Appliquées et d’Informatique, École Nor-
male Supérieure, Université Paris XI, novembre 1994.
[32] Nicolas Holzschuch, Vers une radiosité évoluant en temps réel, memoire de DEA, DEA
Informatique, Mathématique et Applications, Université Paris XI, septembre 1992.

Simulation Photoréaliste de L'éclairage en Synthèse D'images

Transféré par

Droits d'auteur :

Formats disponibles

Simulation Photoréaliste de L'éclairage en Synthèse D'images

Transféré par

Informations du document

Titre original

Copyright

Formats disponibles

Partager ce document

Partager ou intégrer le document

Options de partage

Avez-vous trouvé ce document utile ?

Ce contenu est-il inapproprié ?

Droits d'auteur :

Formats disponibles

Simulation Photoréaliste de L'éclairage en Synthèse D'images

Transféré par

Droits d'auteur :

Formats disponibles

Simulation Photoréaliste de l’éclairage en Synthèse

To cite this version:

HAL Id: tel-00379199

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est

Mémoire présenté pour l’obtention de l’Habilitation à Diriger des Recherches de

2 Modélisation multi-échelles de l’éclairage 11

3 Propriétés de la fonction d’éclairage 123

3.4.4 A Frequency Analysis of Light Transport (Siggraph 2005) . . . . . . . . 164

4 Utilisation des cartes graphiques programmables 177

5 Conclusion et perspectives 263

Liste des publications 265

2.1 Images de modèles architecturaux calculées avec la méthode de radiosité . . . . . 12

3.1 Réflexion est plus ou moins nette en fonction de la BRDF . . . . . . . . . . . . . 124

4.1 Calcul d’ombres douces par discrétisation des obstacles . . . . . . . . . . . . . . 179

1.1 Structure du mémoire

2.1 La méthode de radiosité hiérarchique

La radiance en un point x dans la direction (θ0 , φ0 ) est simplement la somme de la radiance

(a) Tholos, Delphe (b) Place Stanislas, Nancy

La taille de la matrice M est n2 , où n est le nombre de facettes issues de la discrétisation de la

la scène en fonction de la radiosité d’une seule facette (shooting).

2.2 Analyse de l’algorithme de radiosité hiérarchique

2.3 Efficacité de la hiérarchie

(a) Triangulé (b) Non-triangulé (notre algorithme)

Figure 2.5 – Calcul de radiosité sur des surfaces planes quelconques.

En collaboration avec le doctorant François Cuny (co-encadré par Jean-Claude Paul et

2.4 Ondelettes d’ordre élevé

2.4.1 Améliorations de l’algorithme

(a) Haar (b) M 2 (c) M 3

2.4.2 Maillage de discontinuités et ondelettes d’ordre élevé

(a) (b) (c)

2.4.3 Radiosité spatio-temporelle

2.5 Structure de la scène

A detailed study of the performance of hierarchical radiosity is presented,

Table 1. Description of the ve test scenes.

archical algorithm. As postulated in previous work [9, 6], visibility computation

visibility calculations dominate the computation in the hierarchical algorithm2.

2.2 Initial Linking

Name n Total Time Initial Linking

Fig. 2. Percentage of links for which BF does not exceed "refine .

3 Lazy Evaluation of the Top-level Interactions

In this section a modi cation of the hierarchical algorithm is proposed, which

linking criterion is met during the course of the simulation.

3.1 Description of the Algorithm

Fig. 3. The Original Algorithm

Fig. 4. Pseudo-code listing for our algorithm

In the examples shown below we used "link = "refine =5.

3.2 Energy Balance

Fig. 5. Incorrect Energy EET =ET

4 Reducing the Number of Links

max = maxi (Fp ;q )

and we re ne p if any of the following is true:

The decision to cancel the subdivision of a link is based purely on geometrical

Fig. 6. Percentage of nodes and links left after reduction.

Fig. 7. Percentage of computation time using link reduction.

Name n Original Algorithm with Lazy Linking: : : and Reduction

6 Conclusions and Discussion

We have presented the results of an experimental study conducted on a variety

1. The Original Algorithm 2. The Grid Produced

3. With Lazy Linking 4. Di . Between 1. and 3. (8)

9. Full Oce 10. East Room

it must enclose the original shape,