Physics of Light & Optics

Physics of Light and Optics
Justin Peatross Michael Ware Brigham Young University
August 17, 2009
Preface
This book provides an introduction to the eld of optics from a physics perspective. It focuses primarily on the wave and ray descriptions of light, but also includes a brief introduction to the quantum description of light. Topics covered include reection and transmission at boundaries, dispersion, polarization effects, diffraction, coherence, ray optics and imaging, the propagation of light in matter, and the quantum nature of light. The text is designed for upper-level undergraduate students with a physics background. It assumes that the student already has a basic background with complex numbers, vector calculus, and Fourier transforms, but a brief review of some of these mathematical tools is provided in Chapter 0. The main development of the book begins in Chapter 1 with Maxwells equations. Subsequent chapters build on this foundation to develop the wave and ray descriptions of classical optics. The nal two chapters of the book demonstrate the incomplete nature of classical optics and provide a brief introduction to quantum optics. A collection of electronic material related to the text is available at optics.byu.edu, including videos of students performing the lab assignments found in the book. This curriculum was developed for a senior-level optics course at Brigham Young University. While the authors retain the copyright, we have made the book available electronically (at no cost) at optics.byu.edu. This site also provides a link to purchase a bound copy of the book for the cost of printing. The authors may be contacted via e-mail at [email protected]. We enjoy hearing reports of how the book is used, and welcome constructive feedback. The text is revised regularly, and the title page indicates the date of the last revision. We would like to thank all those who have helped improve this material. We especially thank John Colton, Bret Hess, and Harold Stokes for their careful review and extensive suggestions. This curriculum benets from a CCLI grant from the National Science Foundation Division of Undergraduate Education (DUE9952773).
iii
Contents
Preface Table of Contents 0 Mathematical Tools 0.1 Introduction . . . . . . . . . . . . . . . . 0.2 Vector Calculus . . . . . . . . . . . . . . 0.3 Complex Numbers . . . . . . . . . . . . 0.4 Fourier Theory . . . . . . . . . . . . . . 0.5 Linear Algebra and Sylvesters Theorem Appendix 0.A Integral and Sum Table . . . Exercises . . . . . . . . . . . . . . . . . . . . . iii v 1 . 1 . 1 . 4 . 7 . 11 . 14 . 15 21 21 22 23 24 25 26 28 30 30 33 37 37 38 41 43 46 47 49 51
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
. . . . . . .
1 Electromagnetic Phenomena 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Gausss Law and Coulombs Law . . . . . . . . . . . . . . . . . . . . . 1.3 The Lorentz Force, Biot-Savart Law, and Gausss Law for Magnetic Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Faradays Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Amperes Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6 Maxwells Adjustment to Amperes Law . . . . . . . . . . . . . . . . . 1.7 Polarization of Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 1.8 The Macroscopic Maxwell Equations . . . . . . . . . . . . . . . . . . 1.9 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Plane Waves and Refractive Index 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Plane Wave Solutions to the Wave Equation . . . . . . 2.3 Index of Refraction in Dielectrics . . . . . . . . . . . . . 2.4 The Lorentz Model of Dielectrics . . . . . . . . . . . . . 2.5 Conductor Model of Refractive Index and Absorption . 2.6 Poyntings Theorem . . . . . . . . . . . . . . . . . . . . . 2.7 Irradiance of a Plane Wave . . . . . . . . . . . . . . . . . Appendix 2.A Energy Density of Electric Fields . . . . . . . v
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
vi
CONTENTS
Appendix 2.B Energy Density of Magnetic Fields . . . . . . . . . . . . . . 52 Appendix 2.C Radiometry Versus Photometry . . . . . . . . . . . . . . . . 53 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3 Reection and Refraction 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Refraction at an Interface . . . . . . . . . . . . . . . . . . . 3.3 The Fresnel Coefcients . . . . . . . . . . . . . . . . . . . . 3.4 Reectance and Transmittance . . . . . . . . . . . . . . . . 3.5 Brewsters Angle . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Total Internal Reection . . . . . . . . . . . . . . . . . . . . 3.7 Reection from Metallic or other Absorptive Surfaces . . . Appendix 3.A Boundary Conditions For Fields at an Interface Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Polarization 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . 4.2 Linear, Circular, and Elliptical Polarization . . . . . 4.3 Jones Vectors for Representing Polarization . . . . . 4.4 Elliptically Polarized Light . . . . . . . . . . . . . . . 4.5 Linear Polarizers and Jones Matrices . . . . . . . . . 4.6 Jones Matrix for Polarizers at Arbitrary Angles . . . 4.7 Jones Matrices for Wave Plates . . . . . . . . . . . . 4.8 Polarization Effects of Reection and Transmission 4.9 Ellipsometry . . . . . . . . . . . . . . . . . . . . . . . Appendix 4.A Partially Polarized Light . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 59 60 63 65 67 68 70 71 74 77 77 78 79 81 82 85 88 90 93 93 100
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
5 Light Propagation in Crystals 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2 Constitutive Relation in Crystals . . . . . . . . . . . . . . . . . 5.3 Plane Wave Propagation in Crystals . . . . . . . . . . . . . . . 5.4 Fresnels Equation . . . . . . . . . . . . . . . . . . . . . . . . . 5.5 Polarization in Crystals . . . . . . . . . . . . . . . . . . . . . . . 5.6 Biaxial and Uniaxial Crystals . . . . . . . . . . . . . . . . . . . 5.7 Refraction at a Crystal Surface . . . . . . . . . . . . . . . . . . 5.8 Poynting Vector in a Uniaxial Crystal . . . . . . . . . . . . . . . Appendix 5.A Rotation of Coordinates . . . . . . . . . . . . . . . . Appendix 5.B Huygens Elliptical Construct for a Uniaxial Crystal Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Review, Chapters 15
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
105 . 105 . 106 . 108 . 111 . 112 . 115 . 117 . 119 . 121 . 123 . 126 129
2004-2009 Peatross and Ware
CONTENTS
vii
6 Multiple Parallel Interfaces 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Double Boundary Problem Solved Using Fresnel Coefcients . . 6.3 Double Boundary Problem at Sub Critical Angles . . . . . . . . . 6.4 Beyond Critical Angle: Tunneling of Evanescent Waves . . . . . . 6.5 Fabry-Perot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.6 Setup of a Fabry-Perot Instrument . . . . . . . . . . . . . . . . . . 6.7 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument 6.8 Multilayer Coatings . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.9 Repeated Multilayer Stacks . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Superposition of Quasi-Parallel Plane Waves 7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Intensity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Group vs. Phase Velocity: Sum of Two Plane Waves . . . . . . . 7.4 Frequency Spectrum of Light . . . . . . . . . . . . . . . . . . . . 7.5 Group Delay of a Wave Packet . . . . . . . . . . . . . . . . . . . . 7.6 Quadratic Dispersion . . . . . . . . . . . . . . . . . . . . . . . . . 7.7 Generalized Context for Group Delay . . . . . . . . . . . . . . . Appendix 7.A Causality and Exchange of Energy with the Medium Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Coherence Theory 8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 8.2 Michelson Interferometer . . . . . . . . . . . . . . . . . . 8.3 Temporal Coherence . . . . . . . . . . . . . . . . . . . . . 8.4 Fringe Visibility and Coherence Length . . . . . . . . . . 8.5 Fourier Spectroscopy . . . . . . . . . . . . . . . . . . . . . 8.6 Youngs Two-Slit Setup and Spatial Coherence . . . . . . Appendix 8.A Spatial Coherence with a Continuous Source Appendix 8.B The van Cittert-Zernike Theorem . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Review, Chapters 68 9 Light as Rays 9.1 Introduction . . . . . . . . . . . . . . . . . . . . 9.2 The Eikonal Equation . . . . . . . . . . . . . . 9.3 Fermats Principle . . . . . . . . . . . . . . . . . 9.4 Paraxial Rays and ABCD Matrices . . . . . . . 9.5 Reection and Refraction at Curved Surfaces . 9.6 Image Formation by Mirrors and Lenses . . . 9.7 Image Formation by Complex Optical Systems 9.8 Stability of Laser Cavities . . . . . . . . . . . .
. . . . . . . . . .
137 . 137 . 138 . 142 . 145 . 147 . 150 . 152 . 157 . 161 . 163 169 169 171 172 175 180 182 185 189 195 199 199 200 202 204 206 208 213 215 218 223 229 229 231 233 238 240 244 246 248
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
viii
CONTENTS
9.9 Aberrations and Ray Tracing . . . . . . . . . . . . . . . . . . . . . . . 251 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256 10 Diffraction 10.1 Huygens Principle . . . . . . . . . . . . . . . . . . . . . . . 10.2 Scalar Diffraction . . . . . . . . . . . . . . . . . . . . . . . . 10.3 Babinets Principle . . . . . . . . . . . . . . . . . . . . . . . 10.4 Fresnel Approximation . . . . . . . . . . . . . . . . . . . . . 10.5 Fraunhofer Approximation . . . . . . . . . . . . . . . . . . 10.6 Diffraction with Cylindrical Symmetry . . . . . . . . . . . Appendix 10.A Signicance of the Scalar Wave Approximation Appendix 10.B Fresnel-Kirchhoff Diffraction Formula . . . . . Appendix 10.C Greens Theorem . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 Diffraction Applications 11.1 Introduction . . . . . . . . . . . . . . . . . 11.2 Diffraction of a Gaussian Field Prole . . 11.3 Gaussian Laser Beams . . . . . . . . . . . 11.4 Fraunhofer Diffraction Through a Lens . 11.5 Resolution of a Telescope . . . . . . . . . 11.6 The Array Theorem . . . . . . . . . . . . . 11.7 Diffraction Grating . . . . . . . . . . . . . 11.8 Spectrometers . . . . . . . . . . . . . . . . Appendix 11.A ABCD Law for Gaussian Beams Exercises . . . . . . . . . . . . . . . . . . . . . . Review, Chapters 911 12 Interferograms and Holography 12.1 Introduction . . . . . . . . . . . . . . . . 12.2 Interferograms . . . . . . . . . . . . . . 12.3 Testing Optical Components . . . . . . 12.4 Generating Holograms . . . . . . . . . . 12.5 Holographic Wavefront Reconstruction Exercises . . . . . . . . . . . . . . . . . . . . . 261 . 261 . 263 . 265 . 266 . 267 . 269 . 271 . 271 . 275 . 277 281 281 282 284 285 290 293 295 297 299 303 313 319 319 319 321 324 325 329
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
13 Blackbody Radiation 331 13.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331 13.2 Failure of the Equipartition Principle . . . . . . . . . . . . . . . . . . 333 13.3 Plancks Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 13.4 Einsteins A and B Coefcients . . . . . . . . . . . . . . . . . . . . . . 338 Appendix 13.A Thermodynamic Derivation of the Stefan-Boltzmann Law 339 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342 References 345
CONTENTS
ix
Index Physical Constants
347 350
Chapter 0
Mathematical Tools
0.1 Introduction
Optics is an exciting area of study, but (as with most areas of physics) it requires a variety of mathematical tools to be fully appreciated. This chapter reviews a few of the needed mathematical skills. This is not a comprehensive review, but rather a few selected topics that we have found that students often struggle with. We assume that the student already has a basic understanding of differentiation, integration, and standard trigonometric and algebraic manipulation. Students can usually just begin with Chapter 1. The text will refer back to sections in this chapter when it may be helpful to brush-up on some particular technique that is useful to understanding the topic at hand. The topics in this chapter are essentially in the order they will be encountered in the text. Section 0.2 is an overview of vector calculus and related theorems, which are used extensively in electromagnetic theory. It is not essential to be well versed in all of the material presented in section 0.2 (since it is only occasionally needed in homework problems). However, vector calculus is invoked frequently throughout this book, and students will more fully appreciate the connection between electromagnetic principles and optical phenomena when they are comfortable with vector calculus. Section 0.3 reviews complex arithmetic, and students need to know this material by heart. Section 0.4 is an introduction to Fourier theory. Fourier transforms are used extensively in this course beginning with chapter 7. The presentation below is sufciently comprehensive for the student who encounters Fourier transforms here for the rst time, and such a student is strongly advised to study this section before starting chapter 7.
0.2 Vector Calculus

In optics we are concerned primarily with electromagnetic elds that are dened throughout space. Each position in space corresponds to a unique vector r + yy + zz , where x , y , and z are unit vectors of length one, pointing along xx their respective axes. Electric and magnetic elds are vectors whose magnitude 1
Chapter 0 Mathematical Tools
and direction can depend on position, as denoted by E (r) or B (r). An example of such a eld is E (r) = q (r r0 ) 4 0 |r r0 |3 , which is the static electric eld surrounding a point charge located at position r0 . The absolute value brackets indicate the magnitude (length) of the vector given by + y y0 y + (z z 0 ) z | r r0 | = ( x x 0 ) x = (x x 0 )2 + y y 0
2
+ ( z z 0 )2
(0.1)
In addition to space, the electric and magnetic elds almost always depend on time in optics. For example, a time-dependent eld common in optics is E(r, t ) = E0 exp{i (k r t )}, where (as discussed above) physicists have the agreement in advance that only the real part of this expression corresponds to the actual eld. The dot product k r is an example of vector multiplication, and signies the following operation: + ky y + kz z xx + yy + zz k r = kx x = kx x + k y y + kz z = |k||r| cos where is the angle between the vectors k and r. Another type of vector multiplication is the cross product, which is accomplished in the following manner: x Ex Bx y Ey By z Ez Bz (0.2)
EB =
(0.3)
(E x B z E z B x ) y + Ex B y E y Bx z = E y Bz Ez B y x Note that the cross product results in a vector, whereas the dot product results in a scalar. We will use several multidimensional derivatives in our study. The vector rst derivatives are the gradient, the divergence, and the curl. In Cartesian coordinates, the gradient of a scalar is given by f x , y, z = the divergence is given by E = the curl is given by x /x Ex y / y Ey E y z /z Ez E z E x E x + y z x z x y
f f f + + x y z x y z
(0.4)
E x E y E z + + x y z
(0.5)
E =
(0.6) E y
E z y z
0.2 Vector Calculus
We will need a number of vector second derivatives when we study the wave equations describing light. (Vector second derivatives result from various combinations of vector rst derivatives.) For example, the scalar Laplacian is dened as the divergance of a gradient 2 f x , y , z f x , y , z and in cartesian coordinates is given by 2 f x , y , z = 2 f 2 f 2 f + + x 2 y 2 z 2 (0.8) (0.7)
We will also use the vector Laplacian dened by 2 E ( E) ( E) which takes on a simple form in cartesian coordinates 2 E = 2 E y 2 E y 2 E y 2 E x 2 E x 2 E x + + x + + + y x 2 y 2 z 2 x 2 y 2 z 2 2 E z 2 E z 2 E z + z + + x 2 y 2 z 2 (0.9)
(0.10)
(All of the multidimensional derivatives take on more complicated forms in noncartesian coordinates.) We will also encounter several integral theorems involving vector functions in the course of this book. The divergence theorem for a vector function f is da = fn
S V
f dv
(0.11)
The integration on the left-hand side is over the closed surface S , which contains the volume V associated with the integration on the right hand side. The unit points normal to the surface. The divergence theorem is especially useful vector n in connection with Gausss law, where the left hand side is interpreted as the number of eld lines exiting a closed surface. Another important theorem is Stokes theorem: da = fn
S C
fd
(0.12)
The integration on the left hand side is over an open surface S (not enclosing a volume). The integration on the right hand side is around the edge of the surface. is a unit vector that always points normal to the surface. The vector d Again, n points along the curve C that bounds the surface S . If the ngers of your right hand point in the direction of integration around C , then your thumb points . Stokes theorem is especially useful in connection with in the direction of n
Amperes law and Faradays law. The right-hand side is an integration of a eld around a loop. The following vector integral theorem will also be useful: f g + g f dv =
V S
da f gn
(0.13)
0.3 Complex Numbers

In optics, it is often convenient to represent electromagnetic wave phenomena as a superposition of sinusoidal functions having the form A cos (x + ), where x represents a variable, and A and represent parameters. The sine function is intrinsically present in this formula through the identity cos (x + ) = cos x cos sin x sin (0.14)
The student of optics should retain this formula in memory, as well as the frequently used identity sin (x + ) = sin x cos + sin cos x (0.15)
With a basic familiarity with trigonometry, one can approach many optical problems including those involving the addition of multiple waves. However, the manipulation of trigonometric functions via identities (0.14) and (0.15) is often cumbersome and tedious. Fortunately, complex notation offers an equivalent approach with far less busy work. One could avoid using complex notation in the study of optics, and this may seem appealing to the student who is unfamiliar with its use. Such a student might opt to pursue all problems using sines, cosines, and real exponents, together with large quantities of trigonometric identities. This, however, would be far more effort than the modest investment needed to become comfortable with the use of complex notation. Optics problems can become cumbersome enough even with the complex notation, so keep in mind that it could be far more messy! The convenience of complex notation has its origins in Eulers formula: e i = cos + i sin where i = 1. Eulers formula can be proven using Taylors expansion: 1 df (x x 0 ) 1! dx +
x =x 0
(0.16)
f (x ) = f (x 0 ) +
1 d2 f ( x x 0 )2 2! d x2
+
x =x 0
(0.17)
By expanding each function appearing in (0.16) in a Taylors series about the origin we obtain 2 4 + 2! 4! 3 5 i sin = i i +i 3! 5! 2 3 4 5 ei = 1 + i i + +i 2! 3! 4! 5! cos = 1
(0.18)
0.3 Complex Numbers
The last line of (0.18) is seen to be the sum of the rst two lines, from which Eulers formula directly follows. By inverting Eulers formula (0.16) we can obtain the following representation of the cosine and sine functions: cos = e i + e i , 2 e i e i sin = 2i
(0.19)
This representation shows how ordinary sines and cosines are intimately related to hyperbolic cosines and hyperbolic sines. If happens to be imaginary such that = i where is real, then we have e e = i sinh 2i e +e cos i = = cosh 2 sin i =
(0.20)
There are several situations in optics where one is interested in a complex angle, = + i where and are real numbers. For example, the solution to the wave equation when absorption or amplication takes place contains an exponential with a complex argument. In this case, the imaginary part of introduces exponential decay or growth as is apparent upon examination of (0.19). Another important situation occurs when one attempts to calculate the transmission angle for light incident upon a surface beyond the critical angle for total internal reection. In this case, it is necessary to compute the arcsine of a number greater than one in an effort to satisfy Snells law. Even though such an angle does not exist in the usual sense, a complex value for can be found which satises (0.19). The complex value for the angle is useful in computing the characteristics of the evanescent wave on the transmitted side of the surface. As was mentioned previously, we will be interested in waves of the form A cos (x + ). We can use complex notation to represent this wave simply by writing ix A cos (x + ) = Re Ae (0.21)
where the phase is conveniently contained within the complex factor A i Ae . The operation Re {} means to retain only the real part of the argument without regard for the imaginary part. As an example, we have Re {1 + 2i } = 1. The expression (0.21) is a direct result of Eulers equation (0.16). It is conventional in the study of optics to omit the explicit writing of Re {}. i x actually means A cos (x + ) (or A cos cos x Thus, physicists agree that Ae A sin sin x via (0.14)). This laziness is permissible because it is possible to perform linear operations on Re f such as addition, differentiation, or integration
while procrastinating the taking of the real part until the end: Re f + Re g = Re f + g d df Re f = Re dx dx Re f d x = Re f dx (0.22)
As an example, note that Re {1 + 2i } + Re {3 + 4i } = Re {(1 + 2i ) + (3 + 4i )} = 4. However, one must be careful when performing other operations such as multiplication. In this case, it is essential to take the real parts before performing the operation. Notice that Re f Re g = Re f g (0.23)
As an example, we see Re {1 + 2i } Re {3 + 4i } = 3, but Re {(1 + 2i ) (3 + 4i )} = 5. When dealing with complex numbers it is often advantageous to transform between a Cartesian representation and a polar representation. With the aid of Eulers formula, it is possible to transform any complex number a + i b into the form r e i , where a , b , , and are real. From (0.16), the required connection between , and (a , b ) is e i = cos + i sin = a + i b (0.24)
Figure 1 A number in the complex plane can be represented either by Cartesian or polar coordinates.
The real and imaginary parts of this equation must separately be equal. Thus, we have a = cos (0.25) b = sin These equations can be inverted to yield = a2 + b2 b a (0.26) (a > 0)
= tan1
When a < 0, we must adjust by since the arctangent has a range only from /2 to /2. The transformations in (0.25) and (0.26) have a clear geometrical interpretation in the complex plane, and this makes it easier to remember them. They are just the usual connections between Cartesian and polar coordinates. As seen in Fig. 1, is the hypotenuse of a right triangle having legs with lengths a and b , and is the angle that the hypotenuse makes with the x -axis. Again, students should be careful when a is negative since the arctangent is dened in quadrants I and IV. An easy way to deal with the situation of a negative a is to factor the minus sign out before proceeding (i.e. a + i b = (a i b ) ). Then the transformation is made on a i b where a is positive. The minus sign out in front is just carried along unaffected and can be factored back in at the end. Notice that e i is the same as e i () .
0.4 Fourier Theory
Finally, we consider the concept of a complex conjugate. The conjugate of a complex number z = a + i b is denoted with an asterisk and amounts to changing the sign on the imaginary part of the number: z = ( a + i b ) a i b (0.27)
The complex conjugate is useful when computing the magnitude as dened in (0.26): |z | = z z = (a i b ) (a + i b ) = a 2 + b 2 = (0.28) The complex conjugate is also useful for eliminating complex numbers from the denominator of expressions: a + i b (a + i b ) (c i d ) ac + bd + i (bc ad ) = = c + i d (c + i d ) (c i d ) c2 + d2 (0.29)
No matter how complicated an expression, the complex conjugate is calculated by simply inserting a minus sign in front of all occurrences of i in the expression, and placing an asterisk on all complex variables in the expression. For example, the complex conjugate of e i is e i , as can be seen from Eulers formula (0.16). As another example consider E exp {i (z t )} = E exp i ( z t ) , assuming z , , and t are real, but E and are complex. A common way of obtaining the real part of an expression is simply by adding the complex conjugate and dividing the result by 2: Re {z } = 1 z + z 2 (0.30)
Notice that the expression for cos in (0.19) is an example of this formula. Sometimes when a complicated expression is added to its complex conjugate, we let C.C. represent the complex conjugate in order to avoid writing the expression twice.
0.4 Fourier Theory

Fourier analysis is an important part of optics. We often decompose complicated light elds into a superposition of pure sinusoidal waves. This enables us to consider the behavior of the individual frequency components one at a time (important since, for example, the optical index is different for different frequencies). After determining how individual sine waves move through an optical system (say a piece of glass), we can reassemble the sinusoidal waves to see the effect of the system on the overall waveform. Fourier transforms are used for this purpose. In fact, it will be possible to work simultaneously with innitely many sinusoidal waves, where the frequencies comprising a light eld are spread over a continuous range. Fourier transforms are also used in diffraction problems where a single frequency is associated with a superposition of many plane waves propagating in different directions.
We begin with a derivation of the Fourier integral theorem. A periodic function can be represented in terms of the sine and the cosine in the following manner:
f (t ) =
n =0
a n cos (n t ) + b n sin (n t )
(0.31)
This is called a Fourier expansion. It is similar in idea to a Taylors series (0.17), which rewrites a function as a polynomial. In both cases, the goal is to represent one function in terms of a linear combination of other functions (requiring a complete basis set). In a Taylors series the basis functions are polynomials and in a Fourier expansion the basis functions are sines and cosines with different frequencies. The expansion (0.31) is possible even if f (t ) is complex (requiring a n and b n to be complex). By inspection, we see that all terms in (0.31) repeat with a maximum period of 2/. This is why the expansion is limited in its use to periodic functions. The period of the function by such an expansion is such that f (t ) = f (t + 2/). We can rewrite the sines and cosines in the expansion (0.31) using (0.19) as follows:
f (t ) =
n =0
an
e i n t + e i n t e i n t e i n t + bn 2 2i
a n i b n i n t a n + i b n i n t = a0 + e + e 2 2 n =1 n =1 Thus, we can rewrite (0.31) as
(0.32)
f (t ) =
n =
c n e i n t
(0.33)
where c n <0 a n i b n 2 an + i bn c n >0 2 c0 a0
(0.34)
Notice that if c n = c n for all n , then f (t ) is real (i.e. real a n and b n ); otherwise f (t ) is complex. The real parts of the c n coefcients are connected with the cosine terms in (0.31), and the imaginary parts of the c n coefcients are connected with the sine terms in (0.31).
Given a known function f (t ), we can compute the various coefcients c n . There is a trick for doing this. We multiply both sides of (0.33) by e i m t , where
0.4 Fourier Theory
m is an integer, and integrate over the function period 2/:

/
f ( t )e
/
i m t
dt =
n =
cn
/
e i (m n )t d t e i (m n )t i (m n )
n / / i (m n )
= = =
cn n = 2 c
n = 2 c n =
(0.35)
i (m n )
e 2i (m n )
sin [(m n ) ] (m n )
The function sin [(m n ) ] / [(m n ) ] is equal to zero for all n = m , and it is equal to one when n = m (to see this, use LHospitals rule on the zero-over-zero situation). Thus, only one term contributes to the summation in (0.35). We now have cm = 2
/
f ( t )e i m t d t
/
(0.36)
from which the coefcients c n can be computed, given a function f (t ). (Note that m is a dummy index so we can change it back to n if we like.) This completes the circle. If we know the function f (t ), we can nd the coefcients c n via (0.36), and, if we know the coefcients c n , we can generate the function f (t ) via (0.33). If we are feeling a bit silly, we can combine these into a single identity: / f (t )e i n t d t e i n t (0.37) f (t ) = n = 2
/
We start with a function f (t ) followed by a lot of computation and obtain the function back again! (This is not quite as foolish as it rst appears, as we will see later.) As mentioned above, Fourier expansions represent functions f (t ) that are periodic over the interval 2/. This is disappointing since many optical waveforms do not repeat (e.g. a single short laser pulse). Nevertheless, we can represent a function f (t ) that is not periodic if we let the period 2/ become innitely long. In other words, we can accommodate non-periodic functions if we take the limit as goes to zero so that the spacing of terms in the series becomes very ne. Applying this limit to (0.37) we obtain 1 e i n t f (t ) = lim f t e i n t d t (0.38) 2 0 n =
At this point, a brief review of the denition of an integral is helpful to better understand the next step that we administer to (0.38). Recall that an integral is
10
really a summation of rectangles under a curve with nely spaced steps:

b
b a
g () d lim
a
0 n =0
g (a + n ) (0.39)
a +b = lim + n g 0 2 n = b a
2
b a 2
The nal expression has been manipulated so that the index ranges through both negative and positive numbers. If we set a = b and take the limit b , then (0.39) becomes
g () d = lim
0 n =
g (n )
(0.40)
This concludes our short review of calculus. We can use (0.40) in connection with (0.38) (where g (n ) represents everything in the square brackets). The result is the Fourier integral theorem: 1 2
f (t ) =
1 e i t 2
f t e i t d t d
(0.41)
The piece in brackets is called the Fourier transform, and the rest of the operation is called the inverse Fourier transform. The Fourier integral theorem (0.41) is often written with the following (potentially confusing) notation: f () 1 2 1 2
f ( t )e i t d t

(0.42) f () e i t d
f (t )
The transform and inverse transform are also sometimes written as f () F f (t ) and f (t ) F 1 f () . Note that the functions f (t ) and f () are entirely different, even taking on different units (e.g. the latter having extra units of per frequency). The two functions are distinguished by their arguments, which also have different units (e.g. time vs. frequency). Nevertheless, it is customary to use the same letter to denote either function since they form a transform pair. You should be aware that it is arbitrary which of the expressions in (0.42) is called the transform and which is called the inverse transform. In other words, the signs in the exponents of (0.42) may be interchanged. The convention varies in published works. Also, the factor 2 may be placed on either the transform or the inverse transform, or divided equally between the two as has been done here. As was previously mentioned, it would seem rather pointless to perform a Fourier transform on the function f (t ) followed by an inverse Fourier transform,
0.5 Linear Algebra and Sylvesters Theorem
11
just to end up with f (t ) again. Nevertheless, we are interested in this because we want to know the effect of an optical system on a waveform (represented by f (t )). It turns out that in many cases, the effect of the optical system can only be applied to f () (if the effect is frequency dependent). Thus, we perform a Fourier transform on f (t ), then apply the frequency-dependent effect on f (), and nally perform an inverse Fourier transform on the result. The nal function will be different from f (t ). Keep in mind that f () is the continuous analog of the discrete coefcients c n (or the a n and b n ). The real part of f () indicates the amplitudes of the cosine waves necessary to construct the function f (t ). The imaginary part of f () indicates the amplitudes of the sine waves necessary to construct the function f (t ). Finally, we note that a remarkable attribute of the delta function can be seen from the Fourier integral theorem. The delta function t t is dened indirectly through
f (t ) =
f t t t dt
(0.43)
The delta function t t is zero everywhere except at t = t , since the result of the integration only pays attention to the value of f t at that point. At t = t , the delta function is innite in such a way as to make the integral take on the value of the function f (t ). (One can consider t t d t with t = t to be the dimensions of an innitely tall and innitely thin rectangle with an area unity.) After rearranging the order of integration, the Fourier integral theorem (0.41) can be written as
f (t ) =
1 f t 2
e i (t t ) d d t
(0.44)
A comparison of (0.43) and (0.44) reveals the delta function to be a uniform superposition of all frequency components: 1 t t = 2
e i (t t ) d
(0.45)
This representation of the delta function comes in handy when proving Parsevals theorem (see P 0.31), which is used extensively in the study of light and optics.

In this section we outline two useful results from linear algebra. The rst result states that the inverse of a 2 2 matrix is given by A C
B D
1 AD BC
D C
B A
(0.46)
12
This can be proven by direct substitution: A C B D A C B D

1
= = =
1 AD BC 1 AD BC 1 0 0 1
A C
B D
D C
B A (0.47)
AD BC 0
0 AD BC
The next result is Sylvesters Theorem, which is useful when a 2 2 matrix (with a determinate of unity) is raised to a high power. This situation occurs for modeling periodic multilayer mirror coatings or for light rays trapped in a laser cavity. Sylvesters Theorem states that if the determinant of a 2 2 matrix is one, i.e. A B = AD BC = 1 (0.48) C D then the following holds: A C where cos = B D
N
1 sin
A sin N sin (N 1) C sin N
B sin N D sin N sin (N 1)
(0.49)
1 (A + D) 2
(0.50)
Example 0.1
Prove the Sylvesters theorem by induction.
Solution: When N = 1, the equation is seen to be correct by direct substitution. Next we assume that the theorem holds for arbitrary N , and we check to see if it holds for N + 1:
A C B D
N +1
1 sin
A C
B D
Now we inject condition (0.48) on the determinant ( AD BC = 1) into the righthand side
1 sin A 2 + BC sin N A sin (N 1) ( AC + C D ) sin N C sin (N 1) ( AB + B D ) sin N B sin (N 1) D 2 + BC sin N D sin (N 1)
and rearrange the result to give

1 sin A 2 + AD 1 sin N A sin (N 1) C [( A + D ) sin N sin (N 1) ] B [( A + D ) sin N sin (N 1) ] D 2 + AD 1 sin N D sin (N 1)
and then
1 sin A [( A + D ) sin N sin (N 1) ] sin N C [( A + D ) sin N sin (N 1) ] B [( A + D ) sin N sin (N 1) ] D [( A + D ) sin N sin (N 1) ] sin N 2004-2009 Peatross and Ware
13
In each matrix element, the expression ( A + D ) sin N = 2 cos sin N = sin (N + 1) + sin (N 1) occurs, which we have rearranged using cos = (0.15). The result is A C B D
N +1 1 2
(0.51)
( A + D ) while twice invoking B sin (N + 1) D sin (N + 1) sin N
1 sin
A sin (N + 1) sin N C sin (N + 1)
which completes the proof.
14
Appendix 0.A Integral and Sum Table

The following table of formulas are useful for various problems encountered in the text.
e ax

+bx +c
dx =
b2 +c e 4a a
Re {a } > 0
(0.52)
0 2
e i ax |b | |ab | dx = e 2 2 1 + x /b 2
b>0
(0.53)
e i a cos( ) d = 2 J 0 (a )
0 a
(0.54)
J 0 (bx ) x d x =
0
a J 1 (ab ) b
2
(0.55)
e
0
ax 2
e b /4a J 0 (bx ) x d x = 2a dx = 2a
(0.56)
sin2 (ax ) (ax )

2
(0.57) 1 cos(ax ) cos(bx ) d x = ab 2
sin(ax ) sin(bx ) d x =
0 N n =1 n =1 0
(a , b integer)
(0.58)
ar n = a ar n =
1r 1r
(0.59) (r < 1) (0.60)
a 1r
Exercises
15
Exercises
Exercises for 0.2 Vector Calculus P0.1 + 2 + 3 Let r = x y 3 z m and r0 = x y + 2 z m. (a) Find the magnitude of r. (b) Find r r0 . (c) Find the angle between r and r0 .
Answer: (a) r = 14 m; (c) 94 .
P0.2
Prove that the dot product between two vectors is the product of the magnitudes of the two vectors multiplied by the cosine of the angle between them.
Solution: Consider the plane containing the two vectors in (0.2). Call it the x y -plane. In this + k sin y and r = r cos x + coordinate system, the two vectors can be written as k = k cos x , where and are the respective angles that the two vectors make with the x -axis. The r sin y dot product gives k r = kr (cos cos + sin sin ). From (0.14) we have k r = kr cos ( ), which shows that is the angle between the vectors.
P0.3
Prove that the cross product between two vectors is the product of the magnitudes of the two vectors multiplied by the sine of the angle between them. The result is a vector directed perpendicular to the plane containing the original two vectors in accordance with the right hand rule. Verify the BAC-CAB rule: A (B C) = B (A C) C (A B). Prove the following identity: r rr 1 = , |r r | |r r | 3
P0.4 P0.5
where r operates only on r, treating r as a constant vector. P0.6

rr Prove that r |(rr |) 3 is zero, except at r = r where a singularity situation occurs.
P0.7 P0.8
Verify ( f) = 0 for any vector function f. Verify ( f) = ( f) 2 f
Solution: From (0.6), we have f = f y fx fz f y fz fx + x y z y z x z x y
16
and x /x y / y z /z
( f ) =
fy fz z y
z x x z
fy fx y x
f y fx fz fx + y x y z x z +
x z
f y fx fz f y x x y z y z
fz fx fz f y x x z y y z
After rearranging, we get ( f ) = 2 f x x 2 + 2 f y x y + 2 f y 2 f y 2 f z 2 f x 2 f z 2 f x 2 f z + + + x + y + + z 2 x z x y y z x z y z y z 2
2 f x
2 f y 2 f y 2 f y 2 f x 2 f x 2 f z 2 f z 2 f z + + x + + y + + z x 2 y 2 z 2 x 2 y 2 z 2 x 2 y 2 z 2
2 f y 2 f x 2 f + 2 + 2z . After some factorization, we obtain x 2 y z
where we have added and subtracted +y +z x y z
( f) = x
fx f y fz 2 2 2 + + + 2+ 2 2 x y z x y z
+ fz z + fy y fx x
= ( f) 2 f where on the nal line we invoked (0.4), (0.5), and (0.8).
P0.9 P0.10 P0.11 P0.12 P0.13
Verify f g = f g g ( f) + g f (f ) g. Verify f g = g ( f) f g . Verify g f = f g + g f. Verify g f = g f + g f. + x yy + x2zz . Verify the divergence theorem (0.11) for f x , y , z = y 2 x Take as the volume a cube contained by the six planes |x | = 1, y = 1, and |z | = 1.
Solution:
1 1 1 1 1 1
da = fn
S 1 1
d xd y x 2 z
1 1
z =1
1 1
d xd y x 2 z
1 1
z =1
+
1 1 1 1
d xd z x y y =1
1 1 1 1
d xd z x y y =1 +
1 1 1 1
d yd z y 2 x3 3
1
x =1
1 1
d yd z y 2 x2 2
1
x =1
=2
1 1
d xd y x 2 + 2
1 1
d xd zx = 4
1
+4
1
=
1
8 . 3
1
1 1 1
fd v =
V 1 1 1
d xd yd z x + x 2 = 4
1
d x x + x2 = 4
x2 x3 + 2 3
=
1
8 . 3
Exercises
17
P0.14
Verify Stokes theorem (0.12) for the function given in P 0.13. Take the surface to be a square in the x y -plane contained by |x | = 1 and y = 1. Use the divergence theorem to show that the function in P 0.6 is 4 times the three-dimensional delta function.
P0.15
Solution: We have by the divergence theorem rr

S
rr
da = n
V
rr rr
3
dv
From P 0.6, the argument in the integral on the right-hand side is zero except at r = r . Therefore, if the volume V does not contain the point r = r , then the result of both integrals must be zero. Let us construct a volume between an arbitrary surface S 1 containing r = r and S 2 , the surface of a tiny sphere centered on r = r . Since the point r = r is excluded by the tiny sphere, the result of either integral in the divergence theorem is still zero. However, we have on the tiny sphere rr
S2 2
rr
da = n 3
0 0
1 r2
r 2 sin d d = 4
Therefore, for the outer surface S 1 (containing r = r ) we must have the equal and opposite result: rr d a = 4 n 3 rr
S1
This implies r
V
rr rr
3
dv =
4 if V contains r 0 otherwise
The argument of this integral exhibits the same characteristics as the delta function 3 r r x x y y z z . Namely, 3 r r d v =
V rr
1 if V contains r 0 otherwise
Therefore, r
|rr |3
= 43 r r . The delta function is dened in (0.43)
Exercises for 0.3 Complex Numbers P0.16 Do the following complex arithmetic problems using real arithmetic functions along with the fundamentals of complex numbers (i.e. dont use your calculators complex arithmetic abilities): (a) For z 1 = 2 + 3i and z 2 = 3 5i , calculate z 1 + z 2 and z 1 z 2 in both rectangular and polar form. (b) For z 1 = 1 i and z 2 = 3 + 4i , calculate z 1 z 2 and z 1 /z 2 in both rectangular and polar form.
18
P0.17 P0.18
Show that 3 + 4i can be written as 5 exp i tan1 4/3 + i . Show (a i b ) (a + i b ) = exp 2i tan1 b /a regardless of the sign of a , assuming a and b are real. Invert (0.16) to get both formulas in (0.19). Show Re { A } Re {B } = ( AB + A B ) /4 + C .C . If E = |E | e i E and B = |B | e i B , and if k , z , , and t are all real, prove Re E e i (kz t ) Re B e i (kz t ) = 1 E B + EB 4 1 + |E | |B | cos [2 (kz t ) + E + B ] 2
P0.19 P0.20 P0.21
P0.22
(a) If sin = 2, show that cos = i 3. HINT: Use sin2 + cos2 = 1. (b) Show that the angle in (a) is /2 i ln(2 + 3).
P0.23
Use the techniques/principles of complex numbers to write the following as simple phase-shifted cosine waves (i.e. nd the amplitude and phase of the resultant cosine waves): (a) 5 cos(4t ) + 5 sin(4t ) (b) 3 cos(5t ) + 10 sin(5t + 0.4)
Exercises for 0.4 Fourier Theory P0.24 Prove linear superposition of Fourier Transforms: F ag (t ) + bh (t ) = ag () + bh () where g () F g (t ) and h () F {h (t )}. P0.25 P0.26 P0.27 Prove F g (at ) =
1 |a | g a
Prove F g (t ) = g ()e i . Show that the Fourier transform of E (t ) = E 0 e (t /) cos 0 t is

2
E () =
E 0 2 2
+0 )2 4/2
+e
0 )2 4/2
P0.28
Take the inverse Fourier transform of the result in P 0.27. Check that it returns exactly the original function.
Exercises
19
P0.29
The following operation is referred to as the convolution of the functions g (t ) and h (t ):
g (t ) h (t )
g ( t )h ( t ) d t
A convolution measures the overlap of g (t ) and a reversed h (t ) as a function of the offset . (a) Prove the convolution theorem: F g (t ) h (t ) = 2F g (t ) F {h (t )}
(b) Prove this related form of the convolution theorem: F g (t )h (t ) = 1 2 F g (t ) F {h (t )}
Solution: Part (a) F
g (t )h ( t ) d t
1 2 1 2 2
g (t ) h ( t ) d t 1
e i d
(Let = t + t )
g (t ) h t d t
e i t +t d t
= =
1 2
g (t ) e i t d t
h t e i t d t
2g () h ()
P0.30
Prove the autocorrelation theorem: F h (t )h (t )d t =
2 |h ()|2
P0.31
Prove Parsevals theorem:

f () d =

f (t ) d t
P0.32
(a) Compute the Fourier transform of a Gaussian function, f 1 (t ) = 2 2 e t /2 . Do the integral by hand using the table in Appendix 0.A. (b) Compute the Fourier transform of a sine function, f 2 (t ) = sin 0 t . Dont use a computer to do the integraluse the fact that sin(x ) = 1 ix i x ), combined with the integral formula (0.45). 2 i (e e
20
(c) Use your results to parts (a) and (b) and a convolution theorem from 2 2 P 0.29 to evaluate the Fourier transform of g (t ) = e t /2 sin 0 t . (The answer should be similar to 0.27). (d) Plot g (t ) and the imaginary part of its Fourier transform for the parameters 0 = 1 and = 8. P0.33 Use your results from P 0.32, along with a convolution theorem from P 0.29, to evaluate the Fourier transform of h (t ) = e (t t0 )
2
/22
sin 0 t + e t
/22
sin 0 t + e (t +t0 )
/22
sin 0 t
which consists of the sum of three Gaussian pulses, each separated by a time t 0 . HINT: The three-pulse function h (t ) is a convolution of e t /2 sin 0 t with three delta functions. Here is a good check for your nal answer: if you set t 0 = 0, the three pulses are on top of each other, so you should get three times the answer to problem P 0.32(c).
2 2
(b) Plot h (t ) and the imaginary part of its Fourier transform for the parameters 0 = 1, = 8, and t 0 = 30. (c) This h (t ) is longer than the single pulse in problem P 0.32(c). Should its Fourier transform be broader or narrower than in P 0.32(c)? Comment on what you see in the plots.
Chapter 1
Electromagnetic Phenomena
1.1 Introduction
In the mid 1800s James Maxwell assembled the various known relationships of electricity and magnetism into a concise1 set of equations: E =
0
(Gausss Law from Coulombs Law) (Gausss Law for magnetism from Biot-Savart) (Faradays Law) (Amperes Law revised by Maxwell)
(1.1) (1.2) (1.3) (1.4)
B = 0 B E = t E B = 0 +J 0 t
Here E and B represent electric and magnetic elds, respectively. The charge density describes the charge per volume distributed through space. The current density J describes the motion of charge density (in units of times velocity). The constant 0 is called the permittivity, and the constant 0 is called the permeability. After introducing a key revision into Amperes law, Maxwell realized that together these equations comprise a complete self-consistent theory of electromagnetic phenomena. Moreover, the equations imply the existence of electromagnetic waves, which travel at the speed of light. Since the speed of light had been measured before Maxwells time, it was immediately apparent (as was already suspected) that light is a high-frequency manifestation of the same phenomena that govern the inuence of currents and charges upon each other. Previously, optics was considered to be a topic quite separate from electricity and magnetism. Once the connection was made, it became clear that Maxwells equations form the theoretical foundation for optics, and this is where we begin our study of light. In this chapter, we review the physical principles associated with each of Maxwells equations and illustrate the connection between electromagnetic phe1 In Maxwells original notation, this set of equations was hardly concise. Lacking the convenience
of modern vector notation, he wrote them as twenty equations in twenty variables. (Try tting that on a T-shirt!)
21
22
Chapter 1 Electromagnetic Phenomena
nomena and light. While many of the details discussed in this chapter (e.g. static elds and magnetic effects) are not directly used in later chapters, they are included to better appreciate the basic physics that Maxwells equations describe. It may be helpful to study the vector calculus review in section 0.2 before beginning this chapter.
1.2 Gausss Law and Coulombs Law

The force on a point charge q located at r exerted by another point charge q located at r is F = q E(r) (1.5) where E (r) = q 4 rr
0
|r r |3
(1.6)
Figure 1.1 The geometry of Coulombs law for a point charge
This relationship is known as Coulombs law. The force is directed along the vector r r , which points from charge q to q as seen in Fig. 1.1. The length or magnitude of this vector is given by r r (i.e. the distance between q and q ). The familiar inverse square law can be seen by noting that r r r r is a unit vector. We have written the force in terms of an electric eld E (r), which is dened throughout space (regardless of whether the second charge q is actually present). The permittivity 0 amounts to a proportionality constant. The total force from a collection of charges is found by summing expression (1.5) over all charges q n associated with their specic locations rn . If the charges are distributed continuously throughout space, having density r (units of charge per volume), the summation for nding the net electric eld at r becomes an integral: E (r) = 1 4 r
0
rr |r r |3
dv
(1.7)
Figure 1.2 The geometry of Coulombs law for a charge distribution.
This three-dimensional integral gives the net electric eld produced by the charge density distributed throughout the volume V . Gausss law follows directly from (1.7). By performing some mathematical operations on (1.7), we can demonstrate that the electric eld uniquely satises the differential equation E = (1.8)
0
(see Example 1.1 for details). No new physical phenomenon is introduced by writing Gausss law. It is simply a mathematical interpretation of Coulombs law.
Example 1.1
Derive Gausss law from (1.7).
1.3 The Lorentz Force, Biot-Savart Law, and Gausss Law for Magnetic Fields
23
Solution: To derive Gausss law, we take the divergence of (1.7): E (r ) = 1 4 r r

0
rr |r r | 3
dv
(1.9)
The subscript on r indicates that it operates on r while treating r as a constant. As messy as this integral appears, it contains a remarkable mathematical property that can be exploited, even without specifying the form of the charge distribution r . In modern mathematical language, the vector expression in the integral is a three-dimensional delta function: r rr | r r |3 43 r r 4 x x y y z z (1.10)
A derivation of this formula and a description of its properties are addressed in problem P 0.15. The delta function allows the integral in (1.9) to be performed, and the relation becomes simply E (r ) = (r )
0
(1.11)
which is the differential form of Gausss law.
The (perhaps more familiar) integral form of Gausss law can be obtained by integrating (1.8) over a volume V and applying the divergence theorem (0.11) to the left-hand side: 1 da = E (r ) n (r ) d v (1.12)
S
0
This form of Gausss law shows that the total electric eld ux extruding through a closed surface S (i.e. the integral on the left side) is proportional to the net charge contained within it (i.e. within volume V contained by S ).
1.3 The Lorentz Force, Biot-Savart Law, and Gausss Law for Magnetic Fields
The Biot-Savart law describes the force on a charged particle due to a magnetic eld. In this case, the charge q must move with a velocity (call it v) in order to experience the force. The magnetic eld arises itself from charges that are in motion. We consider the magnetic eld to be caused by a distribution of moving charges that form a current density J r throughout space. The current density has units of charge times velocity per volume (or equivalently, current per cross sectional area). The magnetic force law analogous to Coulombs law is F = qv B where B (r ) = 0 4 rr |r r | 3 (1.13)
Figure 1.3 Gausss law in integral form relates the ux of the electric eld through a surface to the charge contained inside that surface.
J r
V
dv
(1.14)
24
The rst equation is known as the Lorentz force for a magnetic eld, and the latter equation is referred to as the Biot-Savart law. The permeability 0 dictates the strength of the force, given the current distribution. As before, we can apply mathematics to the Biot-Savart law to obtain another of Maxwells equations. Nevertheless, the essential physics is already inherent in the Biot-Savart law. With the result from P 0.5, we can rewrite (1.14) as B (r ) = 0 4 J r r
V
1 dv |r r | (1.15) dv
0 = 4
J r
V
|r r |
Since the divergence of a curl is identically zero, taking the divergence of (1.15) gives (see P 0.7) B = 0 (1.16)
This is another of Maxwells equations (two down; two to go). The similarity between this equation and Gausss law for electric elds (1.8) is apparent. In fact, (1.16) is known as Gausss law for magnetic elds. In integral form, Gausss law for magnetic elds looks like that for electric elds (1.12), only with zero on the right hand side. The law implies that the total magnetic ux extruding through any closed surface is zero (i.e. there will be as many eld lines pointing inwards as pointing outwards). If one were to imagine the existence of magnetic charges (monopoles with either a north or south charge), then the right-hand side would not be zero. However, since magnetic charges have yet to be observed in nature, there is no point in introducing them.
1.4 Faradays Law

Michael Faraday discovered and characterized the relationship between changing magnetic uxes and induced electric elds. This effect, called induction, can be observed by moving a magnet around near a loop of wire and noting that the changing magnetic eld induces an electric eld (usually detected by the current it causes in a wire loop). Faraday showed that a change in magnetic ux through the area of a circuit loop (see Fig. 1.4) induces an electromotive force in the loop according to da Ed = Bn (1.17) t
C S
N
Magnet
Figure 1.4 Faradays law.
This relation is known as Faradays law, the third in our list of Maxwells equations. In (1.17), Faradays law is written in integral form. The right side describes a change in the magnetic ux through a surface and the left side describes an electric eld produced on the boundary of this surface.
1.5 Amperes Law
25
To obtain the differential form of Faradays law, we apply Stokes theorem to the left-hand side and obtain da = En
S
da Bn
S
(1.18)
or E+
S
B da = 0 n t
(1.19)
Since this equation is true regardless of what surface we choose, it implies E = B t (1.20)
which is the differential form of Faradays law.
1.5 Amperes Law

The Biot-Savart law (1.14) can also be used to obtain another of Maxwells equations: Amperes law. Amperes law is obtained by inverting the Biot-Savart law (1.14) so that J appears by itself, unfettered by integrals or the like. This is accomplished through mathematics, so again no new physical phenomenon is introduced, only a new interpretation. The process for inverting the Biot-Savart law is given in Example 1.2. The result is B = 0 J which is the differential form of Amperes law. Example 1.2
Obtain Amperes law from the Biot-Savart law under the assumption J = 0.
(1.21)
Michael Faraday (17911867, English) Faraday was one of the greatest experimental physicist in history. He is perhaps best known for his work that established the law of induction (i.e. changing magnetic elds produce electric elds). He also discovered that magnetic elds can interact with light. When a magnetic eld is oriented along the direction of travel for light in a dielectric, the polarization of the light will rotate. This eect is used to build optical isolators, which prevent light from reecting back into an optical system.
Solution: We take the curl of (1.14): B (r ) = 0 4 r J r

V
rr |r r |3
dv
(1.22)
We next apply the differential vector rule from P 0.9 while noting that J r does not depend on r so that only two terms survive. The curl of B (r) then becomes B (r ) = 0 4 J r
V
rr |r r |
3
J r r
rr |r r |3
dv
(1.23)
According to (1.10), the rst term in the integral is 4J r 3 r r , which is easily integrated. To make progress on the second term, we observe that the gradient can
26
be changed to operate on the primed variables without affecting the nal result (i.e. r r ). In addition, we take advantage of the vector integral theorem (0.13) to arrive at B (r) = 0 J (r) 0 4 rr
V 3
|r r |
r J r
dv +
0 4
rr
S
| r r |3
da J r n
(1.24) The last term in (1.24) vanishes if we assume that the current density J is completely contained within the volume V so that it is zero at the surface S . Thus, the expression for the curl of B (r) reduces to B (r) = 0 J (r) 0 4 rr
V
|r r | 3
r J r
dv
(1.25)
The latter term in (1.25) vanishes if J = 0, yielding Amperes law (1.21).
The integral form of Amperes law can be obtained by integrating both sides of (1.21) over an open surface S , bounded by contour C . Stokes theorem (0.12) is applied to the left-hand side of Amperes law to get B (r) d = 0
C S
d a 0 I J (r) n
(1.26)
Figure 1.5 Amperes law.
This law says that the line integral of B around a closed loop C is proportional to the total current owing through the loop (see Fig. 1.5). Recall that the units of J are current per area, so the surface integral containing J yields the current I in units of charge per time. It is important to note that during the derivation of Amperes law (1.21) one must make the approximation J =0 (steady-state approximation) (1.27)
(see Example 1.2). This approximation is valid only if the current density J does not vary rapidly in time, which is not true in general (especially for optical phenomena). Thus, we need to do some additional work to get Amperes law into a form that is true in non-steady-state situations.
1.6 Maxwells Adjustment to Amperes Law

Maxwell was the rst to realize that Amperes law is incomplete as written in (1.21) when the current density J varies dynamically in time. To arrive at the correct equation, we need to understand what the expression J (i.e. the term that we neglected) represents. Consider a volume of space enclosed by a surface S through which current is owing. The total current exiting the volume is I=
S
da Jn
(1.28)
1.6 Maxwells Adjustment to Amperes Law
27
is the outward normal to the surface. The units on this equation are that where n of current, or charge per time, leaving the volume. Since we have considered a closed surface S , the net current leaving the enclosed volume V must be the same as the rate at which charge within the volume vanishes: I = dv (1.29) t
V
Upon equating these two expressions for current, as well as applying the divergence theorem (0.11) to the former, we get J dv =
V V
dv t
(1.30)
or J+
V
dv = 0 t
(1.31)
James Clerk Maxwell (18311879, Scottish) Maxwell is best known for his fundamental contributions to electricity and magnetism and the kinetic theory of gases. He studied numerous other subjects, including the human perception of color and color-blindness, and is credited with producing the rst color photograph. He originally postulated that electromagnetic waves propagated in a mechanical luminiferous ether, but subsequent experiments have found this model untenable. He founded the Cavendish laboratory at Cambridge in 1874, which has produced 28 Nobel prizes to date.
Since (1.31) is true regardless of which volume V we choose, it implies the following continuity equation: J = (1.32) t The continuity equation (1.32) is simply a statement of the conservation of charge. It requires that the charge inside a volume must decrease in time if we are to have a net current owing out of the volume. This is not a concern in the steady-state situation (where the form of Amperes law derived in the previous section applies) since a steady current has equal amounts of charge owing both into and out of any particular volume (i.e. t = 0). Maxwells main contribution (aside from organizing other peoples formulas) was the injection of the continuity equation (1.32) into the derivation of Amperes law to make it applicable to dynamical situations. If we insert the continuity equation (1.32) into the derivation of Amperes law (see Example 1.3), we nd the generalized form of Amperes law: B = J+ 0
0
E t
(1.33)
The nal term is known as the displacement current (density), which exists even in the absence of any actual charge density . It indicates that a changing electric eld behaves like a current in the sense that it produces magnetic elds. The similarity between Faradays law and the corrected Amperes law is apparent, and no doubt played a part in motivating Maxwells work.
Example 1.3
Use the continuity equation and the Biot-Savert law to derive the corrected form of Amperes law.
28
Solution: The derivation proceeds as in Example 1.2 until we reach (1.25). Inserting the continuity equation into (1.25) yields: B = 0 J + 0 4 t r
V
rr | r r |3
dv
Then substitution of (1.7) into this formula gives B = J+ 0

0
E t
In summary, in the previous section we saw that the basic physics in Amperes law is present in the Biot-Savart law (the two laws are connected through mathematics). Adding the requirement of charge conservation yields the corrected form, which can be used with rapidly varying elds.
1.7 Polarization of Materials

We are essentially nished with our analysis of Maxwells equations except for a brief examination of current density J and charge density . The current density can be decomposed into three categories. The rst category is associated with charges that are free to move, such as electrons in a metal. We will denote this type of current density by Jfree . The second category is associated with effective currents inside individual atoms that give rise to paramagnetic and diamagnetic effects. These are seldom important in optics problems, and so we will ignore these types of currents. The third type of current occurs when molecules in a material become polarized (i.e. elongate or orient as dipoles) in response to an applied electric eld. We denote this type of current by Jp to distinguish it from free currents. The total current (ignoring magnetic effects) is then J = Jfree + Jp (1.34)
The polarization current Jp is associated with a dipole distribution function P (r), called the polarization (in units of dipoles per volume, or charge times length per volume). Physically, if the dipoles (depicted in Fig. 1.6) change their orientation as a function of time in some coordinated fashion, an effective current density results. (In a static P (r) no charges are moving, so there is no current.) Since the time-derivative of dipole moments renders charge times velocity, a distribution of sloshing dipoles gives a current density equal to Jp = P t (1.35)
With this, Maxwells equation (1.33) becomes B = Jfree + 0

0
E P + t t
(1.36)
1.7 Polarization of Materials
29
In the study of light and optics, we seldom consider the propagation of electromagnetic waveforms through electrically charged materials. In the case of no net charge, one might be tempted to set the right-hand side of Gausss law (1.1) to zero. However, this would be wrong because neutral materials can become polarized, as described by P (r). The degree of polarization can vary spatially within a material, leading to local concentrations of positive or negative charge even though on average the material is neutral. This local buildup of charge due to the polarization current obeys the continuity equation Jp = p t (1.37)
where p is the charge density created by variations in the polarization P(r). Substitution of (1.35) into this equation yields an expression for the resulting charge density p : p = P (1.38) To further appreciate local charge buildup due to variation medium polarization, consider the divergence theorem (0.11) applied to P (r) in a neutral medium:
S
da = P (r ) n
V
P (r) d v
(1.39)
The left-hand side of (1.39) is a surface integral, which after integrating gives units of charge. Physically, it is the sum of the charges touching the inside of surface S (multiplied by a minus since dipole vectors point from the negatively charged end of a molecule to the positively charged end). The situation is depicted in Fig. 1.6. Keep in mind that P (r) is a continuous function so that Fig. 1.6 depicts crudely an enormous number of very tiny dipoles (no fair drawing a surface that avoids cutting the dipoles; cut through them at random). When P is zero, there are equal numbers of positive and negative charges touching S from within. When P is not zero, the positive and negative charges touching S are not balanced. Essentially, excess charge ends up within the volume because the non-uniform alignment of dipoles causes them to be cut preferentially at the surface. Since either side of (1.39) is equal to the excess charge inside the volume, P may be interpreted as a charge density (it certainly has the right unitscharge per volume), in agreement with (1.38). Again, the negative sign occurs since when P points out of the surface S , negative charges are left inside. The total charge density thus can be written as = free + p With (1.38), Gausss law (1.8) becomes E = free P
0
Figure 1.6 A polarized medium with (a) P = 0 and with (b) P = 0.
(1.40)
(1.41)
For typical optics problems (involving neutral materials), we have free = 0.

30
1.8 The Macroscopic Maxwell Equations

In summary, in electrically neutral non-magnetic materials, Maxwells equations are P E = (Coulombs law Gausss law) (1.42)
0
B = 0 B t E P B = 0 + + Jfree 0 t t E =
(Biot-Savart law Gausss law for magnetism) (1.43) (Faradays law) (Amperes law; xed by Maxwell) (1.44) (1.45)
Notice that the assumption of an electrically neutral material dismisses the possibility of a free charge density free , but we have retained the possibility of free current density Jfree . This is not a contradiction. In a neutral material, some charges may move differently than their oppositely charged counterparts, such as electrons versus ions in a metal. This gives rise to currents without the requirement of a net charge. It is common to see the macroscopic Maxwell equations written in terms of two auxiliary elds: H and D. For the benet of those who have used these auxiliary elds before, we take a moment to review their denitions and explain why we dont use them in this book. The eld H is useful in magnetic materials. In these materials, the combination B 0 in Amperes law is replaced by H B/0 M, where M is the materials magnetization. Since we only consider nonmagnetic materials (M = 0), there is little point in using H. The eld D, called the displacement, is dened as D 0 E + P. This combination of E and P occurs in Coulombs law and Amperes law. For the purposes of this book, it is conceptually more clear to retain the polarization P as a separate eld in these two equations. For instance, in isotropic media P is zero in (1.42) while P/t is non-zero in (1.45).
1.9 The Wave Equation

When Maxwell unied electromagnetic theory, he immediately noticed that waves are solutions to this set of equations. In fact his desire to nd a set of equations that allowed for waves aided his effort to nd the correct equations. After all, it was already known that light traveled as waves, Kirchhoff had previously noticed 8 that 1 0 0 gives the correct speed of light c = 3.00 10 m/s (which had already been measured), and Faraday and Kerr had observed that strong magnetic and electric elds affect light propagating in crystals. At rst glance, Maxwells equations might not immediately suggest (to the inexperienced eye) that waves are solutions. However, we can manipulate the equations (rst order differential equations coupling E and B) into the familiar wave equation (second order differential equations for either E or B, decoupled).
1.9 The Wave Equation
31
We will derive the wave equation for E. The derivation of the wave equation for B is very similar (see problem P 1.7). We begin our derivation by taking the curl of (1.44), from which we obtain ( E ) + ( B) = 0 t (1.46)
Then we eliminate B by substitution for B from (1.45) to obtain the wave equation for the electric eld: ( E) + 0
0
Jfree 2 P 2 E = 0 0 t 2 t t 2
(1.47)
We can put the wave equation into a more familiar form using the differential vector identity (see P 0.8): ( E) = ( E) 2 E (1.48)
Using this identity with (1.47), and applying Coulombs law (1.42) to the ( E) term, we get 2 E Jfree 2 P 1 2 E 0 0 2 = 0 + 0 2 ( P) (1.49) t t t 0 The left-hand side of (1.49) is the familiar wave equation. However, the righthand side contains a number of source terms, which arise when various currents and polarizations are present. The rst term on the right-hand side of (1.49) describes electric currents, which are important for determining the reection of light from a metallic surface or for determining the propagation of light within a plasma. The second term on the right-hand side of (1.49) describes dipole oscillations, which behave similar to currents. In a non-conducting optical material such as glass, the free current is zero, but 2 P t 2 is not zero, as the medium polarization responds to the light eld. This polarization current determines the refractive index of the material (discussed in chapter 2). The nal term on the right-hand side of (1.49) is important in anisotropic media such as crystals. In this case, the polarization P responds to the electric eld along a direction not necessarily parallel to E, due to the inuence of the crystal lattice (addressed in chapter 5). For most problems in optics, some of the terms on the right-hand side of (1.49) are zero. However, usually at least one of the terms must be retained when considering propagation in a medium other than vacuum. After solving the wave equation (1.49) for E, one may obtain B through an application of Faradays law (1.44). Even though the magnetic eld B satises a similar wave equation, decoupled from E (see P 1.7), the two waves are not independent. The elds for E and B must be chosen to be consistent with each other through Maxwells equations. In vacuum all of the terms on the right-hand side in (1.49) are zero, in which case the equation reduces to 2 E 0
0
2 E =0 t 2
(vacuum)
(1.50)
32
The solutions to the vacuum wave equation (1.50) propagate with speed c 1
0
0 = 2.9979 108 m/s
(vacuum)
(1.51)
and any function E is a valid solution as long as it caries the dependence on the r ct , where u is a unit vector specifying the direction of propagation. argument u r c t preserves the shape of the waveform as it propagates in the The argument u direction; features occurring at a given position recur downstream at a distance u ct after a time t . By checking this solution in (1.50), one effectively veries that the speed of propagation is c (see P 1.9). Note that we may add together any combination of solutions (even with differing directions of propagation) to form other valid solutions.
Exercises
33
Exercises
Exercises for 1.1 Introduction P1.1 Suppose that an electric eld is given by E(r, t ) = E0 cos k r t + , E0 where kE0 and is a constant phase. Show that B(r, t ) = k cos k r t + is consistent with (1.3).
Exercises for 1.2 Gausss Law and Coulombs Law P1.2 Consider an innitely long hollow cylinder (inner radius a , outer radius b ) which carries a volume charge density = k /s 2 for a < s < b and no charge elsewhere, where s is the distance from the axis of the cylinder as shown in Fig. 1.7. Use Gausss Law in integral form to nd the electric eld produced by this charge for each of the three regions: s < a , a < s < b , and s > b . HINT: For each region rst draw an appropriate Gaussian surface and integrate the charge density over the volume to gure out the enclosed charge. Then use Gausss law in integral form and the symmetry of the problem to solve for the electric eld.
Figure 1.7 A charged cylinder with charge located between a and b .
Exercises for 1.5 Amperes Law P1.3 A conducting cylinder with the same geometry as P 1.2 carries a volume along the axis of the cylinder for a < s < b . current density J = k /s z Using Amperes Law in integral form, nd the magnetic eld due to this current. Find the eld for each of the three regions: s < a , a < s < b , and s > b . HINT: For each region rst draw an appropriate Amperian loop and integrate the current density over the surface to gure out how much current passes through the loop. Then use Amperes law in integral form and the symmetry of the problem to solve for the magnetic eld.
C
Exercises for 1.6 Maxwells Adjustment to Amperes Law P1.4 (a) Use Gausss law to nd the electric eld in a gap introduced in a current-carrying wire, as shown in Fig. 1.8. Assume that the crosssectional area of the wire A is much wider than the gap separation d . Let the accumulated charge on the plates be Q . HINT: The electric eld is essentially zero except in the gap. (b) Find the strength of the magnetic eld on contour C using Amperes law applied to surface S 1 . Let the current in the wire be I .
Figure 1.8 Charging capacitor.
34
(c) Show that the displacement current leads to the identical magnetic eld when using surface S 2 . HINT: Multiply 0 E t by the crosssectional area to obtain a current. The current in the wire is related to the charge Q through I = Q t .
Exercises for 1.8 The Macroscopic Maxwell Equations P1.5 Memorize the Maxwell equations (1.1)-(1.4) and the macroscopic Maxwell equations (1.42)-(1.45). Be prepared to reproduce them from memory on an exam. Write them from memory in your homework to indicate that you have completed this problem. Also write the assumptions made in writing the macroscopic Maxwell equations in this way. For the elds given in P 1.1, what are the implications for Jfree + P/t ?
P1.6
Exercises for 1.9 The Wave Equation P1.7 Derive the wave equation for the magnetic eld B in vacuum (i.e. Jfree = 0 and P = 0). Show that the magnetic eld in P 1.1 is consistent with the wave equation derived in P 1.7. r c t ) satises the vacuum wave equation (1.50), where Check that E (u E is an arbitrary functional form. (a) Show that E (r, t ) = E0 cos k (u r c t ) + is a solution to the vac is an arbitrary unit vector and k is uum wave equation (1.50), where u a constant with units of inverse length. (b) Show that each wave front forms a plane, which is why such solutions are often called plane waves. HINT: A wavefront is a surface in space where the argument of the cosine (i.e., the phase of the wave) has a constant value. Set the cosine argument to an arbitrary constant and see what positions are associated with that phase. (c) Determine the speed v = r /t that a wave front moves in the u direction. HINT: Set the cosine argument to a constant, solve for r, and differentiate. (d) By analysis, determine the wavelength in terms of k . HINT: Find the distance between identical wave fronts by changing the cosine argument by 2 at a given instant in time. must be perpendicular to each (e) Use (1.42) to show that E0 and u other in vacuum.
P1.8
P1.9
P1.10
Exercises
35
L1.11
Measure the speed of light using a rotating mirror. Provide an estimate of the experimental uncertainty in your answer (not the percentage error from the known value).
Figure 1.9 A schematic of the setup for lab 1.11. Figure 1.10 shows a simplied geometry for the optical path for light in this experiment. Laser light from A reects from a rotating mirror at B towards C. The light returns to B, where the mirror has rotated, sending the light to point D. Notice that a mirror rotation of deects the beam by 2 .
Figure 1.10 Geometry for lab 1.11. P1.12 Ole Roemer made the rst successful measurement of the speed of light in 1676 by observing the orbital period of Io, a moon of Jupiter with a period of 42.5 hours. When Earth is moving toward Jupiter, the period is measured to be shorter than 42.5 hours because light indicating the end of the moons orbit travels less distance than light indicating the beginning. When Earth is moving away from Jupiter, the situation is reversed, and the period is measured to be longer than 42.5 hours. (a) If you were to measure the time for 40 observed orbits of Io when Earth is moving directly toward Jupiter and then several months later measure the time for 40 observed orbits when Earth is moving directly away from Jupiter, what would you expect the difference between these two measurements be? Take the Earths orbital radius to be 1.5 1011 m. To simplify the geometry, just assume that Earth move directly toward or away from Jupiter over the entire 40 orbits (see Fig. 1.11).
36
Earth Sun Io
Jupiter Earth
Figure 1.11 g:Roemer (b) Roemer did the experiment described in part (a), and experimentally measured a 22 minute difference. What speed of light would one deduce from that value? P1.13
Ole Roemer (16441710, Danish) Roemer was a man of many interests. In addition to measuring the speed of light, he created a temperature scale which with slight modication became the Fahrenheit scale, introduced a system of standard weights and measures, and was heavily involved in civic aairs (city planning, etc.). Scientists initially became interested in Ios orbit because its eclipse (when it went behind Jupiter) was an event that could be seen from many places on earth. By comparing accurate measurements of the local time when Io was eclipsed by Jupiter at two remote places on earth, scientists in the 1600s were able to determine the longitude difference between the two places.
+ 2z 4 y ) cos(t ) Suppose we have an electric eld given by E = (7x 2 y 3 x (a) Use Gausss law (1.1) to nd the charge density (x , y , z , t ). (b) Use Faradays law (1.3) to nd
B(x , y ,z ,t ) . t
(c) Determine if E is a solution to the vacuum wave equation, (1.50). P1.14 In an isotropic medium (i.e. P = 0), the polarization can often be written as function of the electric eld: P = 0 (E ) E, where (E ) = 1 + 2 E + 3 E 2 . The higher order coefcients in the expansion (i.e. 2 , 3 , ...) are typically small, so only the rst term is important at low intensities. The eld of nonlinear optics deals with intense lightmatter interactions, where the higher order terms of the expansion are important. This can lead to phenomena such as harmonic generation. Starting with Maxwells equations, derive the wave equation for nonlinear optics in an isotropic medium: 2 E 0 1 + 1 2 2 E + 3 E 2 + E 2 E J = + 0 0 0 2 2 t t t
We have retained the possibility of current here since, for example, in a gas some of the molecules might ionize in the presence of a strong eld, giving rise to a current.
Chapter 2
Plane Waves and Refractive Index

2.1 Introduction
In this chapter we focus on sinusoidal solutions of Maxwells equations, called plane waves, and their interaction with matter. Restricting our attention to plane wave solutions may seem somewhat limiting at rst, since (as mentioned in chapter 1) any waveform can satisfy the wave equation in vacuum (and therefore Maxwells equations) as long as it travels at c and has appropriate connections between E and B. It turns out, however, that an arbitrary waveform can be constructed from a linear superposition of sinusoidal waves. Thus, we can model the behavior of more complicated waveforms by considering the behavior of many sinusoidal waves and then summing them to produce the desired waveform. The electric eld of a plane wave induces oscillating dipoles in a medium, and these oscillating dipoles feed back on the electric eld to make the wave propagate at a different speed than it would in vacuum. We describe this effect using the index of refraction. Since materials respond differently to different frequencies of light, plane waves of different frequencies travel at different speeds in materials. Thus, when we have a waveform composed of many frequencies, it invariably changes its shape when it travels as its frequency components change relationship with one another. This phenomenon (called dispersion) is one of the primary reasons why physicists and engineers choose to work with sinusoidal waves. When describing plane waves, it is convenient to use complex number notation. This is particularly true for problems involving absorption of light such as what takes place inside metals and, to a lesser degree (usually), inside dielectrics (e.g. glass). In such cases, oscillatory elds decay as they travel, owing to absorption. We will introduce complex electric eld waves in section 2.2, and the index of refraction in section 2.3. When the electric eld is represented using complex notation, the index of refraction also becomes a complex number. The imaginary part controls the rate at which the eld decays, while the real part governs the familiar oscillatory behavior. Complex notation will be used extensively in this chapter (and throughout the rest of the book), so we recommend that the student 37
38
Chapter 2 Plane Waves and Refractive Index
AM Radio FM Radio Radar Microwave Infrared Light (red) Light (yellow) Light (blue) Ultraviolet X-rays Gamma rays
Frequency = /2 106 Hz 108 Hz 1010 Hz 109 1012 Hz 1012 4 1014 Hz 4.6 1014 Hz 5.5 1014 Hz 6.7 1014 Hz 1015 1017 Hz 1017 1020 Hz 1020 1023 Hz
Wavelength vac 300 m 3m 0.03 m 0.3 m- 3 104 m 3 104 7 107 m 6.5 107 m 5.5 107 m 4.5 107 m 4 107 3 109 m 3 109 3 1012 m 3 1012 3 1015 m
Table 2.1 The electromagnetic spectrum.
thoroughly study the review on complex notation found in section 0.3 at this point. In sections 2.4 and 2.5 we consider a very successful physical model developed by Hendrik Lorentz for describing the index of refraction in both dielectrics and conductors. In section 2.6, we introduce Poyntings theorem, which governs the ow of energy carried by electromagnetic elds. This leads to the concept of irradiance (or intensity), which we discuss in the plane-wave context in section 2.7.
2.2 Plane Wave Solutions to the Wave Equation

Consider the wave equation for an electric eld waveform propagating in vacuum (1.50): 2 E 2 E 0 0 2 = 0 (2.1) t We are interested in solutions to (2.1) that have the functional form (see P 1.10) E(r, t ) = E0 cos k r t + (2.2)
Here represents an arbitrary (constant) phase term. The vector k, called the wave vector, may be written as = k ku 2 u vac (vacuum) (2.3)
is a unit vector dening the direction of where k has units of inverse length, u propagation, and vac is the length by which r must vary to cause the cosine to go through a complete cycle. This distance is known as the (vacuum) wavelength. The frequency of oscillation is related to the wavelength via = 2 c vac (vacuum) (2.4)
2.2 Plane Wave Solutions to the Wave Equation
39
Figure 2.1 Depiction of electric and magnetic elds associated with a plane wave.
Notice that k and are not independent of each other, are related through the vacuum dispersion relation k= c (vacuum) (2.5)
Typical values for vac are given in table 2.1. Sometimes the spatial period of the wave is expressed as 1/vac , in units of cm1 , called the wave number. A magnetic wave accompanies any electric wave, and it obeys a similar wave equation (see P 1.7). The magnetic wave corresponding to (2.2) is B(r, t ) = B0 cos k r t + , (2.6)
but it is important to note that B0 , k, , and are not independently chosen in (2.6). In order to satisfy Faradays law (1.3), the arguments of the cosine in (2.2) and (2.6) must be identical. In addition, Faradays law requires (see P 1.1) B0 = k E0 (2.7)
In vacuum, the electric and magnetic elds travel in phase. They are directed perpendicular to each other as dened by the cross product in (2.7). Since both elds are also perpendicular to the direction of propagation, given by k, the magnitudes of the eld vectors are related by B 0 = kE 0 / or B 0 = E 0 /c in view of (2.5). The inuence of the magnetic eld only becomes important (in comparison to the electric eld) for charged particles moving near the speed of light. This typically takes place only for extremely intense lasers (intensities above 1018 W/cm2 , see P 2.12) where the electric eld is sufciently strong to cause electrons to oscillate with velocities near the speed of light. Therefore, the magnetic eld can be ignored in most optics problems. Throughout the remainder of this book, we will focus our attention mainly on the electric eld with the understanding that we can at any time deduce the (less important) magnetic eld from the electric eld via Faradays law.
40
The depiction of the electric eld (2.2) and the associated magnetic eld (2.6) in Fig. 2.1 shows the elds drawn like transverse waves on a string. However, they are actually large planar sheets containing uniform elds (different elds in different planes) that move in the direction of k. The name plane wave is given since the argument in (2.2) at any moment is constant (and hence the electric eld is uniform) across planes that are perpendicular to k. A plane wave lls all space and may be thought of as a series of innite sheets of uniform electric and magnetic eld moving in the k direction. At this point, we rewrite our plane wave solution using complex number notation. Although this change in notation will not make the task at hand any easier (and may even appear to complicate things), we introduce it here in preparation for later sections, where it will save considerable labor. (For a review of complex notation, see section 0.3.) Using complex notation we rewrite (2.2) as 0 e i (krt ) E(r, t ) = Re E 0 as follows: where we have hidden the phase term inside of E 0 E0 e i E (2.9) (2.8)
The next step we take is to become intentionally sloppy. Physicists throughout the world have conspired to avoid writing Re {} in an effort (or lack thereof if you prefer) to make expressions less cluttered. Nevertheless, only the real part of the eld is physically relevant even though expressions and calculations contain both real and imaginary terms. This sloppy notation is okay since the real and imaginary parts of complex numbers never intermingle when adding, subtracting, differentiating, or integrating. We can delay taking the real part of the expression until the end of the calculation. Also, when hiding a phase inside of the eld amplitude as in (2.8), we drop the tilde (might as well since we are already being sloppy); when using complex notation, we will automatically assume that the complex eld amplitude contains phase information. Putting this all together, our plane wave solution in complex notation is written simply as E(r, t ) = E0 e i (krt ) (2.10)
It is possible to construct any electromagnetic disturbance from a linear superposition of such waves.
Example 2.1
Verify that the complex plane wave (2.10) is a solution to the wave equation (2.1).
2.3 Index of Refraction in Dielectrics
41
Solution: The rst term gives 2 E0 e i (krt ) = E0 2 2 2 + 2 + 2 e i (k x x +k y y +k z z t ) 2 x y z (2.11)
2 2 2 i (krt ) = E0 k x + ky + kz e
= k 2 E0 e i (krt ) and the second term gives 1 2 2 i (krt ) E e = E0 e i (krt ) 0 c 2 t 2 c2 (2.12)
Upon insertion into (2.1) we obtain the vacuum dispersion relation (2.5), which species the connection between the wavenumber k and the frequency . While the vacuum dispersion relation is simple, it emphasizes that k and cannot be independently chosen (as we saw in (2.3) and (2.4)).
2.3 Index of Refraction in Dielectrics

Lets take a look at how plane waves behave in dielectric media (e.g. glass). We assume an isotropic, homogeneous, and non-conducting medium (i.e. Jfree = 0). In this case, we expect E and P to be parallel to each other so P = 0 from (1.42). The general wave equation (1.49) for the electric eld reduces in this case to 2 E 0 0 2 E 2 P = 0 t 2 t 2 (2.13)
Since we are considering sinusoidal waves, we consider solutions of the form E = E0 e i (krt ) P = P0 e i (krt ) (2.14)
By writing this, we are making the (reasonable) assumption that if an electric eld stimulates a medium at frequency , then the polarization in the medium also oscillates at frequency . This assumption is typically rather good except when extreme electric elds are used (see P 1.14). Recall that by our prior agreement, the complex amplitudes of E0 and P0 carry phase information. Thus, while E and P in (2.14) oscillate at the same frequency, they can be out of phase with each other. This phase mismatch is most pronounced for materials that absorb energy at the plane wave frequency. Substitution of the trial solutions (2.14) into (2.13) yields k 2 E0 e i (krt ) + 0 0 2 E0 e i (krt ) = 0 2 P0 e i (krt ) (2.15)
At this point, we need to make an explicit connection between E0 and P0 . In a linear medium (essentially any material if the electric eld strength is not
42
extreme), the polarization amplitude is proportional to the strength of the applied electric eld: P0 () = 0 () E0 () (2.16) We have introduced a dimensionless proportionality factor () called the susceptibility, which depends on the frequency of the eld. We account for the possibility that E and P oscillate out of phase by allowing () be a complex number in (2.16). By inserting (2.16) into (2.15) and canceling the eld terms, we obtain the dispersion relation in dielectrics: k 2 = 0 0 1 + () 2 or k= c (2.17)
1 + ()
(2.18)
where we have used c 1/ 0 0 . When absorption is small we can neglect the imaginary part of (). By direct comparison with vacuum case (2.5), we see that the speed of the phase fronts for sinusoidal wave is v = c /n () where n () = 1 + () (negligible absorption) (2.20) The dimensionless quantity n (), called the index of refraction, is the ratio of the speed of the light in vacuum to the speed of the wave in the material. Note that the index of refraction, and hence the wave speed v , are a function of frequency. This reects the fact that all materials have some frequency dependence in their response to light. In cases where absorption plays a role, () cannot be approximated as being real, and we must use the complex index of refraction, dened by1 N (n + i ) = 1 + () (2.21) (2.19)
where n and are respectively the real and imaginary parts of the index. (Note that is not k .) According to (2.18), the magnitude of the wave vector also becomes complex according to N (n + i ) k= = (2.22) c c The complex index N takes account of absorption as well as the usual oscillatory behavior of the wave. We see this by explicitly placing (2.22) into (2.14): E(r, t ) = E0 e
r u c
ei
n rt u c
(2.23)
0 E + P = E. The permittivity encapsulates the constitutive relation that connects P with E. In a linear medium we have 0 (1 + ), so that the index of refraction is given by n = / 0 .
1 Electrodynamics books often use the electric displacement D
2.4 The Lorentz Model of Dielectrics
43
is a real unit vector specifying the direction of k. Again, when As before, here u looking at (2.23), by special agreement in advance, we should just think of the real part, namely2 n r 0e c u r t cos E (r , t ) = E u c (2.24) = E0 e Im{k}r cos Re {k} r t + 0 (where the tilde where the phase was formerly held in the complex vector E had been suppressed). Figure 2.2 shows a graph of (2.24). The imaginary part of the index causes the wave to decay as it travels. The real part of the index n is associated with the oscillations of the wave. While the amplitude of the oscillations decays as the wave moves along, the speed of the wave is still controlled by n () as in (2.19). Note that the complex index of refraction only makes sense when we use complex notation to represent the plane waves. In a dielectric, the vacuum relations (2.3) and (2.4) are modied to read Re {k} and = where vac /n . (2.27) While the frequency is the same, whether in a material or in vacuum, the wavelength in the material is different as from the wavelength in vacuum, as indicated by (2.27). 2 , u (2.25)
2 c , n
(2.26)
Figure 2.2 Electric eld of a decaying plane wave. For convenience in plotting, the direction of propagation is chosen to be in the z =z ). direction (i.e. u

To compute the index of refraction in either a dielectric or a conducting material, we require a model that describes the response of electrons in the material to the passing electric eld wave. Of course, the model in turn inuences how the electric eld propagates, which is what inuences the material in the rst place! The model therefore must be solved together with the propagating eld in a self-consistent manner. Hendrik Lorentz developed a very successful model in the late 1800s, which treats each (active) electron in the medium as a classical particle obeying Newtons second law (F = m a). In the case of a dielectric medium, electrons are subject to an elastic restoring force that keeps each electron bound to its respective atom and a damping force that dissipates energy and gives rise to absorption.
2 For the sake of simplicity in writing (2.24) we assumed linearly polarized light. That is, all vector components of E0 were assumed to have the same complex phase . The expression would be somewhat more complicated, for example, in the case of circularly polarized light (described in chapter 4).
44
The Lorentz model determines the susceptibility () (the connection between the electric eld E0 and the polarization P0 ) and hence the index of refraction. The model assumes that all atoms (or molecules) in the medium are identical, each with one (or a few) active electrons responding to the external eld. The atoms are uniformly distributed throughout space with N identical active electrons per volume (units of number per volume). The polarization of the material is then P = N q e rmicro (2.28) Recall that polarization has units of dipoles per volume. Each dipole has strength q e rmicro , where rmicro is a microscopic displacement of the electron from equilibrium. At the time of Lorentz, atoms were thought to be clouds of positive charge wherein point-like electrons sat at rest unless stimulated by an applied electric eld. In our modern quantum-mechanical viewpoint, rmicro corresponds to an average displacement of the electronic cloud, which surrounds the nucleus (see Fig. 2.3). The displacement rmicro of the electron charge in an individual atom depends on the local strength of the applied electric eld E at the position of the atom. Since the diameter of the electronic cloud is tiny compared to a wavelength of (visible) light, we may consider the electric eld to be uniform across any individual atom. The Lorentz model uses Newtons equation of motion to describe an electron displacement from equilibrium within an atom. In accordance with the classical laws of motion, the electron mass m e times its acceleration is equal to the sum of the forces on the electron: micro = q e E m e r micro k Hooke rmicro me r
Figure 2.3 A distorted electronic cloud becomes a dipole.
Unperturbed
In an electric field
(2.29)
micro The electric eld pulls on the electron with force q e E.3 A dragging force m e r opposes the electron motion and accounts for absorption of energy. Without this term, it is only possible to describe optical index at frequencies away from where absorption takes place. Finally, k Hooke rmicro is a force accounting for the fact that the electron is bound to the nucleus. This restoring force can be thought of as an effective spring that pulls the displaced electron back towards equilibrium with a force proportional to the amount of displacement, so this term is essentially the familiar Hookes law. With some rearranging, (2.29) can be written as qe micro + r micro + 2 r E (2.30) 0 rmicro = me where 0 k Hooke /m e is the natural oscillation frequency (or resonant frequency) associated with the electron mass and the spring constant. In accordance with our examination of a single sinusoidal wave, we insert (2.14) into (2.30) and obtain qe micro + r micro + 2 r E0 e i (krt ) (2.31) 0 rmicro = me
3 The electron also experiences a force due to the magnetic eld of the light, F = q v e micro B,
but this force is tiny for typical optical elds.

45
Note that within a given atom the excursions of rmicro are so small that k r remains essentially constant, since k r varies with displacements on the scale of an optical wavelength, which is huge compared to the size of an atom. The inhomogeneous solution to (2.31) is (see P 2.1) rmicro = qe E0 e i (krt ) 2 m e 2 0 i (2.32)
The electron position rmicro oscillates (not surprisingly) with the same frequency as the driving electric eld. This solution illustrates the convenience of the complex notation. The imaginary part in the denominator implies that the electron oscillates with a different phase from the electric eld oscillations; the damping term (the imaginary part in the denominator) causes the two to be out of phase somewhat. The complex algebra in (2.32) accomplishes what would otherwise be cumbersome and require trigonometric manipulations. We are now able to write the polarization in terms of the electric eld. By substituting (2.32) into (2.28) and rearranging, we obtain P= 2 p
0
2 2 0 i
E0 e i (krt )
(2.33)
where the plasma frequency p is p =

2 N qe 0 me
(2.34)
A comparison of (2.33) with (2.16) in view of (2.14) reveals the (complex) susceptibility: 2 p () = 2 (2.35) 0 i 2 The index of refraction is then found by substituting the susceptibility (2.35) into (2.21). The real and imaginary parts of the index are solved by equating separately the real and imaginary parts of (2.21), namely (n + i ) = 1 + () = 1 +
2
Hendrik A. Lorentz (18531928, Dutch) Lorentz extended Maxwells work in electromagnetic theory and used it to explain the reection and refraction of light. He developed a simple and useful model for dielectric media and correctly hypothesized that the atoms were composed of charged particles, and that their movement was the source of light. He won the Nobel prize in 1902 for his contributions to electromagnetic theory.
2 p
2 2 0 i
(2.36)
A graph of n and is given in Fig. 2.4. Most materials actually have more than one species of active electron, and different active electrons behave differently. The generalization of (2.36) in this case is f j 2 pj (2.37) (n + i )2 = 1 + () = 1 + 2 2 j 0 j i j where f j is the aptly named oscillator strength for the j species of active electron. Each species also has its own plasma frequency p j , natural frequency 0 j , and damping coefcient j .
th
Figure 2.4 Real and imaginary parts of the index for a single Lorentz oscillator dielectric with p = 10.
46
Lorentz introduced this model well before the development of quantum mechanics. Even though the model pays no attention to quantum physics, it works surprisingly well for describing frequency-dependent optical index and absorption of light. As it turns out, the Schr odinger equation applied to two levels in an atom reduces in mathematical form to the Lorentz model in the limit of low-intensity light. Quantum mechanics also explains the oscillator strength, which before the development of quantum mechanics had to be inserted ad hoc to make the model agree with experiments.
2.5 Conductor Model of Refractive Index and Absorption

The details of the conductor model are very similar to those of the dielectric model. In a conducting medium, electrons dont experience a restoring force and are free to move outside of atoms, but they are still subject to a damping force due to collisions that removes energy and gives rise to absorption. Such collisions give rise to resistance in a conductor. In this model, we will ignore polarization (i.e. P = 0), but take the current density Jfree to be non-zero. The wave equation then becomes 2 (2.38) 2 E 0 0 2 E = 0 Jfree t t In a manner similar to (2.14), we assume sinusoidal solutions: E = E0 e i (krt ) Jfree = J0 e i (krt ) (2.39)
We assume that the current is made up of individual electrons traveling with velocity vmicro : Jfree = N q e vmicro (2.40) Again, N is the number density of free electrons (in units of number per volume). Recall that current density Jfree has units of charge times velocity per volume (or current per cross sectional area), so (2.40) may be thought of as a denition of current density in a fundamental sense. We have already solved Newtons equation of motion (2.29) for the active electrons in a dielectric. We can consider the electrons in a conductor as the special case of this solution where the restoring spring constant k Hook is zero (and hence 0 = 0). The solution for rmicro in a conductor is given by (2.32) with 0 = 0: q e E0 e i (krt ) rmicro = (2.41) m e i 2 We then nd the electron velocity by taking a time derivative of (2.41): vmicro = q e E0 e i (krt ) me i (2.42)
2.6 Poyntings Theorem
47
With (2.41), we can nd an expression for the current density (2.40) in terms of the electric eld: 2 N qe E0 e i (krt ) (2.43) Jfree = me i Note that in the DC case (i.e. = 0), this expression reduces to Ohms law J = 2 E, where = N q e /m e is the conductivity. Although this formula relates the dragging term to the DC conductivity , the connection matches poorly with experimental observations made for visible frequencies. This is because the collision rate actually varies somewhat with frequency. We substitute (2.43) together with the electric eld in (2.39) into the wave equation (2.38) and obtain the dispersion relation in this model: k 2 E0 e i (krt ) +
2 0 N q e 2 E0 e i (krt ) i (krt ) E e = i 0 c2 me i
(2.44)
The solutions (2.39) then require the following relation to hold: k2 = 2 2 p 1 2 c i + 2 (2.45)
Comparing this expression with (2.22), we nd that the complex index of refraction for the conductor model is given by (n + i )2 = 1 2 p i + 2 (2.46)
A graph of n and in the conductor model is given in Fig. 2.5. The form of the complex refractive index for the conductor model is quite similar to the index found for the dielectric model. The similarity is not surprising since both models include oscillating electrons. In the one case the electrons are free, and in the other case they are tethered to their atoms. In either model, the damping term removes energy from the electron oscillations. In the complex notation for the eld, the damping term gives rise to an imaginary part of the index. Again, the imaginary part of the index causes an exponential attenuation of the plane wave as it propagates.
Figure 2.5 Real and imaginary parts of the index for conductor with p = 50.
2.6 Poyntings Theorem

We next turn our attention to the detection and measurement of light. Until now, we have described light as the propagation of an electromagnetic disturbance. However, we typically observe light by detecting absorbed energy rather than the eld amplitude directly. In this section we examine the connection between propagating electromagnetic elds (such as the plane waves discussed above) and the energy transported by such elds. John Henry Poynting (1852-1914) developed (from Maxwells equations) the theoretical foundation that describes light energy transport. In this section we
48
examine its development, which is surprisingly concise. Students should concentrate mainly on the ideas involved (rather than the details of the derivation), especially the denition and meaning of the Poynting vector, describing energy ow in an electromagnetic eld. Poyntings theorem derives from just two of Maxwells Equations: (1.44) and (1.45). We take the dot product of B/0 with the rst equation and the dot product of E with the second equation. Then by subtracting the second equation from the rst we obtain B E B B P B ( E ) E + 0E + = E Jfree + 0 0 t 0 t t (2.47)
The rst two terms can be simplied using the vector identity P 0.10. The next two terms are the time derivatives of 0 E 2 /2 and B 2 /20 , respectively. The relation (2.47) then becomes E B + 0 t
0
E2 B2 P + = E Jfree + 2 20 t
(2.48)
This is Poyntings theorem. Each term in this equation has units of power per volume. The conventional way of writing Poyntings theorem is as follows: S+ where S E u eld and
0
u medium u eld = t t B 0
(2.49)
(2.50)
E2 B2 + , 2 20
(2.51)
u medium P E Jfree + . t t
(2.52)
S is called the Poynting vector and has units of power per area, called irradiance. The quantity u eld is the energy per volume stored in the electric and magnetic elds. Derivations of the electric eld energy density and the magnetic eld energy density are given in Appendices 2.A and 2.B. (See (2.67) and (2.74).) The term u medium /t is the power per volume delivered to the medium. Equation (2.52) is reminiscent of the familiar circuit power law, Power = V ol t ag e Cur r ent . Power is delivered when a charged particle traverses a distance while experiencing a force. This happens when currents ow in the presence of electric elds. Recall that P/t is a current density similar to Jfree , with units of charge times velocity per volume. The interpretation of the Poynting vector is straightforward when we recognize Poyntings theorem as a statement of the conservation of energy. S describes
2.7 Irradiance of a Plane Wave
49
the ow of energy. To see this more clearly, consider Poyntings theorem (2.49) integrated over a volume V (enclosed by surface S ). If we also apply the divergence theorem (0.11) to the term involving S we obtain da = Sn
S
(u eld + u medium ) d v
V
(2.53)
Notice that the volume integral over energy densities u eld and u medium gives the total energy stored in V , whether in the form of electromagnetic eld energy density or as energy density that has been given to the medium. The integration of the Poynting vector over the surface gives the net Poynting vector ux directed outward. Equation (2.53) indicates that the outward Poynting vector ux matches the rate that total energy disappears from the interior of V . Conversely, if the Poynting vector is directed inward (negative), then the net inward ux matches the rate that energy increases within V . The vector S denes the ow of energy through space. Its units of power per area are just what are needed to describe the brightness of light impinging on a surface.
2.7 Irradiance of a Plane Wave

Consider the electric eld wave described by (2.10). The magnetic eld that accompanies this electric eld can be found from Maxwells equation (1.44), and it turns out to be k E0 i (krt ) B(r, t ) = e (2.54) When k is complex, B is out of phase with E, and this occurs when absorption takes place. When there is no absorption, then k is real, and B and E carry the same complex phase. Before computing the Poynting vector (2.50), which involves multiplication, we must remember our unspoken agreement that only the real parts of the elds are relevant. We necessarily remove the imaginary parts before multiplying (see (0.23)). We could rewrite B and E like in (2.23), imposing the assumption that the complex phase for each vector component of E0 is the same. However, we can defer making this assumption by taking the real parts of the eld in the following manner: Obtain the real parts of the elds by adding their respective complex conjugates and dividing the result by 2 (see (0.30)). The real eld associated with (2.10) is 1 i (k rt ) E (r , t ) = E0 e i (krt ) + E (2.55) 0 e 2 and the real eld associated with (2.54) is B(r, t ) =
1 k E0 i (krt ) k E 0 e + e i (k rt ) 2
(2.56)
By writing (2.55) and (2.56), we have merely exercised our previous agreement that only the real parts of (2.36) and (2.54) are to be retained.
50
Now we are ready to calculate the Poynting vector. The algebra is a little messy in general, so we restrict to the case of an isotropic medium to simplify.
Example 2.2
Use (2.55) and (2.54) to calculate the Poynting vector (2.50) associated with the E0 = 0 plane wave. Assume that an isotropic medium with E(r, t ) = 0 so that u
Solution: S E = = = B 0
1 1 k E0 i (krt ) k E 0 i (k rt ) i (k rt ) E0 e i (krt ) + E e e + 0e 2 20 1 40
E (kE0 ) i (kk )r E0 (kE0 ) 2i (krt ) + 0 e e E E0 k E i (kk )r 0 k E0 0 + + e e 2i (k rt )
(2.57) The letters C.C. stand for the complex conjugate of what precedes. The direction . We have also used (2.22) to rewrite of k is specied with the real unit vector u . i (k k ) as 2 (/c ) u From the assumption that we have an isotropic medium (not a crystal) we have E0 = 0. We can use this fact together with the BAC-CAB rule P 0.4 to replace the u above expression with S= u k k 2 c ur + C.C. E0 E (E0 E0 ) e 2i (krt ) + 0 e 40 (2.58)
k 1 k r E0 ) e 2i (krt ) + E E0 ) e 2 c u E 0 (u + C.C. 0 (u 40
The nal expression (2.58) shows that in an isotropic medium the ow of energy (or k). This agrees with our intuition that energy ows in is in the direction of u the direction that the wave propagates. Very often, we are interested in the time-average of the Poynting vector, denoted by St . Under the time averaging, the rst term in (2.58) vanishes since it rapidly oscillates positive and negative by the same amount. Note that k is the only factor in the second term that is (potentially) not real. The time-averaged Poynting vector becomes St = k + k u r 2 u c E0 E 0 e 40 n 0c 2 r |E 0 x |2 + E 0 y + |E 0 z |2 e 2 c u =u 2
(2.59)
We have used (2.22) to rewrite k + k as 2 (n /c ). We have also used (1.51) to rewrite 1/0 c as 0 c .
2.A Energy Density of Electric Fields
51
included). The expression (2.59) is called irradiance (with the direction u However, we often speak of the intensity of a eld I , which amounts to the same . The denition of intensity is thus thing, but without regard for the direction u less specic, and it can be applied, for example, to standing waves where the net irradiance is technically zero (i.e. counter-propagating plane waves with zero net energy ow). Nevertheless, atoms in standing waves feel the oscillating eld. In general, the intensity is written as I= n 0c n 0c |E 0 x |2 + E 0 y E0 E 0 = 2 2
2
+ |E 0 z |2
(2.60)
where in this case we have ignored absorption (i.e. = 0), or, alternatively, we 2 r} could have considered |E 0 x |2 , E 0 y , and |E 0 z |2 to possess the factor exp {2 (/c ) u already.
Appendix 2.A Energy Density of Electric Fields

In this appendix we prove that the term 0 E 2 /2 in (2.51) corresponds to the energy density of an electric eld. The electric potential (r) (in units of energy per charge, or in other words volts) describes each point of an electric eld in terms of the potential energy that a charge would experience if placed in that eld. The electric eld and the potential are connected through E (r) = (r) (2.61)
The energy U necessary to assemble a distribution of charges (owing to attraction or repulsion) can be written in terms of a summation over all of the charges (or charge density (r)) located within the potential: U= 1 2 (r ) (r ) d v
V
(2.62)
The factor 1/2 is necessary to avoid double counting. To appreciate this factor consider two charges: We need only count the energy due to one charge in the presence of the others potential to obtain the energy required to bring the charges together. A substitution of (1.8) for (r) into (2.62) gives U=
0
(r) E (r) d v
V
(2.63)
Next, we use the vector identity in P 0.11 and get U=

0
(r) E (r) d v
V
E (r) (r) d v
V
(2.64)
52
An application of the Divergence theorem (0.11) on the rst integral and a substitution of (2.61) into the second integral yields U=
0
da + (r) E (r) n
S
E (r ) E (r ) d v
V
(2.65)
Finally, we consider the volume V (enclosed by S) to be extremely large so that all charges are contained well within it. If we choose a large enough volume, say a sphere of radius R , the surface integral over S vanishes. The integrand of the surface integral becomes negligibly small 1/R and E 1/R 2 , whereas d a R 2 . Therefore, the energy associated with an electric eld in a region of space is U=
V
u E (r ) d v
(2.66)
where u E (r )
E2 2
(2.67)
is interpreted as the energy density of the electric eld.
Appendix 2.B Energy Density of Magnetic Fields

In a derivation similar to that in appendix 2.A, we consider the energy associated with magnetic elds. The magnetic vector potential A (r) (in units of energy per chargevelocity) describes the potential energy that a charge moving with velocity v would experience if placed in the eld. The magnetic eld and the vector potential are connected through B (r ) = A (r ) (2.68)
The energy U necessary to assemble a distribution of current can be written in terms of a summation over all of the currents (or current density J (r)) located within the vector potential eld: U= 1 2 J (r ) A (r ) d v
V
(2.69)
As in (2.62), the factor 1/2 is necessary to avoid double counting the inuence of the currents on each other. Under the assumption of steady currents (no variations in time), we may substitute Amperes law (1.21) into (2.69), which yields U= 1 20 [ B (r)] A (r) d v
V
(2.70)
2.C Radiometry Versus Photometry
53
Next we employ the vector identity P 0.10 from which the previous expression becomes U= 1 20 B (r) [ A (r)] d v
V
1 20
[A (r) B (r)] d v
V
(2.71)
Upon substituting (2.68) into the rst equation and applying the Divergence theorem (0.11) on the second integral, this expression for total energy becomes U= 1 20 B (r ) B (r ) d v
V
1 20
da [A (r) B (r)] n
S
(2.72)
As was done in connection with (2.65), if we choose a large enough volume (a sphere with radius R ), the surface integral vanishes because A 1/R and B 1/R 2 , whereas d a R 2 . The total energy (2.72) then reduces to U=
V
u B (r) d v
(2.73)
where u B (r) is the energy density for a magnetic eld.
B2 20
(2.74)
Appendix 2.C Radiometry Versus Photometry

Photometry refers to the characterization of light sources in the context of the spectral response of the human eye. However, physicists most often deal with radiometry, which treats light of any wavelength on equal footing. Table 2.2 lists several concepts important in radiometry. The last two entries are associated with the average Poynting ux described in section 2.7. The concepts used in photometry are similar, except that the radiometric quantities are multiplied by the spectral response of the human eye, a curve that peaks at vac = 555 nm and drops to near zero for wavelengths longer than vac = 700 nm or shorter than vac = 400 nm. Photometric units, which may seem a little obscure, were rst dened in terms of an actual candle with prescribed dimensions made from whale tallow. The basic unit of luminous power is called the lumen, dened to be (1/683) W of light with wavelength vac = 555 nm, the peak of the eyes response. More radiant power is required to achieve the same number of lumens for wavelengths away from the center of the eyes spectral response. Photometric units are often used to characterize room lighting as well as photographic, projection, and display equipment. Table 2.3 gives the names of the various photometric quantities, which parallel the entries in table 2.2. We include a variety of units that are sometimes encountered.
54
Name Radiant Power (of a source)
Concept Electromagnetic energy emitted per time from a source Radiant power per steradian emitted from a point-like source (4 steradians in a sphere) Radiant solid-angle intensity per unit projected area of an extended source. The projected area foreshortens by cos , where is the observation angle relative to the surface normal. Radiant Power emitted per unit surface area of an extended source (the Poynting ux leaving). Electromagnetic power delivered per area to a receiver: Poynting ux arriving.
Units W = J/s
Radiant Solid-Angle Intensity (of a source) Radiance or Brightness (of a source)
W/Sr
W/(Sr cm2 )
Radiant Emittance or Exitance (from a source)
W/cm2
Irradiance (to a receiver). Often called intensity
W/cm2
Table 2.2 Radiometric quantities and units.
Name Luminous Power (of a source) Luminous Solid-Angle Intensity (of a source) Luminance (of a source)
Concept Visible light energy emitted per time from a source: lumen (lm). Luminous power per steradian emitted from a point-like source: candela (cd). Luminous solid-angle intensity per projected area of an extended source. (The projected area foreshortens by cos , where is the observation angle relative to the surface normal.) Luminous Power emitted per unit surface area of an extended source Incident luminous power delivered per area to a receiver: lux.
Typical Units lm=(1/683) W @ 555 nm cd = lm/Sr
cd/cm2 = stilb cd/m2 = nit

nit = 3183 lamberts = 3.4 footlamberts
Luminous Emittance or Exitance (from a source) Illuminance (to a receiver)
lm/cm2
lm/m2 = lux lm/cm2 = phot lm/ft2 = footcandle
Table 2.3 Photometric quantities and units.
Exercises
55
Exercises
Exercises for 2.3 Index of Refraction in Dielectrics P2.1 P2.2 Verify that (2.32) is a solution to (2.31). Derive the Sellmeier equation n2 = 1 + A 2 vac
2 2 vac 0,vac
from (2.36) for a gas with negligible absorption (i.e. = 0, valid far from resonance 0 ), where 0,vac corresponds to frequency 0 and A is a constant. Many materials (e.g. glass, air) have strong resonances in the ultraviolet. In such materials, do you expect the index of refraction for blue light to be greater than that for red light? Make a sketch of n as a function of wavelength for visible light down to the ultraviolet (where 0,vac is located). P2.3 In the Lorentz model, take N = 1028 m3 for the density of bound electrons in an insulator (note that N is number per volume, not just number), and a single transition at 0 = 6 1015 rad/sec (in the UV), and damping = 0 /5 (quite broad). Assume E 0 is 104 V/m. For three frequencies = 0 2, = 0 , and = 0 + 2 nd the magnitude and phase of the following (give the phase relative to the phase of E 0 ). Give correct SI units with each quantity. You dont need to worry about vector directions. (a) The charge displacement amplitude r micro (2.32) (b) The polarization amplitude P () (c) The susceptibility (). What would the susceptibility be for twice the E-eld strength as before? For the following no phase is needed: (d) Find n and at the three frequencies. You will have to solve for the real and imaginary parts of (n + i )2 = 1 + (). (e) Find the three speeds of light in terms of c . Find the three wavelengths . (f) Find how far light penetrates into the material before only 1/e of the amplitude of E remains. Find how far light penetrates into the material before only 1/e of the intensity I remains. P2.4 (a) Use a computer graphing program and the Lorentz model to plot n and as a function of frequency for a dielectric (i.e. obtain graphs such as the ones in Fig. ??(a)). Use these parameters to keep things
56
simple: p = 1, 0 = 10, and = 1; plot your function from = 0 to = 20. (b) Plot n and as a function of frequency for a material that has three resonant frequencies: 0 1 = 10, 1 = 1, f 1 = 0.5; 0 2 = 15, 2 = 1, f 2 = 0.25; and 0 3 = 25, 3 = 3, f 3 = 0.25. Use p = 1 for all three resonances, and plot the results from = 0 to = 30. Comment on your plots.
Exercises for 2.5 Conductor Model of Refractive Index and Absorption P2.5 For silver, the complex refractive index is characterized by n = 0.2 and = 3.4. Find the distance that light travels inside of silver before the eld is reduced by a factor of 1/e . Assume a wavelength of vac = 633 nm. What is the speed of the wave crests in the silver (written as a number times c )? Are you surprised? Show that the dielectric model and the conductor model give identical results for n in the case of a low-density plasma where there is no restoring force (i.e. 0 = 0) and no dragging term (i.e., = 0). Write n in terms of the plasma frequency p . Use the result from P 2.6. (a) If the index of refraction of the ionosphere is n = 0.9 for an FM station at = /2 = 100 MHz, calculate the number of free electrons per cubic meter. (b) What is the complex refractive index for KSL radio at 1160 kHz? Assume the same density of free electrons as in part (a). For your information, AM radio reects better than FM radio from the ionosphere (like visible light from a metal mirror). At night, the lower layer of the ionosphere goes away so that AM radio waves reect from a higher layer. P2.8 Use a computer graphing program to plot n and as a function of frequency for a conductor (obtain plots such as the ones in Fig. ??(b)). Use these parameters to keep things simple: p = 1 and = 0.02. Plot your function from = 0.6 to = 2.
P2.6
P2.7
Exercises for 2.7 Irradiance of a Plane Wave P2.9 In the case of a linearly-polarized plane wave, where the phase of each vector component of E0 is the same, re-derive (2.59) directly from the real eld (2.24). For simplicity, you may ignore absorption (i.e. = 0). HINT: The time-average of cos2 k r t + is 1/2.
Exercises
57
P2.10
(a) Find the intensity (in W/cm2 ) produced by a short laser pulse (linearly polarized) with duration t = 2.5 1014 s and energy E = 100 mJ, focused in vacuum to a round spot with radius r = 5 m. (b) What is the peak electric eld (in V/)? HINT: The SI units of electric eld are N/C = V/m. (c) What is the peak magnetic eld (in T = kg/(s C)?
P2.11
(a) What is the intensity (in W/cm2 ) on the retina when looking directly at the sun? Assume that the eyes pupil has a radius r pupil = 1 mm. Take the Suns irradiance at the earths surface to be 1.4 kW/m2 , and neglect refractive index (i.e. set n = 1). HINT: The Earth-Sun distance is d o = 1.5 108 km and the pupil-retina distance is d i = 22 mm. The radius of the Sun r Sun = 7.0 105 km is de-magnied on the retina according to the ratio d i /d o . (b) What is the intensity at the retina when looking directly into a 1 mW HeNe laser? Assume that the smallest radius of the laser beam is r waist = 0.5 mm positioned d o = 2 m in front of the eye, and that the entire beam enters the pupil. Compare with part (a).
P2.12
Show that the magnetic eld of an intense laser with = 1 m becomes important for a free electron oscillating in the eld at intensities above 1018 W/cm2 . This marks the transition to relativistic physics. Nevertheless, for convenience, use classical physics in making the estimate. HINT: At lower intensities, the oscillating electric eld dominates, so the electron motion can be thought of as arising solely from the electric eld. Use this motion to calculate the magnetic force on the moving electron, and compare it to the electric force. The forces become comparable at 1018 W/cm2 .
Chapter 3
Reection and Refraction

3.1 Introduction
As we know from everyday experience, when light arrives at an interface between materials it is partially reected and partially transmitted. In this chapter, we examine what happens when such a wave propagates from one material (characterized by index n or even by complex index N ) to another material. We will derive expressions to quantify the amount of reection and transmission. The results depend on the angle of incidence (i.e. the angle between k and the normal to the surface) as well as on the orientation of the electric eld (called polarizationnot to be confused with P, also called polarization). As we develop the connection between incident, reected, and transmitted light waves, many familiar relationships will emerge naturally (e.g. Snells law and Brewsters angle). The formalism also describes polarization-dependent phase shifts upon reection (especially interesting in the case of total internal reection or in the case of reections from absorbing surfaces such as metals), described in sections 3.6 and 3.7. For simplicity, we initially neglect the imaginary part of the refractive index. Each plane wave is thus characterized by a real wave vector k. We will write each plane wave in the form E(r, t ) = E0 exp [i (k r t )], where, as usual, only the real part of the eld corresponds to the physical eld. The restriction to real indices is not as serious as it might seem since the results can be extended to include complex indices, and we do this in section 3.7. The use of the letter n instead of N hardly matters. The math is all the same, which demonstrates the power of the complex notation. In an isotropic medium, the electric eld amplitude E0 is conned to a plane perpendicular to k. Therefore, E0 can always be broken into two orthogonal polarization components within that plane. The two vector components of E0 contain the individual phase information for each dimension. If the phases of the two components of E0 are the same, then the polarization of the electric eld is said to be linear. If the components of the vector E0 differ in phase, then the electric eld polarization is said to be elliptical (or circular) as will be studied in 59
60
Chapter 3 Reection and Refraction
chapter 4.
3.2 Refraction at an Interface

To study the reection and transmission of light at a material interface, we will examine three distinct waves traveling in the directions ki , kr , and kt as depicted in the Fig. 3.1. In the upcoming development, we will refer to Fig. 3.1 often. We assume a planar boundary between the two materials. The index n i characterizes the material on the left, and the index n t characterizes the material on the right. ki species an incident plane wave making an angle i with the normal to the interface. kr species a reected plane wave making an angle r with the interface normal. These two waves exist only to the left of the interface. kt species a transmitted plane wave making an angle t with the interface normal. The transmitted wave exists only to the right of the material interface. We choose the y z plane to be the plane of incidence, containing ki , kr , and kt (i.e. the plane represented by the surface of this page). By symmetry, all three k-vectors must lie in a single plane, assuming an isotropic material. We are free to orient our coordinate system in many different ways (and every textbook seems to do it differently!). We choose the normal incidence on the interface to be along the z -direction. The x -axis points into the page. For a given ki , the electric eld vector Ei can be decomposed into arbitrary components as long as they are perpendicular to ki . For convenience, we choose one of the electric eld vector components to be that which lies within the plane (p ) of incidence as depicted in Fig. 3.1. E i denotes this component, represented by an arrow in the plane of the page. The remaining electric eld vector component, denoted by E i(s ) , is directed normal to the plane of incidence. The superscript
Figure 3.1 Incident, reected, and transmitted plane wave elds at a material interface.
3.2 Refraction at an Interface
61
s stands for senkrecht, a German word meaning perpendicular. In Fig. 3.1, E i(s ) is represented by the tail of an arrow pointing into the page, or the x -direction, by our convention. The other elds Er and Et are similarly split into s and p components as indicated in Fig. 3.1. All eld components are considered to be positive when they point in the direction of their respective arrows.1 By inspection of Fig. 3.1, we can write the various k-vectors in terms of the y unit vectors: and z sin i + z cos i ki = k i y sin r z cos r kr = k r y sin t + z cos t kt = k t y Also by inspection of Fig. 3.1 (following the conventions for the electric elds depicted by the arrows), we can write the incident, reected, and transmitted , y , and z : elds in terms of x Ei = E i
(p ) (p )
(3.1)
cos i z sin i + x E i(s ) e i [ki ( y sin i +z cos i )i t ] y

(s ) cos r + z sin r + x Er y e i [kr ( y sin r z cos r )r t ] (s ) cos t z sin t + x Et y e i [kt ( y sin t +z cos t )t t ]
Er = E r Et = E t
(3.2)
(p )
Each eld has the form (2.8), and we have utilized the k-vectors (3.1) in the exponents of (3.2). Now we are ready to connect the elds on one side of the boundary to the elds on the other side. This is done using boundary conditions. As explained in appendix 3.A, Maxwells equations require that the component of E that are parallel to the interface must be identical on either side of the interface. In our and y components are parallel to the interface, and coordinate system, the x and y components z = 0 denes the interface. This means that at z = 0 the x of the combined incident and reected elds must match the corresponding components of the transmitted eld:
(s ) cos i + x E i(s ) e i (ki y sin i i t ) + E r y cos r + x Er Ei y e i (kr y sin r r t ) (p ) (p ) (s ) cos t + x Et = Et y e i (kt y sin t t t ) (p )
(3.3)
Since this equation must hold for all conceivable values of t and y , we are compelled to set all the phase factors in the complex exponentials equal to each other. The time portion of the phase factors requires the frequency of all waves to be the same: i = r = t (3.4) (We could have guessed that all frequencies would be the same; otherwise wave fronts would be annihilated or created at the interface.) Equating the spatial
1 Many textbooks draw the arrow for E (p ) in the direction opposite of ours. However, that choice r leads to an awkward situation at normal incidence (i.e. i = r = 0) where the arrows for the incident
and reected elds are parallel for the s -component but anti parallel for the p -component.
62
terms in the exponents of (3.3) also requires k i sin i = k r sin r = k t sin t (3.5)
Now recall from (2.22) the relations k i = k r = n i /c and k t = n t /c . With these relations, (3.5) yields the law of reection r = i and Snells law n i sin i = n t sin t (3.7) The three angles i , r , and t are not independent. The reected angle matches the incident angle, and the transmitted angle obeys Snells law. The phenomenon of refraction refers to the fact that i and t are different. Because the exponents are all identical, (3.3) reduces to two relatively simple and y ): equations (one for each dimension, x
(s ) (s ) E i(s ) + E r = Et
(3.6)
Willebrord Snell (15801626, Dutch) Snell was an astronomer and mathematician. He is probably most famous for determining the law that connects refracted angles to incident angles when waves come to a boundary. He was an accomplished mathematician, and developed a new method for calculating , and an improved method for measuring the circumference of the earth.
(3.8)
and Ei
(p )
+ Er
(p )
cos i = E t cos t
(p )
(3.9)
We have derived these equations from the boundary condition (3.52) on the parallel component of the electric eld. This set of equations has four unknowns (p ) (p ) (s ) (s ) (E r , E r , E t , and E t ) assuming that the incident elds are given, so we require two further equations to solve the system. These are obtained using the separate boundary condition on the on the parallel component of magnetic elds given in (3.56) (also discussed in appendix 3.A). From Maxwells equation (1.44), we have for a plane wave B= kE n E = u c (3.10)
k/k is a unit vector in the direction of k. We have also utilized (2.22). where u This expression is useful to obtain expressions for Bi , Br , and Bt in terms of the electric eld components that we have already introduced. By injecting (3.1) and (3.2) into (3.10), the incident, reected, and transmitted magnetic elds are seen to be ni (p ) E i + E i(s ) z sin i + y cos i e i [ki ( y sin i +z cos i )i t ] x c nr (p ) (s ) Er + Er sin r y cos r e i [kr ( y sin r z cos r )r t ] Br = x z c nt (p ) (s ) Et + Et sin t + y cos t e i [kt ( y sin t +z cos t )t t ] z Bt = x c Bi =
(3.11)
Next, we apply the boundary condition (3.56), which requires the components of and y directions) to be the B parallel to the surface (i.e. the components in the x
3.3 The Fresnel Coefcients
63
same on either side of the plane z = 0. Since we already know that the exponents are all equal and that r = i and n i = n r , the boundary condition gives ni ni nt (p ) (p ) (p ) (s ) (s ) E i + E i(s ) y Er Er Et + Et cos i + cos t cos i = x x x y y c c c (3.12) As before, (3.12) reduces to two relatively simple equations (one for the x dimen dimension): sion and one for the y ni E i and
(s ) (s ) n i E i(s ) E r cos i = n t E t cos t (p )
Er
(p )
= nt E t
(p )
(3.13)
(3.14)
These two equations (wherein the permeability 0 was considered to be the same on both sides of the boundary) together with (3.8) and (3.9) give a complete description of how the elds on each side of the boundary relate to each other. If we choose an incident eld Ei , these equations can be used to predict Er and Et . To use these equations, we must break the elds into their respective s and p polarization components. However, (3.8), (3.9), (3.13), and (3.14) are not yet in their most convenient form.
3.3 The Fresnel Coefcients

Augustin Fresnel rst developed the equations derived in the previous section. However, at the time he did not have the benet of Maxwells equations, since he lived well before Maxwells time. Instead, Fresnel thought of light as transverse mechanical waves propagating within materials. (We can see why Fresnel was a great proponent of the later-discredited luminiferous ether.) Instead of relating the parallel components of the electric and magnetic elds across the boundary between the materials, Fresnel used the principle that, as a transverse mechanical wave propagates from one material to the other, the two materials should not slip past each other at the interface. This gluing of the materials at the interface also forbids the possibility of the materials detaching from one another (creating gaps) or passing through one another as they experience the wave vibration. This mechanical approach to light worked splendidly and explained polarization effects along with the variations in reectance and transmittance as a function of the incident angle of the light. Fresnel wrote the relationships between the various plane waves depicted in Fig. 3.1 in terms of coefcients that compare the reected and transmitted eld amplitudes to those of the incident eld. He then calculated the ratio of the reected and transmitted eld components to the incident eld components for each polarization. In the following example, we illustrate this procedure for s -polarized light. It is left as a homework exercise to solve the equations for p -polarized light (see P 3.1).
64
Example 3.1
Calculate the ratio of transmitted eld to the incident eld and the ratio of the reected eld to incident eld for s -polarized light.
Solution: We use (3.8)

(s ) (s ) E i(s ) + E r = Et
[3.8]
and (3.14), which with the help of Snells law is written

(s ) E i(s ) E r =
sin i cos t (s ) E sin t cos i t
(3.15)
If we add these two equations, we get 2E i(s ) = 1 + sin i cos t E (s ) sin t cos i t (3.16)
Augustin Fresnel (17881829, French) Fresnel was a major proponent of the wave theory of light. He studied polarization, and invented the Fresnel romb for generating circularly polarized light. He also invented the fresnel lens, originally for use in light houses. Today fresnel lenses are used in many applications such as overhead projectors.
and after dividing by E i(s ) and doing a little algebra, we obtain

(s ) Et
Ei
(s )
2 sin t cos i . sin t cos i + sin i cos t
To get the ratio of reected to incident, we subtract (3.15) from (3.8) to obtain
(s ) 2E r = 1
sin i cos t E (s ) sin t cos i t
(3.17)
and then divide (3.17) by (3.16). After a little algebra, we arrive at

(s ) Er
Ei
(s )
sin t cos i sin i cos t sin t cos i + sin i cos t
The ratio of the reected and transmitted eld components to the incident eld components are specied by the following coefcients, called Fresnel coefcients:
E i(s ) E (s ) ts t E i(s ) (p ) Er r p (p ) Ei (p ) E tp t (p ) Ei rs
(s ) Er
sin (i t ) n i cos i n t cos t sin t cos i sin i cos t = = sin t cos i + sin i cos t sin (i + t ) n i cos i + n t cos t 2 sin t cos i 2 sin t cos i 2n i cos i = = sin t cos i + sin i cos t sin (i + t ) n i cos i + n t cos t cos t sin t cos i sin i tan (i t ) n i cos t n t cos i = = cos t sin t + cos i sin i tan (i + t ) n i cos t + n t cos i
(3.18)
(3.19)
(3.20)
2n i cos i 2 cos i sin t 2 cos i sin t = = cos t sin t + cos i sin i sin (i + t ) cos (i t ) n i cos t + n t cos i (3.21)
3.4 Reectance and Transmittance
65
All of the above forms of the Fresnel coefcients are commonly used. Remember that the angles in the coefcient cannot be independently chosen, but are subject to Snells law (3.7). (The right-most form of each coefcient is obtained from the other forms using Snells law). The Fresnel coefcients allow us to easily connect the electric eld amplitudes on the two sides of the boundary. They also keep track of phase shifts at a boundary. In Fig. 3.2 we have plotted the Fresnel coefcients for the case of a air-glass interface. Notice that the reection coefcients are sometimes negative in this plot, which corresponds to a phase shift of upon reection (remember e i = 1). Later we will see that when absorbing materials are encountered, more complicated phase shifts can arise due to the complex index of refraction.
3.4 Reectance and Transmittance

We are often interested in knowing the fraction of intensity that transmits through or reects from a boundary. Since intensity is proportional to the square of the amplitude of the electric eld, we can write the fraction of the light reected from the surface (called reectance) in terms of the Fresnel coefcients as R s |r s | 2 and Rp r p
2
Figure 3.2 The Fresnel coefcients plotted versus i for the case of an air-glass interface with n i = 1 and n t = 1.5.
(3.22)
These expressions are applied individually to each polarization component (s or p ). The intensity reected for each of these orthogonal polarizations is additive because the two electric elds are orthogonal and do not interfere with each other. The total reected intensity is therefore
(s ) (total) Ir = Ir + Ir (p )
= R s I i(s ) + R p I i
(p )
(3.23)
where the incident intensity is given by (2.60): I i(total) = I i(s ) + I i

(p )
1 = ni 0 c 2
E i(s ) + E i
(p ) 2
(3.24)
Since intensity is power per area, we can rewrite (3.23) as incident and reected power: (p ) (p ) (total) (s ) Pr = Pr + P r = R s P i(s ) + R p P i (3.25)
(total) Using this expression and requiring that energy be conserved (i.e. P i(total) = P r + (total) P t ), we nd the fraction of the power that transmits:
P t(total) = P i(s) + P i
(p)
(s) Pr + Pr
(p) (p)
= (1 R s ) P i(s) + 1 R p P i
(3.26)
From this expression we see that the transmittance (i.e. the fraction of the light that transmits) for either polarization is Ts 1 Rs
and
Tp 1 Rp
(3.27)
66
Figure 3.3 shows typical reectance and transmittance values for an air-glass interface. You might be surprised at rst to learn that T s = | t s |2 and Tp = tp
2
(3.28)
However, recall that the transmitted intensity (in terms of the transmitted elds) depends also on the refractive index. The Fresnel coefcients t s and t p relate the bare electric elds to each other, whereas the transmitted intensity (similar to (3.24)) is 2 1 (p ) (p ) 2 (s ) I t(total) = I t(s ) + I t = n t 0 c E t + Et (3.29) 2
Figure 3.3 The reectance and transmittance plotted versus i for the case of an air-glass interface with n i = 1 and n t = 1.5.
Therefore, we expect T s and T p to depend on the ratio of the refractive indices n t and n i as well as on the squares of t s and t p . There is another more subtle reason for the inequalities in (3.28). Consider a lateral strip of the power associated with a plane wave incident upon the material interface in Fig. 3.4. Upon refraction into the second medium, the strip is seen to change its width by the factor cos t / cos i . This is a geometrical artifact, owing to the change in propagation direction at the interface. The change in direction alters the intensity (power per area) but not the power. In computing the transmittance, we must remove this geometrical effect from the ratio of the intensities, which leads to the following transmittance coefcients: n t cos t | t s |2 n i cos i n t cos t 2 Tp = tp n i cos i Ts =
(valid when no total internal reection)
(3.30)
Figure 3.4 Light refracting into a surface.
Note that (3.30) is valid only if a real angle t exists; it does not hold when the incident angle exceeds the critical angle for total internal reection, discussed in section 3.6. In that situation, we must stick with (3.27).
Example 3.2
Show analytically for p -polarized light that R p + Tp = 1, where R p is given by (3.22) and T p is given by (3.30).
Solution: From (3.20) we have Rp = = cos t sin t cos i sin i cos t sin t + cos i sin i
2
cos2 t sin2 t 2 cos i sin i cos t sin t + cos2 i sin2 i (cos t sin t + cos i sin i )2
3.5 Brewsters Angle
67
From (3.21) and (3.30) we have Tp = = = Then Rp + Tp = = cos2 t sin2 t + 2 cos i sin i cos t sin t + cos2 i sin2 i (cos t sin t + cos i sin i )2 (cos t sin t + cos i sin i )2 (cos t sin t + cos i sin i )2 2 cos i sin t n t cos t n i cos i cos t sin t + cos i sin i
2
sin i cos t 4 cos2 i sin2 t sin t cos i (cos t sin t + cos i sin i )2 4 cos i sin t sin i cos t (cos t sin t + cos i sin i )2
=1
3.5 Brewsters Angle

Notice r p and R p go to zero at a certain angle in Figs. 3.2 and 3.3, indicating that no p -polarized light is reected at this angle. This behavior is quite general, as we can see from the second form of the Fresnel coefcient formula for r p in (3.20), which has tan (i + t ) in the denominator. Since the tangent blows up at /2, the reection coefcient goes to zero when i + t = 2 (requirement for zero p -polarized reection) (3.31)
By inspecting Fig. 3.1, we see that this condition occurs when the reected and transmitted wave vectors, kr and kt , are perpendicular to each other. If we insert (3.31) into Snells law (3.7), we can solve for the incident angle i that gives rise to this special circumstance: n i sin i = n t sin i = n t cos i 2 (3.32)
The special incident angle that satises this equation, in terms of the refractive indices, is found to be nt B = tan1 (3.33) ni We have replaced the specic i with B in honor of Sir David Brewster (17811868) who rst discovered the phenomenon. The angle B is called Brewsters angle. At Brewsters angle, no p -polarized light reects (see L 3.4). Physically, the p -polarized light cannot reect because kr and kt are perpendicular. A reection would require the microscopic dipoles at the surface of the second material to radiate along their axes, which they cannot do. Maxwells equations know about this, and so everything is nicely consistent.
68
3.6 Total Internal Reection

From Snells law (3.7), we can compute the transmitted angle in terms of the incident angle: ni (3.34) sin i t = sin1 nt The angle t is real only if the argument of the inverse sine is less than or equal to one. If n i > n t , we can nd a critical angle at which the argument begins to exceed one: nt c sin1 (3.35) ni When i > c , then there is total internal reection and we can directly show that R s = 1 and R p = 1 (see P 3.7). To demonstrate this, one computes the Fresnel coefcients (3.18) and (3.20) while employing the following substitutions: sin t = and cos t = i n i2
2 nt
ni sin i nt
( i > c )
(Snells law)
(3.36)
sin2 i 1
(i > c )
(3.37)
(see P 0.22). In this case, t is a complex number. However, we do not assign geometrical signicance to it in terms of any direction. Actually, we dont even need to know the value for t ; we need only the values for sin t and cos t , as specied in (3.36) and (3.37). Even though sin t is greater than one and cos t is imaginary, we can use their values to compute r s , r p , t s , and t p . (Complex notation is wonderful!) Upon substitution of (3.36) and (3.37) into the Fresnel reection coefcients (3.18) and (3.20) we obtain
ni nt ni nt
cos i i cos i + i
rs =
n i2 2 nt n i2 2 nt
sin2 i 1 (i > c ) sin i 1

2
(3.38)
and rp =
ni cos i i n t ni cos i + i n t n i2 2 nt n i2 2 nt
sin2 i 1 ( i > c ) sin2 i 1 (3.39)
These Fresnel coefcients can be manipulated (see P 3.7) into the forms
nt r s = exp 2i tan1 n i cos i 2 sin 1 i n2
n i2
t
( i > c )
(3.40)
3.6 Total Internal Reection
69
and
ni r p = exp 2i tan1 n t cos i 2 sin 1 i n2
n i2
t
( i > c )
(3.41)
Each coefcient has a different phase (note n i /n t vs. n t /n i in the expressions), which means that the s- and p -polarized elds experience different phase shifts upon reection. Nevertheless, we denitely have |r s | = 1 and r p = 1. We rightly conclude that 100% of the light reects. The transmittance is zero as dictated by (3.25). We emphasize that one should not employ (3.29) in the case of total internal reection, as the imaginary t makes the geometric factor in this equation invalid. Even with zero transmittance, the boundary conditions from Maxwells equations (see appendix 3.A) require that the elds be non-zero on the transmitted side of the boundary, meaning t s = 0 and t p = 0. While this situation may seem like a contradiction at rst, on closer examination we can see that it actually is an accurate description of what happens. The coefcients t s and t p characterize an evanescent waves that exist on the transmitted side of the interface. The evanescent wave travels parallel to the interface so that no energy is conveyed away from the interface deeper into the medium on the transmission side. In the direction perpendicular to the boundary, the strength of the evanescent wave decays exponentially. To compute the explicit form of the evanescent wave, we plug (3.36) and (3.37) into the transmitted eld (3.2):
Et = E t
(s ) cos t z sin t + x Et y e i [kt ( y sin t +z cos t )t ] 2 k t z n n i (p ) i i sin i + x t s E i(s ) e = t p E i y sin2 i 1 z 2 nt nt (p )
n2 i 2 nt
sin2 i 1
i k t y n i sin i t
t
(3.42)
Figure 3.5 plots the evanescent wave described by (3.42) along with the associated incident wave. Note that the evanescent wave propagates parallel to the boundary
Figure 3.5 A wave experiencing total internal reection creates an evanescent wave that propagates parallel to the interface. (The reected wave is not shown.)
70
(in the y -dimension) and its strength diminishes away from the boundary (in the z -dimension) as dictated by the exponential terms at the end of (3.42). We leave the calculation of t s and t p as an exercise (P 3.8).
3.7 Reection from Metallic or other Absorptive Surfaces

In this section we extend our analysis to materials with complex refractive index N n + i . As a reminder, the imaginary part of the index controls attenuation of a wave as it propagates within a material. The real part of the index governs the oscillatory nature of the wave. It turns out that both the imaginary and real parts of the index strongly inuence the reection of light from a surface. The reader may be grateful that there is no need to re-derive the Fresnel coefcients (3.18) (3.21) for the case of complex indices. The coefcients remain valid whether the index is real or complexjust replace the real index n with the complex index N . However, we do need to be a bit careful when applying them. We restrict our discussion to reections from a metallic or other absorbing material surface. As we found in the case of total internal reection, we actually do not need to know the transmitted angle t to employ Fresnel reection coefcients (3.18) and (3.20). We need only acquire expressions for cos t and sin t , and we can obtain these from Snells law (3.7). To minimize complications, we let the incident refractive index be n i = 1 (which is often the case). Let the index on the transmitted side be written simply as Nt = N . Then by Snells law the sine of the transmitted angle is sin i (3.43) sin t = N This expression is of course complex since N is complex, but that is just ne. The cosine of the same angle is cos t = 1 sin2 t = 1 N N 2 sin2 i (3.44)
The positive sign in front of the square root is appropriate since it is clearly the right choice if the imaginary part of the index approaches zero. Upon substitution of these expressions, the Fresnel reection coefcients (3.18) and (3.20) become rs = and rp = N 2 sin2 i N 2 cos i N 2 sin2 i + N 2 cos i (3.46) cos i cos i + N 2 sin2 i N 2 sin2 i (3.45)
These expressions are tedious to evaluate. When evaluating the expressions, it is usually desirable to put them into the form r s = |r s | e i s (3.47)
3.A Boundary Conditions For Fields at an Interface
71
and r p = r p e i p (3.48)
However, we refrain from putting (3.45) and (3.46) into this form using the general expressions; we would get a big mess. It is a good idea to let your calculator or a computer do it after a specic value for N n + i is chosen. An important point to notice is that the phases upon reection can be very different for s and p -polarization components (i.e. p and s can be very different). This is true in general, even when the reectivity is high (i.e. |r s | and r p on the order of unity). Brewsters angle exists also for surfaces with complex refractive index. However, in general the expressions (3.46) and (3.48) do not go to zero at any incident angle i . Rather, the reection of p -polarized light can go through a minimum at some angle i , which we refer to as Brewsters angle (see Fig. 3.6). This minimum is best found numerically since the general expression for r p in terms of n and and as a function of i can be unwieldy.
Appendix 3.A Boundary Conditions For Fields at an Interface

We are interested in the continuity of elds across a boundary from one medium with index n 1 to another medium with index n 2 . We will show that the components of electric eld parallel to the interface surface must be the same on the two sides of the surface (adjacent to the interface). This result is independent of the refractive index of the materials. We will also show that the component of magnetic eld parallel to the interface surface is the same on the two sides (assuming the permeability 0 is the same on both sides). To derive the boundary conditions, we consider a surface S (a rectangle) that is perpendicular to the interface between the two media and which extends into both media, as depicted in Fig. 3.7.
Figure 3.6 The transmittance and reectance (top) and the phase upon reection (bottom) for a metal with n = 0.2 and = 3.4. Note the minimum of R p where Brewsters angle occurs.
Figure 3.7 Interface of two materials.

72
First we examine the implications of Faradays law (1.17): Ed =

C
da Bn
S
(3.49)
We apply Faradays law to the rectangular contour depicted in Fig. 3.7. We can perform the path integration on the left-hand side of (3.49). The integration around the loop gives E d = E 1|| d E 1
1 E 2 2 E 2|| d + E 2 2 + E 1 1
= E 1|| E 2|| d (3.50)
Here, E 1|| refers to the component of the electric eld in the material with index n 1 that is parallel to the interface. E 1 refers to the component of the electric eld in the material with index n 1 which is perpendicular to the interface. Similarly, E 2|| and E 2 are the parallel and perpendicular components of the electric eld in the material with index n 2 . We have assumed that the rectangle is small enough that the elds are uniform within the half rectangle on either side of the boundary. We can continue to shrink the loop down until it has zero surface area by letting the lengths 1 and 2 go to zero. In this situation, the right-hand side of Faradays law goes to zero da 0 Bn
S
(3.51)
and we are left with E 1|| = E 2|| (3.52)
This simple relation is a general boundary condition, which is met at any material interface. The component of the electric eld that lies in the plane of the interface must be the same on both sides of the interface. We now derive a similar boundary condition for the magnetic eld using the integral form of Amperes law:2 B d = 0
C S
Jfree +
P + t
E da n t
(3.53)
As before, we are able to perform the path integration on the left-hand side for the geometry depicted in the gure. When we integrate around the loop we get B d = B 1|| d B 1
1 B 2 2 B 2|| d + B 2 2 + B 1 1
= B 1|| B 2|| d (3.54)
The notation for parallel and perpendicular components on either side of the interface is similar to that used in (3.50).
2 This form can be obtained from (1.26) by integration over the surface S in Fig. 3.7 and applying
Stokes theorem (0.12) to the magnetic eld term.

3.A Boundary Conditions For Fields at an Interface
73
Again, we can continue to shrink the loop down until it has zero surface area by letting the lengths 1 and 2 go to zero. In this situation, the right-hand side of (3.53) goes to zero (not considering the possibility of surface currents): Jfree +
S
P + t
E da 0 n t
(3.55)
and we are left with B 1|| = B 2|| (3.56) This is a general boundary condition that must be satised at the material interface.
74
Exercises
Exercises for 3.3 The Fresnel Coefcients P3.1 P3.2 Derive the Fresnel coefcients (3.20) and (3.21) for p -polarized light. Verify that each the alternative forms given in (3.18)(3.21) are equivalent (given Snells law). Show that at normal incidence (i.e. i = t = 0) the Fresnel coefcients reduce to nt ni lim r s = lim r p = i 0 i 0 nt + ni and
i 0
lim t s = lim t p =
i 0
2n i nt + ni
P3.3
Undoubtedly the most important interface in optics is when air meets glass. Use a computer graphing program to make the following plots for this interface as a function of the incident angle. Use n i = 1 for air and n t = 1.6 for glass. Explicitly label Brewsters angle on all of the applicable graphs. (a) r p and t p (plot together on same graph) (b) R p and T p (plot together on same graph) (c) r s and t s (plot together on same graph) (d) R s and T s (plot together on same graph)
Exercises for 3.4 Reectance and Transmittance L3.4 (a) In the laboratory, measure the reectance for both s and p polarized light from a at glass surface at about ten points. You can normalize the detector by placing it in the incident beam of light before the glass surface. Especially watch for Brewsters angle (described in section 3.5). Figure 3.8 illustrates the experimental setup.
Figure 3.8 Experimental setup for lab 3.4.

Exercises
75
(b) Use a computer to calculate the theoretical air-to-glass reectance as a function of incident angle (i.e. plot R s and R p as a function of i ). Take the index of refraction for glass to be n t = 1.54 and the index for air to be one. Plot this theoretical calculation as a smooth line on a graph. Plot your experimental data from (a) as points on this same graph (not points connected by lines). P3.5 Show analytically for s -polarized light that R s + T s = 1, where R s is given by (3.22) and T s is given by (3.30).
Exercises for 3.5 Brewsters Angle P3.6 Find Brewsters angle for glass n = 1.5.
Exercises for 3.6 Total Internal Reection P3.7 Derive (3.40) and (3.41) and show that R s = 1 and R p = 1. HINT: a ib = a +ib a 2 + b 2 e i tan
1 b a b a
1 a 2 + b 2 e i tan
e i tan e
1 b a b a
i tan1
= e 2i tan
1 b a
where a is positive and real and b is real. P3.8 P3.9 Compute t s and t p in the case of total internal reection. Use a computer to plot the air-to-water transmittance as a function of incident angle (i.e. plot (3.27) as a function of i ). Also plot the water-to-air transmittance on a separate graph. Plot both T s and T p on each graph. The index of refraction for water is n = 1.33. Take the index of air to be one. Light (vac = 500 nm) reects internally from a glass surface (n = 1.5) surrounded by air. The incident angle is i = 45 . An evanescent wave travels parallel to the surface on the air side. At what distance from the surface is the amplitude of the evanescent wave 1/e of its value at the surface?
P3.10
Exercises for 3.7 Reection from Metallic or other Absorptive Surfaces P3.11 Using a computer graphing program that understands complex numbers (e.g. Matlab), plot |r s |, |r p | versus i for silver (n = 0.2 and = 3.4). Make a separate plot of the phases s and p from (3.47) and (3.48). Clearly label each plot, and comment on how the phase shifts are different from those experienced when reecting from glass.
76
P3.12
Find Brewsters angle for silver (n = 0.2 and = 3.4) by calculating R p and nding its minimum. You will want to use a computer program to do this (Matlab, Maple, Mathematica, etc.). The complex index for silver is given by n = 0.2 and = 3.4. Find r s and r p when i = 80 and put them into the forms (3.47) and (3.48). Find the result using the rules of complex arithmetic and real-valued function on your calculator. (You can use the complex number abilities of your calculator to check your answer.)
P3.13
Figure 3.9 Geometry for P 3.13
Chapter 4
Polarization
4.1 Introduction
When the direction of the electric eld of light oscillates in a regular, predictable fashion, we say that the light is polarized. In this chapter, we develop a formalism for describing polarized light and for describing the effect of devices that modify polarization. First we introduce the vocabulary used to describe the various types of polarization that plane waves exhibit. Then in section 4.3 we introduce a convenient way for keeping track of polarization using a two-dimensional Jones vector, and in the next section we show how this formalism is used to describe a general form of polarization referred to as elliptically polarized light. In section 4.5 we discuss a method for describing devices that can change polarization, and show how their effect on a light eld can be represented by 2 2 Jones matrices operating on the polarization vector. In the next section we derive a general Jones matrix for a linear polarizer oriented at an arbitrary angle with respect to the coordinate system. We then apply this analysis to describe wave plates, which are devices that introduce a relative phase delay of one eld component with respect to the other. A wave plate can be used used to convert, for example, linearly polarized light into circularly polarized light. Beginning in section 4.8, we investigate how reection and transmission at a material interface inuences eld polarization. The Fresnel coefcients studied in the previous chapter can be conveniently incorporated into the 2 2 matrix formulation for handling polarization. As we saw previously, the amount of light reected from a surface depends on the type of polarization, s or p . In addition, upon reection, s -polarized light can acquire a phase lag or phase advance relative to p -polarized light. This is especially true at metal surfaces, which have complex indices of refraction (i.e. highly absorptive). In section 4.9 we briey discuss ellipsometry, which is the science of characterizing optical properties of materials by observing the polarization of light reected from surfaces. Throughout this chapter, we consider light to have well characterized polarization. However, in most natural sources of light (e.g. sunlight or the light from an incandescent lamp) the direction of the electric eld varies rapidly and randomly. 77
78
Chapter 4 Polarization
Such sources are commonly referred to as unpolarized. It is common to have a mixture of unpolarized and polarized light, called partially polarized light. The Jones vector formalism used in this chapter is inappropriate for describing the unpolarized portions of the light. In appendix 4.A we describe a more general formalism for dealing with light having an arbitrary degree of polarization.
4.2 Linear, Circular, and Elliptical Polarization

Consider the plane-wave solution to Maxwells equations given by E (r, t ) = E0 e i (krt ) (4.1)
The wave vector k species the direction of propagation. We neglect absorption so that the refractive index is real and k = n /c = 2n /vac (see (2.22)(2.27)). In an isotropic medium we know that k and E0 are perpendicular, but even after the direction of k is specied, we are still free to have E0 point anywhere in two dimensions perpendicular to k. If we orient our coordinate system with the z -axis in the direction of k, we can write (4.1) as + Eyy e i (kz t ) E (z , t ) = E x x (4.2)
As always, only the real part of (4.2) is physically relevant. The complex amplitudes of E x and E y keep track of the phase of the oscillating eld components. In general the complex phases of E x and E y can differ, so that the wave in one of the dimensions lags or leads the wave in the other dimension. The relationship between E x and E y describes the polarization of the light. For example, if the y -component of the eld E y is zero, the plane wave is said to be linearly polarized along the x -dimension. Linearly polarized light can have any orientation in the x y plane, and it occurs whenever E x and E 0 y have the same complex phase (or differ by an integer times ). For our purposes, we will take the x -dimension to be horizontal and the y -dimension to be vertical unless otherwise noted. As an example, suppose E y = i E x , where E x is real. The y -component of the eld is then out of phase with the x -component by the factor i = e i /2 . Taking the real part of the eld (4.2) we get + Re e i /2 E x e i (kz t ) y E (z , t ) = Re E x e i (kz t ) x + E x cos (kz t + /2) y = E x cos (kz t ) x sin (kz t ) y = E x cos (kz t ) x In this example, the eld in the y -dimension lags the eld in the x -dimension by a quarter cycle. That is, the behavior seen in the x -dimension happens in the y -dimension a quarter cycle later. The eld never goes to zero simultaneously in both dimensions. In fact, in this example the strength of the electric eld is constant, and it rotates in a circular pattern in the x - y dimensions. For this
(left circular) (4.3)
4.3 Jones Vectors for Representing Polarization
79
Figure 4.1 The combination of two orthogonally polarized plane waves that are out of phase results in elliptically polarized light. Here we have left circularly polarized light created as specied by (4.3).
reason, this type of eld is called circularly polarized. Figure 4.1 graphically shows the two linear polarized pieces in (4.3) adding to make circularly polarized light. If we view a circularly polarized light eld throughout space at a frozen instant in time (as shown in Fig. 4.1), the electric eld vector spirals as we move along the z -dimension. If the sense of the spiral (with time frozen) matches that of a common wood screw oriented along the z -axis, the polarization is called right handed. (It makes no difference whether the screw is ipped end for end.) If instead the eld spirals in the opposite sense, then the polarization is called left handed. The eld in shown at the right side of Fig. 4.1 is an example of left-handed circularly polarized light. An equivalent way to view the handedness convention is to imagine the light impinging on a screen as a function of time. The eld of a right-handed circularly polarized wave rotates counter clockwise at the screen, when looking along the k direction (towards the front side of the screen). The eld rotates clockwise for a left-handed circularly polarized wave. Linear polarized light can become circularly or, in general, elliptically polarized after reection from a metal surface if the incident light has both s - and p -polarized components. Every good experimentalist working with light needs to know this. For reections involving materials with real indices such as glass (for visible light), the situation is less complicated and linearly polarized light remains linear. However, even if the index is real, there are interesting phase shifts (different for s and p components) for total internal reection.
4.3 Jones Vectors for Representing Polarization

In 1941, R. Clark Jones introduced a two-dimensional matrix algebra that is useful for keeping track of light polarization and the effects of optical elements that inuence polarization. The algebra deals with light having a denite polarization, such as plane waves. It does not apply to un-polarized or partially polarized light (e.g. sunlight). For partially polarized light, a four-dimensional algebra known as Stokes calculus is used (see Appendix 4.A). In preparation for introducing Jones vectors, we explicitly write the complex
80
phases of the eld components in (4.2) as + | E y |e i y y e i (kz t ) E ( z , t ) = |E x |e i x x and then factor (4.4) as follows: + B e i y e i (kz t ) E (z , t ) = E eff A x where E eff A |E x | 2 + E y e i x
2
(4.4)
(4.5)
(4.6) (4.7)
|E x | |E x + E y |2
2
Ey |E x |2 + E y
2
(4.8) (4.9)
y x
Vector 1 0 0 1 cos sin 1 i 1 i
Description linearly polarized along the x dimension linearly polarized along the y dimension linearly polarized at an angle from the x -axis right circularly polarized left circularly polarized
Please notice that A and B are real non-negative dimensionless numbers that satisfy A 2 + B 2 = 1. If the x -component of the eld E x happens to be zero, then its phase e i x is indeterminant. In this case we let E eff = |E y |e i y , B = 1, and = 0. (If E y is zero, then e i y is indeterminant. However, this is not a problem since B = 0 in this case, so that(4.5) is still well-dened.) The overall eld strength E eff is often unimportant in a discussion of polarization. It represents the strength of an effective linearly polarized eld that would give the same intensity that (4.4) would yield. Specically, from (4.5) and (2.60) we have 1 1 I = S t = nc 0 E E = nc 0 |E eff |2 (4.10) 2 2 The phase of E eff represents an overall phase shift that one can trivially adjust by physically moving the light source (a laser, say) forward or backward by a fraction of a wavelength. The portion of (4.5) that is interesting in the current discussion is the vector + B e i y , referred to as the Jones vector. This vector contains the essential Ax information regarding eld polarization. Notice that the Jones vector is a kind + B e i y ) ( Ax + B e i y ) = 1 (the asterisk represents the of unit vector, in that ( A x and complex conjugate). When writing a Jones vector we dispense with the x notation and organize the components into a column vector (for later use in y matrix algebra) as follows: A (4.11) B ei This vector can describe the polarization state of any plane wave eld. Table 4.1 lists some Jones vectors representing various polarization states.
1 2
1 2
Table 4.1 Jones Vectors for various polarization states
4.4 Elliptically Polarized Light
81
4.4 Elliptically Polarized Light

In general, the Jones vector (4.11) represents a polarization state in between linear and circular. This in-between state is known as elliptically polarized light. As the wave travels, the eld vector undergoes a spiral motion. If we observe the eld vector at a point as the eld goes by, the eld vector traces out an ellipse oriented perpendicular to the direction of travel (i.e. in the x y plane). One of the axes of the ellipse occurs at the angle = 2 AB cos 1 tan1 2 A2 B 2 (4.12)
with respect to the x -axis (see P 4.8). This angle sometimes corresponds to the minor axis and sometimes to the major axis of the ellipse, depending on the exact values of A , B , and . The other axis of the ellipse (major or minor) then occurs at /2 (see Fig. 4.2). We can deduce whether (4.12) corresponds to the major or minor axis of the ellipse by comparing the strength of the electric eld when it spirals through the direction specied by and when it spirals through /2. The strength of the electric eld at is given by (see P 4.8) E = |E eff | A 2 cos2 + B 2 sin2 + AB cos sin 2 (E max or E min ) (4.13)
and the strength of the eld when it spirals through the orthogonal direction ( /2) is given by E /2 = |E eff | A 2 sin2 + B 2 cos2 AB cos sin 2 (E max or E min ) (4.14)
R. Clark Jones (19162004, United States) Jones was educated at Harvard and spent his professional career working for Polaroid corporation. He is well-known for his work in polarization, but also studied many other elds. He was an avid train enthusiast, and even wrote papers on railway engineering.
After computing (4.13) and (4.14), we decide which represents E min and which E max according to E max E min (4.15)
Figure 4.2 The electric eld of elliptically polarized light traces an ellipse in the plane perpendicular to its propagation direction. The two plots are for different values of A , B , and . The angle can describe the major axis (left gure) or the minor axis (right gure), depending on the values of these parameters.
82
We could predict in advance which of (4.13) and (4.14) corresponds to the major axis and which corresponds to the minor axis. However, making this prediction is as complicated as simply evaluating (4.13) and (4.14) and determining which is greater. Elliptically polarized light is often characterized by the ratio of the minor axis to the major axis. This ratio is called the ellipticity, which is a dimensionless number: E min e (4.16) E max The ellipticity e ranges between zero (corresponding to linearly polarized light) and one (corresponding to circularly polarized light). Finally, the helicity or handedness of elliptically polarized light is as follows (see P 4.2): 0<< < < 2 (left-handed helicity) (right-handed helicity) (4.17) (4.18)
4.5 Linear Polarizers and Jones Matrices

In 1928, Edwin Land invented Polaroid at the age of nineteen. He did it by stretching a polymer sheet and infusing it with iodine. The stretching causes the polymer chains to align along a common direction, whereupon the sheet is cemented to a substrate. The infusion of iodine causes the individual chains to become conductive. When light impinges upon the Polaroid sheet, the component of electric eld that is parallel to the polymer chains causes a current Jfree to oscillate in that dimension. The resistance to the current quickly dissipates the energy (i.e. the refractive index is complex) and the light is absorbed. The thickness of the Polaroid sheet is chosen sufciently large to ensure that virtually none of the light with electric eld component oscillating along the chains makes it through the device. The component of electric eld that is orthogonal to the polymer chains encounters electrons that are essentially bound, unable to leave their polymer chains. For this polarization component, the wave passes through the material like it does through typical dielectrics such as glass (i.e. the refractive index is real). Today, there are a wide variety of technologies for making polarizers, many very different from Polaroid. A polarizer can be represented as a 2 2 matrix that operates on Jones vectors. The function of a polarizer is to pass only the component of electric eld that is oriented along the polarizer transmission axis. Thus, if a polarizer is oriented with its transmission axis along the x -dimension, then only the x -component of polarization transmits; the y -component is killed. If the polarizer is oriented with its transmission axis along the y -dimension, then only the y -component of the eld transmits, and the x -component is killed. These two scenarios can be
4.5 Linear Polarizers and Jones Matrices
83
Polaroid Arbitrary incident polarization Transmission Axis
Transmitted polarization component
Figure 4.3 Light transmitting through a Polaroid sheet. The conducting polymer chains run vertically in this drawing, and light polarized along the chains is absorbed. Light polarized perpendicular to the polymer chains passes through the polarizer.
represented with the following Jones matrices: 1 0 0 0 0 0 0 1 (polarizer with transmission along x-axis) (4.19)
(polarizer with transmission along y-axis)
(4.20)
These matrices operate on any Jones vector representing the polarization of incident light. The result gives the Jones vector for the light exiting the polarizer.
Example 4.1
Use the Jones matrix (4.19) to calculate the effect of a horizontal polarizer on light that is initially horizontally polarized, vertically polarized, and arbitrarily polarized.
Solution: First we consider a horizontally polarized plane wave traversing a polarizer with its transmission axis oriented also horizontally (x -dimension): 1 0 0 0 1 0 = 1 0 (horizontal polarizer on horizontally polarized eld)
(4.21) As expected, the polarization state is unaffected by the polarizer (ignoring small surface reections). Now consider vertically polarized light traversing the same horizontal polarizer. In this case, we have: 1 0 0 0 0 1 = 0 0 (horizontal polarizer on vertical linear polarization) (4.22)
84
As expected, the polarizer extinguishes the light. Finally, when a horizontally oriented polarizer operates on light with an arbitrary Jones vector (4.11), we have 1 0 0 0 A B ei = A 0 (horizontal polarizer on arbitrary polarization)
(4.23) Only the horizontal component of polarization is transmitted through the polarizer.
While students will readily agree that the matrices given in (4.19) and (4.20) can be used to get the right result for light traversing a horizontal or a vertical polarizer, the real advantage of the matrix formulation has yet to be demonstrated. In the next few sections we will derive Jones matrices for a number of optical elements that can modify polarization: polarizers at arbitrary angle, wave plates at arbitrary angle, and reection or transmissions at an interface. Table 4.2 shows Jones matrices for each of these devices. Before deriving these specic Jones matrices, however, we take a moment to more fully appreciate why the Jones matrix formulation is useful. The real power of the formalism becomes clear as we consider situations where light encounters multiple polarization elements in sequence. In these situation, we use a product of Jones matrices to represent the effect of the compound systems. We can represent this situation by A B ei = Jsystem A B ei (4.24)
where the unprimed Jones vector represents light going into the system and the primed Jones vector represents light emerging from the system. The matrix Jsystem is a Jones matrix formed by the series polarization devices. If there are N devices in the system, the compound matrix is calculated as
Jsystem JN JN 1 J2 J1
th
(4.25)
where Jn is is the matrix for the n optical element encountered in the system. Notice that the matrices operate on the Jones vector in the order that the light encounters the devices. Therefore, the matrix for the rst device (J1 ) is written on the right, and so on until the last device encountered, which is written on the left, farthest from the Jones vector. When part of the light is absorbed by passing through one or more polarizers in a system, the Jones vector of the exiting light is no longer normalized to magnitude one. Since the components of a Jones vector represent the electric eld, we nd the factor by which the intensity of the light decreases by dotted the vector with its complex conjugate. In accordance with (4.10), the intensity of the exiting light is 1 + B ei y Ax + B ei y I = nc 0 |E eff |2 A x 2 (4.26) 1 2 2 = nc 0 |E eff |2 A + B 2
4.6 Jones Matrix for Polarizers at Arbitrary Angles
85
Optical Element Linear polarizer, transmission axis oriented at with respect to the x -axis Half wave plate, fast axis oriented at with respect to x -axis Quarter wave plate, fast axis oriented at with respect to x -axis Right circular polarizer
Jones Matrix cos2 sin cos sin cos sin2
cos 2 sin 2
sin 2 cos 2 sin cos i sin cos sin2 + i cos2
cos2 + i sin2 sin cos i sin cos

1 2
1 i 1 i r p 0 tp 0
i 1 i 1 0 rs 0 ts
Left circular polarizer
1 2
Reection from an interface
1 2
Transmission through an interface
1 2
Table 4.2 Summary of Jones Matrices.

2 2
Notice that the intensity is attenuated by the factor A + B after propagating through the system. Recall that E eff represents the effective strength of the eld before it enters the polarizer (or other device), so that the initial Jones vector is normalized to one (see (4.10)). By convention we normally remove an overall phase factor from the Jones vector so that A is real and non-negative, and we choose so that B is real and non-negative. However, if we dont bother doing this, the absolute value signs on A and B in (4.26) ensure that we get the correct value for intensity.

In this section we derive a Jones matrix to describe a plane wave with arbitrary polarization passing through a polarizer with its transmission axis aligned at an arbitrary angle with the x -axis. We derive this Jones matrix in a general context so that we can take advantage of present work when we discuss wave plates. To help keep things on a more conceptual level, we revert back to using electric eld components directly. We will make the connection with Jones calculus at a later point. The electric eld of our plane wave is + Eyy e i (kz t ) E (z , t ) = E x x (4.27)
1 and the The transmission axis of the polarizer is specied by the unit vector e
86
Incident Light
(a)
(b)
Transmission Axis
Transmitted component
Figure 4.4 (a) Light transmitting through a polarizer oriented with transmission axis at angle from x -axis. (b) The unit vectors specifying the coordinate system of the polarizer.
2 (orthogonal to the transmission absorption axis of the polarizer be specied by e 1 is oriented at an angle from the x -axis as depicted in axis). The vector e Fig. 4.4. We need to write the electric eld components in terms of the new basis 1 and e 2 as shown in Fig. 4.5. The x - y unit vectors specied by the unit vectors e are connected to the new coordinate system via (see Fig. 4.4(b)): = cos e 1 sin e 2 x = sin e 1 + cos e 2 y (4.28)
By direct substitution of (4.28) into (4.27), the electric eld can be written as 1 + E2e 2 ) e i (kz t ) E (z , t ) = (E 1 e where E 1 E x cos + E y sin E 2 E x sin + E y cos
axis 2 axis 1
(4.29)
(4.30)
1 Figure 4.5 Electric eld components written in either the x y basis or the e e2 basis.
87
Now we introduce the effect of the polarizer on the eld: E 1 is transmitted unaffected, while E 2 is extinguished. To account for the effect of the device, we multiply E 2 by a parameter . In the case of the polarizer, is simply zero, but when we consider wave plates we can have other values for . After traversing the polarizer, the eld becomes 1 + E 2 e 2 ) e i (kz t ) Eafter (z , t ) = (E 1 e (4.31)
We now have the eld after the polarizer, but it would be nice to rewrite it in terms of the original x y basis. By inverting (4.28), or by inspection of Fig. 4.4, if preferred, we see that 1 = cos x + sin y e 2 = sin x + cos y e (4.32)
Substitution of these relationships into (4.31) together with the denitions (4.30) for E 1 and E 2 yields + sin y Eafter (z , t ) = E x cos + E y sin cos x + cos y e i (kz t ) + E x sin + E y cos sin x e i (kz t ) = E x cos2 + sin2 + E y (sin cos sin cos ) x e i (kz t ) + E x (sin cos sin cos ) + E y sin2 + cos2 y (4.33) Notice that if = 1 (i.e. no polarizer), then we get back exactly what we started with (i.e. (4.33) reduces to (4.27)). To get to the Jones matrix for the polarizer, we note that (4.33) is a linear mixture of E x and E y which can be represented with matrix algebra. If we represent the electric eld as a two dimensional column vector with its x -component in the top and its y -component in the bottom (like a Jones vector), then we can rewrite (4.33) as Eafter (z , t ) = cos2 + sin2 sin cos sin cos sin cos sin cos sin2 + cos2 Ex Ey e i (kz t )
(4.34) The matrix here is a proper Jones matrix, but the vector it operates on is not a properly normalized Jones vector. However, we can make it into a proper Jones vector by simply by factoring as specied in (4.5). We can now write down the Jones matrix for a polarizer by simply inserting = 0 into the matrix: cos2 sin cos sin cos sin2 (polarizer with transmission axis at angle )
(4.35) Notice that when = 0 this matrix reduces to that of a horizontal polarizer (4.19), and when = /2, it reduces to that of a vertical polarizer (4.20).
88
4.7 Jones Matrices for Wave Plates

Another device that inuencing polarization is called a wave plate (or retarder). Wave plates are usually made from a anisotropic material such as a crystal with low symmetry. Such materials have different indices of refraction, depending on the orientation of the electric eld polarization. A wave plate has the appearance of a thin window through which the light passes. However, it has a fast and a slow axis, which are 90 apart in the plane of the window. If the light is polarized along the fast axis, it experiences an index of refraction n fast . This index is less than an index n slow that light experiences when polarized along the orthogonal (slow) axis. When a plane wave passes through a wave plate, the component of the electric eld oriented along the fast axis travels faster than its orthogonal counterpart. The fast component gets ahead, and this introduces a relative phase between the two polarization components. The wave vectors associated with the two electric eld components within the wave plate are given by k slow = 2n slow vac and k fast = 2n fast vac (4.36)
As light passes through a wave plate of thickness d , the phase difference that accumulates between the fast and the slow polarization components is k slow d k fast d = 2 d (n slow n fast ) vac (4.37)
By adjusting the thickness of the wave plate, we can introduce any desired phase difference between the two components. The most common types of wave plates are the quarter-wave plate and the half-wave plate. The quarter-wave plate introduces a phase difference between the two polarization components equal to k slow d k fast d = /2 + 2m (quarter-wave plate) (4.38)
where m is an integer. This means that the polarization component along the slow axis is delayed spatially by one quarter of a wavelength (or ve quarters, etc.). The half-wave plate introduces a phase delay between the two polarization components equal to k slow d k fast d = + 2m (half-wave plate) (4.39)
where m is an integer. This means that the polarization component along the slow axis is delayed spatially by half a wavelength (or three halves, etc.). The derivation of the Jones matrix for the two wave plates is essentially the 1 correspond same as the derivation for the polarizer in the previous section. Let e 2 correspond to the slow axis. We proceed as before. to the fast axis, and let e However, instead of setting equal to zero in (4.34), we must choose values for
4.7 Jones Matrices for Wave Plates
89
Slow axis
Fast axis Transmitted polarization components have altered relative phase
Waveplate
Figure 4.6 Wave plate interacting with a plane wave.
appropriate for each wave plate. Since nothing is absorbed, should have a magnitude equal to one. The important feature is the phase of . As seen in (4.37), the eld component along the slow axis accumulates excess phase relative to the component along the fast axis, and we let account for this. In the case of the quarter-wave plate, the appropriate factor from (4.38) is = e i /2 = i (quarter-wave plate) (4.40)
This describes a relative phase delay for the light emerging with polarization along the slow axis. Substituting (4.40) into (4.33) yields the Jones matrix for a quarter wave plate: cos2 + i sin2 sin cos i sin cos sin cos i sin cos sin2 + i cos2 (quarter-wave plate)
(4.41) For the half-wave plate, the appropriate relative phase delay for the slow axis is = e i = 1 and the Jones matrix becomes: cos2 sin2 2 sin cos 2 sin cos sin2 cos2 = cos 2 sin 2 sin 2 cos 2 (half-wave plate) (half-wave plate) (4.42)
(4.43) Remember that refers to the angle that the fast axis makes with respect to the x -axis. Before moving on, consider the following two examples that illustrate how wave plates are often used:
Example 4.2
90
Calculate the Jones matrix for a quarter wave plate at = 45 , and calculate its effect on horizontally polarized light.
Solution: At = 45 , the Jones matrix for the quarter-wave plate (4.41) reduces to e i /4 2 1 i i 1 (quarter-wave plate, fast axis at = 45 ) (4.44)
The overall phase factor e i /4 in front is not important since it merely accompanies the overall phase of the beam, which can be adjusted arbitrarily by moving the light source forwards or backwards through a fraction of a wavelength. Now we calculate the effect of the quarter wave plates (oriented at = 45 ) operating on horizontally polarized light: 1 2 1 i i 1 1 0 = 1 2 1 i (4.45)
Notice that a quarter-wave plate (properly oriented) turns linearly polarized light into right-circularly polarized light (see Table 4.1). Example 4.3
Calculate the effect of a half wave plate at an arbitrary on horizontally polarized light.
Solution: Carrying out the multiplication, we obtain cos 2 sin 2 sin 2 cos 2 1 0 = cos 2 sin 2 (4.46)
The resulting Jones vector describes linearly polarized light an angle of 2 from the x -axis.
This example illustrates that a half wave plate rotates the polarization angle of linearly polarized light to another angle (the amount of rotation depending on the value of ) while preserving the linear polarization.
4.8 Polarization Effects of Reection and Transmission

When light encounters a material interface, the amount of reected and transmitted light depends on the polarization. The Fresnel coefcients (3.18)(3.21) dictate how much of each polarization is reected and how much is transmitted. In addition, the Fresnel coefcients keep track of phases intrinsic in the reection phenomenon. To the extent that the s and p components of the eld behave differently, the overall polarization state is altered. For example, a linearly-polarized
4.8 Polarization Effects of Reection and Transmission
91
eld upon reection can become elliptically polarized (see L 4.9). Even when a wave reects at normal incidence so that the s and p components are indistinguishable, right-circular polarized light becomes left-circular polarized. This is the same effect that causes a right-handed person to appear left-handed when viewed in a mirror. We can use Jones calculus to keep track of how reection and transmission inuences polarization. However, before proceeding, we emphasize that in this context we do not strictly adhere to the coordinate system depicted in Fig. 3.1. (Please refer to Fig. 3.1 right now.) For purposes of examining polarization, we consider each plane wave as though traveling in its own z -direction, regardless of the incident angle in the gure. This loose manner of dening coordinate systems has a great advantage. The individual x and y dimensions for each of the three separate plane waves are each aligned parallel to their respective s and p eld component. Let us adopt the convention that p -polarized light in all cases is associated with the x -dimension (horizontal). The s -polarized component then lies along the y -dimension (vertical). We are now in a position to see why there is a handedness inversion upon reection from a mirror. While referring to Fig. 3.1, notice that for the incident light, the s -component of the eld crossed (vector cross product) into the p component yields that beams propagation direction. However, for the reected light, the s -component crossed into the p -component points opposite to that beams propagation direction. The Jones matrix corresponding to reection from a surface is simply r p 0 0 rs (Jones matrix for reection) (4.47)
By convention, we place the minus sign on the coefcient r p to take care of handedness inversion (the effect that moves your watch from your left wrist to the right wrist when looking in a mirror). We could alternately have put the minus sign on r s ; the important point is that the two polarizations acquire a relative phase differential of when the propagation direction ips. This effect changes right-hand polarized light into left-hand polarized light. The Fresnel coefcients specify the ratios of the exiting elds to the incident ones. When (4.47) operates on an arbitrary Jones vector such as (4.11), r p multiplies the horizontal component of the eld, and r s multiplies the vertical component of the eld. In the case of reection from an absorbing surface such as a metal, the phases of the two polarization components can be very different (see P 4.11). Thus, linearly polarized light containing both s - and p -components in general becomes elliptically polarized when reected from a metal surface. When light undergoes total internal reection, again the phases of the s - and p -components can be very different, thus enabling the conversion of linearly polarized light into elliptically polarized light (see P 4.12). Transmission through a material interface can also inuence the polarization of the eld. However, there is no handedness inversion, since the light continues on in a forward sense. Nevertheless, the relative amplitudes (and phases if
92
Figure 4.7 When light is reected out of an optical systems plane of incidence a rotation matrix must be applied so that the rotated x -axis is in the new plane of incidence (i.e. so that p -polarized light remains associated with the x -component of a Jones vector).
materials are absorbing) of the eld components are modied by the Fresnel transmission coefcients. The Jones matrix for this effect is tp 0 0 ts
(Jones matrix for transmission)
(4.48)
If a beam of light encounters a series of mirrors, the nal polarization is determined by multiplying the sequence of appropriate Jones matrices (4.47) onto the initial polarization. This procedure is straightforward if the normals to all of the mirrors lie in a single plane (say parallel to the surface of an optical bench). However, if the beam path deviates from this plane (due to vertical tilt on the mirrors), then we must reorient our coordinate system before each mirror to have a new horizontal (p -polarized dimension) and the new vertical (s -polarized dimension). We have already examined the rotation of a coordinate system through an angle in (4.30). This rotation can be accomplished by multiplying the following matrix onto the incident Jones vector: cos sin sin cos
(rotation of coordinates through an angle )
(4.49)
This is a rotation about the z -axis, and the angle of rotation is chosen such that the rotated x -axis lies in the plane of incidence for the mirror. When such a reorientation of coordinates is necessary, the two orthogonal eld components in the initial coordinate system are stirred together to form the eld components in the new system. This does not change the fundamental characteristics of the polarization, just its representation.
4.9 Ellipsometry
93
4.9 Ellipsometry
Measuring the polarization of light reected from a surface can yield information regarding the optical constants of that surface (i.e. n and ). As done in L 4.9, it is possible to characterize the polarization of a beam of light using a quarter-wave plate and a polarizer. However, we often want to know n and at a range of frequencies, and this would require a different quarter-wave plate thickness d for each wavelength used (see (4.38)). Therefore, many commercial ellipsometers do not try to extract the helicity of the light, but only the ellipticity. In this case only polarizers are used, which can be made to work over a wide range of wavelengths. Inasmuch as most commercial ellipsometers do not determine directly the helicity of the reected light, the measurement is usually made for a variety of different incident angles on the sample. This adds enough redundancy that n and can be pinned down (allowing a computer to take care of the busy work). If many different incident angles are measured at many different wavelengths, it is possible to extract detailed information about the optical constants and the thicknesses of possibly many layers of materials inuencing the reection. (We will learn to deal with multilayer coatings in chapter 6.) Commercial ellipsometers typically employ two polarizers, one before and one after the sample, where s and p -polarized reections take place. The rst polarizer ensures that linearly polarized light arrives at the test surface (polarized at angle to give both s and p -components). The Jones matrix for the test surface reection is given by (4.47), and the Jones matrix for the analyzing polarizer oriented at angle is given by (4.35). The Jones vector for the light arriving at the detector is then cos2 sin cos sin cos sin2 r p 0 = 0 rs cos sin (4.50)
r p cos cos2 + r s sin sin cos r p cos sin cos + r s sin sin2
In an ellipsometer, the angle of the analyzing polarizer often rotates at a high speed, and the time dependence of the light reaching a detector is analyzed and correlated with the polarizer orientation. From the measurement of the intensity where and are continuously varied, it is possible to extract the values of n and (with the aid of a computer!).
Appendix 4.A Partially Polarized Light

In this appendix, we outline an approach for dealing with partially polarized light, which is a mixture of polarized and unpolarized light. Most natural light such as sunshine is unpolarized. The transverse electric eld direction in natural light varies rapidly (and quasi randomly). Such variations imply the superposition of multiple frequencies rather as opposed to the single frequency assumed in the formulation of Jones calculus earlier in this chapter. Unpolarized light
94
can become partially polarized when it, for example, reects from a surface at oblique incidence, since s and p components of the polarization might reect with differing strength. Stokes vectors are used to keep track of the partial polarization (and attenuation) of a light beam as the light progresses through an optical system. In contrast, Jones vectors only deal with pure polarization states. Partially polarized light is a mixture or polarized and unpolarized light. In fact, a beam of light can always be considered as an intensity sum of completely unpolarized light and perfectly polarized light: I = I pol + I un (4.51)
It is assumed that both types of light propagate in the same direction. The main characteristic of unpolarized light is that it cannot be extinguished by a single polarizer (or combination of a wave plate and polarizer). Moreover, the transmission of unpolarized light through an ideal polarizer is always 50%. On the other hand, polarized light (be it linearly, circularly, or elliptically polarized) can always be represented by a Jones vector, and it is always possible to extinguish polarized light with a combination of a wave plate and a single polarizer. We may introduce the degree of polarization as the fraction of the intensity that is in a denite polarization state: P I pol I pol + I un (4.52)
The degree of polarization takes on values between zero and one. Thus, if the light is completely unpolarized (such that I pol = 0), then the degree of polarization is zero. On the other hand, if the beam is fully polarized (such that I un = 0), then the degree of polarization is one. A Stokes vector, which characterizes a partially polarized beam, is a column vector written as S0 S 1 S2 S3 The parameter S0 I I in (4.53)
is a comparison of the beams intensity (or power) with a benchmark intensity, I in , measured before the beam enters an optical system under consideration. I represents the intensity at the point of investigation, where one wishes to characterize the beam. Thus, S 0 is normalized such that a value of one represents the input intensity. After the light goes through a polarizing system, S 0 can drop to values less than one, to account for attenuation of light by polarizers in the system. (Alternatively, S 0 could grow in the atypical case of amplication.)
4.A Partially Polarized Light
95
The next parameter, S 1 , describes how much the light looks either horizontally or vertically polarized, and it is dened as S1 2 I hor S0 I in (4.54)
Here, I hor represents the amount of light detected if an ideal linear polarizer is placed with its axis aligned horizontally directly in front of the detector (inserted where the light is characterized). S 1 ranges between negative one and one, taking on its extremes when the light is linearly polarized either horizontally or vertically, respectively. If the light has been attenuated, it may still be perfectly horizontally polarized even if S 1 has a magnitude less than one. (For convenience, one may wish to renormalize the beam, taking I in to be the intensity at the point of investigation, or one can simply examine S 1 /S 0 , which is guaranteed to a number ranging between negative one and one.) The parameter S 2 describes how much the light looks linearly polarized along the diagonals. It is given by 2 I 45 S2 S0 (4.55) I in Similar to the previous case, I 45 represents the amount of light detected if an ideal linear polarizer is placed with its axis at 45 directly in front of the detector (inserted where the light is characterized). As before, S 2 ranges between negative and one, taking on extremes when the light is linearly polarized either at 45 or 135 . Finally, S 3 characterizes the extent to which the beam is either right or left circularly polarized: 2 I r-cir S3 S0 (4.56) I in Here, I r-cir represents the amount of light detected if an ideal right-circular polarizer is placed directly in front of the detector. A right-circular polarizer is one that passes right-handed polarized light, but blocks left handed polarized light. One way to construct such a polarizer is a quarter wave plate, followed by a linear polarizer with the transmission axis aligned 45 from the wave-plate fast axis, followed by another quarter wave plate at 45 from the polarizer (see P 4.13). Again, this parameter ranges between negative one and one, taking on the extremes for right and left circular polarization, respectively. Importantly, if any of the parameters S 1 , S 2 , or S 3 take on their extreme values (i.e. a magnitude equal to S 0 ), the other two parameters necessarily equal zero. As an example, if a beam is linearly horizontally polarized with I = I in , then we have I hor = I in , I 45 = I in /2, and I r-cir = I in /2. This yields S 0 = 1, S 1 = 1, S 2 = 0, and S 3 = 0. As a second example, suppose that the light has been attenuated to I = I in /3 but is purely left circularly polarized. Then we have I hor = I in /6, I 45 = I in /6, and I r-cir = 0. Whereas the Stokes parameters are S 0 = 1/3, S 1 = 0, S 2 = 0, and S 3 = 1/3. Another interesting case is completely unpolarized light, which transmits 50% through any of the polarizers discussed above. In this case, I hor = I 45 = I r-cir = I /2 and S 1 = S 2 = S 3 = 0.
96
Example 4.4
Find the Stokes parameters for perfectly polarized light, represented by an arbitrary Jones vector A B ei where A , B , and are all real. (Recall that depending on the values A , B , and , the polarization can follow any ellipse.)
Solution: The intensity of this polarized beam is I pol = A 2 + B 2 , according to | |2 Eq. (4.26), where we absorb the factor 1 2 0 c E eff into A and B for convenience. The Jones vector for the light that passes through a horizontal polarizer is 1 0 0 0 A B ei = A 0
which gives a measured intensity of I hor = A 2 . Similarly, the Jones vector when the beam is passed through a polarizer oriented at 45 is 1 2 leading to an intensity of I 45 = A 2 + B 2 + 2 AB cos 2 1 1 1 1 A B ei = 1 2 A + B ei A + B ei
Finally, the Jones vector for light passing through a right-circular polarizer (see P 4.13) is 1 2 1 i i 1 A B ei = 1 2 A + i B ei i A + B e i = 1 2 ( A B sin ) + i B cos B cos + i (B sin A )
giving an intensity of I r-cir = A 2 2 AB sin + B 2 sin2 + B 2 cos2 A 2 + B 2 2 AB sin = 2 2
Thus, the Stokes parameters become S0 = A2 + B 2 I in
S1 = S2 = S3 =
2 A2 A2 + B 2 A2 B 2 = I in I in I in
A 2 + B 2 + 2 AB cos A 2 + B 2 2 AB cos = I in I in I in A 2 + B 2 2 AB sin A 2 + B 2 2 AB sin = I in I in I in

97
It is clear from the linear dependence of S 0 , S 1 , S 2 , and S 3 on intensity (see Eqs. (4.53)(4.56)) that the overall Stokes vector may be regarded as the sum of the individual Stokes vectors for polarized and unpolarized light. That is, we may (pol) (un) write S i = S i + S i , i = 0, 1, 2, 3. This is certainly true for S0 = I pol + I un I = I in I in (4.57)
and in the other cases the unpolarized portion does not contribute to the Stokes parameters, since an equal contribution from the unpolarized light appears in both terms in each of Eqs. (4.54)(4.56) and therefore cancels out. A completely general form of the Stokes vector may then be written as (see Example 4.4) S0 I pol + I un S 1 A2 B 2 1 (4.58) = S 2 I in 2 AB cos S3 2 AB sin where the Jones vector A B ei describes the polarized portion of the light, which has intensity I pol = A 2 + B 2 (4.59)
We would like to express the degree of polarization in terms of the Stokes parameters. We rst note that the quantity A2 B 2 I in
2 2 2 2 S1 + S2 + S3 can be expressed as
2 2 2 S1 + S2 + S3 =
+
2
2 AB cos I in
2 AB sin I in
1 A2 B 2 I in A2 + B 2 = I in I pol = I in =
+ 4 A 2 B 2 cos2 + sin2 (4.60)
Substituting (4.57) and (4.60) into the expression for the degree of polarization (4.52) yields 1 2 2 2 P S1 + S2 + S3 (4.61) S0 If the light is polarized such that it perfectly transmits through or is perfectly extinguished by one of the three test polarizers associated with S 1 , S 2 , or S 3 , then the degree of polarization will be unity. Obviously, it is possible to have pure polarization states that are not aligned with the axes of any one of these test
98
polarizers. In this situation, the degree of polarization is still one, although the values S 1 , S 2 , and S 3 may all three contribute to (4.61). Finally, it is possible to represent polarizing devices as matrices that operate on the Stokes vectors in much the same way that Jones operate on Jones vectors. Since Stokes vectors are four-dimensional, the matrices used are four-by-four. These are known as Mueller matrices.
Example 4.5
Determine the Mueller matrix that represents a linear polarizer with transmission axis at arbitrary angle .
Solution: We know that the 50% of the unpolarized light transmits through the polarizer, ending up with Jones vector A1 B1 = I un 2 cos sin
(see table 4.1). We also know that the Jones matrix (4.26) acts on the polarized portion of the light, represented by arbitrary Jones vector A B ei This gives a transmitted Jones vector of A2 B 2 e i 2 = cos2 cos sin cos sin sin2 cos sin A B ei
= A cos + B sin e i
One might be tempted to add the two Jones vectors, but this would be wrong, since the two beams are not coherent. As mentioned previously, unpolarized light necessarily contains multiple frequencies, and so the elds from the polarized and unpolarized beam destructively interfere as often as they constructively interfere. In this case, we add intensities rather than elds. That is, we have A
2
= A1 + A2 =
I un + A 2 cos2 + B 2 sin2 + 2 AB cos sin cos cos2 2
I un + A 2 + B 2 cos 2 sin 2 + A2 B 2 + 2 AB cos cos2 2 2 2 sin 2 S 0 cos 2 + S1 + S 2 cos2 = 2 2 2 Similarly, B

2
= B 11 + B 22
S 0 cos 2 sin 2 + S1 + S 2 sin2 2 2 2

99
This gives S0 = A =
2
+ B
S 0 cos 2 sin 2 + S1 + S2 2 2 2
2
S1 = A = =
sin 2 S 0 cos 2 + S1 + S2 2 2 2
cos2 sin2
sin 4 S 0 cos 2 cos2 2 + S1 + S2 2 2 4
and since = 0 we have S2 = 2 A =2 = B cos sin 2 S 0 cos 2 + S1 + S 2 cos sin 2 2 2
S 0 sin 2 sin 4 sin2 2 + S1 + S2 2 4 2 S 3 = 2 A B sin =0 These transformations expressed in matrix format become 1 S0 S 1 cos 2 1 S = 2 sin 2 2 S3 0 cos 2 cos2 2 1 2 sin 4 0 sin 2 1 2 sin 4 sin2 2 0 0 S0 S1 0 0 S2 S3 0
which reveals the Mueller matrix for a linear polarizer.
100
Exercises
Exercises for 4.3 Jones Vectors for Representing Polarization P4.1 + B e i y Ax + B e i y Show that A x with (4.5).
= 1, as dened in connection
P4.2
Prove that if 0 < < , the helicity is left-handed, and if < < 2 the helicity is right-handed. HINT: Write the relevant real eld associated with (4.5) A cos kz t + + y B cos kz t + + E (z , t ) = |E eff | x where is the phase of E eff . Freeze time at, say, t = /. Determine the eld at z = 0 and at z = /4 (a quarter cycle), say. If E (0, t ) E (/4, t ) points in the direction of k, then the helicity matches that of a wood screw.
P4.3
For the following cases, what is the orientation of the major axis, and what is the ellipticity of the light? Case I: A = B = 1/ 2; = 0 Case II: A = B = 1/ 2; = /2; Case III: A = B = 1/ 2; = /4. Determine how much right-handed circularly polarized light (vac = 633 nm) is delayed (or advanced) with respect to left-handed circularly polarized light as it goes through approximately 3 cm of Karo syrup (the neck of the bottle). This phenomenon is called optical activity. Because of a denite-handedness to the molecules in the syrup, rightand left-handed polarized light experience slightly different refractive indices.
L4.4
Figure 4.8 Lab schematic for L 4.4 HINT: Linearly polarized light contains equal amounts of right and left circularly polarized light. Consider 1 2 1 i + ei 2 1 i
Exercises
101
where is the phase delay of the right circular polarization. Show that this can be written as cos /2 ei sin /2 Compare this with cos sin where is the angle through which the polarization is rotated, beginning with horizontally polarized light. The overall phase is unimportant.
Exercises for 4.5 Linear Polarizers and Jones Matrices P4.5 (a) Suppose that linearly polarized light is oriented at an angle with respect to the horizontal axis (x -axis) (see table 4.1). What fraction of the original intensity gets through a vertically oriented polarizer? (b) If the original light is right-circularly polarized, what fraction of the original intensity gets through the same polarizer?
Exercises for 4.6 Jones Matrix for Polarizers at Arbitrary Angles P4.6 Horizontally polarized light ( = 0) is sent through two polarizers, the rst oriented at 1 = 45 and the second at 2 = 90 . What fraction of the original intensity emerges? What is the fraction if the ordering of the polarizers is reversed? (a) Suppose that linearly polarized light is oriented at an angle with respect to the horizontal or x -axis. What fraction of the original intensity emerges from a polarizer oriented with its transmission at angle from the x -axis?
Answer: cos2 ( ); compare with P 4.5.
P4.7
(b) If the original light is right circularly polarized, what fraction of the original intensity emerges from the same polarizer? P4.8 Derive (4.12), (4.13), and (4.14). HINT: Analyze the Jones vector just as you would analyze light in the laboratory. Put a polarizer in the beam and observe the intensity of the light as a function of polarizer angle. Compute the intensity via (4.26). Then nd the polarizer angle (call it ) that gives a maximum (or a minimum) of intensity. The angle then corresponds to an axis of the ellipse followed by the E-eld as it spirals. When taking the arctangent, remember that it is dened only over a range of . You can add for another valid result (which corresponds to the second ellipse axis).
102
Exercises for 4.7 Jones Matrices for Wave Plates L4.9 Create a source of unknown elliptical polarization by reecting a linearly polarized laser beam (with both s and p -components) from a metal mirror with a large incident angle (i.e. i 80 ). Use a quarterwave plate and a polarizer to determine the Jones vector of the reected beam. Find the ellipticity, the helicity (right or left handed), and the orientation of the major axis.
Figure 4.9 Lab schematic for L 4.9 HINT: A polarizer alone can reveal the direction of the major and minor axes and the ellipticity, but it does not reveal the helicity. Use a quarterwave plate (oriented at a special angle ) to convert the unknown elliptically polarized light into linearly polarized light. A subsequent polarizer can then extinguish the light, from which you can determine the Jones vector of the light coming through the wave plate. This must equal the original (unknown) Jones vector (4.11) operated on by the wave plate (4.41). As you solve the matrix equation, it is helpful to note that the inverse of (4.41) is its own complex conjugate. P4.10 What is the minimum thickness (called zero-order thickness) of a quartz plate made to operate as a quarter-wave plate for vac = 500 nm? The indices of refraction are n fast = 1.54424 and n slow = 1.55335.
Exercises for 4.8 Polarization Effects of Reection and Transmission P4.11 Light is linearly polarized at = 45 with a Jones vector according to table 4.1. The light is reected from a vertical silver mirror with angle of incidence i = 80 , as described in (P 3.13). Find the Jones vector representation for the polarization of the reected light. NOTE: The answer may be somewhat different than the result measured in L 4.9. For one thing, we have not considered that a silver mirror inevitably has a thin oxide layer.
Exercises
103
Figure 4.10 Geometry for P 4.11

Answer: 0.668 0.702e 1.13i .
P4.12
Calculate the angle to cut the glass in a Fresnel rhomb such that after the two internal reections there is a phase difference of /2 between the two polarization states. The rhomb then acts as a quarter wave plate.
Figure 4.11 Fresnel Rhomb geometry for P 4.12 HINT: You need to nd the phase difference between (3.40) and (3.41). Set the difference equal to /4 for each bounce. The equation you get does not have a clean analytic solution, but you can plot it to nd a numerical solution.
Answer: There are two angles that work: = 50 and = 53 .
Exercises for 4.A Partially Polarized Light P4.13 (a) One way to construct a right-circular polarizer is using a quarter wave plate with fast axis at 45 , followed by a linear polarizer oriented vertically, and nally a quarter wave plate with fast axis at 45 . Calculate the Jones matrix for this system. Answer:
1 2
1 i
i 1
(b) Check that the device leaves right-circularly polarized light unaltered while killing left-circularly polarized light. P4.14 Derive the Mueller matrix for a half wave plate.
104
P4.15
Derive the Mueller matrix for a quarter wave plate.
Chapter 5
Light Propagation in Crystals

5.1 Introduction
To this point, we have considered only isotropic media with P = 0 ()E. The fact that the susceptibility () is the same in all directions for these materials leads to an index of refraction that is independent of the direction of travel and polarization. When we introduced wave plates in chapter 4, we saw that in some materials is possible for light to experience a different index of refraction depending on the polarization of the electric eld E. This difference in the index of refraction occurs in anisotropic media, where the direction and strength of the induced dipoles depends in a non-trivial way on the direction and strength of the electric eld. This behavior is observed in crystals where the lattice structure has a low degree of symmetry.1 The unique properties of anisotropic materials make them important elements in many optical systems. In section 5.2 we discuss how to connect E and P in anisotropic media using the susceptibility tensor. In section 5.3 we apply Maxwells equations to a plane wave traveling in a crystal. The analysis leads to Fresnels equation (section 5.4), which connects the components of the k-vector with the components of the susceptibility tensor. In section 5.6 we apply Fresnels equation to a uniaxial crystal (e.g. quartz, sapphire) where x = y = z . In section 5.8 we examine the ow of energy in a uniaxial crystal and show that the Poynting vector and the k-vector in general are not parallel. In Appendix 5.B we describe light propagation in a crystal using the method of Christian Huygens (1629-1695) who lived more than a century before Fresnel. Huygens successfully described birefringence in crystals using the idea of elliptical wavelets. His method gives the direction of the Poynting vector associated with the extraordinary ray in a crystal. It was Huygens who coined the term extraordinary since one of the rays in a birefringent material appeared not to obey Snells law. Actually, the k-vector always obeys Snells law, but in a crystal the k-vector points in a different direction than the Poynting vector, and it is the
1 Not all crystals are anisotropic. For instance, crystals with a cubic lattice structure (such as
NaCl) are highly symmetric and respond to electric elds the same in any direction.
105
106
Chapter 5 Light Propagation in Crystals
Poynting vector that delivers the energy seen by an observer.
5.2 Constitutive Relation in Crystals

In this section, we explore the connection between the polarization P of the medium and the electric eld E in a anisotropic crystal. In this type of crystal, the lattice can cause directional asymmetries so that the P in the crystal does not necessarily respond in the same direction as the electric eld E (i.e. P = 0 E). However, at low intensities the response of materials is still linear (or proportional) to the strength of the electric eld. The linear constitutive relation which connects P to E in a crystal can be expressed in its most general form as
Px Py = Pz
xx yx 0 zx
x y y y z y
xz Ex y z E y zz Ez
(5.1)
The matrix in (5.1) is called the susceptibility tensor. To visualize the behavior of electrons in such a material we imagine the electron bound as though by tiny springs with different strengths in different dimensions to represent the anisotropy (see Fig. 5.1). When an external electric eld is applied, the electron experiences a force that moves it from its equilibrium position. The springs (actually the electric force from ions bound in the crystal lattice) exert a restoring force, but the restoring force is not equal in all directionsthe electron tends to move more along the dimension of the weaker spring. The displaced electron creates a microscopic dipole, but the asymmetric restoring force causes P to be in a direction different than E. To understand the geometrical interpretation of the many coefcients i j , assume, for example, that the electric eld is directed along the x -axis (i.e. E y = E z = 0) as depicted in Fig. 5.1. In this case, the three equations encapsulated in (5.1) reduce to P x = 0 xx E x P y = 0y x E x P z = 0 zx E x
Figure 5.1 A physical model of an electron bound in a crystal lattice.
The coefcient xx connects the strength of P in the x direction with the strength of E in that same direction. The coefcients y x indicate the amount of P in the y -dimension produced by the electric eld component in the x -dimension. The other coefcients with mixed subscripts in (5.1) likewise describe the contribution to P in one dimension made by an electric eld component in another dimension. As you might imagine, working with nine susceptibility coefcients can get complicated. Fortunately, we can greatly reduce the complexity of the description by a judicious choice of coordinate system. For ordinary crystals, it turns out that if we assume that there is no absorption in the crystal, conservation of energy
5.2 Constitutive Relation in Crystals
107
requires that the susceptibility tensor in (5.1) be real and symmetric (i j = j i ).2 In Appendix 5.A we show that these requirements allow us to nd a coordinate system for which off-diagonal elements of the tensor vanish. This is true even if the lattice planes in the crystal are not mutually orthogonal (e.g. rhombus, hexagonal, etc.). We allow the crystal to dictate the orientation of the coordinate system, aligned to the principal axes of the crystal for which the off-diagonal elements of (5.1) are zero. In that frame the constitutive relation simplies to Px x 0 0 Ex P y = 0 0 y 0 E y (5.2) Pz 0 0 z Ez or 0 x E x + y 0y E y + z 0 z E z P=x (5.3) The assumption of no absorption requires that the diagonal elements of the matrix in (5.2) be real. Example 5.1
Show that the assumption that a medium is non-absorbing implies that the susceptibility tensor is symmetric.
Figure 5.2 A physical model of an electron bound in a crystal lattice with the coordinate system specially chosen along the principal axes so that the susceptibility tensor takes on a simple form.
Solution: We assume that P is due to a single species of electron, so that we have P = N p. Here N is the number of microscopic dipoles per volume and p = q e r, where q e is the charge on the electron and r is the microscopic displacement of the electron. The force on this electron due to the electric eld is given by F = Eq e . With these denitions, we can use (5.1) to write a connection between the force due to a static E and the electron displacement: x xx x y xz Fx 0 y x y y y z F y N qe y = (5.4) qe z zx z y zz Fz The column vector on the left represents the components of the displacement r. We next invert (5.4) to nd the force of the electric eld on an electron as a function of its displacement3 Fx k xx k x y k xz x Fy = kyx ky y kyz y (5.5) Fz k zx k z y k zz z
2 By ordinary we mean that the crystal does not exhibit optical activity. Optically active crystals
have a complex susceptibility tensor, even when no absorption takes place. Conservation of energy in this more general case requires that the susceptibility tensor be Hermitian (i j = ). See ji Section ?? for more details. 3 This inversion assumes the eld changes slowly so the forces on the electron are always essentially balanced. This is not true for optical elds, but the proof gives the right avor for why conservation of energy results in the symmetry. A more formal proof that doesnt make this assumption can be found in Principles of Optics, 7th Ed., Born and Wolf, pp. 790-791 (Ref. [1]).
108
where k xx kyx k zx kx y ky y kz y k xz xx 2 N qe y x kyz 0 k zz zx x y y y z y
1 xz y z zz
(5.6)
The total work done on an electron in moving it to its displaced position is given by W= F (r ) d r (5.7)
path
While there are many possible paths for getting the electron to any specic displacement (each path specied by a different history of the electric eld), the work done along any of these paths must be the same if the system is conservative (i.e. + yy we could have no absorption). For example, for a nal displacement of r = x x the following two paths:
We can use (5.5) in (5.7) to calculate the total work done on the electron along path 1:
x y
W= = =
0 x 0
F x (x , y = 0, z = 0)d x +
y
F y (x = x , y , z = 0)d y
k xx x d x +
(k y x x + k y y y ) d y
ky y 2 k xx 2 x + kyx x y + y 2 2
y x
If we take path 2, the total work is W= = = F y (x = 0, y , z = 0)d y +

y 0 x
F x (x , y = y , z = 0)d x
ky y y d y +
(k xx x + k x y y ) d x k xx 2 x 2
ky y 2
y 2 + kx y x y +
Since the work must be the same for these two paths, we clearly have k x y = k y x . Similar arguments for other pairs of dimensions ensure that the matrix of k coefcients is symmetric. From linear algebra, we learn that if the inverse of a matrix is symmetric then the matrix itself is also symmetric. When we combine this result with the denition (5.6), we see that the assumption of no absorption requires the susceptibility matrix to be symmetric.
5.3 Plane Wave Propagation in Crystals

Now we are ready to search for solutions to the wave equation in a crystal. As a trial solution, we consider a plane wave with frequency , similar to the planewave solution we have studied in isotropic materials. In this case, the elds E, B,
5.3 Plane Wave Propagation in Crystals
109
and P are all associated with the same plane wave according to E = E0 e i (krt ) B = B0 e i (krt ) P = P0 e i (krt ) As usual, the phase of each wave is included in the amplitudes E0 , B0 , and P0 . We can make several observations about the behavior of these elds by applying Maxwells equations directly. Gausss law for electric elds requires ( 0 E + P) = 0 and Gausss law for magnetism gives B = 0 When our trial solutions (5.8) are inserted into these, we nd k ( 0 E + P) = 0 and kB = 0 (5.12) Notice that we have the following peculiarity: From its denition, the Poynting vector S E B/0 is perpendicular to both E and B, and by (5.12) the k-vector is perpendicular to B. However, by (5.11) the k-vector is not necessarily perpendicular to E, since in general k E = 0 if P points in a direction other than E. Therefore, k and S are not necessarily parallel in a crystal. In other words, the ow of energy and the direction of the wave propagation can be different. Now we consider the behavior of our trial elds in the wave equation (1.49). Under the assumption Jfree = 0, we have 2 E 0
0
(5.8)
(5.9)
(5.10)
(5.11)
2 E 2 P = + ( E ) 0 t 2 t 2
(5.13)
As with the isotropic case, we next substitute our trial solutions (5.8) into the wave equation to nd a dispersion relation that imposes requirements that k and must satisfy in order for the elds to be consistent with Maxwells equations. Example 5.2 illustrates how the dispersion relation is found. Example 5.2
Obtain the dispersion relation in a crystal by substituting the proposed solution (5.8) into the wave equation (5.13).
Solution: After substituting the elds into the wave equation and carrying out the derivatives, we nd k 2 E 2 0 ( 0 E + P) = k (k E) (5.14)
110
Inserting the constitutive relation (5.3) for crystals into (5.14) yields k 2 E 2 0
0
+ 1 + y E y y + 1 + z E z z = k (k E ) 1 + x E x x
(5.15)
This relationship is unwieldy because of the mix of electric eld components that appear in the expression. This was not a problem when we investigated isotropic materials for which the k-vector is perpendicular to E, making the right-hand side of the equations zero. Nevertheless, through a direct procedure, we can eliminate the electric eld components from the expressions. Relation (5.15) actually contains three equations, one for each dimension. Explicitly, these equations are k2 k2 and k2 2 1 + x c2 2 1 + y c2 2 1 + z c2 E x = k x (k E ) E y = k y (k E ) (5.16)
(5.17)
E z = k z (k E )
(5.18)
We have replaced the constants 0 0 with 1/c 2 according to (1.51). We multiply (5.16)(5.18) respectively by k x , k y , and k z . We also move the factor in square brackets in each equation to the denominator on the right-hand side. Then if we add the three equations together we get
2 kx (k E ) + 2 (1+x ) 2 k c2 2 ky (k E )
k2
(1+ y )
c2
2 kz (k E ) = k x E x + k y E y + k z E z = (k E ) 2 (1+z ) 2 k c2
(5.19) This nice trick allows us to get rid of the electric eld by dividing the equation by k E. If we also multiply the equation by 2 /c 2 we have our dispersion relation unencumbered by eld components:
2 kx
k 2 c 2 /2 1 + x
2 ky
k 2 c 2 /2 1 + y
2 kz
k 2 c 2 /2 1 + z
2 (5.20) c2
The dispersion relation (5.20) found in Example 5.2 allows us to nd a suitable k, given values for , x , y , and z . Nevertheless, with only this information the solution to this equation is far from unique. In particular, we must decide on a direction for the wave to travel (i.e. we must choose the ratios between k x , k y , and k z ). To remind ourselves of this fact, we introduce a unit vector that points in the direction of the k: + ky y + kz z = k ux x + uy y + uz z = ku k = kx x (5.21)
With this unit vector inserted, the dispersion relation (5.20) for plane waves in a crystal becomes
2 ux
k 2 c 2 /2 1 + x
u2 y k 2 c 2 /2 1 + y
2 uz
k 2 c 2 /2 1 + z
2 (5.22) k 2c 2
5.4 Fresnels Equation
111
5.4 Fresnels Equation

We are now ready to introduce the refractive index for anisotropic materials. As we have seen before, the speed of a wave having the form (5.8) is v = /k (see P 1.10). By denition, the refractive index of a material is the ratio of c to the speed v (see (2.19)). Therefore, the refractive index for the wave is n= kc (5.23)
Although (5.23) looks innocent enoughand we have seen it beforethe relationship between k and now depends on the direction of propagation in the crystal according to (5.22). Motivated by the relation between the index and the susceptibility in the isotropic case (2.19), we replace the susceptibility parameters in (5.22) with three new constants: n x 1 + x ny nz 1 + y 1 + z (5.24)
Using these denitions along with (5.23) we can write the dispersion relation (5.22) in the following convenient form:
2 ux 2 n2 nx
u2 y n2 n2 y
2 uz 2 n2 nz
1 n2
(5.25)
Equation (5.25) is called Fresnels equation (not to be confused with the Fresnel coefcients studied in chapter 3). The relationship contains the yet unknown index n that varies with the direction of the k-vector (i.e. the direction of the unit ). For a given k-vector there are two possible values for n , one associated vector u with each of two polarization directions of the electric eld. When Fresnels equation (5.25) is solved to nd the two values of n associated (see P 5.1), the resulting solution is quadratic in n 2 . The solutions with a given u can be written as4 B B 2 4 AC n2 = (5.26) 2A where
2 2 2 2 2 A ux nx + u2 y n y + uz nz 2 2 2 2 2 2 2 2 2 2 2 B ux nx n2 y + nz + u y n y nx + nz + uz nz nx + n y 2 2 2 C nx n y nz
(5.27) (5.28) (5.29)
The upper and lower signs in (5.26) give two solutions for n , one large and one small small (associated with the + and , respectively). These are often referred to as the slow and fast index, respectively, because the waves associated
4 It is possible to write this solution in many equivalent forms using the identity u 2 + u 2 + u 2 = 1. x y z
112
with these indexes propagate at speed v = c /n . In the special cases of propagation along one of the principal axes of the crystal, the index n takes on the two values n x , n y , or n z associated with the axes orthogonal to propagation (see Example 5.3).
Example 5.3
Calculate the two possible values for the index of refraction when k is in the z direction (in the crystal principal frame).
Solution: With u z = 1 and u x = u y = 0 we have

2 A = nz ; 2 2 B = nz nx + n2 y ; 2 2 2 C = nx n y nz
The square-root term is then B 2 4 AC = =

4 4 2 2 2 2 4 nz nx + 2n x n y + n4 y 4n x n y n z 4 2 nz nx n2 y 2
2 2 = nz nx n2 y
Inserting this expression into Fresnels equation, we nd the two values for the index n = nx , n y The index n x is experienced by light polarized in the x -dimension, and the index n y is experienced by light traveling in the y -dimension.
It is often convenient to use spherical coordinates to represent the compo: nents of u u x = sin cos u y = sin sin u z = cos Here is the polar angle measured from the z -axis of the crystal and is the azimuthal angle measured from the x -axis of the crystal. These equations emphasize the fact that there are only two degrees of freedom when specifying propagation direction ( and ). It is important to remember that these angles must be specied in the frame of the crystals principal axes, which is usually not aligned with the faces of a cut crystal in an optical setup. (5.30)
5.5 Polarization in Crystals

correspond to two distinct polarizaThe two values for n associated with a given u tion components of the eld. In Example 5.3 the natural polarization components
5.5 Polarization in Crystals
113
were along the principal axes, but for propagation in an arbitrary direction the polarization components are in other directions. Therefore, every propagation has its own natural set of polarization components. The two polardirection u ization components travel at different speeds, so even though the frequency is the same for both components, the wavelength for each is different (within the crystal). To nd the direction of the polarization components associated with the two values of n , we return to Maxwells equations. When substitute our solution (5.8) into Faradays Law (1.44) and Amperes Law (1.45) (under the assumption Jfree = 0) we obtain the following requirements on the solution: k E = B and k B = 0 ( 0 E + P) We can combine these two equations to eliminate B k (k E) + 0 2 ( 0 E + P) = 0 (5.33) (5.32) (5.31)
and then apply the linear constitutive relation (5.2) to nd (after carrying out the cross products and a little algebra) the following requirement on the electric eld vector n2 2 2 x ux u y ux uz 2 u y uz Ex n 2 ny Ey = 0 2 2 (5.34) ux uz u y uz ux u y n2 2 nz Ez 2 ux uz u y uz ux u2 y n2 For (5.34) to have a non-trivial solution, the determinant of the matrix must be zero. Imposing this requirement is an equivalent way to derive Fresnels equation (5.25) for n . and a value for n (from Fresnels equation), we can use Given a direction for u (5.34) to determine which direction the electric eld associated with that index must oscillate. When all three equations represented by (5.34) are coupled, the appropriate eld direction for a value of n is given by (see P 5.3)

Ex Ez
Ey
uy 2 n n2 y uz
ux 2 2 n nx
(5.35)
2 n2 nz
This is a proportionality rather than an equation, since Maxwells equation only species the direction of Ewe are free to choose the amplitude. Because Fresnels equation gives two values for n , (5.35) species two distinct polarization
114
. These polarization components form a natucomponents associated with each u ral basis for describing light propagation in a crystal. When light is composed of a mixture of these two polarizations, the two polarization components experience different indexes of refraction. (i.e. u x , u y , or u z ) is precisely zero, (5.35) fails If any of the components of u to provide a well-dened direction for the polarization. This is because at least on of the dimensions in the system of equations represented by (5.34) is decoupled from the others. In these cases, it is necessary to go back to (5.34) and re-solve for the polarization directions.
Example 5.4
Determine the directions of the two polarization components associated with the =z values of n for a propagation direction u
Solution: In this case we have u x = u y = 0, so as noted above, have to go back to (5.34) and re-solve. In our case, the set of equations becomes
2 nx n2
1 0 0
n2 y n2
0 1 0
Ex =0 0 Ey 2 Ez nz
n2
(5.36)
Notice that all three dimensions are decoupled in this system (i.e. there are no off-diagonal terms). In Example 5.3 we found that the two values of n associated =z are n x and n y . If we use n = n x in our set of equations, we have with u 0
n2 y
2 nx
0 1 0
0 0
2 nx
Ex 0 Ey = 0 2 nz Ez
Assuming n x , n y , and n z are all unique so that n y /n x = 1 and n z /n x = 1, these equations require E y = E z = 0 but allow E x to be non-zero. This proves our earlier assertion that the index n x is associated with light polarized in the x -dimension in =z . Similarly, when n y is inserted into (5.36), we nd that it is the special case of u associated with light polarized in the y -dimension.
We can use (5.35) to study the behavior of polarization direction as the direction of propagation varies. Figure 5.3 shows plots of the polarization direction (i.e. normalized E x , E y , and E z ) in Potassium Niobate as the propagation direction (5.30) is varied. The plot is created by inserting the spherical representation of u into Fresnels equation (5.26) for a chosen sign of the , and then inserting the resulting n into (5.35) to nd the associated electric eld. As we saw in Example 5.4, at = 0 the light associated with the slow index is polarized along the y -axis and the light associated with the fast index is polarized along the x -axis.
5.6 Biaxial and Uniaxial Crystals
115
In Fig. 5.3(c) we have plotted the angle between the two polarization components. At = 0, the two polarization components are orthogonal, as one would expect. However, notice that in other propagation directions the two polarization components are not orthogonal. This feature is on of the things that makes describing light in crystals challenging. Before moving on, let us briey summarize what has been accomplished so far. Given values for x , y , and z associated with light in a crystal at a given frequency, one denes the indices n x , n y , and n z , according to (5.24). Next, a direction for the k-vector is chosen (i.e. u x , u y , and u z , or equivalently and ). This direction potentially has two values for the index of refraction associated it, found using Fresnels equation (5.26). Each index can then be associated with a polarization direction for the electric eld, found using (5.35).
5.6 Biaxial and Uniaxial Crystals

All anisotropic crystals have certain special propagation directions where the two values for n from Fresnels equation are equal. These directions are referred to as the optic axes of the crystal. When propagating along an optic axis, all polarization components experience the same index of refraction. If the values of n x , n y , and n z are all unique, a crystal will have two optic axes, and hence is referred to as a biaxial crystal. By convention, we order the crystal axes for biaxial crystals so that n x < n y < n z . Under this convention, the two optic axes occur in the x -z plane ( = 0) at the following polar angles , measured from the z -axis (see P 5.4): nx cos = ny
2 nz n2 y 2 2 nz nx
(Optic axes directions, biaxial crystal)
(5.37)
Describing light in biaxial crystals can get complicated because, among other things, the polarization components specied by (5.35) are not necessarily orthogonal to one another. Thus, for the remainder of this chapter, we will focus on the simpler case of uniaxial crystals.
Figure 5.3 Polarization direction associated with the two values of n in Potassium Niobate (KNbO3 ) at = 500 nm (n x = 2.22, n y = 2.34, and n z = 2.41) and = /4. Frame (c) shows the angle between the two polarization components.
116
In uniaxial crystals two of the coefcients x , y , and z are the same. In this case, there is only one optic axis for the crystal (hence the name uniaxial). By convention, in uniaxial crystals we label the dimension that has the unique susceptibility as the z -axis (i.e. x = y = z ). This makes the z -axis the optic axis, since the the two values for n in this direction are the same (i.e. n x = n y ). The unique index of refraction is called the extraordinary index n z = ne and the other index is the ordinary index n x = n y = no (5.39) (5.38)
These names were coined by Huygens, one of the early scientists to study light in crystals (see appendix 5.B). A uniaxial crystal with n e > n o is referred to as a positive crystal, and one with n e < n o is referred to as a negative crystal. To calculate the index of refraction for a wave propagating in a uniaxial crystal, we use denitions (5.38) and (5.39) along with the spherical representation of u (5.30) in Fresnels equation (5.26) to nd the following two values for n (see P 5.5): n = no and n = n e ( ) (uniaxial crystal) (5.40)
no ne
2 2 no sin2 + n e cos2
(uniaxial crystal)
(5.41)
The index n e ( ) in (5.41) is also commonly referred to as the extraordinary index along with the constant n e = n z . While this has the potential for some confusion, the practice is so common that we will continue it here. In this book we will write n e ( ) when the angle dependent quantity specied by (5.41) is required, and write n e in formulas where the constant (5.38) is called for (as in the right hand side of (5.41)). Notice that n e ( ) depends only on (the polar angle measured to the ) and not (the azimuthal angle). optic axis z The rst index (5.40) corresponds to a polarization component which points and z (e.g. if u is in the x -z plane, n o is orthogonal to the plane containing u associated with light polarized in the y -dimension).
sin ) cos E o (u 0
(5.42)
This is shown by inserting n = n o into the requirement (5.34), and nding the allowed elds (see P 5.6). This eld component is associated with the ordinary wave because just as in an isotropic medium such as glass, the index of refraction for light with this polarization does not vary with . The polarization component
5.7 Refraction at a Crystal Surface
117
associated with n e ( ) is found by using (5.35):
sin cos
n 2 ( ) n 2 o e sin sin ) Ee (u n 2 ( ) n 2 e o cos

2 2 ( ) n e ne
(5.43)
Notice that this polarization component is partially directed along the optic axis Ee (u ) = 0 (i.e. it has a z -component), and it is not perpendicular to k since u (see P 5.7). It is, however, perpendicular to the ordinary polarization component, since Ee Eo = 0. If = 0, then the k-vector is directed exactly along the optic axis, and no polarization component experiences the unusual dimension (i.e. the z -direction). Notice that when = 0, (5.41) reduces to n = n o so that both indices are the same. On the other hand, if = /2 then (5.41) reduces to n = n e . This is why a wave plate is cut with the optic axis parallel to the surface. For light entering the wave plate at normal incidence, the propagation direction is = /2, and there is a slow and a fast axis with indices n o and n e .
5.7 Refraction at a Crystal Surface

Now we consider refraction as light enters a uniaxial crystal. Snells law (3.7) describes the connection between the k-vectors incident upon and transmitted through the surface. In a crystal, one must consider the portion of the incoming light that will experience the ordinary and extraordinary indexes separately. Because they experience different indexes, the ordinary and extraordinary polarized light refract into the crystal at two different angles, they travel at two different velocities in the crystal, and they have two different wavelengths in the crystal. The ordinary polarization component obeys Snells law as usual, with index n o . If we assume that the index outside of the crystal is n i = 1, Snells law may be written as sin i = n o sin t (ordinary polarized light) (5.44) where n is the index inside the crystal. The extraordinary polarized light still obeys Snells law, but now the index of refraction in the crystal depends on direction of propagation inside the crystal relative to the optic axis. Mathematically, we have sin i = n e ( ) sin t (extraordinary polarized light) (5.45)
where is the angle between the optic axis inside the crystal and the direction of propagation in the crystal (given by t in the plane of incidence). When the optic axis is at an arbitrary angle with respect to the surface the connection between
118
and t must be specied by rotation matrices, and getting a general expression for Snells law involves solving a quartic equation so that a general solution is awkward. In Example 5.5 we illustrate a specic case where it is possible to nd a simple analytic expression for Snells law, but in the general case one would usually just solve the equations numerically. Example 5.5
Find Snells for extraordinary polarized light in the special case where a crystal is cut such that the optic axis lies perpendicular to the surface (not the way a wave plate is cut) as shown in Fig. 5.4.
Figure 5.4 Propagation of light in a uniaxial crystal with its optic axis perpendicular to the surface.
Solution: In this case, the connection between propagation direction and the optic axis in this case is simply t = . If the light hits the crystal surface straight on, the index of refraction is n o , regardless of the orientation of polarization since = 0 for normal incidence. When the light strikes the surface at an angle, s -polarized light continues to experience the index n o , but p -polarized light experiences an index that varies with angle via (5.41) since the electric eld has a component of polarization along the optic axis. (The correspondence between s and p and ordinary and extraordinary polarization components is specic to the orientation of the optic axis in this example. For arbitrary orientations of the optic axis with respect to the surface, the ordinary and extraordinary components will generally be mixtures of s and p polarized light.) When we insert (5.41) into Snells law (5.45) with = t , we can invert it to nd the transmitted angle t in terms of i (see P 5.8): tan t = no n e sin i
2 ne sin2 i
(extraordinary polarized, optic axis surface) (5.46)
5.8 Poynting Vector in a Uniaxial Crystal
119
As strange as this formula looks, it is Snells law, but with an angularly dependent index.
5.8 Poynting Vector in a Uniaxial Crystal

When an object is observed through a crystal (acting as a window), the energy associated with ordinary and extraordinary polarized light follow different paths, giving rise to two different images. This phenomenon is called birefringence. Since the Poynting vector dictates the direction of energy ow, it is the direction of S that determines the separation of the double image seen when looking through a birefringent crystal. Snells law dictates the connection between the directions of the incident and transmitted k-vectors. The Poynting vector S for purely ordinary polarized light points in the same direction as the k-vector, so the direction of energy ow for ordinary polarized light also obeys Snells law. However, for extraordinary polarized light, the Poynting vector S is not parallel to k (recall the discussion in connection with (5.11) and (5.12)). Thus, the energy ow associated with extraordinary polarized light does not obey Snells law. When Huygens saw this, he said how extraordinary! (See Appendix 5.B.) To analyze this situation, it is necessary to derive an expression for extraordinary polarized light similar to Snells law, but which applies to S rather than to k. This describes the direction that the energy associated with extraordinary rays takes upon entering the crystal. To calculate the direction that the extraordinary polarized S takes upon entering a crystal, we rst calculate the direction of k inside the crystal using Snells law (5.45). Then we use the expression (5.43) for E along with B = (k E)/, to evaluate S = E B/0 . In general, this process is best done numerically, since Snells law (5.45) for extraordinary polarized light usually does not have simple analytic solutions. In Example 5.6 we study a special case where an analytic solution can be found relatively easily.
Example 5.6
Derive the connection between the incident and transmitted Poynting vectors when extraordinary polarized light is incident on a uniaxial crystal cut with its optic axis perpendicular to the surface as in Fig. 5.4.
Solution: This is the same orientation as Example 5.5, so we can use the expression derived there for Snells law. We assume that the light propagates in the y -z plane
120
as shown in Fig. 5.4 (i.e. = /2). From (5.43) we have 0
sin t 2 2 ) = E 0 n e ( t ) n o E e (u cos t
2 2 ne ( t ) n e
where t describes the direction of the k-vector: 0 k = k sin t cos t To nd the direction of energy ow, we must calculate S = E B/0 . To do this we need to know E. From the constitutive relation (5.3) and the denitions (5.24) we have
0
(5.47)
E+P = =
0 0
+ 1 + y E y y + 1 + z E z z 1 + x E x x
2 2 2 + no + ne no Ex x Ey y Ez z
(5.48)
Upon substitution of (5.47) and (5.48) into (5.11) we have sin t + z cos t k ( 0 E + P) = k y =
0 0
2 2 2 + no + ne no Ex x Ey y Ez z
2 2 k no E y sin t + n e E z cos t
(5.49)
=0 Therefore, the y and z components of the eld are related through Ez =

2 no Ey 2 ne
tan t
(5.50)
Note that E y and E z are exactly the components of the electric eld that comprise extraordinary polarized light; the ordinary component of the eld points in the x -direction. We may write the extraordinary polarized electric eld as z E = Ey y
2 no 2 ne
tan t
(ordinary polarized)
(5.51)
Before computing the Poynting vector, we also need to express the magnetic eld in terms of the electric eld. To do this we take advantage of (5.47) to nd B= kE sin t + z cos t E y y z o k y tan t n2
e
n2
2 kE y n o
(5.52)
= x
2 ne
sin t tan t + cos t

5.A Rotation of Coordinates
121
We proceed with the computation of the Poynting vector. Using (5.51) and (5.52) we get B S = E 0 z = E y y =
2 2 kE y no 2 no 2 ne
tan t
2 kE y n o 2 0 n e
sin t tan t + cos t x

2 no 2 ne
(5.53)
0 n e
sin t tan t + cos t 2
+y z
tan t
Keep in mind that t refers to the direction of the k-vector. The above equation demonstrates that the Poynting vector S lies along another direction. Let us label the direction of the Poynting vector with the angle S . This angle can be obtained from the ratio of the two vector components of S as follows: tan S Sy Sz =
2 no 2 ne
tan t
(extraordinary polarized)
(5.54)
While the k-vector is characterized by the angle t , the Poynting vector is characterized by the angle S . We can nd the connection between the incident angle i and S by taking advantage of the already known connection between i and t . Combining (5.46) and (5.54), we obtain tan S = ne n o sin i
2 ne sin2 i
(extraordinary polarized)
(5.55)
As we noted in the last example, we have the case where ordinary polarized light is s -polarized light, and extraordinary polarized light is p -polarized light due to our specic choice of orientation for the optic axis in this section. In general, the s - and p -polarized portions of the incident light can each give rise to both extraordinary and ordinary rays.
Appendix 5.A Rotation of Coordinates

In this appendix, we go through the labor of showing that (5.1) can always be written as (5.3), given that the susceptibility tensor is symmetric (i.e. i j = j i ). This amounts to an eigenvalue problem, which we accomplish here via rotations of the coordinate system. We have P = 0 E where
Ex E Ey Ez Px P Py Pz
(5.56)
xx x y xz
x y y y y z
xz y z zz
(5.57)
Our task is to nd a new coordinate system x , y , and z for which the susceptibility tensor is diagonal. That is, we want to choose x , y , and z such that P = 0 E ,
(5.58)
122
where
Ex E Ey Ez Px P Py Pz
x x 0 0
0 y 0
y
0 0 z z
(5.59)
To arrive at the new coordinate system, we are free to make pure rotation transformations. From (4.32), a rotation through an angle about the z -axis, followed by a rotation through an angle about the resulting y -axis, and nally a rotation through an angle about the new x -axis, can be written as R 11 R 12 R 13 R R 21 R 22 R 23 R 31 R 32 R 33 1 0 0 cos 0 sin cos sin 0 0 1 0 sin cos 0 = 0 cos sin 0 sin cos sin 0 cos 0 0 1 cos cos cos sin sin = cos sin sin sin cos cos cos sin sin sin sin cos sin sin cos sin cos sin cos cos sin sin cos cos (5.60) The matrix R produces an arbitrary rotation of coordinates in three dimensions. Specically, we can write: E = RE (5.61) P = RP These transformations can be inverted to give E = R1 E P = R1 P where
cos cos cos sin sin sin cos sin sin cos sin cos R1 = cos sin cos cos sin sin sin sin cos cos sin sin sin sin cos cos cos R 11 R 21 R 31 = R 12 R 22 R 32 = RT (5.63) R 13 R 23 R 33
(5.62)
Note that the inverse of the rotation matrix is the same as its transpose, an important feature that we exploit in what follows. Upon inserting (5.62) into (5.56) we have
R1 P = 0 R1 E
or P = 0 RR1 E
(5.64)
(5.65)
5.B Huygens Elliptical Construct for a Uniaxial Crystal
123
From this equation we see that the new susceptibility tensor we seek for (5.58) is RR1 R 11 R 12 R 13 xx x y = R 21 R 22 R 23 R 31 R 32 R 33 xz x x x y x z = x y y y y z x z y z z z
x y y y y z
xz R 11 y z R 12 zz R 13
R 21 R 22 R 23
R 31 R 32 R 33
(5.66)
We have expressly indicated that the off-diagonal terms of are symmetric (i.e. i j = j i ). This can be veried by performing the multiplication in (5.66). It is a consequence of being symmetric and R1 being equal to RT The three off-diagonal elements of (appearing both above and below the diagonal) are found by performing the matrix multiplication in the second line of (5.66). The specic expressions for these three elements are not particularly enlightening. The important point is that we can make all three of them equal to zero since we have three degrees of freedom in the angles , , and . Although, we do not expressly solve for the angles, we have demonstrated that it is always possible to set x
y
=0 (5.67)
x z = 0 y z = 0 This justies (5.3).
Appendix 5.B Huygens Elliptical Construct for a Uniaxial Crystal

In 1690 Christian Huygens developed a way to predict the direction of extraordinary rays in a crystal by examining an elliptical wavelet. The point on the elliptical wavelet that propagates along the optic axis is assumed to experience the index n e . The point on the elliptical wavlet that propagates perpendicular to the optic axis is assumed to experience the index n o . It turns out that Huygens approach agreed with the direction energy propagation (5.55) (as opposed to the direction of the k-vector). This was quite satisfactory in Huygens day (except that he was largely ignored for a century, owing to Newtons corpuscular theory) since the direction of energy propagation is what an observer sees. Consider a plane wave entering a uniaxial crystal. In Huygens point of view, each point on a wave front acts as a wavelet source which combines with neighboring wavelets to preserve the overall plane wave pattern. Inside the crystal, the wavelets propagate in the shape of an ellipse. The equation for an elliptical wave
124
front after propagating during a time t is y2 (c t /n e )2 + z2 (c t /n o )2 =1 (5.68)
After rearranging, the equation of the ellipse inside the crystal can also be written as y2 ct z= 1 (5.69) no (c t n e )2 In order to have the wavelet joint neatly with other wavelets to build a plane wave, the wave front of the ellipse must be parallel to a new wave front entering the surface at a distance c t / sin i above the original point. This distance is represented by the hypotenuse of the right triangle seen in Fig. 5.5. Let the point where the wave front touches the ellipse be denoted by y , z = z tan , z . The slope (rise over run) of the line that connects these two points is then dz z = dy c t / sin i z tan (5.70)
Christiaan Huygens (16291695, Dutch) Huygens championed the wave theory of light. He was able to explain birefringence in terms of dierent indexes of refraction that varyd with direction. (Newton was also able to explain birefringence with particles by assuming that the crystal sorted light-particles according to their geometric properties.) Huygens made many advancements in clock-making technology and statistical theory.
At the point where the wave front touches the ellipse (i.e., y , z = z tan , z ), the slope of the curve for the ellipse is dz = dy
2 yn e
no c t
1 (ct /n
y2
=
2 e)
2 ne y 2 no z
2 ne 2 no
tan
(5.71)
Figure 5.5 Elliptical wavelet.

5.B Huygens Elliptical Construct for a Uniaxial Crystal
125
We would like these two slopes to be the same. We therefore set them equal to each other:
2 ne
no
tan = 2
2 2 ne z tan c t ne tan2 + 1 = 2 sin 2 ct / sin i z tan z no no i
(5.72)
If we evaluate (5.68) for the point y , z = z tan , z , we obtain ct = no z

2 ne 2 no
tan2 + 1
(5.73)
Upon substitution of this into (5.72) we arrive at

2 ne tan 2 sin no i
2 ne
no
tan2 + 1 2
2
4 ne tan2 2 no sin2 i 2 no 2 ne
2 ne 2 no
tan2 + 1 n o sin i ne
2 ne sin2 i
(5.74) (5.75)
2 ne
sin i
1 tan2 =
tan =
This agrees with (5.55) as anticipated. Again, Huygens approach obtained the correct direction of the Poynting vector associated with the extraordinary wave.
126
Exercises
Exercises for 5.4 Fresnels Equation P5.1 Solve Fresnels equation (5.25) to nd the two values of n associated . Show that both solutions yield a positive index of with a given u refraction HINT: Show that (5.25) can be manipulated into the form 0= +
2 2 6 ux + u2 y + uz 1 n 2 2 2 2 2 2 2 2 2 2 2 nx + n2 y + nz ux n y + nz u y nx + nz uz nx + n y
n4
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 nx n y + nx nz + n2 y nz ux n y nz u y nx nz uz nx n y n + nx n y nz
The coefcient of n 6 is identically zero since by denition we have 2 2 ux + u2 y + u z = 1. P5.2 Suppose you have a crystal with n x = 1.5, n y = 1.6, and n z = 2.0. Use Fresnels equation to determine what the two indices of refraction are = ( for a k-vector in the crystal along the u x + 2 y + 3 z)/ 14 direction.
Exercises for 5.6 Biaxial and Uniaxial Crystals P5.3 P5.4 Show that (5.35) is a solution to (5.34). Given that the optic axes are in the x -z plane, show that the direction of the optic axes are given by (5.37). HINT: The two indexes are the same when B 2 4 AC = 0. You will want to use polar coordinates for the direction unit vector, as in (5.30). Set = 0 so you are in the x -z plane. Use sin2 + cos2 = 1 to get an equation that only has cosine terms and solve for cos2 . P5.5 Use denitions (5.38) and (5.39) along with the spherical representation (5.30) in Fresnels equation (5.26) to calculate the two values for of u the index in a uniaxial crystal (i.e. (5.40) and (5.41)). HINT: First show that
2 2 A = no sin2 + n e cos2 2 2 4 2 2 ne + no sin2 + n e n o cos2 B = no 4 2 C = no ne
and then use these expressions to evaluate Fresnels equation.

Exercises
127
P5.6
Show that the eld polarization component associated with n = n o in a uniaxial crystal is directed perpendicular to the plane containing u by substituting this value for n into (5.34) and determining what and z combination of eld components are allowable. with = 0 (the index is the same for all HINT: Use (5.30) to represent u , so you may as well use one that makes calculation easy). When you substitute into (5.33) you will nd that E y can be any value because of the location of zeros in the matrix. To get a requirement on E x and E z , collapse the matrix equation down to a 2 2 system. For non-trivial solutions to exist (i.e. E x = 0 or E y = 0), the determinant of the matrix must be zero. Show that this is only the case if n o = n e (i.e. the crystal is isotropic).
P5.7
) in a Show that the electric eld for extraordinary polarized light Ee (u ), but that it is perpenuniaxial crystal is not perpendicular to k (i.e. u ). dicular to the ordinary polarization component Eo (u Derive (5.46). A quartz plate (uniaxial crystal with the optic axis perpendicular to the surfaces) has thickness d = 0.96 mm. The indices of refraction are n o = 1.54424 and n e = 1.55335. A plane wave with wavelength vac = 633 nm passes through the plate. After emerging from the crystal, there is a phase difference between the two polarization components of the plane wave, and this phase difference depends on incident angle i . Use a computer to plot as a function of incident angle from zero to 90 .
P5.8 P5.9
Figure 5.6 Diagram for P 5.9.

128
HINT: For s -polarized light, show that the number of wavelengths that t in the plate is (vac /nd . For p -polarized light, show that o ) cos s the number of wavelengths that t in the plate and the extra leg outside of the plate (see Fig. 5.6) is /nd cos + , where = vac ( vac p ) p d tan s tan p sin i and n p is given by (5.41). Find the difference between these expressions and multiply by 2 to nd . L5.10 In the laboratory, send a HeNe laser (vac = 633 nm) through two crossed polarizers, oriented at 45 and 135 . Place the quartz plate described in P 5.9 between the polarizers on a rotation stage. Now equal amounts of s - and p -polarized light strike the crystal as it is rotated from normal incidence.
Figure 5.7 Schematic for L 5.10. If the phase shift between the two paths is an odd integer times , the crystal acts as a half wave plate and maximum transmission through the second polarizer results. If the phase shift is an even integer times , then minimum transmission through the second polarizer results. Plot these measured maximum and minimum points on your computergenerated graph of the previous problem.
16 14
Phase Difference Dim spots Bright spots
Phase Difference
12 10 8 6 4 2 0 0
20
40
60
80
Incident Angle
Figure 5.8 Plot for P 5.9 and L 5.10.
Review, Chapters 15
Students preparing for an exam will want to understand the following questions and problems thoroughly enough to be able to work them without referring back to previous chapters. True and False Questions R1 T or F: The optical index of any material (not vacuum) varies with frequency. T or F: The frequency of light can change as it enters a crystal (consider low intensityno nonlinear effects). T or F: The entire expression E0 e i (krt ) associated with a light eld (both the real part and the imaginary parts) is physically relevant. T or F: The real part of the refractive index cannot be less than one. T or F: s -polarized light and p -polarized light experience the same phase shift upon reection from a material with complex index. T or F: When light is incident upon a material interface at Brewsters angle, only one polarization can transmit. T or F: When light is incident upon a material interface at Brewsters angle one of the polarizations stimulates dipoles in the material to oscillate with orientation along the direction of the reected k-vector. T or F: The critical angle for total internal reection exists on both sides of a material interface. T or F: From any given location above a (smooth at) surface of water, it is possible to see objects positioned anywhere under the water. T or F: From any given location beneath a (smooth at) surface of water, it is possible to see objects positioned anywhere above the water. T or F: An evanescent wave travels parallel to the surface interface on the transmitted side. 129
R2
R3
R4 R5
R6
R7
R8
R9
R10
R11
130
Review, Chapters 15
R12
T or F: When p -polarized light enters a material at Brewsters angle, the intensity of the transmitted beam is the same as the intensity of the incident beam. T or F: For incident angles beyond the critical angle for total internal reection, the Fresnel coefcients t s and t p are both zero. T or F: As light enters a crystal, the Poynting vector always obeys Snells law. T or F: As light enters a crystal, the k-vector does not obey Snells for the extraordinary wave.
R13
R14
R15
Problems R16 (a) Write down Maxwells equations. (b) Derive the wave equation for E under the assumptions that Jfree = 0 and P = 0 E. Note: ( f) = ( f) 2 f. (c) Show by direct substitution that E (r, t ) = E0 e i (krt ) is a solution to the wave equation. Find the resulting connection between k and . Give appropriate denitions for c and n , assuming that is real. and E0 = E 0 x , nd the associated B-eld. (d) If k = k z (e) The Poynting vector is S = E B/0 , where the elds are real. Derive an expression for I S t . R17 A horizontal and a vertical polarizer are placed in series, and horizon1 tally polarized light with Jones vector enters the system. 0
Figure 5.9
131
(a) What is the Jones vector of the transmitted eld? (b) Now a polarizer at 45 is inserted between the other two polarizers. What is the Jones vector of the transmitted eld? How does the nal intensity compare to initial intensity? (c) Now a quarter wave plate with a fast-axis angle of 45 is inserted between the two polarizers (instead of the polarizer of part (b)). What is the Jones vector of the transmitted eld? How does the nal intensity compare to initial intensity? R18 (a) Find the Jones matrix for half wave plate with its fast axis making an arbitrary angle with the x -axis. HINT: Project an arbitrary polarization with E x and E y onto the fast and slow axes of the wave plate. Shift the slow axis phase by , and then project the eld components back onto the horizontal and vertical axes. The answer is cos2 sin2 2 sin cos 2 sin cos sin2 cos2
(b) We desire to attenuate continuously a polarized laser beam using a half wave plate and a polarizer aligned to the initial polarization of the beam (see gure). The fast axis of the half wave plate is initially aligned in the direction of polarization and then rotated through an angle . What is the ratio of the intensity exiting the polarizer to the incoming intensity as a function of ?
Figure 5.10 Polarizing Elements R19 Consider an interface between two isotropic media where the incident eld is dened by Ei = E i
(p )
cos i z sin i + x E i(s ) e i [ki ( y sin i +z cos i )i t ] y
132
Review, Chapters 15
The plane of incidence is shown in Fig. 5.11 (a) By inspection of the gure, write down similar expressions for the reected and transmitted elds (i.e. Er and Et ). (b) Find an expression relating Ei , Er , and Et using the boundary condition at the interface. From this expression obtain the law of reection and Snells law. (c) The boundary condition requiring that the tangential component of B must be continuous leads to n i (E i E r ) = n t E t
(p ) (p ) (p )
(s ) (s ) n i (E i(s ) E r ) cos i = n t E t cos t
Use this and the results from part (b) to derive rp You may use the identity sin i cos i sin t cos t tan (i t ) = sin i cos i + sin t cos t tan (i + t ) Er Ei
(p ) (p )
tan (i t ) tan (i + t )
Figure 5.11 R20 The Fresnel equations are rs

(s ) Er
Ei
(s )
sin t cos i sin i cos t sin t cos i + sin i cos t 2 sin t cos i sin t cos i + sin i cos t
ts
(s ) Et
Ei
(s )
133
rp
Er Ei
(p ) (p )
cos t sin t cos i sin i cos t sin t + cos i sin i 2 cos i sin t cos t sin t + cos i sin i
tp
Et Ei
(p )
(p )
(a) Find what each of these equations reduces to when i = 0. Give your answer in terms of n i and n t . (b) What percent of light (intensity) reects from a glass surface (n = 1.5) when light enters from air (n = 1) at normal incidence? (c) What percent of light reects from a glass surface when light exits into air at normal incidence? R21 Light goes through a glass prism with optical index n = 1.55. The light enters at Brewsters angle and exits at normal incidence.
Figure 5.12 (a) Derive and calculate Brewsters angle B . You may use the results of R19 (c). (b) Calculate . (c) What percent of the light (power) goes all the way through the prism if it is p -polarized? Ignore light that might make multiple reections within the prism and come out with directions other than that shown by the arrow. You may use the Fresnel coefcients given in R20. (d) What percent for s -polarized light? R22 A 45 - 90 - 45 prism is a good device for reecting a beam of light parallel to the initial beam. The exiting beam will be parallel to the entering beam even when the incoming beam is not normal to the front surface (although it needs to be in the plane of the drawing). (a) How large an angle can be tolerated before there is no longer total internal reection at both interior surfaces? Assume n = 1 outside of the prism and n = 1.5 inside.
134
Review, Chapters 15
Figure 5.13 (b) If the light enters and leaves the prism at normal incidence, what will the difference in phase be between the s and p -polarizations? You may use the Fresnel coefcients given in R20. R23 Second harmonic generation (the conversion of light with frequency into light with frequency 2) can occur when very intense laser light travels in a material. For good harmonic production, the laser light and the second harmonic light need to travel at the same speed in the material. In other words, both frequencies need to have the same index of refraction so that harmonic light produced down stream joins in phase with the harmonic light produced up stream, referred to as phase matching. This ensures a coherent building of the second harmonic eld rather than destructive cancellations. Unfortunately, the index of refraction is almost never the same for different frequencies in a given material, owing to dispersion. However, we can achieve phase matching in some crystals where one frequency propagates as an ordinary wave and the other propagates as an extraordinary wave. We cause the two indices to be precisely the same by tuning the angle of the crystal. Consider a ruby laser propagating and generating the second harmonic in a uniaxial KDP crystal (potassium dihydrogen phosphate). The indices of refraction are given by n o and no ne
2 2 no sin2 + n e cos2
where is the angle made with the optic axis. At the frequency of a ruby laser, KDP has indices n o () = 1.505 and n e () = 1.465. At the frequency of the second harmonic, the indices are n o (2) = 1.534 and n e (2) = 1.487. Show that phase matching can be achieved if the laser is polarized so that it experiences only the ordinary index and the second harmonic
135
light is polarized perpendicular to that. At what angle does this phase matching occur?
Selected Answers
R17: (b) 1/4, (c) 1/2. R20: (b) 4% (c) 4%. R21: (b) 33 , (c) 95%, (d) 79%. R22: (a) 4.8 , (b) 74 . R23: 51.12 .
Chapter 6
Multiple Parallel Interfaces

6.1 Introduction
In chapter 3, we studied the transmission and reection of light at a single interface between two isotropic and homogeneous materials with indices n 0 and n 2 . We found that the percent of light reected and transmitted depends on the incident angle 0 and on whether the light is s or p -polarized. The connection between the reected and transmitted elds and the incident eld is given by the Fresnel coefcients (3.18)(3.21). The fraction of the incident power going into the reected or transmitted beams is given by either R s and T s or R p and T p , depending on the polarization of the incident light (see (3.22) and (3.25)). In this chapter we consider the overall transmission and reection through two parallel interfaces, where a layer of a third material is inserted between the initial and nal materials. This situation occurs frequently in optics. For example, lenses are often coated with a thin layer of material in an effort to reduce reections. A metal mirror usually has a thin oxide layer or a protective coating between the metal and the air. Section 6.2 introduces the general formalism for the double boundary problem. In section 6.3 the results are manipulated into an easier-to-interpret form, valid as long as the critical angle for total internal reection is not exceeded at the rst interface. In section 6.4 we examine the tunneling of evanescent waves across a gap between two parallel surfaces when the critical angle for total internal reection is exceeded. The formalism we develop for the double-boundary problem is useful for describing a simple instrument called a Fabry-Perot etalon (or interferometer if the instrument has the capability of variable spacing between the two surfaces). The Fabry-Perot etalon, which is useful for distinguishing closely spaced wavelengths, is constructed from two partially reective surfaces separated by a xed distance. Beginning in section 6.8, we study multilayer coatings, where an arbitrary number of interfaces exist between many material layers. Multilayers are often used to make highly reective mirror coatings from dielectric materials (as opposed to metallic materials). Such mirror coatings can reect with efciencies 137
138
Chapter 6 Multiple Parallel Interfaces
greater than 99.9% at certain wavelengths. In contrast, metallic mirrors typically reect with 96% efciency, which can be a signicant loss if there are many mirrors in an optical system. Dielectric multilayer coatings also have the advantage of being more durable and harder to damage with high-intensity lasers.
6.2 Double Boundary Problem Solved Using Fresnel Coefcients

Consider a slab of material sandwiched between two other materials as depicted in Fig. 6.1. Because there are multiple reections inside the middle layer, we have dropped the subscripts i, r, and t used in chapter 3 and instead use the symbols and to indicate forward and backward traveling waves, respectively. Let n 1 stand for the refractive index of the middle layer. In preparation for our treatment of many-layer systems, we use n 0 and n 2 to represent the indices of the other two regions. For simplicity, we assume that indices are real. As with the (s ) single-boundary problem, we are interested in nding the transmitted elds E 2 (p ) (p ) (s ) and E 2 in terms of the incident elds E 0 and E 0 . Similarly, we can also nd the (p ) (p ) (s ) (s ) reected elds E 0 and E 0 in terms the incident elds E 0 and E 0 . Both forward and backward-traveling plane waves exist in the middle material. Our intuition rightly tells us that in this region there are many reections, bouncing both forward and backwards between the two surfaces. It might therefore seem that there should be an innite number of elds represented, each corresponding to a different bounce. Fortunately, the forward-traveling plane waves arising from the many bounces in the middle layer all travel in the same
Figure 6.1 Waves propagating through a dual interface between materials.

139
direction. Similarly, the backwards-traveling plane waves arising from the many bounces travel in a single direction. Hence, these many elds join neatly into a net forward-moving and a net backwards-moving plane wave eld. As of yet, we do not know the amplitudes and phases of the two resulting (s ) (s ) plane waves in the middle layer, but we can denote them by E 1 and E 1 or by (p ) (p ) (s ) E 1 and E 1 , separated into their s or p -components, as usual. Similarly, E 0 (p ) (p ) (s ) and E 0 as well as E 2 and E 2 are understood to include all elds which leak through the surfaces on each of the repeated bounces. All of these are included in the overall reection and transmission of the elds. Thus, we need not concern ourselves with the innite number of plane wave elds arising from the many bounces; we need only consider the ve plane waves depicted in Fig. 6.1. The elds at the boundaries are connected via the Fresnel coefcients (3.18) (3.21), which are direct consequences of Maxwells equations. At the rst surface we dene sin 1 cos 0 sin 0 cos 1 sin 1 cos 0 + sin 0 cos 1 2 sin 1 cos 0 0 1 ts sin 1 cos 0 + sin 0 cos 1 cos 1 sin 1 cos 0 sin 0 0 1 rp cos 1 sin 1 + cos 0 sin 0 2 cos 0 sin 1 0 1 tp cos 1 sin 1 + cos 0 sin 0
0 rs 1
(6.1)
The notation 0 1 indicates the rst surface from the perspective of starting on the incident side and propagating towards the middle layer. The coefcients (6.1) are written as though the problem involves only a single interface. They do not take into account any feedback from the second surface. Similarly, the single-boundary Fresnel coefcients for light approaching the rst interface from within the middle layer are
1 rs 1 ts 1 rp 1 tp
0 = r s
2 sin 0 cos 1 sin 0 cos 1 + sin 1 cos 0

1
0 = r p
(6.2)
2 cos 1 sin 0 cos 0 sin 0 + cos 1 sin 1
The notation 1 0 indicates connections at the rst interface, but from the perspective of beginning inside the middle layer. Finally, the single-boundary
140
coefcients for light approaching the second interface are sin 2 cos 1 sin 1 cos 2 sin 2 cos 1 + sin 1 cos 2 2 sin 2 cos 1 1 2 ts sin 2 cos 1 + sin 1 cos 2 cos 2 sin 2 cos 1 sin 1 1 2 rp cos 2 sin 2 + cos 1 sin 1 2 cos 1 sin 2 1 2 tp cos 2 sin 2 + cos 1 sin 1
1 rs 2
(6.3)
The notation 1 2 indicates connections made at the second interface from the perspective of beginning in the middle layer. Our task is to connect the ve plane waves depicted in Fig. 6.1 using the various Fresnel coefcients (6.1)(6.3). For simplicity, we will consider s -polarized light, but the analysis can be extended to p -polarized light simply by changing the subscripts in the derivation. We begin at the second interface, which looks like a single-boundary problem (i.e. only one plane wave on the transmitted (s ) side). The eld E 1 represents the forward-traveling eld of the middle region evaluated at the origin ( y , z ) = (0, 0), which we arbitrarily dene to be located at the rst interface. At the second interface, the forward traveling wave is given by (s ) i k1 r d and k1 = k 1 y sin 1 + z cos 1 . The transmitted eld in E1 e , where r = z the third medium is related to the forward-traveling eld of the middle region via
(s ) 1 2 (s ) i k 1 d cos 1 E2 = ts E1 e
(6.4)
where we have adjusted the phase of the eld in (6.4) by k1 r = k 1 d cos 1 . Keep in mind that (6.4) represents the connection made at the point ( y , z ) = (s ) (0, d ) on the second interface. In the case of the transmitted eld, we let E 2 stand for the transmitted eld at the point ( y , z ) = (0, d ); its phase is built into its 1 2 denition. The factor t s is the single-boundary Fresnel transmission coefcient at the interface (6.3), and we have used it in a manner consistent with our previous analysis in chapter 3. We have written (6.4) for s -polarized light. The equation looks the same for p -polarized light; just replace the subscript s with p . Through the remainder of this section and the next, we will continue to economize by writing the equations only for s -polarized light with the understanding that they apply equally well to p -polarized light. The backward-traveling plane wave in the middle region arises from the reection of the forward-traveling plane wave in that same region. In this case, the connection using the appropriate Fresnel coefcient gives
(s ) i k 1 d cos 1 1 2 (s ) i k 1 d cos 1 E1 e = rs E1 e
(6.5)
(s ) Here again we have chosen to let E 0 represent a plane wave eld referenced to the origin ( y , z ) = (0, 0). Therefore, the factor e i k1 d cos 1 is needed at ( y , z ) = (0, d )
141
d ) since the k-vector for the reverse-traveling eld in the middle region (i.e. r = z sin 1 z cos 1 . is k1 = k 1 y We next connect the two plane waves in the middle region with the incident (s ) (s ) plane wave. In this case we must simultaneously connect E 1 with both E 0 and (s ) E 1 since they each give a contribution:
(s ) 0 1 (s ) 1 0 (s ) E1 = ts E0 + r s E1
(6.6)
Since all elds in (6.6) are evaluated at the origin ( y , z ) = (0, 0), there is no need for any phase factors like in (6.4) or (6.5). The relation (6.6) shows that the forward traveling wave in the middle region arises from both a transmission of the incident wave and a reection of the backwards-traveling wave in the middle region. (We (s ) could also write an expression involving the overall reected eld E 0 , but we refrain.) In summary, we have used the single-boundary Fresnel coefcients to construct the necessary connections in the double-boundary problem. We next solve (6.4)(6.6) to nd the nal transmitted eld in terms of the (s ) (s ) incident eld. We do this by eliminating E 1 and E 1 from the expressions. Equation (6.4) can be inverted as follows:
(s ) E1 = (s ) E2 1 2 i k 1 d cos 1 ts e
(6.7)
When this is substituted into (6.5), we obtain

(s ) E1 = 1 2 i k 1 d cos 1 rs e 1 ts 2 (s ) E2
(6.8)
Substitution of (6.7) and (6.8) into (6.6) yields

(s ) E2 1 2 i k 1 d cos 1 ts e 0 1 (s ) 1 = ts E0 + r s 0 1 2 i k 1 d cos 1 rs e 1 ts 2 (s ) E2
(6.9)
This can be simplied to

(s ) t 0 1t 1 2 E2 = i k d cos s 1 s (s ) 0 1 2 i k 1 d cos 1 1 r E0 e 1 s rs e
(6.10)
where the factor k 1 d cos 1 =
2n 1 d cos 1 vac
(6.11)
represents the phase acquired by either plane wave in traversing the middle region (see (2.25) and (2.27)). Actually, we are mainly interested in the fraction of the power that emerges through the nal surface. As in (3.29), the fraction of power transmitted is given by (s ) 2 n 2 cos 2 E 2 tot Ts = (2 real). (6.12) (s ) n 0 cos 0 E 0
142
Of course the relationship

tot T stot + R s =1
(6.13) T stot
still applies, but it is convenient for us to compute directly through (6.12) tot instead of indirectly from R s . When the transmitted angle 2 is real, we may write the fraction of the transmitted power as T stot =
1 2 0 1 ts ts n 2 cos 2 1 0 1 2 i k 1 d cos 1 n 0 cos 0 e i k1 d cos 1 r s rs e
(2 real)
(6.14)
in accordance with (6.10) and (6.12). As was mentioned, (6.14) applies equally well to p -polarized light (just change the subscripts). Equation (6.14) is valid also even if the angle 1 is complex. Thus, it can be applied to the case of evanescent waves tunneling through a gap where 0 is beyond the critical angle for total internal reection from the middle layer. This will be studied further in section 6.4. Note that even if 1 is complex, the angle 2 is still real if the critical angle in the absence of the middle layer is not exceeded.
6.3 Double Boundary Problem at Sub Critical Angles

In the case that 1 is real (or in other words when cos 1 is real so that no evanescent wave is to be considered), we may simplify (6.14) as follows:
T stot = = n 2 cos 2 1 n 0 cos 0 e i k1 d cos 1 r s n 2 cos 2 1 n 0 cos 0 1 + r s n 2 cos 2 n 0 cos 0 1 + r 1 s n 2 cos 2 1 n 0 cos 0 1 + r s
0 0 ts 0 0 ts 1 rs 1 2 1
1 ts
2
0
1 e i k1 d cos 1 e i k1 d cos 1 r s 1 ts 2
1 rs
i k 1 d cos 1 e
2
0 1 rs 2
1 rs
1 2Re r s 0 ts
e 2i k1 d cos 1
2
1 0
1 ts
2
0 1 rs 2
1 rs
1 2Re r s 1
i r 1
s
i r 1
s
e 2i k1 d cos 1
0 ts 0
1 ts 0
2
1 rs 2
1 rs
1 2 rs
cos + r s
(2 and 1 real) (6.15)
On the last line we have introduced the denitions 2k 1 d cos 1 and r s r s1 The phase terms r s1 the relationships and
1 rs 2 1 = rs 2 0 0
(6.16)
+ r s1
(6.17)
and r s1
are dened indirectly and may be extracted from

0 1 = rs 0
1 rs
e e
i r 1
s
(6.18) (6.19)
i r 1
s
6.3 Double Boundary Problem at Sub Critical Angles
143
We can continue our simplication of (6.15) by using the following identity: cos = 1 2 sin2 2 (6.20)
where + r s . With this, (6.15) can be written as T stot = = = where T smax Fs

0 n 2 cos 2 t s 1
n 2 cos 2 1 n 0 cos 0 1 + r s n 2 cos 2 1 n 0 cos 0 1 r s T smax 1 + F s sin2

2
0 ts 0
1 ts 0 2
2
2
1 rs
1 2 rs 1
1 rs
1 2 sin2 sin2
2
0 ts 0 1 rs 2
1 ts
(6.21)
1 + 4 rs
1 rs
(2 and 1 real)
2
0
1 ts
2
2
1 n 0 cos 0 1 r s 1 4 rs 0 0 1 rs 1 rs 2 2
1 rs
(6.22) (6.23)
1 1 rs
The quantity T smax is the maximum possible transmittance of power through the surfaces, and F s is called the coefcient of nesse (not to be confused with reecting nesse discussed in section 6.7), which determines how strongly the transmittance is inuenced by varying the spacing d or the wavelength vac (causing to vary). The maximum transmittance T smax can be manipulated as follows: T smax =
n 1 cos 1 n 0 cos 0
0 ts 1
2 n 2 cos 2 n 1 cos 1
0 1 rs 2
1 ts
1 1 rs
= 1
T s0 1 T s1 Rs
1 0
mt Rs
(6.24)
where we have introduced the familiar single-boundary reectance and transmittance of the power at each of the interfaces. Similarly, we can simplify the expression for the nesse coefcient: Fs = 4 1
1 0 1 Rs Rs 2
2 mi 1 Rs Rs
2
(6.25)
1 0 0 1 Please note that R s = Rs , as veried from (6.2). Again, although the above equations have been written expressly for s -polarized light, they can be used for p -polarized light by changing all subscripts to p .
Example 6.1
144
You desire to make a beam splitter for s -polarized light as shown in Fig. 6.2 by coating a piece of glass (n = 1.5) with a thin lm of zinc sulde (n = 2.32). The idea is to get about half of the light to reect from the front of the glass. An antireection coating is applied to the back surface of the glass. The light is incident at 45 as shown in Fig. 6.2. Find the highest transmittance possible through an antireection lm of magnesium uoride (n = 1.38) at the back surface of the beam splitter. Find the smallest possible d 2 that accomplishes this for light with wavelength vac = 633 nm. (In P 6.3 you will consider the reection from the front coating.) NOTE: Since the antireection lms are usually imperfect, beam splitter substrates are often slightly wedged so that unwanted reections from the second surface exit in a different direction.
Solution: We have Figure 6.2 n 0 = 1.5 n 1 = 1.38 n2 = 1 2 = 45 n 1 sin 1 = sin 2 1 = sin1 sin 45 = 30.82 1.38 sin (1 2 ) sin (30.82 45 ) 1 2 rs = = 0.253 = sin (1 + 2 ) sin (30.82 + 45 ) sin 45 n 0 sin 0 = sin 2 0 = sin1 = 28.13 1.5 sin (1 0 ) sin (30.82 28.13 ) 1 0 = 0.0549 rs = = sin (1 + 0 ) sin (30.82 + 28.13 )
0 2 1 2
1 Rs 1 Rs
|0.0549|2 = 0.0030 |0.253|2 = 0.0640 = T s1

0 0 1 = 1 Rs 1 2 0
T s0 Ts
1
= 1 0.0030 = 0.997
= 1 Rs 4 R1 1 R1
= 1 0.0640 = 0.936
2 2
r s r s1 F=
+ r s1
0
= +0 =
2
2 2 2
R1 R1
1
4 (0.0030) (0.0640) 1 (0.0030) (0.0640)

2 2
= 0.0570 = 0.960
T smax = 1 T stot =
T s0
T s1
0
(0.997) (0.936) 1 (0.0030) (0.0640)

2
1 Rs
1 Rs
0.960 1 + 0.0570 sin2

+ 2 + 2
The maximum transmittance occurs when sin2 meaning that 96% of the light is transmitted. + = 2k 1 d 2 cos 1 + = 2 d 2 =
= 0. In that case, Ttot = 0.960,
vac 633 nm = = 134 nm 4n 1 cos 1 4 (1.38) cos 30.82

6.4 Beyond Critical Angle: Tunneling of Evanescent Waves
145
Without the coating, (i.e. d 2 = 0), the transmittance through the antireection coating would be 0.908, so the coating does give an improvement.
6.4 Beyond Critical Angle: Tunneling of Evanescent Waves

The formula (6.14) for the transmittance holds, even if the middle angle 1 doesnt exist in a physical sense (i.e. if it is complex). We can use (6.14) to describe frustrated total internal reection where 0 and 2 exceed the critical angle. In this case an evanescent wave occurs in the middle region. If the second surface is brought close to the rst and the spacing between the two surfaces is small enough, the evanescent wave stimulates the second surface and a transmitted wave results. It is often inconvenient to deal with a complex angle 1 when calculating the single-boundary Fresnel coefcients, so we rewrite sin 1 using Snells law: n0 n2 sin 0 = sin 2 (6.26) sin 1 = n1 n1 and cos 1 as cos 1 = i sin2 1 1 (6.27)
Note that beyond the critical angle, sin 1 is greater than one.
Example 6.2
Calculate the transmittance of p -polarized light through the region between two closely spaced 45 right prisms as a function of the vacuum wavelength vac and the prism spacing d , as shown in Fig. 6.3 (see P 6.4 for the s -polarized case). Take the index of refraction of the prisms to be n = 1.5, surrounded by index n = 1 and use 0 = 2 = 45 . Neglect possible reections from the exterior surfaces of the prisms.
Figure 6.3 Frustrated total internal reection in two prisms.
Solution: First we must compute the Fresnel coefcients appearing in (6.14). From (6.1)(6.3) we compute the various necessary Fresnel coefcients, using
146
(6.26) and (6.27) to handle the complex angles:

0 tp 1
2 cos 0 sin 1 cos 1 sin 1 + cos 0 sin 0
= i
2
2 cos 0 (n sin 0 ) n 2 sin2 0 1 (n sin 0 ) + cos 0 sin 0

2
(6.28) = 5.76
1 tp
2 cos 1 sin 2 cos 2 sin 2 + cos 1 sin 1 2 i
n 2 sin2 2 1 sin 2 n 2 sin2 0 1 (n sin 0 )
(6.29) = 0.64
cos 2 sin 2 + i
1 rp
1 = rp
0 = r p
cos 1 sin 1 cos 0 sin 0 cos 1 sin 1 + cos 0 sin 0 i n 2 sin2 0 1 (n sin 0 ) cos 0 sin 0 n 2 sin2 0 1 (n sin 0 ) + cos 0 sin 0 (6.30)
= i
= e i 1.287 We also need k 1 d cos 1 = 2 d cos 1 vac n 2 sin2 0 1 d vac d vac (6.31)
= 2 i = i 2.22
Now we are ready to compute the net transmittance (6.14). Since 0 = 2 and n 0 = n 2 , we have
tot Tp =
0 ts 1
1 ts 0
2 2
1 e i k1 d cos 1 r s
= e = e = e = e = e
i i 2.22
1 2 i k 1 d cos 1 rs e (5.76)(0.64)
d vac
e i 1.287 e i 1.287 e
d vac
i i 2.22
d vac
3.69
2.22
d vac
2.22
i 2.574
2.22
d vac
2.22
d vac
+i 2.574
(6.32)
3.69
4.44
d vac
+e
4.44
d vac
e i 2.574 +e i 2.574 2
3.69
4.44
d vac
+e 3.69 +e
4.44
d vac
2 cos(2.574) + 1.69
4.44
d vac
4.44
d vac
6.5 Fabry-Perot
147
Figure 6.4 shows a plot of the transmittance (6.32) calculated in Example 6.2. Notice that the transmittance goes to one as expected when the two prisms tot are brought together: T p (d /vac = 0) = 1. When the prisms get to be about a wavelength apart, the transmittance is signicantly reduced, and as the distance gets large compared to a wavelength, the transmittance quickly goes to zero tot (T p (d /vac 1) 0).
6.5 Fabry-Perot
Marie Paul Auguste Charles Fabry (1867-1945) and Jean Baptiste Gaspard Gustave Alfred Perot (1863-1925) realized that a double interface could be used to distinguish wavelengths of light that are very close together. The Fabry-Perot instrument consists simply of two identical (parallel) surfaces separated by spacing d . Our analysis in section 6.3 applies. For simplicity, we choose the refractive index before the initial surface and after the nal surface to be the same (i.e. n 0 = n 2 ). We assume that the transmission angles are such that total internal reection is avoided. Whether the double-boundary setup transmits light well or poorly depends on the exact spacing between the two boundaries and on the reectivity of the surfaces, as well as on the wavelength of the light. If the spacing d separating the two parallel surfaces is adjustable (scanned), the instrument is called a Fabry-Perot interferometer. If the spacing is xed while the angle of the incident light is varied, the instrument is called a Fabry-Perot etalon. An etalon can therefore be as simple as a piece of glass with parallel surfaces. Sometimes, a thin optical membrane called a pellicle is used as an etalon (occasionally inserted into laser cavities to discriminate against certain wavelengths). However, to achieve sharp discrimination between closely-spaced wavelengths, a large spacing d is desirable. The two surfaces should also reect relatively well, much better than, say, a simple air-glass interface. As we previously derived (6.21), the transmittance through a double boundary is T max T tot = (6.33) 1 + F sin2 2 In the case of identical interface on the incident and transmitted sides, the transmittance and reecance coefcients are the same at each surface (i.e. T = T 0 1 = T 1 2 and R = R 1 0 = R 1 2 ). In this case, the maximum transmittance and the nesse coefcient are T2 T max = (6.34) (1 R )2 and F= 4R (1 R )2 (6.35)
Figure 6.4 Transmittance of p polarized light through a gap between two 45 prisms with n = 1.5 as the gap width is varied (Example 6.2).
In principle, these equations should be evaluated for either s or p -polarized light. However, a Fabry-Perot interferometer or etalon is usually operated near normal incidence so that there is little difference between the two polarizations.
148
Figure 6.5 Transmittance as the phase is varied. The different curves correspond to different values of the nesse coefcient. 0 represents a large multiple of 2.
When using a Fabry-Perot instrument, one observes the transmittance T tot as the parameter is varied (see (6.16) and (6.20)). The parameter can be varied by altering d , 1 , or as prescribed by = 4 n 1 d cos 1 + r vac (6.36)
To increase the sensitivity of the instrument, it is desirable to have the transmittance T tot vary strongly when is varied. By inspection of (6.33), we see that T tot varies strongest if the nesse coefcient F is large. We achieve a large nesse coefcient by increasing the reectance R . The total transmittance T tot (6.33) through a Fabry-Perot instrument is depicted in Fig. 6.5 as a function of . The various curves correspond to different values of F . Typical values of can be extremely large. For example, suppose that the instrument is used at near-normal incidence (i.e. cos 1 = 1) with a wavelength of vac = 500 nm and an interface separation of d 0 = 1 cm. From (6.36) the value of (ignoring the constant phase term r ) is approximately 0 = 4 (1 cm) = 80, 000 500 nm (6.37)
As we vary d , , or 1 by small amounts, we can easily cause to change by 2 as depicted in Fig. 6.5. The gure shows small changes in above a value 0 , which represents a large multiple of 2. The basic setup of a Fabry-Perot instrument is shown in Fig. 6.6. In order to achieve a relatively high nesse coefcient F , we require fairly high reectivities at the two surfaces. To accomplish this, special coatings can be applied to the surfaces, for example, a thin layer of silver (or some other coating) to achieve a partial reection, say 90%. Typically, two glass substrates are separated by
6.5 Fabry-Perot
149
Figure 6.6 Typical Fabry-Perot setup. If the spacing d is variable, it is called an interferometer; otherwise, it is called an etalon.
distance d , with the coated surfaces facing each other as shown in the gure. The substrates are aligned so that the interior surfaces are parallel to each other. It is typical for each substrate to be slightly wedge-shaped so that unwanted reections from the outer surfaces do not interfere with the double boundary situation between the two plates. Actually, each interior coating may be thought of as its own double-boundary problem (or multiple-boundary as the case may be). However, without regard for the details of the coatings, we can say that each coating has a certain overall transmittance T and a certain overall reection R . As light goes through the coating, it can also be attenuated through absorption. Therefore, at each coating surface we have R +T + A = 1 (6.38)
where A represents the amount of light absorbed at a coating. Notice from (6.38) that when we increase the value of R , the value of T must decrease. Thus, to the extent that A is non zero, there is an apparent tradeoff between increasing the nesse coefcient F and maintaining a bright (observable) transmittance T max through the instrument (see (6.34) and (6.35)). However, in Fig. 6.5, each curve is plotted in terms of its own T max . The reection phase r in (6.36) depends on the exact nature of the coatings in the Fabry-Perot instrument. However, we do not need to know the value of r (depending on both the complex index of the coating material and its thickness). Whatever the value of r , we only care that it is constant. Experimentally, we can always compensate for the r by tweaking the spacing d . Note that the required tweak on the spacing need only be a fraction of a wavelength, which is tiny when compared to the overall spacing d , typically many thousands of wavelengths. In the next section, we examine the transmittance (6.33) in detail as the spacing d and the angle 1 are adjusted. We also discuss typical experimental arrangements for a Fabry-Perot interferometer or etalon. In section 6.7, we examine how a Fabry-Perot instrument is able to distinguish closely spaced wavelengths,
150
Figure 6.7 Setup for a Fabry-Perot interferometer.
and we will introduce the concept of free spectral range and resolving power of the instrument.
6.6 Setup of a Fabry-Perot Instrument

Figure 6.7 shows the typical experimental setup for a Fabry-Perot interferometer. A collimated beam of light is sent through the instrument. The beam is aligned so that it is normal to the surfaces. It is critical for the two surfaces of the interferometer to be very close to parallel. For initial alignment, the back-reected beams from each surface can be monitored to ensure rough alignment. Then as fringes appear, the alignment is further adjusted until the entire transmitted beam becomes one large fringe, which blinks all together as the spacing d changes (by tiny amounts). A mechanical actuator is then used to vary the spacing between the plates, and the transmittance of the light is observed with a detector connected to an oscilloscope. The sweep of the oscilloscope must be synchronized with the period of the (oscillating) mechanical driver. To make the alignment of the instrument less critical, a small aperture can be placed in front of the detector so that it observes only a small portion of the beam. The transmittance as a function of plate separation is shown in Fig. 6.8. In this case, varies via changes in d only (see (6.36) with cos 1 = 1 and xed wavelength). As the spacing is increased by only a half wavelength, the transmittance changes through a complete period. Figure 6.8 shows what is seen on an oscilloscope when the mechanical driver travels at constant velocity. The various peaks in the gure are called fringes. The setup for a Fabry-Perot etalon is similar to that of the interferometer. The key difference is that the angle of the incident light is varied rather than the plate separation. One way to do this is to observe light from a point source which forms a conical beam that transverses the device, as depicted in Fig. 6.9. Different portions of the beam go through the device at different angles. When aligned straight on, the transmitted light forms a bulls-eye pattern on a screen, as will be described below. Often the two surfaces in the etalon are held parallel to each other by a precision ring spacer to eliminate the need for alignment. In Fig. 6.10 we graph the transmittance T tot (6.33) as a function of angle
6.6 Setup of a Fabry-Perot Instrument
151
Figure 6.8 Transmittance as the separation d is varied (F = 100). d 0 represents a large distance for which is a multiple of 2.
(holding wavelength and plate separation xed). Since cos 1 is not a linear function, the spacing of the peaks varies with angle. Actually, as 1 increases from zero, the cosine steadily decreases, causing to decrease. Each time decreases by 2 we get a new peak. Not surprisingly, only a modest change in angle is necessary to cause the transmittance to vary from maximum to minimum, or vice versa. In Fig. 6.10, we have again assumed vac = 500 nm and d 0 = 1 cm. The advantage to the Fabry-Perot etalon (as opposed to the interferometer) is that no moving parts are needed. The disadvantage is that light must be sent through the instrument at many angles to see the variation in the transmittance. The peaks in the gure are called fringes. An example of the bulls-eye pattern observed with this setup is shown in Fig. 6.10(b). An increase in radius corresponds to an increase in the cone angle. Thus, the bulls-eye pattern can be understood as the curve in Fig. 6.10(a) rotated about a circle. If the wavelength or the spacing between the plates were to vary, the radii (or angles) where the fringes appear would shift accordingly. For example, the center spot could become dark. Finally, consider the setup shown in Fig. 6.11, which is used to observe light from a diffuse source. The earlier setup shown in Fig. 6.9 wont work for a diffuse source unless all of the light is blocked except for a small point source. This is impractical if there remains insufcient illumination at the nal screen for observation. In order to preserve as much light as possible we can sandwich the etalon between two lenses. We place the diffuse source at the focal point of the rst lens. We place the screen at the focal point of the second lens. This causes an image of the source to appear on the screen. (If the diffuse source has the shape of Mickey Mouse, then an image of Mickey Mouse appears on the screen.) Each point of the diffuse source is mapped to a corresponding point on the screen; the orientation of the points is preserved (albeit inverted). In addition, the light associated with any particular point of the source travels as a collimated beam in
152
the region between the lenses. Each collimated beam traverses the etalon with a unique angle. Because of the differing angles, the light associated with each point traverses the etalon with higher or lower transmittance. The result is that the Bulls eye pattern seen in Fig. 6.10 becomes superimposed on the image of the diffuse source. One can observe the pattern directly by substituting the lens and retina of the eye for the nal lens and screen.
6.7 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument

Thus far, we have examined how the transmittance through a Fabry-Perot instrument varies with surface separation d and angle 1 . However, the main purpose of a Fabry-Perot instrument is to measure small changes in the wavelength of light, which similarly affects the value of (see (6.36)). Consider a Fabry-perot interferometer where the transmittance through the instrument is plotted as a function of surface separation d . (For purposes of the following discussion, we could have instead chosen a Fabry-Perot etalon at various transmittance angles.) Let the spacing d 0 correspond to the case when is a multiple of 2 for the wavelength vac . Next suppose we adjust the wavelength of the light from vac = 0 to vac = 0 + while observing the transmittance. As we do this, the value of changes. Fig. 6.12 shows what happens as we scan the spacing d of the interferometer in the neighborhood of d 0 . A change in wavelength causes the position of the fringes to shift so that a peak no longer occurs when the spacing is d 0 . The dashed line corresponds to a different wavelength. We now nd the connection between a change in wavelength and the amount that changes, giving rise to the fringe shift seen in Fig. 6.12. Suppose that the transmittance through the Fabry-Perot instrument is maximum at the wavelength 0 . That is, we have 4n 1 d 0 cos 1 + r (6.39) 0 where 0 is an integer multiple of 2. Now consider what happens to as the wavelength increases. At a new wavelength (all else remaining the same) we have 0 = = 4n 1 d 0 cos 1 + r 0 + (6.40)
The change in wavelength is usually very small compared to 0 , so we can represent the denominator with the rst two terms of a Taylor-series expansion: 1 /0 1 1 = = 0 + 0 (1 + /0 ) 0 Then (6.40) can be rewritten as 0 = 4n 1 d 0 cos 1 2 0 (6.42)
(6.41)
153
Figure 6.9 A diverging monochromatic beam traversing a Fabry-Perot etalon.
Figure 6.10 (a) Transmittance as the angle 1 is varied. It is assumed that the distance d is chosen such that is a multiple of 2 when the angle is zero. (b) Pattern on the screen of a diverging monochromatic beam traversing a Fabry-Perot etalon with F = 10.
Figure 6.11 Setup of a Fabry-Perot etalon for looking at a diffuse source.
154
Figure 6.12 Transmittance as the spacing d is varied for two different wavelengths (F = 100). The solid line plots the transmittance of light with a wavelength of 0 , and the dashed line plots the transmittance of a wavelength shorter than 0 . Note that the fringes shift positions for different wavelengths.
Equation (6.42) enables us to compute the amount of fringe shift (like those seen in Fig. 6.12) for a given change in wavelength. Conversely, if we observe a certain shift in the location of the fringes we can say by what amount the wavelength must have changed. If the change in wavelength is enough to cause to decrease by 2, the fringes in Fig. 6.12 shift through a whole period, and the picture looks the same. This behavior shows an important limitation of the instrument. If the fringes shift by too much, we might become confused as to whether anything has changed at all, owing to the periodic nature of the fringes. We can avoid this confusion if we are able to watch the fringes shift as we continuously vary the wavelength, but for many applications we do not have continuous control over the wavelength. For example, we may want to send simultaneously two nearby wavelengths through the instrument to make a comparison. If the wavelengths are separated by too much, we may be confused. The fringes of one wavelength may be shifted past several fringes of the other wavelength, and we will not be able to tell by how much they are different. This introduces the concept of free spectral range, which is the wavelength change FSR that causes the fringes to shift through one period. We nd this by setting (6.42) equal to 2. After rearranging, we get FSR = 2 vac 2n 1 d 0 cos 1 (6.43)
If the wavelength is vac = 500 nm and the spacing is d 0 = 1 cm, the free spectral range is FSR = (500 nm)2 /2(1 cm) = 0.0 13 nm, assuming near normal incidence and an index n 1 = 1. This extremely narrow wavelength range is the widest that should be examined for the given parameters. In summary, the free spectral range
155
Figure 6.13 Transmittance as function of angle through a Fabry-Perot etalon. Two nearby wavelengths are sent through the instrument simultaneously, (left) barely resolved and (right) easily resolved.
is the largest change in wavelength permissible while avoiding confusion. To convert this wavelength difference FSR into a corresponding frequency difference, one differentiates = 2c /vac to get || = 2 c 2 vac (6.44)
We next consider the smallest change in wavelength that can be noticed, or resolved with a Fabry-Perot instrument. For example, if two very near-by wavelengths are sent through the instrument simultaneously, we can distinguish them only if the separation between their corresponding fringe peaks is at least as large as the width of individual peaks. This situation of two barely resolvable fringe peaks is shown on the left of Fig. 6.13. We will look for the wavelength change that causes a peak to shift by its own width. We dene the width of a peak by its full width at half maximum (FWHM). Again, let 0 be a multiple of 2 so that a peak in transmittance occurs when = 0 . In this case, we have from (6.33) that T tot = T max 1+F
0 sin2 2
= T max
(6.45)
If varies from 0 to 0 FWHM /2, then, by denition, the transmittance drops to one half. Therefore, we may write T tot = T max 1 + F sin2
0 FWHM /2 2
T max 2
(6.46)
We solve (6.46) for FWHM , and we see that this equation requires F sin2
FWHM =1 4
(6.47)
156
where we have taken advantage of the fact that 0 is a multiple of 2. Next, we suppose that FWHM is rather small so that we may represent the sine by its argument. This approximation is okay if the nesse coefcient F is rather large (say, 100). With this approximation, (6.47) simplies to FWHM = 4 F . (6.48)
The ratio of the period between peaks 2 to the width FWHM of individual peaks is called the reecting nesse (or just nesse). f 2 F = FWHM 2 (6.49)
This parameter is often used to characterize the performance of a Fabry-Perot instrument. Note that a higher nesse f implies sharper fringes in comparison to the fringe spacing. Finally, we are ready to compute the minimum wavelength difference that can be resolved using the instrument. The free spectral range FSR compared to the minimum wavelength FWHM is the same as a whole period 2 compared to FWHM , or the reecting nesse f . Therefore, we have FWHM = 2 FSR vac = f n 1 d 0 cos 1 F (6.50)
For vac = 500 nm, d 0 = 1 cm, and F = 100 (again assuming near normal incidence and n 1 = 1), this minimum resolvable wavelength change is FWHM = (500 nm)2 (1 cm) 100 = 0.00080 nm (6.51)
This means that a wavelength spread of 0.00080 nm centered on 0 = 500 nm looks about the same in the Fabry-Perot instrument as a pure wavelength at 0 = 500 nm. However, a wavelength variation larger than this will be noticed. As a nal note, a common characterization of how well an instrument distinguishes close-together wavelengths is given by the ratio of 0 to min , where min is the minimum change of wavelength that the instrument can distinguish in the neighborhood of 0 . (We are not as impressed when min is small, if 0 also is small.) This ratio is called the resolving power of the instrument: RP 0 min (6.52)
In the case of a Fabry-Perot instrument, we have min = FWHM . Fabry-Perot instruments tend to have very high resolving powers since they respond to very small differences in wavelength. When min = 0.00080 nm and 0 = 500 nm, the resolving power of a Fabry-Perot instrument is an impressive RP = 600, 000. For comparison, the resolving power of a typical grating spectrometer is much less (a few thousand). However, a spectrometer has the advantage that it can observe a much wider range of wavelengths at once (not conned within the narrow free spectral range of a Fabry-Perot instrument).
6.8 Multilayer Coatings
157

In this section, we generalize our previous analysis of a double interface to an arbitrary number of parallel interfaces (i.e. multilayer coatings). As we saw in section 6.3, a single coating applied to an optical surface is often insufcient to accomplish the desired effect, especially if the goal is to make a highly reective mirror. For example, if we want to make a mirror surface using a dielectric coating (with the advantage of being less fragile and more reective than a metal coating), a single layer is insufcient to reect the majority of the light, even if a relatively high index is used. In P 6.3 we compute that a single dielectric layer deposited on glass can reect at most about 46% of the light. We would like to do much better (e.g. >99%), and this can be accomplished with multilayer dielectric coatings which can have considerably better reectivities than metal surfaces such as silver. We now proceed to develop the formalism of the general multi-boundary problem. Rather than incorporate the single-interface Fresnel coefcients into the problem as we did in section 6.2, we return to the basic boundary conditions for the electric and magnetic elds at each interface between the layers. We examine p -polarized light incident on an arbitrary multilayer coating (all interfaces parallel to each other). We leave it as an exercise to re-derive the formalism for s -polarized light (see P 6.11). The upcoming derivation is valid also for complex refractive indices, although our notation suggests real indices. The ability to deal with complex indices is very important if, for example, we want to make mirror coatings work in the extreme ultraviolet wavelength range where virtually every material is absorptive. Consider the diagram of a multilayer coating in Fig. 6.14 for which the angle of light propagation in each region may be computed from Snells law: n 0 sin 0 = n 1 sin 1 = = n N sin N = n N +1 sin N +1 (6.53)
where N denotes the number of layers in the coating. The subscript 0 represents the initial medium outside of the multilayer, and the subscript N + 1 represents the nal material, or the substrate on which the layers are deposited. In each layer, only two plane waves exist, each of which is composed of light arising from the many possible bounces from various layer interfaces. The arrows pointing right indicate plane wave elds in individual layers that travel roughly in the forward (incident) direction, and the arrows pointing left indicate plane wave elds that travel roughly in the backward (reected) direction. In the nal (p ) region, there is only one plane wave traveling with a forward direction (E N +1 ) which gives the overall transmitted eld. As we have studied in chapter 3 (see (3.9) and (3.13)), the boundary conditions for the parallel components of the E eld and for the parallel components of the B eld lead respectively to cos 0 E 0 + E 0
(p ) (p )
= cos 1 E 1 + E 1
(p )
(p )
(6.54)
158
Figure 6.14 Light propagation through multiple layers.
and n0 E 0 E 0
(p ) (p )
= n1 E 1 E 1
(p )
(p )
(6.55)
These equations are applicable only for p -polarized light. Similar equations give the eld connection for s -polarized light (see (3.8) and (3.14)). We have applied these boundary conditions at the rst interface only. Of course there are many more interfaces in the multilayer. For the connection between the j th layer and the next, we may similarly write cos j E j e i k j
(p ) j
cos j
+ E j e i k j
(p )
cos j
= cos j +1 E j +1 + E j +1
(p ) (p ) (p ) (p )
(6.56)
and n j E j eikj
(p ) j
cos j
E j e i k j
(p )
cos j
= n j +1 E j +1 E j +1
(6.57)
Here we have set the origin within each layer at the left surface. Then when making the connection with the subsequent layer at the right surface, we must = k j j cos j . This corresponds specically take into account the phase k j j z to the phase acquired by the plane wave eld in traversing the layer with thickness j . The right-hand sides of (6.56) and (6.57) need no phase adjustment since the ( j + 1)th eld is evaluated on the left side of its layer. At the nal interface, the boundary conditions reduce to cos N E N e i k N
(p ) N
cos N
+ E N e i k N
(p )
cos N
= cos N +1 E N +1
(p ) (p )
(6.58)
and n N E N e i kN
(p ) N
cos N
E N e i k N
(p )
cos N
= n N +1 E N +1
(6.59)
These equations are the same as (6.56) and (6.57) when j = N . However, we have (p ) written them here explicitly since they are unique in that E N +1 0. At this point we are ready to solve (6.54)(6.59). We would like to eliminate (p ) (p ) (p ) all elds besides E 0 , E 0 , and E N +1 . Then we will be able to nd the overall
159
reectance and transmittance of the multilayer coating. In solving (6.54)(6.59), we must proceed with care, or the algebra can quickly get out of hand. Fortunately, most students have had training in linear algebra, and this is a case where that training pays off. We rst write a general matrix equation that summarizes the mathematics in (6.54)(6.59), as follows: cos j e i j n j ei j where j and 0 kj
j cos j
cos j e i j n j e i j
Ej (p ) Ej
(p )
cos j +1 n j +1
cos j +1 n j +1
E j +1 (p ) E j +1
(p )
(6.60)
j =0 1 j N
(p )
(6.61)
E N +1 E N +1 E N +1 0 Then we solve (6.60) for the incident elds as follows: Ej (p ) Ej

(p ) (p )
(p )
(6.62)
cos j e i j n j ei j
cos j e i j n j e i j
cos j +1 n j +1
cos j +1 n j +1
E j +1 (p ) E j +1
(p )
(6.63) We can use (6.63) to connect the elds in the initial and nal layers. If we write (6.63) for the j = 0 case, and then substitute using (6.63) again with j = 1 we nd E0 (p ) E0
(p )
= =
cos 0 n0 cos 0 n0
cos 0 n 0 cos 0 n 0
cos 1 n1 M1
(p )
cos 1 n 1 cos 2 n 2
E1 (p ) E1
(p )
cos 2 n2
E2 (p ) E2
(p )
(6.64)
where we have grouped the matrices related to the j = 1 layer together via M1
(p )
cos 1 n1
cos 1 n 1
cos 1 e i 1 n 1 e i 1
cos 1 e i 1 n 1 e i 1
(6.65)
By repeating this procedure for all N layers, we connect the elds in the initial medium with the nal medium as follows: E0 (p ) E0
(p )
cos 0 n0
cos 0 n 0
N j =1 th
Mj
(p )
cos N +1 n N +1
cos N +1 n N +1
E N +1 0
(p )
where the matrices related to the j Mj =

(p )
(6.66) layer are grouped together according to cos j e i j n j ei j cos j e i j n j e i j

1
cos j nj
cos j n j
cos j i n j sin j / cos j
i sin j cos j /n j cos j
(6.67)
160
The matrix inversion in the rst line was performed using (0.46). The symbol signies the product of the matrices with the lowest subscripts on the left:
N j =1
M j M1 M2 M N
(p )
(p )
(p )
(p )
(6.68)
(p )
As a nishing touch, we divide (6.64) by the incident eld E 0 and perform the matrix inversion using (0.46) to obtain 1 E where 0 0 j =1 (6.70) In the nal matrix after the product in (6.70) we have replaced the entries in the right column with zeros. This is permissable since the column vector that A (p ) operates on in (6.69) has a zero in the bottom component. (Having zeros in the matrix can save computation time when calculating with large N .) Equation (6.69) represents two equations, which must be solved simultane(p ) (p ) (p ) (p ) ously to nd the ratios E 0 /E 0 and E N +1 /E 0 . Once the matrix A (p ) is computed, this is a relatively simple task: A (p ) = Mj
(p ) (p ) 0 (p ) 0
= A (p )
E N +1 0
(p )
E0
(p )
(6.69)
a 11 (p ) a 21
(p )
a 12 (p ) a 22
(p )
1 2n 0 cos 0
n0 n0
cos 0 cos 0
cos N +1 n N +1
tp
E N +1 E
(p ) 0 (p )
(p )
1 a 11
(p ) (p ) (p )
(Multilayer)
(6.71)
rp
E0 E0
(p )
a 21
a 11
(Multilayer)
(6.72)
The convenience of this notation lies in the fact that we can deal with an arbitrary number of layers N with varying thickness and index. The essential information for each layer is contained succinctly in its respective 2 2 matrix. To nd the overall effect of the many layers, we need only multiply the matrices for each layer together to nd A , and then we can use (6.71) and (6.72) to compute the reection and transmission coefcients for the whole system. The derivation for s -polarized light is similar to the derivation for p -polarized light. The equation corresponding to (6.69) for s -polarized light turns out to be 1 E where A (s )
(s ) a 11 (s ) a 21 (s ) a 12 (s ) a 22 (s ) 0
(s ) 0
= A (s )
(s ) EN +1
(s ) E0
(6.73)
1 2n 0 cos 0
n 0 cos 0 n 0 cos 0
1 1
N j =1
M (js )
1 n N +1 cos N +1
0 0 (6.74)
6.9 Repeated Multilayer Stacks
161
and M (js ) =
cos j i n j cos j sin j
i sin j /(n j cos j ) cos j
(6.75)
We can then compute the transmission and reection coefcients in the same manner that we found the p -components ts
(s ) EN 1 +1 = (s ) (s ) E0 a 11 (s ) (s ) E0 a 21 = (s ) (s ) E0 a 11
(Multilayer)
(6.76)
rs
(Multilayer)
(6.77)
6.9 Repeated Multilayer Stacks

In general high-reection coatings are designed with alternating high and low refractive indices. For high reectivity, each layer should have a quarter-wave thickness. That is, we need j = 2 (high reector) (6.78)
This amounts to the condition on the thickness of

j
vac 4n j cos j
(high reector)
(6.79)
Since the layers alternate high and low indices, at every other boundary there is a phase shift of upon reection from the interface. Hence, the quarter wavelength spacing gives maximum reectivity since the reected wave in each layer meets the wave in the previous layer in phase. In this situation, the matrix for each layer becomes Mj =
(p )
0 i n j / cos j
i cos j /n j 0
(high reector, p -polarized)
(6.80)
The matrices for a high and a low refractive index layer are multiplied together in the usual manner. Each layer pair takes the form 0
i nH cos H H i cos nH
0
i nL cos L
L i cos nL
n L cos H n H cos L
0
H cos L n n L cos H
(6.81)
To extend to q = N /2 layer pairs, we have

N j =1
Mj =
(p )
L cos H n n H cos L
0
H cos L n n L cos H
0
L cos H n n H cos L
0
H cos L n n L cos H
(6.82)
162
and using (6.82) we can compute A (p )

L cos H n 1 n H cos L A (p ) = L cos H 2 n n H cos L
q cos N +1 cos 0 q cos N +1 cos 0
H cos L + n n L cos H H cos L n n L cos H
q n N +1 n0 q n N +1 n0
0 0
(6.83)
This stack of q periods can achieve extraordinarily high reectivity. In the limit of q , we have t p 0 and r p 1 from (6.71) and (6.72), giving 100% reection. Sometimes multilayer coatings are made with repeated stacks of layers. In general, if the same series of layers in (6.82) is repeated many times, say q times, the following formula known as Sylvesters theorem (see appendix 0.5) comes in handy:
q
A C where
B D
1 sin
A sin q sin q 1 C sin q
B sin q D sin q sin q 1
(6.84)
1 (6.85) (A + D) . 2 This formula relies on the condition AD BC = 1, which is true for matrices of the form (6.67) and (6.75) or any product of them. Here, A , B , C , and D represent the elements of a matrix composed of a block of matrices corresponding to a repeated pattern within the stack. Many different types of multilayer coatings are possible. For example, a Brewsters-angle polarizer has a coating designed to transmit with high efciency p -polarized light while simultaneously reecting s -polarized light with high efciency. The backside of the substrate is left uncoated where p -polarized light passes with 100% efciency at Brewsters angle. cos
Exercises
163
Exercises
Exercises for 6.2 Double Boundary Problem Solved Using Fresnel Coefcients P6.1 You have a 1 micron thick coating of dielectric material (n = 2) on a piece of glass (n = 1.5). Use a computer to plot the magnitude of the Fresnel coefcient (6.10) from air into the glass at normal incidence. Plot as a function of wavelength for wavelengths between 200 nm and 800 nm (assume the index remains constant over this range).
Exercises for 6.3 Double Boundary Problem at Sub Critical Angles P6.2 A light wave impinges at normal incidence on a thin glass plate with index n and thickness d . (a) Show that the transmittance through the plate as a function of wavelength is 1 T tot = 2 2 n 1 nd 1 + ( 4n 2 ) sin2 2 vac HINT: Find r1 and then use T im = 1 R 0 T1
2 1 2
=r1
= r 0
n 1 n +1
= 1 R1
(b) If n = 1.5, what is the maximum and minimum transmittance through the plate? (c) If the plate thickness is d = 150 m, what wavelengths transmit with maximum efciency? HINT: Give a formula involving an integer N . P6.3 Consider the beam splitter introduced in Example 6.1. Show that the maximum reectance possible from the single coating at the rst surface is 46%. Find the smallest possible d 1 that accomplishes this for light with wavelength vac = 633 nm.
164
Exercises for 6.4 Beyond Critical Angle: Tunneling of Evanescent Waves P6.4 Re-compute (6.32) in the case of s -polarized light. Write the result in the same form as the last expression in (6.32). HINT: You need to redo (6.28)(6.30). Consider s -polarized microwaves (vac = 3 cm) encountering an air gap separating two parafn wax prisms (n = 1.5). The 45 right-angle prisms are arranged with the geometry shown in Fig. 6.3. The presence of the second prism frustrates the total internal reection that would have occurred if the rst prism were by itself. This occurs because feedback from the second surface disrupts the evanescent waves.
L6.5
Figure 6.15 (a) Use a computer to plot the transmittance through the gap as a function of separation d (normal to gap surface). Do not consider reections from other surfaces of the prisms. HINT: Plot the result of P 6.4. (b) Measure the transmittance of the microwaves through the prisms as function of spacing d (normal to the surface) and superimpose the results on the graph of part (a). Figure 6.16 shows a plot of some typical data taken with this setup. Presumably experimental error causes some discrepancy, but the trend is clear.
Figure 6.16
Exercises for 6.7 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument P6.6 A Fabry-Perot interferometer has silver-coated plates each with reectance R = 0.9, transmittance T = 0.05, and absorbance A = 0.05. The plate separation is d = 0.5 cm with interior index n 1 = 1. Suppose that the wavelength being observed near normal incidence is 587 nm. (a) What is the maximum and minimum transmittance through the interferometer? (b) What are the free spectral range FSR and the fringe width FWHM ? (c) What is the resolving power?
Exercises
165
P6.7
Generate a plot like Fig. 6.10(a), showing the fringes you get in a FabryPerot etalon when 1 is varied. Let Tmax = 1, F = 10, = 500 nm, d = 1 cm, and n 1 = 1. (a) Plot T vs. 1 over the angular range used in Fig. 6.10(a). (c) Suppose d was slightly different, say 1.00001 cm. Make a plot of T vs 1 for this situation.
P6.8
Consider the conguration depicted in Fig. 6.9, where the center of the diverging light beam vac = 633 nm approaches the plates at normal incidence. Suppose that the spacing of the plates (near d = 0.5 cm) is just right to cause a bright fringe to occur at the center. Let n 1 = 1. Find the angle for the m th circular bright fringe surrounding the central spot (the 0th fringe corresponding to the center). HINT: cos = 1 2 /2. The answer has the form a m ; nd the value of a . Characterize a Fabry-Perot etalon in the laboratory using a HeNe laser (vac = 633 nm). Assume that the bandwidth HeNe of the HeNe laser is very narrow compared to the fringe width of the etalon FWHM . Assume two identical reective surfaces separated by 5.00 mm. Deduce the free spectral range FSR , the fringe width FWHM , the resolving power, and the reecting nesse (small f ).
L6.9
Figure 6.17 L6.10 Use the same Fabry-Perot etalon to observe the Zeeman splitting of the yellow line = 587.4 nm emitted by a krypton lamp when a magnetic eld is applied. As the line splits and moves through half of the free spectral range, the peak of the decreasing wavelength and the peak of the increasing wavelength meet on the screen. When this happens, by how much has each wavelength shifted?
Figure 6.18
166
Exercises for 6.8 Multilayer Coatings P6.11 (a) Write (6.54) through (6.59) for s -polarized light. (b) From these equations, derive (6.73)(6.75). P6.12 Beginning with (6.76) for a single layer between two materials (i.e. two interfaces), derive (6.21). WARNING: This is more work than it may appear at rst.
Exercises for 6.9 Repeated Multilayer Stacks P6.13 (a) What should be the thickness of the high and the low index layers in a periodic high-reector mirror? Let the light be p -polarized and strike the mirror surface at 45 . Take the indices of the layers be n H = 2.32 and n L = 1.38, deposited on a glass substrate with index n = 1.5. Let the wavelength be vac = 633 nm. (b) Find the reectance R with 1, 2, 4, and 8 periods in the high-low stack. P6.14 Find the high-reector matrix for s -polarized light that corresponds to (6.82). Design an anti-reection coating for use in air (assume the index of air is 1): (a) Show that for normal incidence and /4 lms (thickness= 1 4 the wavelength of light inside the material), the reectance of a single layer (n 1 ) coating on a glass is R=
2 n g n1 2 n g + n1 2
P6.15
(b) Show that for a two coating setup (air-n 1 -n 2 -glass; n 1 and n 2 are each a /4 lm), that 2 2 2 n2 n g n1 R= 2 2 n2 + n g n1 (c) If n g = 1.5, and you have a choice of these common coating materials: ZnS (n = 2.32), CeF (n = 1.63) and MgF (n = 1.38), nd the combination that gives you the lowest R for part (b). (Be sure to specify which material is n 1 and which is n 2 .) What R does this combination give? P6.16 Suppose you design a two-coating anti-reection optic (each coating set for /4, as in the last problem) using n 1 = 1.6 and n 2 = 2.1. Assume youve got n g = 1.5 and normal incidence. If you design your coatings
Exercises
167
to be quarter-wave for = 550 nm (in the middle of the visible range) the R that you found in P 6.15(b) will be true only for that specic wavelength for two reasons: the index changes with , but more importantly, the thicknesses used in the coatings will not be /4 for other wavelengths. Lets ignore the index change with and focus on the wavelength dependence. Use the matrix techniques and a computer to plot R (air for 400 to 700 nm (visible range). Do this for a single bilayer (one layer of each coating, two bilayers, four bilayers, and 25 bilayers.
Chapter 7
Superposition of Quasi-Parallel Plane Waves

7.1 Introduction
Through the remainder of our study of optics, we will be interested in the superposition of many plane waves, which interfere to make an overall waveform. Such a waveform can be represented as follows: E(r, t ) =
j
E j e i (k j r j t )
(7.1)
The corresponding magnetic eld (see (2.54)) is B(r, t ) =

j
B j e i (k j r j t ) =
j
kj Ej j
e i (k j r j t )
(7.2)
In section 7.2, we show that the intensity of this overall eld under certain assumptions can be expressed as I (r , t ) = n 0c E (r , t ) E (r , t ) 2 (7.3)
where E(r, t ) represents the entire complex expression for the electric eld rather than just the real part. Although this expression is reminiscent of (2.60), it should be kept in mind that we previously considered only a single plane wave (perhaps with two distinct polarization components). It may not be immediately obvious, but (7.3) automatically time-averages over rapid oscillations so that I retains only a slowly varying time dependence. Equation (7.3) is exact only if the vectors k j are all parallel. This is not as serious a restriction as might seem at rst. For example, the output of a Michelson interferometer (studied in the next chapter) is the superposition of two elds, each composed of a range of frequencies with parallel k j s. We can relax the restriction of parallel k j s slightly and apply (7.3) also to plane waves with nearly parallel k j s such as occurs in a Youngs two-slit diffraction experiment (studied in the next 169
170
Chapter 7 Superposition of Quasi-Parallel Plane Waves
chapter). In such diffraction problems, (7.3) is viewed as an approximation valid to the extent that the vectors k j are close to parallel.
In section 7.3 we introduce the concept of group velocity, which is distinct from phase velocity that we encountered previously. As we saw in chapter 2, the real part of refractive index in certain situations can be less than one, indicating superluminal wave crest propagation (i.e. greater than c )! In this case, the group velocity is usually less than c . Group velocity tracks the speed of the interference or ripples resulting from the superposition of multiple waves. Thus, the intensity of a waveform is more connected with the group velocity, rather than the phase velocity.
Nevertheless, it is possible for the group velocity also to become superluminal when absorption or amplication is involved. Group velocity tracks the presence or locus of eld energy, which is indirectly inuenced by an exchange of energy with the medium. For a complete picture, one must consider the energy stored in both the eld and the medium. So-called superluminal pulse propagation occurs when a magician invites the audience to look only at the eld energy while energy transfers into and out of the unwatched domain of the medium. Extra eld energy can seemingly appear prematurely downstream, but only if there is already non-zero eld energy downstream to stimulate a transfer of energy between the eld and the medium. As is explained in Appendix 7.A, the actual transport of energy is strictly bounded by c ; superluminal signal propagation is impossible.
In section 7.4, we reconsider waveforms composed of a continuum of plane waves, each with a distinct frequency . We discuss superpositions of plane waves in terms of Fourier theory. (For an introductory overview of Fourier transforms, see section 0.4.) Essentially, a Fourier transform enables us to determine which plane waves are necessary to construct a given wave from E (r1 , t ). This is important if we want to know what happens to a waveform as it traverses from point r1 to r2 in a material with a frequency-dependent index. Different frequency components of the waveform experience different phase velocities, causing the waveform to undergo distortion as it propagates, a phenomenon called dispersion. Since we already know how individual plane waves propagate in a material, we can reassemble them at the end of propagation to obtain the new overall pulse E (r2 , t ) (i.e. by performing an inverse Fourier transform). This procedure is examined in section 7.6 specically for a light pulse with a Gaussian temporal prole. We shall see that the group velocity tracks the movement of the center of the wave packet. The arguments are presented in a narrowband context where the pulse maintains its characteristic shape while spreading. In section 7.7, we examine group velocity in a generalized broadband context where the wave packet can become severely distorted during propagation.
7.2 Intensity
171
7.2 Intensity
In this section we justify the expression for intensity given in (7.3). The Poynting vector (2.50) is Re{E (r, t )} Re{B (r, t )} (7.4) S(r, t ) = 0 Upon substitution of (7.1) and (7.2) into the above expression, we obtain S(r, t ) = 1
j ,m m 0
Re E j e i (k j r j t ) km Re Em e i (km rm t )
(7.5)
For simplicity, we assume that all vectors k j are real. If the wave vectors are complex, the same upcoming result can be obtained. In that case, as in (2.60), the eld amplitudes E j would correspond to local amplitudes (as energy is absorbed or amplied during propagation). Next we apply the BAC-CAB rule (P 0.4) to (7.5) and obtain S(r, t ) = 1
j ,m m 0
km Re E j e i (k j r j t ) Re Em e i (km rm t ) (7.6)
i (km rm t )
Re Em e
Re E j e
i (k j r j t )
km
The last term in (7.6) can be dismissed if all of the km are perpendicular to each of the E j . This can only be ensured if all k-vectors are parallel to each other. Let us make this rather stringent assumption and drop the last term in (7.6). The magnitude of the Poynting vector then becomes S (r , t ) = 0 c
j ,m
n m Re E j e i (k j r j t ) Re Em e i (km rm t )
(parallel k-vectors) (7.7)
where in accordance with (1.51) and (2.22) we have introduced km = nm 0 c . m 0 (7.8)
Here n m refers to the refractive index associated with the frequency m . If we assume that the index does not vary dramatically with frequency, we may approximate it as a constant. We usually measure intensity outside of materials (in air or in vacuum), so this approximation is often quite ne. With these approximations the magnitude of the Poynting vector becomes (with the help of (0.30)) i (k j r j t ) i (km rm t ) E j e i (k j r j t ) + E e E e i (km rm t ) + E j me m S (r , t ) = n 0 c 2 2 j ,m = nc 4
0
j ,m
i [(k j +km )r( j +m )t ] E j Em e i [(k j +km )r( j +m )t ] + E j Em e
i [(k j km )r( j m )t ] i [(k j km )r( j m )t ] + E j E + E me j Em e
(parallel k-vectors, constant n ) (7.9)

172
Notice that each of the rst two terms in (7.9) oscillates very rapidly (at frequency j + m ). The time average of these terms goes to zero. The second two terms oscillate slowly or not at all if j = m . Taking the time average over the rapid oscillation in (7.9), we then get n 0c 2
i [(k j km )r( j m )t ] Em e i [(k j km )r(n m )t ] E j E + E me j j ,m
S (r, t )osc. = =
n 0c i (km rm t ) Re E j e i (k j r j t ) E me 2 m j n 0c = Re E (r, t ) E (r, t ) . 2 (parallel k-vectors, constant n , time-averaged over rapid oscillations) (7.10) In writing the nal line we have again invoked (7.1). Notice that the expression E(r, t ) E (r, t ) is already real. Therefore, we may drop the function Re [], and (7.3) is veried. The assumptions behind (7.3) are now clear. In dropping the vector symbol from all km to get (7.7) we assumed that all km are nearly parallel to each other. If some of the km point in an anti-parallel direction, we can still proceed with the above approximations but with negative signs entered explicitly into (7.7) for those components. For example, a standing wave has no net ow of energy and the net Poynting vector is zero. This brings out the distinction between irradiance S and intensity I . Intensity is a measure of what atoms feel, which is not zero for standing waves. On the other hand, S is identically zero for standing waves because there is no net ow of energy. Thus, we often apply (7.10) to standing waves (technically incorrect in the above context), but we refer to the result as intensity instead of irradiance or Poynting ux. We do this because for many experiments it is not important whether the eld is traveling or standing, but it is only important that atoms locally experience an oscillating electric eld. At extreme intensities, however, where the inuence of the magnetic eld becomes comparable to that of the electric eld, the distinction between propagating and standing elds can become important. In summary, the intensity of the eld (time-averaged over rapid oscillations) may be expressed approximately as n 0c I (r, t ) E(r, t ) E (r, t ) = 2 (parallel or antiparallel k-vectors, constant n ) (7.11)
where E(r, t ) is entered in complex format.
7.3 Group vs. Phase Velocity: Sum of Two Plane Waves

Consider the sum of two plane waves with equal amplitudes: E(r, t ) = E0 e i (k1 r1 t ) + E0 e i (k2 r2 t ) (7.12)
7.3 Group vs. Phase Velocity: Sum of Two Plane Waves
173
As we previously studied (see P 1.10), the velocities of the individual wave crests are v p 1 = 1 /k 1 (7.13) v p 2 = 2 /k 2 These are known as the phase velocities of the individual plane waves. As the two plane waves propagate, they interfere, giving regions of higher and lower intensity. As we now show, the peaks in the intensity distribution (7.11) can move at a velocity quite different from the phase velocities in (7.13). The intensity associated with (7.12) is computed as follows: I (r , t ) = n 0c i (k1 r1 t ) E0 E + e i (k2 r2 t ) e i (k1 r1 t ) + e i (k2 r2 t ) 0 e 2 n 0c i [(k2 k1 )r(2 1 )t ] = E0 E + e i [(k2 k1 )r(2 1 )t ] 0 2+e 2 = n 0 c E0 E 0 [1 + cos [(k2 k1 ) r (2 1 ) t ]] = n 0 c E0 E 0 [1 + cos (k r t )] where k k2 k1 2 1
(7.14)
(7.15)
Keep in mind that this intensity is averaged over rapid oscillations. The solid line in Fig. 7.1 shows this time-averaged version of the intensity given by the above expression. The dashed line shows the intensity with the rapid oscillations retained, according to (7.9). It is left as an exercise (see P 7.3) to show that the rapid-oscillation peaks in Fig. 7.1 (dashed) move at the average of the phase velocities in (7.13). An examination of (7.14) reveals that the time-averaged curve in Fig. 7.1 (solid) travel with speed vg (7.16) k This is known as the group velocity. Essentially, v g may be thought of as the velocity for the envelope that encloses the rapid oscillations. In general, v g and v p are not the same. This means that as the waveform propagates, the rapid oscillations move within the larger modulation pattern, for example, continually disappearing at the front and reappearing at the back of each modulation. The presence of eld energy (which gives rise to intensity) is clearly tied more to v g than to v p . The group velocity is identied with the propagation of overall waveforms. As an example of the behavior of group velocity, consider the propagation of two plane waves in a plasma (see P 2.7) for which the index is real over a range of frequencies. The index of refraction is given by n plasma () =
2 1 2 p / < 1
(assuming > p )
(7.17)
174
Figure 7.1 Intensity of two interfering plane waves. The solid line shows intensity averaged over rapid oscillations.
The phase velocity for each frequency is computed by v p 1 = c /n plasma (1 ) v p 2 = c /n plasma (2 ) (7.18)
Since n plasma < 1, both of these velocities exceed c . However, the group velocity is vg = dk d = = k dk d
1
d n plasma () d c
= n plasma () c
(7.19)
which is clearly less than c (deriving the nal expression in (7.19) from the previous one is left as an exercise). For convenience, we have taken 1 and 2 to lie very close to each other. This example shows that in an environment where the index of refraction is real (i.e. no net exchange of energy with the medium), the group velocity does not exceed c , although the phase velocity does. The group velocity tracks the presence of eld energy, whether that energy propagates or is extracted from a material. The universal speed limit c is always obeyed in energy transportation. The fact that the phase velocity can exceed c should not disturb students. In the above example, the fast-moving phase oscillations result merely from an interplay between the eld and the plasma. In a similar sense, the intersection of an ocean wave with the shoreline can also exceed c , if different points on the wave front happen to strike the shore nearly simultaneously. The point of intersection between the wave and the shoreline does not constitute an actual object under motion. Similarly, wave crests of individual plane waves do not necessarily constitute actual objects that are moving; in general, v p is not the relevant speed at which events up stream inuence events down stream. From another perspective, individual plane waves have innite length and innite duration. They do not exist in isolation except in our imagination. All real waveforms are comprised of a range of frequency components, and so interference always happens. Energy is associated with regions of constructive interference between those waves.
7.4 Frequency Spectrum of Light
175
If there is an exchange of energy between the eld and the medium (i.e. if the index of refraction is complex), v g still describes where eld energy may be found, but it does not give the whole story in terms of energy ow (addressed in Appendix 7.A).

We continue our study of waveforms. An arbitrary waveform can be constructed from a superposition of plane waves. The discrete summation in (7.1) is of limited use, since a waveform constructed from a discrete sum must eventually repeat over and over. To create a waveform that does not repeat (e.g. a single laser pulse or, technically speaking, any waveform that exists in the physical world) a continuum of plane waves is necessary. Several examples of waveforms are shown in Fig. 7.2. To construct non-repeating waveforms, the summation in (7.1) must be replaced by an integral, and the waveform at a point r can be expressed as 1 E (r , t ) = E (r, ) e i t d (7.20) 2
The function E (r, ) has units of eld per frequency. It gives the contribution of each frequency component to the overall waveform and includes all spatial dependence such as the factor exp {i k () r}. The function E (r, ) is distinguished from the function E(r, t ) by its argument (i.e. instead of t ). The factor 1/ 2 is introduced to match our Fourier transform convention. Given knowledge of E (r, ), the waveform E(r, t ) can be constructed. Similarly, if the waveform E(r, t ) is known, the eld per frequency can be obtained via E (r, ) = 1 2
Isaac Newton (16431727, English) Newton demonstrated that white light is composed of many dierent colors. He realized that the amount of refraction experienced by light depends on its color, so that refracting telescopes would suer from chromatic abberation. He advanced a corpuscular theory of light, although his notion of light particles bears little resemblance to the modern notion of light quanta.
E (r , t ) e i t d t
(7.21)
This operation, which produces E (r, ) from E(r, t ), is called a Fourier transform. The operation (7.20) is called the inverse Fourier transform. For a review of Fourier theory, see section 0.4. Even though E(r, t ) can be written as a real function (since, after all, only the real part is relevant), E (r, ) is in general complex. The real and imaginary parts of E (r, )keep track of how much cosine and how much sine, respectively, make up E(r, t ). Keep in mind that both positive and negative frequency components go into the cosine and sine according to (0.19). Therefore, it should not seem strange that we integrate (7.20) over all frequencies, both positive and negative. If E (r, t ) is taken to be a real function, then we have the symmetry relation E (r, ) = E (r, ) (if E(r, t ) is real) (7.22)
However, often E(r, t ) is written in complex notation, where taking the real part is implied. For example, the real waveform Er (r, t ) = E0r (r) e t
2
/22
cos 0 t
(7.23)
176
Figure 7.2 (a) Electric eld (7.23) with = T /4, where T is the period of the carrier frequency: T = 2/0 . (b) Electric eld (7.23) with = 2T . (c) Electric eld (7.23) with = 5T .
is usually written as E c (r , t ) = E 0 c (r ) e t
2
/22 i 0 t
(7.24)
where Er (r, t ) = Re{Ec (r, t )}. The phase is hidden within the complex amplitude E0c (r), where in writing (7.23) we have assumed (for simplicity) that each eld vector component contains the same phase. This waveform is shown in Fig. 7.2 for various parameters . Consider the Fourier transforms of the waveform (7.23). Upon applying (7.21)
177
we get (see P 0.27) e i e

2 (+0 )2 2
Er (r, ) = E0r (r)
+ e i e 2
2 (0 )2 2
(7.25)
Similarly, the Fourier transform of (7.24), i.e. the complex version of the same waveform, is Ec (r, ) = E0c (r) e
2 (0 )2 2
(7.26)
The latter transform is less cumbersome to perform, and for this reason more often used. Figure 7.3 shows graphs of |E r (r, )|2 associated with the waveforms in Fig. 7.2. Figure 7.4 shows graphs of Ec (r, ) E c (r, )/2 obtained from the complex versions of the same waveforms. The graphs show the power spectra of the eld (aside from some multiplicative constants). A waveform that lasts for a brief interval of time (i.e. small ) has the widest spectral distribution in the frequency domain. In Figs. 7.3a and 7.4a, we have chosen an extremely short waveform (perhaps even physically difcult to create, with = /(20 ), see Fig. 7.2a) to illustrate the distinction between working with the real and the complex representations of the eld. Notice that the Fourier transform (7.25) of the real eld depicted in Fig. 7.3 obeys the symmetry relation (7.22), whereas the Fourier transform of the complex eld (7.26) does not. Essentially, the power spectrum of the complex representation of the eld can be understood to be twice the power spectrum of the real representation, but plotted only for the positive frequencies. This works well as long as the spectrum is well localized so that there is essentially no spectral amplitude near = 0 (i.e. no DC component). This is not the case in Figs. 7.3a and 7.4a. Because the waveform is extremely short in time, the extraordinarily wide spectral peaks spread to the origin, and Fig. 7.4a does not accurately depict the positive-frequency side of Fig. 7.3a since the two peaks merge into each other. In practice, we almost never run into this problem in optics (i.e. waveforms are typically much longer in time). For one thing, in the above examples, the waveform or pulse duration is so short that there is only about one oscillation within the pulse. Typically, there are several oscillations within a waveform and no DC component. Throughout the remainder of this book, we shall assume that the frequency spread is localized around 0 , so that we can use the complex representation with impunity. The intensity dened by (7.3) is also useful for the continuous superposition of plane waves as dened by the inverse Fourier transform (7.20). We can plug in the expression for the eld in complex format. The intensity in (7.3) takes care of the time-average over rapid oscillations. While this is very convenient, this also points out why the complex notation should not be used for extremely short waveforms (e.g. for optical pulses a few femtoseconds long): There needs to be a sufcient number of oscillations within the waveform to make the rapid time average meaningful (as opposed to that in Fig. 7.2a).
178
Figure 7.3 (a) Power spectrum based on (7.25) with = T /4, where T is the period of the carrier frequency: T = 2/0 . (b) Power spectrum based on (7.25) with = 2T . (c) Power spectrum based on (7.25) with = 5T .
179
Figure 7.4 (a) Power spectrum based on (7.26) with = T /4, where T is the period of the carrier frequency: T = 2/0 . (b) Power spectrum based on (7.26) with = 2T . (c) Power spectrum based on (7.26) with = 5T .
180
Parsevals theorem (see P 0.31) imposes an interesting connection between the time-integral of the intensity and the frequency-integral of the power spectrum:

I (r , t )d t =

I (r, ) d
(7.27)
where n 0c E (r , t ) E (r , t ) 2 n 0c I (r, ) E (r, ) E (r, ) 2 I (r , t ) (7.28)
The power spectrum I (r, ) is observed when the waveform is sent into a spectral analyzer such as a diffraction spectrometer. Please excuse the potentially confusing notation (in wide usage): I (r, ) is not the Fourier transform of I (r, t )!
7.5 Group Delay of a Wave Packet

When all k-vectors associated with a waveform point in the same direction, it becomes straightforward to predict the form of a pulse at different locations given knowledge of the waveform at another. Being able to predict the shape and arrival time of waveform is very important since a waveform traversing a material such as glass can undergo signicant temporal dispersion as different frequency components experience different indices of refraction. For example, an ultra-short laser pulse traversing a glass window or a lens can emerge with signicantly longer duration, owing to this effect. An example of this is given in the next section. The fourier transform (7.21) gives the amplitudes of the individual plane wave components making up a waveform. We already know how to propagate individual plane waves through a material (see (2.23)). A phase shift associated with a displacement r modies the eld according to E (r0 + r, ) = E (r0 , ) e i k()r (7.29)
The k-vector contains the pertinent information about the material via k = n ()/c . (A complex wave vector k may also be used if absorption or amplication is present.) The procedure for nding what happens to a pulse when it propagates through a material is clear. Take the Fourier transform of the known incident pulse E (r0 , t ) to nd the plane-wave coefcients E (r0 , ) at the beginning of propagation. Apply the phase adjustment in (7.29) to nd the plane wave coefcients E (r0 + r, ) at the end of propagation. Then take the inverse Fourier transform to determine the
7.5 Group Delay of a Wave Packet
181
waveform E (r0 + r, t ) at the new position: E (r 0 + r , t ) = 1 2 1 2
E(r0 + r, )e i t d

E(r0 , )e i (k()rt ) d
(7.30)
The exponent in (7.29) is called the phase delay for the pulse propagation. It : is often expanded in a Taylor series about a carrier frequency k r = k| + k )+ ( 1 2 k 2 2 ) 2 + r ( (7.31)
The k-vector has a sometimes-complicated frequency dependence through the functional form of n (). If we retain only the rst two terms in this expansion then (7.30) becomes E(r0 + r, t ) = 1 2
E(r0 , )e
)+ k(
k ) (
rt
d
k r
= e
) i k(
1 2
E (r0 , ) e
i t
John Strutt (3rd Baron Rayleigh) (18421919, British) As head of the Cavendish laboratory, Rayleigh studied a wide variety of subjects. He developed the notion of group velocity and used it to understand the propagation of vibration in numerous systems. He won the Nobel prize in physics in 1904.
)r t ] = e i [k(
1 2
E (r0 , ) e i (t t ) d
(7.32)
where in the last line we have used the denition t k r . (7.33)
so that t is real, i.e. If we assume that the imaginary part of k is constant near t = Re k r (7.34)
then the last integral in (7.32) is simply the Fourier transform of the original pulse with a new time argument, so we can carry out the integral to obtain
)r t ] E (r0 + r, t ) = e i [k( E r0 , t t
(7.35)
The rst term in (7.35) gives an overall phase shift due to propagation, and is related to the phase velocity of the carrier frequency (see (7.18)):
1 ) = vp (
) k (
(7.36)
182
To compare the intensity prole of the pulse at r0 + r with the prole at r0 we compute the square magnitude of (7.35) I ( r 0 + r , t ) E r 0 , t t
2 2 Im k( )r
(7.37)
In (7.37) we see that (to rst order) t is the time required for the pulse to traverse the displacement r. The exponential in (7.37) describes the amplitude of the pulse at the new point, which may have changed during propagation due to absorption. The function Re k r is known as the group delay function, and . Traditional group velocity in (7.34) it is evaluated only at the carrier frequency is obtained by dividing the displacement r by the group delay time t to obtain
1 ) = vg (
Re{k ()}
(7.38)

Group delay (or group velocity) essentially tracks the center of the packet. In our derivation we have assumed that the phase delay k() r could be wellrepresented by the rst two terms of the expansion (7.31). While this assumption gives results that are often useful, the other terms also play a role. In section 7.6 well study what happens if you keep the next higher order term in the expansion. Well nd that this term controls the rate at which the wave packet spreads as it travels. We should also note that there are times when the expansion (7.31) fails to is near a resonance of the medium), and the expansion converge (usually when approach is not valid. Well address how to analyze pulse propagation for these situations in section 7.7.
7.6 Quadratic Dispersion

A light pulse traversing a material in general undergoes dispersion because different frequency components take on different phase velocities. As an example, consider a short laser pulse traversing an optical component such as a lens or window, as depicted in Fig. 7.5. The light can undergo temporal dispersion, where a short light pulse spreads out in time with the different frequency components becoming separated (often called stretching or chirping). Dispersion can occur even if the optic absorbs very little of the light. Dispersion does not alter the power spectrum of the light pulse (7.28), ignoring absorption or reections at the surfaces of the component. This is because the amplitude of E(r, ) does not change, but merely its phase according to (7.29). In other words, the plane-wave components that make up the pulse can have their relative phases adjusted, while their individual amplitudes remain unchanged. To compute the effect of dispersion on a pulse after it travels a distance in glass, we need to choose a specic pulse form. Suppose that just before entering the glass, the pulse has a Gaussian temporal prole given by (7.24). Well place r0 at the start of the glass at z = 0 and assume that all plane-wave components -direction, so that k r = kz . The polarization of the eld will be travel in the z
7.6 Quadratic Dispersion
183
Figure 7.5 A 25 fs pulse traversing a 1 cm piece of BK7 glass.
the same for all frequencies. The Fourier transform of the Gaussian pulse is given in (7.26). Hence we have E (0, t ) = E0 e t E (0, ) = E0 e
2
/22 i 0 t
2 2 ( 2 0)
(7.39)
To nd the eld downstream we invoke (7.29), which gives the appropriate phase shift for each plane wave component: E (z , ) = E (0, ) e i k ()z = E0 e
2 (0 )2 2
e i k ()z
(7.40)
To nd the waveform at the new position z (where the pulse presumably has just exited the glass), we take the inverse Fourier transform of (7.40). However, before doing this we must specify the function k (). For example, if the glass material is replaced by vacuum, the wave number is simply k vac () = /c . In this case, the nal waveform is 1 2
2 (0 )2 2 1 t z /c 2
E (z , t ) =
E 0 e
e i c z e i t d = E0 e 2
e i (k0 z 0 t )
(vacuum)
(7.41) where k 0 0 /c . Not surprisingly, after traveling a distance z though vacuum, the pulse looks identical to the original pulse, only its peak occurs at a later time z /c . The term k 0 z appropriately adjusts the phase at different points in space so that at the time z /c the overall phase at z goes to zero. Of course the functional form of the k-vector is different (and more complicated) in glass than in vacuum. One could represent the index with a multiresonant Sellmeier equation with coefcients appropriate to the particular material (even more complicated than in P 2.2). For this example, however, we again resort to an expansion of the type (7.31), but this time we keep three terms. Let us = 0 , so the expansion is choose the carrier frequency to be k 1 2 k k () z ( 0 ) z + = k (0 ) z + 0 2 2 = k 0 z + v 1 ( 0 ) z + ( 0 )2 z
g
( 0 )2 z +
(7.42)
184
where k 0 k (0 ) = 0 n (0 ) c k n (0 ) 0 n (0 ) 1 vg + = 0 c c 1 2 k 2 2
0
(7.43) (7.44) (7.45)
n (0 ) 0 n (0 ) + c 2c
With this approximation for k (), we are now able to perform the inverse Fourier transform on (7.40): E (z , t ) = 1 2
E 0 e
2 (0 )2 2
e i k0 z +i v g
(0 )z +i (0 )2 z i t
d (7.46)
E0 e i (k0 z 0 t ) 2
1 (0 )z i (0 )t (2 /2i z )(0 )2 i v g
We can avoid considerable clutter if we change variables to 0 . Then the inverse Fourier transform becomes E (z , t ) = E0 e i (k0 z 0 t ) 2
2 2 e 2 (1i 2z / ) i (t z /v g ) d 2
(7.47)
The above integral can be performed with the aid of (0.52). The result is E0 e i (k0 z 0 t ) 2
2 2
E (z , t ) =
1 i 2z /2
2 z 2
2 2 4 2 1i 2z /
( t z / v g )2 ( ) (7.48)
= E0 e i (k0 z 0 t )
4
e2
tan1
1 + 2z /2
(t z /v g )2 (1+i 2z /2 ) 2 2 2 e 2 1+(2z / )
Next, we spruce up the appearance of this rather cumbersome formula as follows: E (z , t ) = where (z ) and T (z ) 1 + 2 ( z ) (7.51) E0 T (z )/ e
1 2
t z /v g T(z ) 2
i 2
t z /v g T(z )
(z )+i (k 0 z 0 t )+i 1 tan1 (z ) 2
(7.49)
2 z 2
(7.50)
We can immediately make a few observation about (7.49). First, note that at z = 0 (i.e. zero thickness of glass), (7.49) reduces to the input pulse given in (7.39),
7.7 Generalized Context for Group Delay
185
as we would expect. Secondly, the peak of the pulse moves at speed v g since the term e
1 2
t z /v g T(z ) 2
controls the pulse amplitude, while the other terms (multiplied by i ) in the exponent of (7.49) merely alter the phase. Also note that the duration of the pulse increases and its peak intensity decreases as it travels, since T(z ) increases with z . In P 7.9 we will nd that (7.49) also predicts that for large z , the eld of the spread-out pulse oscillates less rapidly at the beginning of the pulse than at the end (assuming > 0). This phenomenon is known as chirp, and indicates that red frequencies get ahead of blue frequencies during propagation since they experience a lower index of refraction. While we have derived these results for the specic case of a Gaussian pulse, the results are applicable to other pulse shapes also. Although the exact details will vary by pulse shape, all short pulses eventually broaden and chirp as they propagate through a dispersive medium such as glass (as long as the medium responds linearly to the eld). Higher order terms in the expansion (7.31) to the spreading, chirping, and other deformation of the pulse as it propagates, but the become progressively more cumbersome to study analytically.

The expansion of k () in (7.31) is inconvenient if the frequency content (bandwidth) of a waveform encompasses a substantial portion of a resonance structure such as shown in Fig. 7.6. In this case, it becomes necessary to retain a large number of terms in (7.31) to describe accurately the phase delay k () r. Moreover, if the bandwidth of the waveform is wider than the spectral resonance of the medium (as shown in Fig. 7.7), the series altogether fails to converge. These difculties have led to the traditional viewpoint that group velocity loses meaning for broadband waveforms (interacting with a resonance in a material) since it is associated with the second term in the expansion (7.31), evaluated at a carrier . In this section, we study a broader context for group velocity (or frequency rather its inverse, group delay), which is always valid, even for broadband pulses where the expansion (7.31) utterly fails. The analysis avoids the expansion and so is not restricted to a narrowband context. We are interested in the arrival time of a waveform (or pulse) to a point, say, where a detector is located. The denition of the arrival time of pulse energy need only involve the Poynting ux (or the intensity), since it alone is responsible for energy transport. To deal with arbitrary broadband pulses, the arrival time should avoid presupposing a specic pulse shape, since the pulse may evolve in complicated ways during propagation. For example, the pulse peak or the midpoint on the rising edge of a pulse are poor indicators of arrival time if the pulse contains multiple peaks or a long and non-uniform rise time.
Figure 7.6 Index of refraction in the neighborhood of a resonance.
Figure 7.7 Normalized spectrum of a broadband pulse before and after propagation through an absorbing medium.
186
Figure 7.8 Pulse undergoing distortion during transit.
For the reasons given, we use a time expectation integral (or time center-ofmass) to describe the arrival time of the pulse:
t r
t (r, t )d t
(7.52)
Here (r, t ) is a normalized distribution function associated with the intensity: (r , t ) I (r , t )
(7.53)
I (r , t ) d t
For simplication, we assume that the light travels in a uniform direction. As we shall see, the function d k /d (inverse of group velocity) is linked to this temporal expectation of the incoming intensity. Consider a pulse as it travels from point r0 to point r = r0 + r in a homogeneous medium (see Fig. 7.9). The difference in arrival times at the two points is t t r t r 0 (7.54) The pulse shape can evolve in complicated ways between the two points, spreading with different portions being absorbed (or amplied) during transit. Nevertheless, (7.54) renders an unambiguous time interval between the passage of the pulse center at each point.
Figure 7.9 Transit time dened as the difference between arrival time at two points.
187
This difference in arrival time can be shown to consist of two terms (see P 7.12): t = tG (r) + t R (r0 ) (7.55) The rst term, called the net group delay, dominates if the eld waveform is initially symmetric in time (e.g. an unchirped Gaussian). It amounts to a spectral average of the group delay function taken with respect to the spectral content of the pulse arriving at the nal point r = r0 + r:
t G (r ) =
(r, )
Rek r d
(7.56)
where the spectral weighting function is (r, ) I (r, )

(7.57)
I (r , ) d
and I (r, ) is given in (7.28). The two curves in Fig. 7.7 show (r0 , ) (before propagation) and (r, ) (after propagation) for an initially Gaussian pulse. As seen in (7.57), the pulse travel time depends on the spectral shape of the pulse at the end of propagation. Note the close resemblance between the formulas (7.52) and (7.56). Both are expectation integrals. The former is executed as a center-of-mass integral on time; the latter is executed in the frequency domain on Rek r/, the group delay function. The group delay at every frequency present in the pulse inuences , the the result. If the pulse has a narrow bandwidth in the neighborhood of integral reduces to Rek/| r, in agreement with (7.38) (see P 7.10). The net group delay depends only on the spectral content of the pulse, independent of its temporal organization (i.e., the phase of E (r, ) has no inuence). Only the real part of the k-vector plays a direct role in (7.56). The second term in (7.55), called the reshaping delay, represents a delay that arises solely from a reshaping of the spectral amplitude. This term takes into account how the pulse time center-of-mass shifts as portions of the spectrum are removed (or added). It is computed at r0 before propagation takes place: t R (r0 ) = t r0
altered t r0
(7.58)
Here t r0 represents the usual arrival time of the pulse at the initial point r0 , according to (7.52). The intensity at this point is associated with a eld E (r0 , t ), connected to E (r0 , ) through an inverse Fourier transform (7.20). On the other hand, t r0 altered is the arrival time of a pulse associated with the modied eld E (r0 , ) e Imkr . Notice that E (r0 , ) e Imkr is still evaluated at the initial point r0 . Only the spectral amplitude (not the phase) is modied, according to what is anticipated to be lost (or gained) during the trip. In contrast to the net group delay, the reshaping delay is sensitive to how a pulse is organized. The reshaping
188
Figure 7.10 Narrowband pulse traversing an absorbing medium.
Figure 7.11 Real and imaginary parts of the refractive index for an absorptive medium.
Figure 7.12 Pulse transit time for a narrowband pulse in an absorbing medium as a function of carrier frequency.
Figure 7.13 Pulse transit time for a broadband pulse in an absorbing medium.
delay is negligible if the pulse is initially symmetric (in amplitude and phase) before propagation. The reshaping delay also goes to zero in the narrowband limit, and the total delay reduces to the net group delay. As an example, consider the Gaussian pulse (7.24) with duration either 1 = 10/ (narrowband) or 2 = 1/ (broadband), where is the damping term in the Lorentz model described in section 2.3. Let the pulse travel a distance r = c / 10 through the absorbing medium (as depicted in Fig. 7.10), which has z a resonance at frequency 0 . The index of refraction is shown in Fig. 7.11. Its resonance has a width of . Fig. 7.12 shows the delay between the pulse arrival times at r0 and r = r0 + r as the pulses central frequency r = r0 + r is varied in the neighborhood of the resonance. The solid line gives the total delay t = tG (r) experienced by the narrowband pulse in traversing the displacement. The reshaping delay in this case is negligible (i.e. t R (r) = 0) and is shown by the dotted line. Near resonance, superluminal behavior results as the transit time for the pulse becomes small and even negative. The peak of the attenuated pulse exits the medium even before the peak of the incoming pulse enters the medium! Keep in mind that the exiting pulse is tiny and resides well within the original envelope of the pulse propagated forward at speed c , as indicated in Fig. 7.10. Thus, with or without the absorbing material in place, the signal is detectable just as early. Similar results can be obtained in amplifying media. As the injected pulse becomes more sharply dened in time, the superluminal behavior does not persist. Fig. 7.13 shows the clearly subluminal transit time for the broadband pulse with the shorter duration 2 . While Fig. 7.12 can be generated using the traditional narrowband context of group delay, Fig. 7.13 requires the new context presented in this section. It demonstrates that sharply dened waveforms (i.e. broadband) do not propagate superluminally. In addition, while a long smooth pulse can exhibit so-called superluminal behavior over short propagation distances, the behavior does not persist as the pulse spectrum is modied by the medium. As we have mentioned, the group delay function indicates the average arrival of eld energy to a point. Since this is only part of the whole energy story, there is no problem when it becomes superluminal. The overly rapid appearance of electromagnetic energy at one point and its simultaneous disappearance at another point merely indicates an exchange of energy between the electric eld and the
7.A Causality and Exchange of Energy with the Medium
189
medium. In appendix 7.A we discuss the energy transport velocity (involving all energystrictly luminal) and the velocity of locus of electromagnetic eld energy.
Appendix 7.A Causality and Exchange of Energy with the Medium

In accordance with Poyntings theorem (2.49), the total energy density stored in an electromagnetic eld and in a medium is given by u (r, t ) = u eld (r, t ) + u exchange (r, t ) + u (r, ) (7.59)
This expression for the energy density includes all (relevant) forms of energy, including a non-zero integration constant u (r, ) corresponding to energy stored in the medium before the arrival of any pulse (important in the case of an amplifying medium). u eld (r, t ) and u exchange (r, t ) are both zero before the arrival of the pulse (i.e. at t = ). In addition, u eld (r, t ), given by (2.51), returns to zero after the pulse has passed (i.e. at t = +). The time-dependent accumulation of energy transferred into the medium from the eld is given by
t
u exchange (r, t ) =
E r, t
P r, t t
dt
(7.60)
where we ignore the possibility of any free current Jfree in (2.52). As u exchange increases, the energy in the medium increases. Conversely, as u exchange decreases, the medium surrenders energy to the electromagnetic eld. While it is possible for u exchange to become negative, the combination u exchange + u () (i.e. the net energy in the medium) can never go negative since a material cannot surrender more energy than it has to begin with. We next consider the concept of the energy transport velocity. Poyntings theorem (2.49) has the form of a continuity equation which when integrated spatially over a small volume V yields S da =
A
u dV
V
(7.61)
where the left-hand side has been transformed into an surface integral representing the power leaving the volume. Let the volume be small enough to take S to be uniform throughout V . The energy transport velocity (directed along S) is then dened to be the effective speed at which the energy contained in the volume (i.e. the result of the volume integral) would need to travel in order to achieve the power transmitted through one side of the volume (e.g. the power transmitted through one end of a tiny cylinder aligned with S). The energy transport velocity as traditionally written is then S vE (7.62) u
190
When the total energy density u is used in computing (7.62), the energy transport velocity has a ctitious nature; it is not the actual velocity of the total energy (since part is stationary), but rather the effective velocity necessary to achieve the same energy transport that the electromagnetic ux alone delivers. There is no behind-the-scenes ow of mechanical energy. Note that if only u eld is used in evaluating (7.62), the Cauchy-Schwartz inequality (i.e. 2 + 2 2) ensures an energy transport velocity v E that is strictly bounded by the speed of light in vacuum c . The total energy density u at least as great as the eld energy density u eld . Hence, this strict luminality is maintained. Since the point-wise energy transport velocity dened by (7.62) is strictly luminal, it follows that the global energy transport velocity (the average speed of all energy ) is also bounded by c . To obtain the global properties of energy transport, we begin with a weighted average of the energy transport velocity at each point in space. A suitable weighting parameter is the energy density at each position. The global energy transport velocity is then vE vE u d 3 r u d 3r = S d 3r u d 3r (7.63)
where we have substituted from (7.62). The integral is taken over all relevant space (note d 3 r = dV ). Integration by parts leads to r S d 3 r u d 3r
u 3 r t d r
vE =
u d 3r
(7.64)
where we have assumed that the volume for the integration encloses all energy in the system and that the eld near the edges of this volume is zero. Since we have included all energy, Poyntings theorem (2.49) can be written with no source terms (i.e. S + u /t = 0). This means that the total energy in the system is conserved and is given by the integral in the denominator of (7.64). This allows the derivative to be brought out in front of the entire expression giving vE = where r ru d 3 r u d 3r (7.66) r t (7.65)
The latter expression represents the center-of-mass or centroid of the total energy in the system. This precise relationship between the energy transport velocity and the centroid requires that all forms of energy be included in the energy density u . If, for example, only the eld energy density u eld is used in dening the energy transport velocity, the steps leading to (7.66) would not be possible. Although (7.66)
191
guarantees that the centroid of the total energy moves strictly luminally, there is no such limitation on the centroid of eld energy alone. Explicitly we have S u eld = t ru eld d 3 r u eld d 3 r (7.67)
While, as was pointed out, the left-hand side of (7.67) is strictly luminal, the righthand side can easily exceed c as the medium exchanges energy with the eld. In an amplifying medium exhibiting superluminal behavior, the rapid appearance of a pulse downstream is merely an artifact of not recognizing the energy already present in the medium until it converts to the form of eld energy. The traditional group velocity is connected to this method of accounting, which is why it can become superluminal. Note the similarity between (7.52), which is a time centerof-mass, and the right-hand side of (7.67), which is the spatial center of mass. Both expressions can be connected to group velocity. Group velocity tracks the presence of eld energy alone without necessarily implying the actual motion of that energy. It is enlightening to consider u exchange within a frequency-domain context. We utilize the eld represented in terms of an inverse Fourier transform (7.20). Similarly, the polarization P can be written as an inverse Fourier transform: P(r, t ) = 1 2
P (r, ) e
i t
P(r, t ) i d = t 2
P (r, ) e i t d (7.68)
In an isotropic medium, the polarization for an individual plane wave can be written in terms of the linear susceptibility dened in (1.46): P (r, ) = 0 (r, ) E (r, ) (7.69)
With (7.21), (7.68), and (7.69), the exchange energy density (7.60), can be written as t i 0 1 u exchange (r, t ) = E r, e i t d (r, ) E (r, ) e i t d d t 2 2 (7.70) After interchanging the order of integration, the expression becomes

u exchange (r, t ) = i
d (r, ) E (r, )
(7.71) The nal integral in (7.71) becomes the delta function when t goes to +. In this case, the middle integral can also be performed. Therefore, after the point r experiences the entire pulse, the nal amount of energy density exchanged between the eld and the medium at that point is
1 d E r, 2
e i (+ )t d t
u exchange (r, +) = i
(r, ) E (r, ) E (r, ) d
(7.72)
192
In this appendix, for convenience we consider the elds to be written using real notation. Then we can employ the symmetry (7.22) along with the symmetry P (r, ) = P (r, ) and hence (r, ) = (r, ) . Then we obtain
(7.73)
(7.74)
u exchange (r, +) =
Im (r, ) E (r, ) E (r, ) d
(7.75)
This expression describes the net exchange of energy density after all action has nished. It involves the power spectrum of the pulse. We can modify this formula in an intuitive way so that it describes the exchange energy density for any time during the pulse. The principle of causality guides us in considering how the medium perceives the electric eld for any time. Since the medium is unable to anticipate the spectrum of the entire pulse before experiencing it, the material responds to the pulse according to the history of the eld up to each instant. In particular, the material has to be prepared for the possibility of an abrupt cessation of the pulse at any moment, in which case all exchange of energy with the medium immediately ceases. In this extreme scenario, there is no possibility for the medium to recover from previously incorrect attenuation or amplication, so it must have gotten it right already. If the pulse were in fact to abruptly terminate at a given instant, then the expression (7.75) would immediately apply since the pulse would be over; it would not be necessary to integrate the inverse Fourier transform (7.21) beyond the termination time t for which all contributions are zero. Causality requires that the medium be indifferent to whether a pulse actually terminates if it hasnt happened yet. Therefore, (7.75) applies at all times where the spectrum (7.21) is evaluated over that portion of the eld previously experienced by the medium. The following is then an exact representation for the exchange energy density dened in (7.60):
u exchange (r, t ) = where
Im (r, ) Et (r, ) E t (r, ) d
(7.76)
E t (r, )
1 2
E r, t e i t d t
(7.77)
This time dependence enters only through Et (r, ) E t (r, ), known as the instantaneous power spectrum. The expression (7.76) for the exchange energy reveals physical insights into the manner in which causal dielectric materials exchange energy with different parts
193
of an electromagnetic pulse. Since the function E t () is the Fourier transform of the pulse truncated at the current time t and set to zero thereafter, it can include many frequency components that are not present in the pulse taken in its entirety. This explains why the medium can respond differently to the front of a pulse than to the back. Even though absorption or amplication resonances may lie outside of the spectral envelope of a pulse taken in its entirety, the instantaneous spectrum on a portion of the pulse can momentarily lap onto or off of resonances in the medium. In view of (7.76) and (7.77) it is straightforward to predict when the electromagnetic energy of a pulse will exhibit superluminal or subluminal behavior. In section 7.7, we saw that this behavior is controlled by the group velocity function. However, with (7.76) and (7.77), it is not necessary to examine the group velocity directly, but only the imaginary part of the susceptibility (r, ). If the entire pulse passing through point r has a spectrum in the neighborhood of an amplifying resonance, but not on the resonance, superluminal behavior can result (Chiao effect). The instantaneous spectrum during the front portion of the pulse is generally wider and can therefore lap onto the nearby gain peak. The medium accordingly amplies this perceived spectrum, and the front of the pulse grows. The energy is then returned to the medium from the latter portion of the pulse as the instantaneous spectrum narrows and withdraws from the gain peak. The effect is not only consistent with the principle of causality, it is a direct and general consequence of causality as demonstrated by (7.76) and (7.77). As an illustration, consider the broadband waveform with 2 = 1/ described in section 7.7. Consider an amplifying medium with index shown in Fig. 7.14 with the amplifying resonance (negative oscillator strength) set on the frequency + 2, where is the carrier frequency. Thus, the resonance structure is 0 = centered a modest distance above the carrier frequency, and there is only minor spectral overlap between the pulse and the resonance structure. Superluminal behavior can occur in amplifying materials when the forward edge of a narrow-band pulse can receive extra amplication. Fig. 7.15(a) shows the broadband waveform experienced by the initial position r0 in the medium. Fig. 7.15(b) shows the real and imaginary parts of the refractive index in the neigh . Fig. 7.15(c) depicts the exchange energy borhood of the carrier frequency density u exchange as a function of time, where rapid oscillations have been averaged out. The overshooting of the curve indicates excess amplication during the early portion of the pulse. The energy is then returned (in part) to the medium during the later portion of the pulse, a clear indication of superluminal behavior. Fig. 7.15(d) displays the instantaneous power spectrum (used in computing u exchange ) evaluated at various times during the pulse. The corresponding times are indicated with vertical lines in both Figs. 7.15(a) and 7.21(c). The format of each vertical line matches a corresponding spectral curve. The instantaneous spectrum exhibits wings, which lap onto the nearby resonance and vary in strength depending on when the integral (7.77) truncates the pulse. As the wings grow and access the neighboring resonance, the pulse extracts excess energy from the
Figure 7.14 Real and imaginary parts of the refractive index for an amplifying medium.
194
1 (a)
Field Envelope
(b) 1
0.5
Index
0 1 2 3 -10
Ren Imn
0 -3 -2 -1 0
10
20
t
1 (d) 10
-2
0 (c) -0.1 -0.2
10
-6
-10
10
20
-3 -2 -1 0
Figure 7.15 (a) Electric eld envelope in units of E 0 . Vertical lines indicate times for assessment of the instantaneous spectrum. (b) Refractive index associated with an 2 amplifying resonance. (c) Exchange energy density in units of 0 E 0 /2. (d) Instantaneous 2 2 spectra of the eld pulse in units of E 0 / . Spectra are assessed at the times indicated in (a) and (c).
medium. As the wings diminish, the pulse surrenders that energy back to the medium, which gives the appearance of superluminal transit times.
Exercises
195
Exercises
Exercises for 7.2 Intensity P7.1 E 1 e i (kz t ) and x E 2 e i (kz t ) be two counter-propagating plane (a) Let x waves where E 1 and E 2 are both real. Show that their sum can be written as E tot (z ) e i ((z )t ) x where E tot (z ) = E 1 and (z ) = tan1 Outside the range 2 kz
2
E2 E1
+4
E2 cos2 kz E1
(1 E 2 /E 1 ) tan kz (1 + E 2 /E 1 ) the pattern repeats.
(b) Suppose that two counter-propagating laser elds have separate intensities, I 1 and I 2 = I 1 /100. The ratio of the elds is then E 2 /E 1 = 1/10. In the standing interference pattern that results, what is the ratio of the peak intensity to the minimum intensity ? Are you surprised how high this is? P7.2 Equation (7.11) implies that there is no interference between elds that are polarized along orthogonal dimensions. That is, the intensity of
)rt ] )rt ] E 0 e i [(k z E 0 e i [(k x E(r, t ) = x +y
according to (7.11) is uniform throughout space. Of course (7.11) does not apply since the k-vectors are not parallel. Show that the timeaverage of S (r, t ) according to (7.6) exhibits interference in the distribution of net energy ow.
Exercises for 7.3 Group vs. Phase Velocity: Sum of Two Plane Waves P7.3 Show that (7.12) can be written as E(r, t ) = 2E0 e
i
k2 +k1 2
2 +1 2
cos
k r t 2 2
From this show that the speed at which the rapid-oscillation peaks move in Fig. 7.1 is vp1 + vp2 2 P7.4 Conrm the right-hand side of (7.19).
196
Exercises for 7.4 Frequency Spectrum of Light P7.5 The continuous eld of a very narrowband continuous laser may be approximated as a pure plane wave: E(r, t ) = E0 e i (k0 z 0 t ) . Suppose the wave encounters a shutter at the plane z = 0. (a) Compute the power spectrum of the light before the shutter. HINT: The answer is proportional to the square of a delta function centered on 0 (see (0.45)). (b) Compute the power spectrum after the shutter if it is opened during the interval /2 t /2. Plot the result. Are you surprised that the shutter appears to create extra frequency components? HINT: Write your answer in terms of the sinc function dened by sinc sin /. P7.6 (a) Determine the Full-Width-at-Half-Maximum of the intensity (i.e. the width of I (r, t ) represented by t FWHM ) and of the power spectrum (i.e. the width of I (r, ) represented by FWHM ) for the Gaussian pulse dened in (7.26). HINT: Both answers are in terms of . (b) Give an uncertainty principle for the product of t FWHM and FWHM . P7.7 Verify (7.27) for the Gaussian pulse dened by (7.24) and (7.26).
Exercises for 7.6 Quadratic Dispersion P7.8 Suppose that the intensity of a Gaussian laser pulse has duration t FWHM = 25 fs with carrier frequency 0 corresponding to vac = 800 nm. The pulse goes through a lens of thickness = 1 cm (laser quality glass type BK7) with index of refraction given approximately by n () = 1.4948 + 0.016 0
What is the full-width-at-half-maximum of the intensity for the emerging pulse? HINT: For the input pulse we have = (see P 7.6). P7.9 If the pulse dened in (7.49) travels through the material for a very long distance z such that T (z ) (z ) and tan1 (z ) /2, show that the instantaneous frequency of the pulse is 0 + t 2z /v g 4 z
t FWHM 2 ln 2
Exercises
197
COMMENT: As the wave travels, the earlier part of the pulse oscillates more slowly than the later part. This is called chirp, and it means that the red frequencies get ahead of the blue ones since they experience a lower index.
Exercises for 7.7 Generalized Context for Group Delay P7.10 When the spectrum is narrow compared to features in a resonance (such as in Fig. 7.11), the reshaping delay (7.58) tends to zero and can be ignored. Show that when the spectrum is narrow the net group delay (7.56) reduces to Rek lim tG (r) = r When the spectrum is very broad the reshaping delay (7.58) also tends to zero and can be ignored. Show that when the spectrum is extremely broad, the net group delay reduces to lim tG (r) = r c
P7.11
assuming k and r are parallel. This implies that a sharply dened signal cannot travel faster than c . HINT: The real index of refraction n goes to unity far from resonance, and the imaginary part goes to zero. P7.12 Work through the derivation of (7.55). HINT: This somewhat lengthy derivation can be found in Optics Express 9, 506-518 (2001).
Chapter 8
Coherence Theory
8.1 Introduction
Most students of physics become familiar with a Michelson interferometer (shown in Fig. 8.1) early in their course work. This preliminary understanding is usually gained in terms of a single-frequency plane wave that travels through the instrument. A Michelson interferometer divides the initial beam into two identical beams and then delays one beam with respect to the other before bringing them back together. Depending on the relative path difference d (roundtrip by our convention) between the two arms of the system, the light can interfere constructively or destructively in the direction of the detector. One way to view the relative path difference is in terms of the relative time delay d /c . The intensity seen at the detector as a function of path difference is computed to be I det () = c 0 E0 e i (kz t ) + E0 e i (kz (t )) E0 e i (kz t ) + E0 e i (kz (t )) 2 c 0 = 2 E0 E 0 + 2E0 E0 cos() 2 = 2 I 0 [1 + cos()]
(8.1)
where I 0 c20 E0 E 0 is the intensity from one beam alone (when the other arm of the interferometer is blocked). This formula is familiar and it describes how the intensity at the detector oscillates between zero and four times the intensity of one beam alone. Notice that the intensity of one beam alone will be one fourth of the intensity originating from the source since it meets the beam splitter twice (assuming a 50:50 beam splitter). In this chapter, we consider what happens when light containing a continuous band of frequencies is sent through the interferometer. In section 8.2, we derive an appropriate replacement for (8.1), which describes the intensity arriving at the detector when broadband light is sent through the interferometer. We will nd that oscillations in the intensity at the detector become less pronounced as the mirror in one arm of the interferometer is scanned away from the position where the two paths are equal. Remarkably, this decrease in fringe visibility depends only 199
200
Chapter 8 Coherence Theory
Figure 8.1 Michelson interferometer.
upon the frequency content of the light without regard to whether the frequency components are organized into a short pulse or left as a longer pattern in time. In section 8.3, the concept of temporal coherence is explained in the context of what is observed in a Michelson interferometer. Section 8.4 gives an interpretation of the results in terms of the fringe visibility and the coherence length. In section 8.5, we discuss a practical application known as Fourier spectroscopy. This powerful technique makes it possible to deduce the spectral content of light using a Michelson interferometer. In section 8.6, we examine a Youngs two-slit setup and show how it is similar to a Michelson interferometer. Finally, the concept of spatial coherence is introduced in section 8.A in the context of a Youngs two-slit setup.
8.2 Michelson Interferometer

Consider a waveform E(t ) that has traveled through the rst arm of a Michelson interferometer to arrive at the detector in Fig. 8.1. Specically, E(t ) is the value of the eld at the detector when the second arm of the interferometer is blocked. The waveform E(t ) in general may be composed of many frequency components according to the inverse Fourier transform (7.20). For convenience we will think of E (t ) as a pulse containing a nite amount of energy. (We will comment on continuous light sources in the next section.) The beam that travels through the second arm of the interferometer is associated with the same waveform, albeit with a delay according to the path difference between the two arms. Thus, E (t ) indicates the eld at the detector from the second arm when the rst arm of the interferometer is blocked. Again, represents the round-trip delay of the adjustable path relative to the position where the two paths have equal lengths. The total eld at the detector is composed of the two waveforms: Edet (t , ) = E (t ) + E (t ) (8.2)
8.2 Michelson Interferometer
201
With (7.28) we compute the intensity at the detector: c 0 Edet (t , ) E det (t , ) 2 c 0 = E ( t ) E ( t ) + E ( t ) E ( t ) + E ( t ) E ( t ) + E ( t ) E ( t ) 2 c 0 = I ( t ) + I ( t ) + E ( t ) E ( t ) + E ( t ) E ( t ) 2 = I (t ) + I (t ) + c 0 Re E(t ) E (t ) (8.3) The function I (t ) stands for the intensity of one of the beams arriving at the detector while the opposite path of the interferometer is blocked. Notice that we have retained the dependence on t in I det (t , ) in addition to the dependence on the path delay . This allows us to accommodate pulses of light that have a time-varying envelope. The rapid oscillations of the light are automatically averaged away in I (t ), but not the slowly varying form of the pulse. The total energy (per area) accumulated at the detector is found by integrating the intensity over time. In other words, we let the detector integrate the energy of the entire pulse before taking a reading. For short laser pulses (sub-nanosecond), the detector automatically integrates the entire energy (per area) of the pulse since the detector cannot keep up with the detailed temporal variations of the pulse envelope. The integration of (8.3) over time yields I det (t , ) =

I det (t , ) d t =

I ( t )d t +
I (t ) d t + c 0 Re
E ( t ) E ( t ) d t
(8.4)
The nal integral remains unchanged if we take a Fourier transform followed by an inverse Fourier transform: 1 1 E ( t ) E ( t ) d t = d e i d e i E ( t ) E ( t ) d t 2 2 (8.5) The reason for this procedure is so that we can take advantage of the autocorrelation theorem (see P 0.30). We can use this theorem to replace the expression in brackets in (8.5): 1 2

Albert Abraham Michelson (1852 1931, United States) Michelson (pronounced Michael sun) was born in Poland, but he grew up in the rough mining towns of California. He joined the navy, and later returned to teach at the naval academy. Michelson was fascinated by the problem of determining the speed of light, and developed several experiments to measure it more carefully. He is probably most famous for his experiment conducted with Edward Morley to detect the motion of the earth through the ether. He won the Nobel prize in 1907 for his contributions to optics.
d e
E ( t ) E ( t ) d t =
2E () E () =
2 I () c 0
(8.6)
We can apply Parsevals theorem (see (7.27)) to the rst two integrals on the right-hand side of (8.4):

I (t )d t =

I ( t ) d t =
I () d
(8.7)
Notice that the middle integral is insensitive to the delay since the integral is performed over all time (i.e. a change of variables t = t converts the middle
202
integral into the rst). With the aid of (8.6) and (8.7), the accumulated energy (8.4) at the detector becomes

I det (t , ) d t = 2

I () d + 2Re
I ()e i d Re
I () e i d I () d

I () d 1 +
(8.8)
= 2
It is convenient to rewrite this in terms of the Degree of Coherence function ():

I det (t , ) d t = 2

I (t )d t 1 + Re ()
(8.9)
where ()
I () e i d
(8.10) I () d
Notice that in writing (8.9) we have again applied Parsevals theorem (8.7) to part of the equation. In summary, (8.9) describes the accumulated energy (per area) arriving to the detector after the Michelson interferometer. The dependence on the path delay is entirely contained in the function ().
8.3 Temporal Coherence

We could have derived (8.9) using another strategy, which may seem more intuitive than the approach in the previous section. Equation (8.1) gives the intensity at the detector when a single plane wave of frequency goes through the interferometer. Now suppose that a waveform composed of many frequencies is sent through the interferometer. The intensity associated with each frequency acts independently, obeying (8.1) individually. The total energy (per area) accumulated at the detector is then a linear superposition of the spectral intensities of all frequencies present:

I det (, ) d =

2 I () [1 + cos ()] d
(8.11)
While this procedure may seem obvious, the fact that we can do it is remarkable! Remember that it is usually the elds that we must add together before nding the intensity of the resulting superposition. The formula (8.11) with its superposition of intensities relies on the fact that the different frequencies inside the interferometer when time-averaged (over all time) do not interfere. Certainly,
8.3 Temporal Coherence
203
the elds at different frequencies do interfere (or beat in time). However, they constructively interfere as often as they destructively interfere, and over time it is as though the individual frequency components transmit independently. Again, in writing (8.11) we considered the light to be pulsed rather than continuous so that the integrals converge. We can manipulate (8.11) as follows: I () cos () d I det (, ) d = 2 I () d 1 + (8.12) I () d
This is the same as (8.8) since we can replace cos() with Re e i , and we can apply Parsevals theorem (8.7) to the other integrals. Thus, the above arguments lead to (8.9) and (8.10), in complete agreement with the previous section. Finally, let us consider the case of a continuous light source for which the integrals in (8.9) diverge. This is the case for starlight or for a continuous wave (CW) laser source. The integral I (t )d t diverges since a source that is on forever (or at least for a very long time) emits innite (or very much) energy. However, note that the integrals on both sides of (8.9) diverge in the same way. We can renormalize (8.9) in this case by replacing the integrals on each side with the average value of the intensity: 1 I ave I (t )t = T
T /2
I ( t )d t
T /2
(continuous source)
(8.13)
The duration T must be large enough to average over any uctuations that are present in the light source. The average in (8.13) should not be used on a pulsed light source since the result would depend on the duration T of the temporal window. In the continuous wave (CW) case (e.g. starlight or a CW laser), the signal at the detector (8.9) becomes I det (t , )t = 2 I (t )t 1 + Re () (continuous source) (8.14)
Although technically the integrals involved in computing () (8.10) also diverge in the case of CW light, the numerator and the denominator diverge in the same way. Therefore, we may renormalize I () in any way we like to deal with this problem, and this does not affect the nal result. Regardless of how large I () is, and regardless of the units on the measurement (volts or whatever), we can simply plug the instrument reading directly into (8.10). The units in the numerator and denominator cancel so that () always remains dimensionless. A very remarkable aspect of the above result is that the behavior of the light in the Michelson interferometer does not depend on the phase of E (). It depends only on the amount of light associated with each frequency component through
204
Figure 8.2 Re[()] (solid) and |()| (dashed) for a light pulse having a Gaussian spectrum (7.26).
0c I () 2 E () E (). When the light at one frequency undergoes constructive interference for a given path difference , the light at another frequency might undergo destructive interference. The net effect is given in the degree of coherence function (), which contains the essential information describing interference. Fig. 8.2 depicts the degree of coherence function as one arm of the interferometer is adjusted through various delays . In summary, narrowband light is temporally more coherent than broadband light because there is less interference between different frequencies.
8.4 Fringe Visibility and Coherence Length

The degree of coherence function () is responsible for oscillations in intensity at the detector as the mirror in one of the arms is moved. The real part Re () is analogous to cos() in (8.1). For large delays , the oscillations tend to die off as different frequencies individually interfere, some constructively, others destructively. For large path differences, the intensity at the detector tends to remain steady as the mirror is moved further. We dene the coherence time to be the amount of delay necessary to cause () to quit oscillating (i.e. its amplitude approaches zero). A useful (although arbitrary) denition for the coherence time is

( ) d = 2
0
( ) d
(8.15)
8.4 Fringe Visibility and Coherence Length
205
Figure 8.3 The output of a Michelson interferometer for a Gaussian spectrum (8.21)
The coherence length is the distance that light travels in this time:
c
c c
(8.16)
Another useful concept is fringe visibility. The fringe visibility is dened in the following way: I max I min V () (continuous) (8.17) I max + I min or V ()
E max E min E max + E min
(pulsed)
(8.18)
where E max max I det (t , ) d t refers to the accumulated energy (per area) at the detector when the mirror is positioned such that the amount of throughput to the detector is a local maximum (i.e. the left-hand side of (8.9)). E min refers to the accumulated energy at the detector when the mirror is positioned such that the amount of throughput to the detector is a local minimum. As the mirror moves a large distance from the equal-path-length position, the oscillations become less pronounced because the values of E min and E max tend to take on the same value, and the fringe visibility goes to zero. The fringe visibility goes to zero when () goes to zero. It is left as an exercise to show that the fringe visibility can be written as V () = () (8.19) In the case of a Gaussian spectral distribution (7.26) I () = I (0 ) e
0 2
(8.20)
206
the result of (8.10) is () = e i 0

()2 2 4
(8.21)
Figure 8.2 plots the magnitude and real part of (8.21). From (8.15) the coherence time is 2 c = (8.22) Figure 8.3 shows 1 + Re (), which is proportional to the energy (per area) arriving at the detector. As expected, the fringes die off for a delay interval of c .
8.5 Fourier Spectroscopy

As we have seen in the previous discussion, the signal output from a Michelson interferometer for a pulsed input is given by
Sig ()
I det (t , ) d t = 2
I (t ) d t 1 + Re ()
(8.23)
where
()
I ()e i d
(8.24) I ()d
Typically, the signal comes in the form of a voltage or a current from a sensor. However, the signal can be normalized to the signal level occurring when is large (i.e. fringe visibility goes to zero: () = 0). In this case, the normalized signal must approach
lim Sig () = 2E 0
(8.25)
where is the appropriate normalization constant that changes the proportionality (8.23) into an equation, and E0

I ( t )d t =
I ()d
(8.26)
denotes the total energy (per area) that would arrive at the detector from one arm of the interferometer (i.e. if the other arm were blocked). Given our measurement of Sig(), we would like to nd I (), or the spectrum of the light. Unfortunately, I () is buried within the integrals (8.23). However, since the denominator of () is constant (equal to E 0 ) and since the numerator of () looks like an inverse Fourier transform of I (), we are able to extract the desired spectrum after some manipulation. This procedure for extracting I () from an interferometric measurement is known as Fourier spectroscopy.
8.5 Fourier Spectroscopy
207
Figure 8.4 Depiction of F {Sig()}/ 2.
We now describe the procedure for obtaining I (). We can write the properly normalized signal (8.23) as
Sig () = 2E 0 + 2Re
I ()e i d
(8.27)
Next, we take the Fourier transform of this equation:
F Sig () = F {2E 0 } + F 2Re
I () e i d
(8.28)
The left-hand side is known since it is the measured data, and a computer can be employed to take the Fourier transform of it. The rst term on the right-hand side is the Fourier transform of a constant:
F {2E 0 } = 2E 0
1 2
e i d = 2E 0 2 ()
(8.29)
Notice that (8.29) is zero everywhere except where = 0, where a spike occurs. This represents the DC component of F Sig () .
208
The second term of (8.28) can be written as

F 2Re I () e i d = F I () e i d + I () e i d 1 1 I ( )e i d e i d + I ( )e i d e i d = 2 2 1 1 i ( ) i ( + ) = 2 I ( ) e d d + e d d I ( ) 2 2 = = 2

I ( ) d +
I ( ) + d
2 [ I () + I ()] (8.30)
With (8.29) and (8.30) we can write (8.28) as F Sig () 2 = 2E 0 () + I () + I () (8.31)
The Fourier transform of the measured signal is seen to contain three terms, one of which is the power spectrum that we are after, namely I (). Fortunately, when graphed as a function of (shown in Fig. 8.4), the three terms on the right-hand side typically do not overlap. As a reminder, the measured signal as a function of looks something like that in Fig. 8.3. The oscillation frequency of the fringes lies in the neighborhood of 0 . To obtain I () the procedure is clear: Record Sig (); if desired, normalize by its value at large ; take its Fourier transform; extract the curve at positive frequencies.
8.6 Youngs Two-Slit Setup and Spatial Coherence

In close analogy with the Michelson interferometer, which is able to investigate temporal coherence, the Youngs two-slit experiment can be used to investigate spatial coherence of quasi-monochromatic light. Thomas Young, who lived nearly a century before Michelson, used his two-slit setup for the rst conclusive demonstration that light is a wave. The Youngs two-slit setup and the Michelson interferometer have in common that two beams of light travel different paths and then interfere. In the Michelson interferometer, one path is delayed with respect to the other so that temporal effects can be studied. In the Youngs two-slit setup, two laterally separate points of the same wave are compared as they are sent through two slits. Depending on the coherence of the wave at the two points, the fringe pattern observed can exhibit good or poor visibility. Just as the Michelson interferometer is sensitive to the spectral content of light, the Youngs two-slit setup is sensitive to the spatial extent of the light source illuminating the two slits. For example, if light from a distant star (restricted by a lter to a narrow spectral range) is used to illuminate a double-slit setup,
209
the resulting interference pattern appearing on a subsequent screen contains information regarding the angular width of the star. Michelson was the rst to use this type of setup to measure the angular width of stars. Light emerging from a single ideal point source has wave fronts that are spatially uniform in a lateral sense (see Fig. 8.5). Such wave fronts are said to be spatially coherent, even if the temporal coherence is not perfect (i.e. if a range of frequencies is present). When spatially coherent light illuminates a Youngs two-slit setup, fringes of maximum visibility are seen at a distant screen, meaning the fringes vary between a maximum intensity and zero. If a larger source of light (with randomly varying phase across its extent) is used to illuminate the Youngs two-slit setup (see Fig. 8.6), the wave fronts at the two slits are less correlated, and the visibility of the fringes on the distant screen diminishes because fringes uctuate rapidly in time and partially wash out. We now consider the details of the Youngs two-slit setup. When both slits of a Youngs two-slit setup are illuminated with spatially coherent light, the resulting pattern on a far-away screen is given by I = 2 I 0 1 + cos k (d 2 d 1 ) + 2 1 = 2 I 0 1 + cos kh y /D + (8.32)
where 1 and 2 are the phases of the wave front at the two slits, respectively. Notice the close similarity with a Michelson interferometer (see (8.1)). Here the controlling variable is h (the separation of the slits) rather than (the delay introduced by moving a mirror in the Michelson interferometer). To obtain the nal expression in (8.32) we have made the approximations d1 y = and d2 y = y + h /2 + D 2 = D
2
Thomas Young (17731829, English) Young was a physician by trade, but studied widely in other elds. His double slit experiment gave convincing evidence of the wave nature of light. He also did extensive research into color vision. On the side, he translated hieroglyphics and studied many other languages.
y h /2 + D 2 = D
1+
y h /2 D2
y h /2 = D 1+ 2D 2
(8.33)
1+
y + h /2 D2
y + h /2 = D 1+ 2D 2
(8.34)
Figure 8.5 A point source produces coherent (locked phases) light. When this light which traverses two slits and arrives at a screen it produces a fringe pattern.
210
Figure 8.6 Light from an extended source is only partially coherent. Fringes are still possible, but they exhibit less contrast.
These approximations are valid as long as D y and D h . We now consider how to modify (8.32) so that it applies to the case when the two slits are illuminated by a host of point sources distributed over a nite lateral extent. This situation is depicted in Fig. 8.6 and it leads to partial spatial coherence when the phase of each emitter is random. Again, spatial coherence is a term used to describe whether the phase of the wave fronts at one slit are correlated with the phase of the wave fronts at the other slit. We will nd that a larger source gives less coherent wave fronts at the slits. To simplify our analysis, let us consider the many point sources to be arranged in one dimension (in the plane of the gure). We restrict the distribution of point sources to vary only in the y dimension. This ensures that the light has uniform phase along either slit (in and out of the plane of Fig. 8.6). We assume that the light is quasi-monochromatic so that its frequency is approximately with a phase that uctuates randomly over time intervals much longer than the period of oscillation 2/. This necessarily implies that there will be some frequency bandwidth, however small. The light emerging from the j th point at y j travels by means of two very narrow slits to a point y on a screen. Let E 1 ( y j ) and E 2 ( y j ) be the elds on the screen at y , each originating from the point y j and traveling respectively through the two slits. We suppress the vectorial nature of E 1 ( y j ) and E 2 ( y j ), and we ignore possible complications due to eld polarization. The total eld contribution at the screen from the j th point is obtained by adding E 1 ( y j ) and E 2 ( y j ). Let us make the assumption that E 1 ( y j ) and E 2 ( y j ) have the same amplitude |E ( y j )|. Thus, the two elds differ only in their phases according to the respective distances traveled to the screen. This allows us to write the two elds as E1(y j ) = E (y j ) e and E2(y j ) = E (y j ) e
i k r 2 ( y j )+d 2 ( y ) t +( y j ) i k r 1 ( y j )+d 1 ( y ) t +( y j )
(8.35)
(8.36)
211
Notice that we have explicitly included an arbitrary phase ( y j ), which is different for each point source. We now set about nding the cumulative eld at y arising from the many points indexed by the subscript j . We therefore sum over the index j . Again, for simplicity we have assumed that the point sources are distributed along one dimension, in the y -direction. The upcoming results can be generalized to a two-dimensional source where the point sources are distributed also in and out of the plane of Fig. 8.6. However, in this case, the slits should be replaced with two pinholes. The net eld on the screen at point y is E net (h ) =
j
E1(y j ) + E2(y j )
(8.37)
This net eld depends not only on h , but also on y , R , D , and k as well as on the phase ( y j ) at each point. Nevertheless, in the end we will mainly emphasize the dependence on the slit separation h . The intensity of this eld is I net (h ) = c |E net (h )|2 2 0c = E1(y j ) + E2(y j ) E1(ym ) + E2(ym ) 2 m j c 0 = E 1 ( y j )E 1 ( y m ) + E 2 ( y j )E 2 ( y m ) + 2ReE 1 ( y j )E 2 (ym ) 2 j ,m
0
(8.38)
When inserting the eld expressions (8.35) and (8.36) into this expression for the intensity at the screen, we get I net (h ) = c 2
0
j ,m
E (y j ) E (ym ) e
i k r 1 ( y j )r 1 ( y m )
i ( y j )( y m )
+ E (y j ) E (ym ) e
i k r 2 ( y j )r 2 ( y m )
i ( y j )( y m )
+2Re E ( y j ) E ( y m ) e
i k r 1 ( y j )r 2 ( y m )
e i k [d1 ( y )d2 ( y )] e
i ( y j )( y m )
(8.39) At this juncture we make a critical assumption that the phase of the emission ( y j ) varies in time independently at every point on the source. This assumption is appropriate for the emission from thermal sources such as starlight, a glowing lament (ltered to a narrow frequency range), or spontaneous emission from an excited gas or plasma. The assumption of random phase, however, is inappropriate for coherent sources such as laser light. We comment on this in Appendix 8.B. A wonderful simplication happens to (8.39) when ( y j ) ( y m ) varies randomly in time for j = m (i.e. when there is no correlation between the two phases). Keep in mind that to the extent that the phases vary in time, the frequency spectrum of the light broadens in competition with our quasi-monochromatic assumption. If we average the intensity over an extended time, then e
i ( y j )( y m )
212
averages to zero unless we have j = m in which case the factor reduces to e 0 which is always one. Thus, we have e
i ( y j )( y m ) t
= j ,m
1 if j = m , 0 if j = m .
(random phase assumption) (8.40)
The function j ,m is known as the Kronecker delta function. The time-averaged intensity under the random-phase assumption (8.40) becomes I net (h )t =
j
I (y j ) +
I ( y j ) + 2Re
I ( y j )e
i k r 1 ( y j )r 2 ( y j )
e i k [d1 ( y )d2 ( y )] (8.41)
We may use (8.33) to simplify d 1 ( y ) d 2 ( y ) = h y /D , and similarly, we may simplify r1(y j ) r2(y j ) = y j h /R with the approximations
2 y j h /2 + R 2 = R 1 +
r1(y j ) = and r2(y j ) =
y j h /2 2R 2
(8.42)
y j + h /2 + R 2 = R 1 +
2
y j + h /2 2R 2
(8.43)
With these simplications, (8.41) becomes I net (h )t = 2

j
I y j + 2Ree i
kh y D
kh y
I y j e i
(random phase assumption)
(8.44) The only thing left to do is to put this formula into a slightly more familiar form: I net (h )t = 2
j
I yj
1 + Re (h )
(8.45)
where e i (h )
kh y D
kh y
I y j e i I yj
(8.46)
Students should notice the close similarity to the Michelson interferometer, (8.9) and (8.10). As before, (h ) is known as the degree of coherence, in this case spatial coherence. It controls the fringe pattern seen at the screen. The factor exp i kh y /D denes the positions of the periodic fringes on the screen. The remainder of (8.46) controls the depth of the fringes as the slit separation h is varied. When the slit separation h increases, the amplitude of (h ) tends to diminish until the intensity at the screen becomes uniform. When
8.A Spatial Coherence with a Continuous Source
213
the two slits have very small separation (such that e i R = 1 wherever I ( y ) is signicant) then we have (h ) = 1 and very good fringe visibility results. As the slit separation h increases, the fringe visibility V (h ) = (h )
kh y
(8.47)
diminishes, eventually approaching zero (see (8.19)). In analogy to the temporal case (see (8.15)), we can dene a slit separation sufciently large to make the fringes at the screen disappear:
hc 2
0
(h ) d h
(8.48)
We can generalize (8.46) so that it applies to the case of a continuous distribution of light as opposed to a collection of discrete point sources. In Appendix 8.A we show how summations in (8.45) and (8.46) become integrals over the source intensity distribution, and we write I net (h )t = 2 I oneslit t 1 + Re (h ) where e i (h )
kh y D
(8.49)
I ( y )e i I ( y )d y
kh y R
dy (8.50)
Note that I ( y ) has units of intensity per length in this expression.
Appendix 8.A Spatial Coherence with a Continuous Source

In this appendix we examine the coherence of light from a continuous spatial distribution (as opposed to a collection of discrete point sources) and justify (8.50) and (8.47) under the assumption of randomly varying phase at the source. We begin by replacing the summations in (8.39) with integrals over a continuous emission source. As we do this, we must consider the eld contributions to be in units of eld per length of the extended source. We make the following replacements:
214
E1(y j )
1 2 1 2 1 2 1 2
E 1 ( y )d y

E1(ym )
E 1 ( y )d y

(8.51) E 2 ( y )d y
E2(y j )
E2(ym )
E 2 ( y )d y
We include the factor 1/ 2 here as part of the denition of the eld distributions for later convenience. With the above replacements, (8.39) becomes
c 0 1 I net (h ) = 2 2 + 1 2

E (y ) e
i kr 1 ( y ) i ( y )
dy

E ( y ) e i kr 1 ( y ) e i ( y ) d y
E ( y ) e i kr 2 ( y ) e i ( y ) d y

E ( y ) e i kr 2 ( y ) e i ( y ) d y
e i k [d1 ( y )d2 ( y )] +2Re 2
E (y ) e
i kr 2 ( y ) i ( y )
E (y ) e
i kr 1 ( y ) i ( y )
dy
dy
(8.52)
The next step is to make the average over random phases. Rather than deal with a time average of randomly varying phases, we will instead work with a linear superposition of all conceivable phase factors. That is, we will write the phase as ( y j ) K y , where K is a parameter with units of inverse length, which we allow to take on all possible real values with uniform likelihood. The way we modify (8.40) for the continuous case is then
i ( y j )( y m ) t
= j ,m
e i K ( y y ) d K = 2( y y )
(8.53)
Instead of taking the time average, we integrate both sides of (8.52) over all possible values of the phase parameter K , whereupon the delta function in (8.53) naturally arises on the right-hand side of the equation.
8.B The van Cittert-Zernike Theorem
215
When (8.52) is integrated over K , the result is
c 0 I net (h ) d K = 2
E (y ) e
i kr 1 ( y
)d y

E y
e i kr 1 ( y ) y y d y
E ( y ) e i kr 2 ( y ) d y
i k [d 1 ( y )d 2 ( y )]
E ( y ) e i kr 2 ( y ) y y d y
E (y ) e
i kr 2 ( y
+2Ree
E y
i kr 1 ( y )
dy
) y y d y
(random phase assumption) (8.54)
It may seem strange at rst that the left-hand side of (8.54) has units of intensity per unit length. This is somewhat abstract. However, these units result from the natural way of dealing with the random phases when the source is continuous. As K varies, the phase distribution at the source varies. The integral in (8.54) averages all of these possibilities. The delta functions in (8.54) allow us to perform another stage of integration for each term on the right-hand side. We can also make substitutions from (8.33), (8.34), (8.42) and (8.43). The result is

I net (h ) d K = 2

I ( y )d y +2Ree
kh y D
I ( y )e i
kh y R
d y (random phase assumption) (8.55)
where I (y ) 1 0c E (y ) 2
2
(8.56)
Notice that I ( y ) in the present context has units of intensity per length squared since E ( y ) has units of eld per length. As they should, the units on the two sides of (8.55) match, both having units of intensity per length. (Recall that K has units of per length and I net (h ) has usual units of intensity.) We can renormalize these strange units on each side of the equation. We can redene the left-hand side I net (h ) d K to be the intensity at the screen and the integral on the right-hand side I ( y )d y to be the intensity at the screen when only one slit is open. Then (8.55) reduces to (8.49) and (8.50).
Appendix 8.B The van Cittert-Zernike Theorem

In this appendix we avoid making the assumption of randomly varying phase. This would be the case when the source of light is, for example, a laser. By
216
substituting (8.35) and (8.36) into (8.52) we have

I net (h ) =
0c
2 2 + 2Re
kh y D
E ( y ) e i ( y )+i
ky 2 2R
kh y 2R
dy
E (y ) e
i ( y )+i
ky 2 2R
kh y 2R
dy
dy
ei
E y
e i ( y )+i
ky 2 2R
e i
kh y 2R
dy
E y
e i ( y )+i
ky 2 2R
ei
kh y 2R
(8.57)
The three terms on the right-hand side of (8.57) can be understood as follows. The rst term is the intensity on the screen when the lower slit is covered. The second term is the intensity on the screen when the upper slit is covered. The last term is the interference term, which modies the sum of the individual intensities when both slits are uncovered. Notice the occurrence of Fourier transforms (over position) on the quantities inside of the square brackets. Later, when we study diffraction theory, we will recognize these transforms. The Fourier transforms here determine the strength of elds impinging on the individual slits. We have essentially worked out diffraction theory for this specic case. The appearance of the strength of the eld illuminating each of the slits explains the major difference between the coherent source and the random-phase source. With the random-phase source, the slits are always illuminated with the same strength regardless of the separation. However, with a coherent source, beaming can occur such that the strength (and phase) of the eld at each slit depends on its exact position. A wonderful simplication occurs when the phase of the emitted light has the following distribution: ( y ) = ky 2 2R (converging spherical wave) (8.58)
Equation (8.58) is not as arbitrary as it may rst appear. The particular phase is an approximation to a concave spherical wave front converging to the center between the two slits. This type of wave front is created when a plane wave passes through a lens. With the special phase (8.58), the intensity (8.57) reduces to c I net (h ) = 2
0
1 2 ei
kh y D
E (y ) e

kh y 2R
dy
1 2
E y
kh y 2R
dy
+2Re
E ( y ) e i
kh y 2R
1 dy 2
E (y ) e i
kh y 2R
(8.59) dy
(converging spherical wave) There is a close resemblance between the expression 1 2
|E slit one (h /2)|
E ( y ) e i
kh y 2R
dy
(8.60)
8.B The van Cittert-Zernike Theorem
217
and the magnitude of the degree of coherence V = (h ) from (8.50). Here E slitone denotes the eld impinging on the screen that goes through the upper slit positioned at a distance h /2 from center. The eld strength when the single slit is positioned at h compared to that when it is positioned at zero is
E slit one (h ) = E slit one (0)
E ( y ) e i
kh y R
dy (converging spherical wave assumption)
E (y ) d y
(8.61) This looks very much like (h ) of (8.50) except that the magnitude of the eld appears in (8.61), whereas the intensity appears in (8.50). If we replace the eld in (8.61) with one that is proportional to the intensity 2 (i.e. E new y I ( y ) E old ( y ) ), then the expression becomes the same as (8.50). This may seem rather contrived, but at least it is cute, and it is known as the van Cittert-Zernike theorem. It says that the spatial coherence of an extended source with randomly varying phase corresponds to the eld distribution created by replacing the extended source with a converging spherical wave whose eld amplitude distribution is the same as the original intensity distribution.
218
Exercises
Exercises for 8.3 Temporal Coherence P8.1 Show that Re () dened in (8.10) reduces to cos (0 ) in the case of a plane wave E (t ) = E 0 e i (k0 z 0 t ) being sent through a Michelson interferometer. In other words, the output intensity from the interferometer reduces to I = 2 I 0 [1 + cos (0 )] as you already expect. HINT: Dont be afraid of delta functions. After integration, the left-over delta functions cancel. P8.2 Light emerging from a dense hot gas has a collisionally broadened power spectrum described by the Lorentzian function I () = 1+ I (0 )
2 0 FWHM /2
The light is sent into a Michelson interferometer. Make a graph of the average power arriving to the detector as a function of . HINT: See (0.53). P8.3 (a) Regardless of how the phase of E () is organized, the oscillation of the energy arriving to the detector as a function of is the same. The spectral phase of the light in P 8.2 is randomly organized. Describe qualitatively how the light probably looks as a function of time. (b) Now suppose that the phase of the light is somehow neatly organized such that i E (0 ) e i c z E () = 0 i + FWHM /2 Perform the inverse Fourier transform on the eld and nd how the intensity of the light looks a function of time. HINT:
e i ax dx = x +
2i e i a 0
if a >0 if a <0
Im > 0
The constants I (0 ), and FWHM will appear in the answer.
Exercises
219
Exercises for 8.4 Fringe Visibility and Coherence Length P8.4 (a) Verify (8.19). HINT: Write = e i and assume that the oscillations in that give rise to fringes are due entirely to changes in and that is a slowly varying function in comparison to the oscillations. (b) What is the coherence time c of the light in P 8.2? P8.5 (a) Show that the fringe visibility of the Gaussian distribution (8.20) (i.e. the magnitude of in (8.21)) goes from 1 to e /2 = 0.21 as the roundtrip path in one arm of the instrument is extended by a coherence length. (b) Find the FWHM bandwidth in wavelength FWHM in terms of the coherence length c and the center wavelength 0 associated with (8.20). HINT: Derive FWHM = 2 ln 2. To convert to a wavelength differc c . You can ignore the minus sign; it ence, use = 2 = 2 2 simply means that wavelength decreases as frequency increases.
Exercises for 8.5 Fourier Spectroscopy L8.6 (a) Use a scanning Michelson interferometer to measure the wavelength of ultrashort laser pulses produced by a mode-locked Ti:sapphire oscillator.
Figure 8.7 (b) Measure the coherence length of the source by observing the distance over which the visibility diminishes. From your measurement,
220
what is the bandwidth FWHM of the source, assuming the Gaussian prole in the previous problem? See P 8.5. (c) Use a computer to perform a fast Fourier transform (FFT) of the signal output. For the positive frequencies, plot the laser spectrum as a function of and compare with the results of (a) and (b). (d) How do the results change if the ultrashort pulses are rst stretched in time by traversing a thick piece of glass?
Exercises for 8.6 Youngs Two-Slit Setup and Spatial Coherence P8.7 (a) A point source with wavelength = 500 nm illuminates two parallel slits separated by h = 1.0 mm. If the screen is D = 2 m away, what is the separation between the diffraction peaks on the screen? Make a sketch. (b) A thin piece of glass with thickness d = 0.01 mm and index n = 1.5 is placed in front of one of the slits. By how many fringes does the pattern at the screen move? HINT: This effectively introduces a relative phase in (8.32). Compare the phase of the light when traversing the glass versus traversing an empty region of the same thickness. L8.8 (a) Carefully measure the separation of a double slit in the lab (h 1 mm separation) by shining a HeNe laser ( = 633 nm) through it and measuring the diffraction peak separations on a distant wall (say, 2 m from the slits). HINT: For better accuracy, measure across several fringes and divide.
Figure 8.8 (b) Create an extended light source with a HeNe laser using a timevarying diffuser followed by an adjustable single slit. (The diffuser must rotate rapidly to create random time variation of the phase at each point as would occur automatically for a natural source such as a star.) Place the double slit at a distance of R 100 cm after the
Exercises
221
rst slit. (Take note of the exact value of R , as you will need it for the next problem.) Use a lens to image the diffraction pattern that would have appeared on a far-away screen into a video camera. Observe the visibility of the fringes. Adjust the width of the source with the single slit until the visibility of the fringes disappears. After making the source wide enough to cause the fringe pattern to degrade, measure the single slit width a by shining a HeNe laser through it and observing the diffraction pattern on the distant wall. HINT: A single slit of width a produces an intensity pattern described by Eq. (11.45) with N = 1 and x = a . NOTE: It would have been nicer to vary the separation of the two slits to determine the width of a xed source. However, because it is hard to make an adjustable double slit, we varied the size of the source until the spatial coherence of the light matched the slit separation. P8.9 (a) Compute h c for a uniform intensity distribution of width a using (8.48). (b) Use this formula to check that your measurements in L 8.8 agree with spatial coherence theory. HINT: In your experiment h c is the double slit separation. Use your measured R and h to calculate what the width of the single slit (i.e. a ) should have been when the fringes disappeared and compare this calculation to your direct measurement of a .
Solution: (This is only a partial solution)

y
a /2
(h ) =
a /2
I 0 exp i kh R + D
a /2
dy =
y e i kh D
a /2 a /2
y e i kh R
dy =
e i kh D e
i kh
y R i kh R
a /2
a /2
I0d y
a /2
e = e i kh D
/2 i kh aR
e i kh
a /2 R
/2 2i kh aR
= e i kh D sinc kha 2R
Note that
sin2 x (x )2
dx =
Review, Chapters 68
True and False Questions R24 T or F: It is always possible to completely eliminate reections with a single-layer antireection coating as long as the right thickness is chosen for a given real index. T or F: For a given incident angle and value of n , there is only one single-layer coating thickness d that will minimize reections. T or F: When coating each surface of a lens with a single-layer antireection coating, the thickness of the coating on the exit surface will need to be different from the thickness of the coating on the entry surface. T or F: In our notation (widely used), I (t ) is the Fourier transform of I (). T or F: The integral of I (t ) over all t equals the integral of I () over all . T or F: The phase velocity of light (the speed of an individual frequency component of the eld) never exceeds the speed of light c . T or F: The group velocity of light in a homogeneous material can exceed c if absorption or amplication takes place. T or F: The group velocity of light never exceeds the phase velocity. T or F: A Michelson interferometer can be used to measure the spectral intensity of light I (). T or F: A Michelson interferometer can be used to measure the duration of a short laser pulse and thereby characterize its chirp. T or F: A Michelson interferometer can be used to measure the wavelength of light. 223
R25
R26
R27
R28
R29
R30
R31 R32
R33
R34
224
Review, Chapters 68
R35
T or F: A Michelson interferometer can be used to measure the phase of E (). T or F: The Fourier transform (or inverse Fourier transform if you prefer) of I () is proportional to the degree of temporal coherence. T or F: A Michelson interferometer is ideal for measuring the spatial coherence of light. T or F: The Youngs two-slit setup is ideal for measuring the temporal coherence of light. T or F: Vertically polarized light illuminates a Youngs double-slit setup and fringes are seen on a distant screen with good visibility. A half wave plate is placed in front of one of the slits so that the polarization for that slit becomes horizontally polarized. Heres the statement: The fringes at the screen will shift position but maintain their good visibility.
R36
R37
R38
R39
Problems R40 A thin glass plate with index n = 1.5 is oriented at Brewsters angle so that p -polarized light with wavelength vac = 500 nm goes through with 100% transmittance. (a) What is the minimum thickness that will make the reection of s -polarized light be maximum? (b) What is the transmittance T stot for this thickness assuming s -polarized light? R41 Consider a Fabry-Perot interferometer. Note: R 1 = R 2 = R . (a) Show that the free spectral range for a Fabry-Perot interferometer is FSR = 2 2nd cos
(b) Show that the fringe width FWHM is 2 F nd cos where F

4R . (1R )2
(c) Derive the reecting nesse f = FSR /FWHM . R42 For a Fabry-Perot etalon, let R = 0.90, vac = 500 nm, n = 1, and d = 5.0 mm.
225
(a) Suppose that a maximum transmittance occurs at the angle = 0. What is the nearest angle where the transmittance will be half of the maximum transmittance? You may assume that cos = 1 2 /2. (b) You desire to use a Fabry-Perot etalon to view the light from a large diffuse source rather than a point source. Draw a diagram depicting where lenses should be placed, indicating relevant distances. Explain briey how it works. R43 You need to make an antireective coating for a glass lens designed to work at normal incidence.
Figure 8.9 The matrix equation relating the incident eld to the reected and transmitted elds (at normal incidence) is 1 n0 + 1 n 0
reected E0 incident E0
cos k 1 i n 1 sin k 1
i n1
sin k 1 cos k 1
1 nt
transmitted Et incident E0
(a) What is the minimum thickness the coating should have? HINT: It is less work if you can gure this out without referring to the above equation. You may assume n 1 < n t . (b) Find the index of refraction n 1 that will make the reectivity be zero. R44 (a) What is the spectral content (i.e., I ()) of a square laser pulse E (t ) = E 0 e i 0 t 0 , |t | /2 , |t | > /2
Make a sketch of I (), indicating the location of the rst zeros. (b) What is the temporal shape (i.e., I (t )) of a light pulse with frequency content E 0 , | 0 | /2 E () = 0 , | 0 | > /2 where in this case E 0 has units of E-eld per frequency. Make a sketch of I (t ), indicating the location of the rst zeros. (c) If E () is known (any arbitrary function, not the same as above), and the light goes through a material of thickness and index of refraction n (), how would you nd the form of the pulse E (t ) after passing through the material? Please set up the integral.
226
Review, Chapters 68
R45
(a) Prove Parsevals theorem:

|E ()| d =

| E ( t )| 2 d t .
HINT: 1 t t = 2
e i (t t ) d
(b) Explain the physical relevance of Parsevals theorem to light pulses. Suppose that you have a detector that measures the total energy in a pulse of light, say 1 mJ directed onto an area of 1 mm2 . Next you measure the spectrum of light and nd it to have a width of = 50 nm, centered at 0 = 800 nm. Assume that the light has a Gaussian frequency prole I () = I (0 )e Use as an approximate value = units for I (0 ). HINT:
2 0 2
2 c . 2
Find a value and correct
e Ax
+B x +C
dx =
B 2 /4 A +C e A
Re { A } > 0
R46
Continuous light entering a Michelson interferometer has a spectrum described by I 0 , | 0 | /2 I () = 0 , | 0 | > /2 The Michelson interferometer uses a 50:50 beam splitter. The emerging light has intensity I det (t , )t = 2 I (t )t 1 + Re () , where degree of coherence is

() =
I () e
I ()d
Find the fringe visibility V ( I max I min )/( I max + I min ) as a function of (i.e. the round-trip delay due to moving one of the mirrors). R47 Light emerging from a point travels by means of two very narrow slits to a point y on a screen. The intensity at the screen arising from a point source at position y is found to be I screen y , h = 2 I ( y ) 1 + cos kh y y + D R
where an approximation has restricted us to small angles.

227
Figure 8.10 (a) Now, suppose that I ( y ) characterizes emission from a wider source with randomly varying phase across its width. Write down an expression (in integral form) for the resulting intensity at the screen:
I screen (h )
I screen y , h d y
(b) Assume that the source has an emission distribution with the form 2 2 I ( y ) = I 0 / y e y / y . What is the function (h ) where the intensity is written I screen (h ) = 2 I 0 1 + Re(h ) ? HINT:
e Ax
+B x +C
dx =
B 2 /4 A +C e A
Re { A } > 0.
(c) As h varies, the intensity at a point on the screen y oscillates. As h grows wider, the amplitude of oscillations decreases. How wide must the slit separation h become (in terms of R , k , and y ) to reduce the visibility to I max I min 1 V = I max + I min 3
Selected Answers
R40: (a) 100 nm. (b) 0.55. R42: (a) 0.074 . R43: (b) 1.24. R45: (b) 3.8 1016 J/ cm2 s1 .
Chapter 9
Light as Rays
9.1 Introduction
So far in our study of optics, we have described light in terms of waves, which satisfy Maxwells equations. However, as is well known to students, in many situations light can be thought of as rays directed along the ow of energy. A ray picture is useful when one is interested in the macroscopic distribution of light energy, but rays fail to reveal how intensity varies when light is concentrated in small regions of space. Moreover, simple ray theory suggests that a lens can focus light down to a point. However, if a beam of light were concentrated onto a true point, the intensity would be innite! In this scenario ray theory can clearly not be used to predict the intensity prole in a focus. In this case, it is necessary to consider waves and diffraction phenomena. Nevertheless, ray theory is useful for predicting where a focus occurs. It is also useful for describing imaging properties of optical systems (e.g. lenses and mirrors). Beginning in section 9.4 we study the details of ray theory and the imaging properties of optical systems. First, however, we examine the justication for ray theory starting from Maxwells equations. Section 9.2 gives a derivation of the eikonal equation, which governs the direction of rays in a medium with an index of refraction that varies with position. The word eikonal comes from the Greek s from which the modern word icon derives. The eikonal equation therefore has a descriptive title since it controls the formation of images. Although we will not use the eikonal equation extensively, we will show how it embodies the underlying justication for ray theory. As will be apparent in its derivation, the eikonal equation relies on an approximation that the features of interest in the light distribution are large relative to the wavelength of the light. The eikonal equation describes the ow of energy in an optical medium. This applies even to complicated situations such as desert mirages where air is heated near the ground and has a different index than the air further from the ground. Rays of light from the sky that initially are directed toward the ground can be bent such that they travel parallel to the ground owing to the inhomogeneous refractive index. If the index of refraction as a function of position is known, the 229
230
Chapter 9 Light as Rays
eikonal equation can be used to determine the propagation of such rays. This also applies to practical problems such as the propagation of rays through lenses (where the index also varies with position). In section 9.3, we deduce Fermats principle from the eikonal equation. Of course Fermat asserted his principle more than a century before Maxwell assembled his equations, but it is nice to give justication retroactively to Fermats principle using the modern perspective. In short, Fermat asserted that light travels from point A to point B following a path that takes the minimum time. In section 9.4, we begin our study of paraxial ray theory, which is used to analyze the propagation of rays through optical systems composed of lenses and/or curved mirrors. The paraxial approximation restricts rays to travel nearly parallel to the axis of such a system. We consider the effects of three different optical elements acting on paraxial rays. The rst element is simply the unobstructed propagation of a ray through a distance d in a uniform medium; if the ray is not exactly parallel to the optical axis, then it moves further away from (or closer to) the optical axis as it travels. The second element is a curved spherical mirror, which reects a ray and changes its angle. The third element, which is similar, is a spherical interface between two materials with differing refractive indices. We demonstrate that the effects of each of these elements on a ray of light can be represented as a 2 2 matrix. These three basic elements can be combined to construct more complex imaging systems (such as a lens or a series of lenses and curved mirrors). The overall effect of a complex system on a ray can be computed by multiplying together the matrices associated with each of the basic elements. We discuss the condition for image formation in section 9.6 and make contact with the familiar formula 1 1 1 = + f do di (9.1)
which describes the location of images produced by curved mirrors or thin lenses. In section 9.7 we introduce the concept of principal planes, which exist for multielement optical systems. If the distance d o is measured from one principal plane while d i is measured from a second principal plane, then the thin lens formula (9.1) can be applied even to complicated systems with an appropriate effective focal length f eff . Finally, in section 9.8 we use paraxial ray theory to study the stability of laser cavities. The ray formalism can be used to predict whether a ray, after many round trips in the cavity, remains near the optical axis (trapped and therefore stable) or if it drifts endlessly away from the axis of the cavity on successive round trips. In appendix 9.9 we address deviations from the paraxial ray theory known as aberrations. We also comment on ray-tracing techniques, used for designing optical systems that minimize such aberrations.
9.2 The Eikonal Equation
231
9.2 The Eikonal Equation

We begin with the wave equation (2.21) for a medium with a real index of refraction: n 2 (r) 2 E (r, t ) 2 E(r, t ) 2 =0 (9.2) c t 2 Although in chapter 2 we considered solutions to the wave equation in a homogeneous material, the wave equation is also perfectly valid when the index of refraction varies throughout space. Here we allow the medium (i.e. the density) to vary with position. Hence the index n (r) is an arbitrary function of r. In this case, the usual plane-wave solutions no longer satisfy the wave equation. We consider the light to have a single frequency . As a trial solution for (9.2), we take E(r, t ) = E0 (r) e i [kvac R (r)t ] (9.3) where k vac = 2 = c vac
(9.4)
Here R (r) is a real scalar function (which depends on position) having the dimension of length. By assuming that R (r) is real, we do not account for absorption or amplication in the medium. Even though the trial solution (9.3) looks somewhat like a plane wave, the function R (r) accommodates wave fronts that can be curved or distorted as depicted in Fig. 9.1. At any given instant t , the phase of the curved surfaces described by R (r) = constant can be interpreted as wave fronts of the solution. The wave fronts travel in the direction for which R (r) varies the
Figure 9.1 Wave fronts distributed throughout space in the presence of a spatially inhomogeneous refractive index.
232
fastest. This direction is given by R (r), which lies in the direction perpendicular to surfaces of constant phase. Note that if the index is spatially independent (i.e. n (r) n ), then (9.3) reduces to the usual plane-wave solution of the wave equation. In this case, we have R (r) = k r/k vac and the eld amplitude becomes constant (i.e. E0 (r) E0 ). The substitution of the trial solution (9.3) into the wave equation (9.2) gives 2 E0 (r) e i [kvac R (r)t ] + n 2 (r) 2 E0 (r) e i [kvac R (r)t ] = 0 c2 (9.5)
We divide each term by e i t and utilize (9.4) to rewrite the wave equation as 1
2 k vac
2 E0 (r) e i kvac R (r) + n 2 (r) E0 (r) e i kvac R (r) = 0
(9.6)
Our next task is to evaluate the spatial derivative, which is worked out in the following example.
Example 9.1
Compute the Laplacian needed in (9.6).
Solution: The gradient of the x component of the eld is E 0x (r) e i kvac R (r) = [E 0x (r)] e i kvac R (r) + i k vac E 0x (r) [R (r)] e i kvac R (r) The Laplacian of the x component is
2 E 0x (r) [R (r)] [R (r)] E 0x (r) e i kvac R (r) = 2 E 0x (r) k vac
+i k vac E 0x (r) 2 R (r) + 2i k vac [E 0x (r)] [R (r)] e i kvac R (r) Upon combining the result for each vector component of E0 (r), the required spatial derivative can be written as
2 2 E0 (r) e i kvac R (r) = 2 E0 (r) k vac E0 (r) [R (r)] [R (r)] + i k vac E0 (r) 2 R (r)
[E 0x (r)] [R (r)] + y E 0 y (r) [R (r)] +2i k vac x [E 0z (r)] [R (r)]}) e i kvac R (r) +z
Using the result from Example 9.1 with some additional rearranging, (9.6) becomes R (r) R (r) n 2 (r) E0 (r) = 2 E0 (r)
2 k vac
k vac 2i E 0z (r) R (r) E 0 y (r) R (r) + z + y k vac (9.7)

2 R (r) +
2i E 0x (r) R (r) x k vac
9.3 Fermats Principle
233
At this point we are ready to make an important approximation. We take the limit of a very short wavelength (i.e. 1/k vac = vac /2 0). This means that we lose the effects of diffraction. We also lose surface reections at abrupt index changes unless specically considered. This approximation works best in situations where only macroscopic features are of concern. Under the assumption of an innitesimal wavelength, the entire right-hand side of (9.7) vanishes (thank goodness) and the wave equation imposes [R (r)] [R (r)] = n 2 (r) , Written another way, this equation is (r) R (r) = n (r) s (9.9) (9.8)
is a unit vector pointing in This latter form is called the eikonal equation where s the direction R (r), the direction normal to wave front surfaces. Under the assumption of an innitely short wavelength, the Poynting vector as demonstrated in P 9.2. In other words, the direction of is directed along s at each location in s species the direction of energy ow. The unit vector s space points perpendicular to the wave fronts and indicates the direction that the distributed waves travel as seen in Fig. 9.1. We refer to a collection of vectors s throughout space as rays. In retrospect, we might have jumped straight to (9.9) without going through the above derivation. After all, we know that each part of a wave front advances in the direction of its gradient R (r) (i.e. in the direction that R (r) varies most rapidly). We also know that each part of a wave front dened by R (r) = constant travels at speed c /n (r). The slower a given part of the wave front advances, the more rapidly R (r) changes with position r and the closer the contours of constant phase. It follows that R (r) must be proportional to n (r) since R (r) denotes the rate of change in R (r).

The eikonal equation (9.9) governs the path that rays follow as they traverse a region of space, where the index varies as a function of position. An analysis of the eikonal equation renders Fermats principle as we now show. We begin by taking the curl of (9.9) to obtain (r)] = [R (r)] = 0 [ n (r ) s (9.10)
(The curl of a gradient is identically zero for any function R (r).) Integration of (9.10) over an open surface of area A results in (r)] d a = 0 [ n (r ) s
A
(9.11)
234
We next apply Stokes theorem (0.12) to the integral and convert it to a path integral around the perimeter of the area. Then we get (r) d =0 n (r ) s
C
(9.12)
d around a closed loop is always zero. Keep in mind that The integration of n s (r) must be used, and this is determined by the eikonal the proper value for s equation (9.9). Equation (9.12) implies Fermats principle, but to see this fact requires some subtle arguments. Equation (9.12) implies the following:
B
d ns
A
is independent of path from A to B.
(9.13)
Figure 9.2 A ray of light leaving point A arriving at B.
(i.e. Now consider points A and B that lie along a path that is always parallel to s perpendicular to the wave fronts as depicted in Fig. 9.2). When integrating along , the cosine in the dot product in (9.13) is always one. If we the path parallel to s choose some other path that connects A and B, the cosine associated with the dot product is often less than one. Since in both cases the result of the integral must be the same, the other factors inside the integral must render a larger value to compensate for the cosine terms occasional dip below unity when the path is not . Thus, if we articially remove the dot product from the integral (i.e. parallel to s exclude the cosine factor), the result of the integral is smallest when the path is . taken along the direction of s With the dot product removed from (9.13), the result of the integration agrees (i.e. only for the path that with the true result only for the path taken along s corresponds to the one that light rays actually follow). In mathematical form, this argument can be expressed as
B
d = min ns
A
nd
A
(9.14)
The integral on the right gives the optical path length (OP L ) between A and B:
B
OP L |B A
nd
(9.15)
where the n in general can be different for each of the incremental distances d . The conclusion is that the true path that light follows between two points (i.e. the ) is the one with smallest optical path length. one that follows along s Fermats principle is usually stated in terms of the time it takes light to travel between points. The travel time t depends not only on the path taken by the
235
light but also on the velocity of the light v (r), which varies spatially with the refractive index: B B OP L |B d d A B = = (9.16) t | A = v (r ) c /n (r) c
A A
Fermats principle is then described as follows: Consider a source of light at some point A in space. Rays may emanate from point A in many different directions. Now consider another point B in space where the light from the rst point is to be observed. Under ordinary circumstances, only one of the many rays leaving point A will pass through the point B. Fermats principle states that the ray crossing the second point takes the path that requires the least time to travel between the two points. It should be noted that Fermats principle, as we have written it, does not work for anisotropic media such as crystals where n depends on the direction of a ray as well as on its location (see P 9.4). To nd the correct path for the light ray that leaves point A and crosses point B, we need only minimize the optical path length between the two points. Minimizing the optical path length is equivalent to minimizing the time of travel since it differs from the time of travel only by the constant c . The optical path length is not the actual distance that the light travels; it is proportional to the number of wavelengths that t into that distance (see (2.27)). Thus, as the wavelength shortens due to a higher index of refraction, the optical path length increases. The correct ray traveling from A to B does not necessarily follow a straight line but can follow a complicated curve according to how the index varies. Example 9.2
Use Fermats principle to derive Snells law.
Pierre de Fermat (16011665, French) Fermat was a distinguished mathematician. He loved to publish results, but was often quite secretive about the methods used to obtain his results. Fermat was the rst to state that the path taken by a beam of light is the one that can be traveled in the least amount of time.
Solution: Consider the many rays of light that leave point A seen in Fig. 9.3. Only one of the rays passes through point B. Within each medium we expect the light to travel in a straight line since the index is uniform. However, at the boundary we must allow for bending since the index changes.
Figure 9.3 Rays of light leaving point A; not all of them will traverse point B.
236
The optical path length between points A and B (in terms of the unknown coordinate of the point where the ray penetrates the interface) is OP L = n i x i2 + y i2 + n t
2 2 x t + yt
(9.17)
We need to minimize this optical path length to nd the correct one according to Fermats principle. Since points A and B are xed, we may regard x i and x t as constants. The distances y i and y t are not constants although the combination y tot = y i + y t is constant. Thus, we may rewrite (9.17) as OP L y i = n i x i2 + y i2 + n t
2 x t + y tot y i 2
(9.18)
(9.19)
where everything in the right-hand side of the expression is constant except for yi. We now minimize the optical path length by taking the derivative and setting it equal to zero: dOP L = ni d yi Notice that sin i = yi x i2 + y i2 yi x i2 + y i2 + nt y tot y i
2 x t + y tot y i 2
=0
(9.20)
and
sin t =
yt
2 2 x t + yt
(9.21)
When these are substituted into (9.20) we obtain n i sin i = n t sin t which is the familiar Snells law. (9.22)
Figure 9.4 Rays of light leaving point A with the same optical path length to B.
An imaging situation occurs when many paths from point A to point B have the same optical path length. An example of this occurs when a lens causes an image to form. In this case all rays leaving point A (on an object) and traveling through the system to point B (on the image) experience equal optical path lengths. This situation is depicted in Fig. 9.4. Note that while the rays traveling through the center of the lens have a shorter geometric path length, they travel through more material so that the optical path length is the same for all rays.
Example 9.3
Use Fermats principle to derive the equation of curvature for a reective surface that causes all rays leaving one point to image to another. Do the calculation in two dimensions rather than in three. This conguration is used in laser heads to direct ash lamp energy into the amplifying material. One point represents the
237
end of a long cylindrical laser rod and the other represents the end of a long ash lamp.
Solution: We adopt the convention that the origin is half way between the points, which are separated by a distance 2a , as shown in Fig. 9.5.
Figure 9.5 If the points are to image to each other, Fermats principle requires that the total path length be a constant, say b . By inspection of the gure, we obtain an equation describing the curvature of the reective surface b= ( x + a )2 + y 2 + ( x a )2 + y 2 (9.23)
To get (9.23) into a more recognizable form, we isolate the rst square root ( x + a )2 + y 2 = b square both sides of the equation ( x + a )2 + y 2 = b 2 + ( x a )2 + y 2 2 b ( x a )2 + y 2 , ( x a )2 + y 2 ,
and then carry out the square of two of the binomial terms x 2 + a 2 + 2ax + y 2 = b 2 + x 2 + a 2 2ax + y 2 2b ( x a )2 + y 2 .
Some nice cancelation occurs, and we gather the remaining non-square-rooted terms on the left 4ax b 2 = 2b ( x a )2 + y 2 .
We square both sides of the equation and carry out the square of the remaining binomial term to obtain 16a 2 x 2 4ab 2 x + b 4 = 4b 2 x 2 2ax + a 2 + y 2 , and then cancel and regroup terms to arrive at 16a 2 4b 2 x 2 4b 2 y = 4a 2 b 2 b 4 .
238
Finally, we divide both sides of the equation by the term on the right to obtain the (hopefully) familiar form of an ellipse x2
b2 4
y2
b2 4
a2
=1
9.4 Paraxial Rays and ABCD Matrices

In the remainder of this chapter we develop a formalism for describing the effects of mirrors and lenses on rays of light. Keep in mind that when describing light as a collection of rays rather than as waves, the results can only describe features that are macroscopic compared to a wavelength. The rays of light at each location in space describe approximately the direction of travel of the wave fronts at that location. Since the wavelength of visible light is extraordinarily small compared to the macroscopic features that we perceive in our day-to-day world, the ray approximation is often a very good one. This is the reason that ray optics was developed long before light was understood as a wave. We consider ray theory within the paraxial approximation, meaning that we restrict our attention to rays that are near and almost parallel to an optical axis of a system, say the z -axis. It is within this approximation that the familiar imaging properties of lenses occur. An image occurs when all rays from a point on an object converge to a corresponding point on what is referred to as the image. To the extent that the paraxial approximation is violated, the clarity of an image can suffer, and we say that there are aberrations present. Very often in the eld of optical engineering, one is primarily concerned with minimizing aberrations in cases where the paraxial approximation is not strictly followed. This is done so that, for example, a camera can take pictures of subjects that occupy a fairly wide angular eld of view, where rays violate the paraxial approximation. Optical systems are typically engineered using the science of ray tracing, which is described briey in section 9.9. As we develop paraxial ray theory, we should remember that rays impinging on devices such as lenses or curved mirrors should strike the optical component at near normal incidence. To quantify this statement, the paraxial approximation is valid to the extent that we have sin = and similarly tan = (9.25) Here, the angle (in radians) represents the angle that a particular ray makes with respect to the optical axis. There is an important mathematical reason for this approximation. The sine is a nonlinear function, but at small angles it is approximately linear and can be represented by its argument. It is this linearity
(9.24)
9.4 Paraxial Rays and ABCD Matrices
239
Figure 9.6 The behavior of a ray as light traverses a distance d .
that is crucial to the process of forming images. The linearity also greatly simplies the formulation since it reduces the problem to linear algebra. Conveniently, we will be able to keep track of imaging effects with a 22 matrix formalism. Consider a ray conned to the y z plane where the optical axis is in the z direction. Let us specify a ray at position z 1 by two coordinates: the displacement from the axis y 1 and the orientation angle 1 (see Fig. 9.6). The ray continues along a straight path as it travels through a uniform medium. This makes it possible to predict the coordinates of the same ray at other positions, say at z 2 . The connection is straightforward. First, since the ray continues in the same direction, we have 2 = 1 (9.26) By referring to Fig. 9.6 we can write y 2 in terms of y 1 and 1 : y 2 = y 1 + d tan 1 (9.27)
where d z 2 z 1 . Equation (9.27) is nonlinear in 1 . However, in the paraxial approximation (9.25) it becomes linear, which after all is the point of the approximation. In this approximation the expression for y 2 becomes y 2 = y 1 + d 1 (9.28)
Equations (9.26) and (9.28) describe a linear transformation which in matrix notation can be consolidated into the form y2 2 = 1 d 0 1 y1 1 (propagation through a distance d ) (9.29)
Here, the vectors in this equation specify the essential information about the ray before and after traversing the distance d , and the matrix describes the effect of traversing the distance. This type of matrix is called an ABCD matrix. Suppose that the distance d is subdivided into two distances, a and b , such that d = a + b . If we consider individually the effects of propagation through a and through b , we have y mid mid y2 2
= =
1 0
a 1
y1 1 y mid mid
1 b 0 1
(9.30)
240
Figure 9.7 A ray depicted in the act of reection from a curved surface.
where the subscript mid refers to the ray in the middle position after traversing the distance a . If we combine the equations, we get y2 2 = 1 b 0 1 1 0 a 1 y1 1 (9.31)
which is in complete agreement with (9.29) since the ABCD matrix for the entire displacement is A C B D = 1 b 0 1 1 0 a 1 = 1 0 a +b 1 (9.32)
9.5 Reection and Refraction at Curved Surfaces

We next consider the effect of reection from a spherical surface as depicted in Fig. 9.7. We consider only the act of reection without considering propagation before or after the reection takes place. Thus, the incident and reected rays in the gure are symbolic only of the direction of propagation before and after reection; they do not indicate any amount of travel. Upon reection we have y2 = y1 (9.33)
since the ray has no chance to go anywhere. We adopt the widely used convention that, upon reection, the positive z direction is reoriented so that we consider the rays still to travel in the positive z sense. Notice that in Fig. 9.7, the reected ray approaches the z -axis. In this case 2 is a negative angle (as opposed to 1 which is drawn as a positive angle) and is equal to 2 = (1 + 2i ) (9.34)
241
where i is the angle of incidence with respect to the normal to the spherical mirror surface. By the law of reection, the reected ray also occurs at an angle i referenced to the surface normal. The surface normal points towards the center of curvature, which we assume is on the z -axis a distance R away. By convention, the radius of curvature R is a positive number if the mirror surface is concave and a negative number if the mirror surface is convex. We must eliminate i from (9.34) in favor of 1 and y 1 . By inspection of Fig. 9.7 we can write y1 = sin (9.35) = R where we have applied the paraxial approximation (9.24). (Note that the angles in the gure are exaggerated.) We also have = 1 + i and when this is combined with (9.35), we get i = y1 1 R (9.37) (9.36)
With this we are able to put (9.34) into a useful linear form: 2 = 2 y 1 + 1 R (9.38)
Equations (9.33) and (9.38) describe a linear transformation that can be concisely formulated as y2 2 = 1 2/R 0 1 y1 1 (concave mirror) (9.39)
The ABCD matrix in this transformation describes the act of reection from a concave mirror with radius of curvature R . The radius R is negative when the mirror is convex. The nal basic element that we shall consider is a spherical interface between two materials with indices n i and n t (see Fig. 9.8). This has an effect similar to that of the curved mirror, which changes the direction of a ray without altering its distance y 1 from the optical axis. Please note that here the radius of curvature is considered to be positive for a convex surface (opposite convention from that of the mirror). Again, we are interested only in the act of transmission without any travel before or after the interface. As before, (9.33) applies (i.e. y 2 = y 1 ). To connect 1 and 2 we must use Snells law which in the paraxial approximations is n i i = n t t (9.40) As seen in the Fig. 9.8, we have i = 1 +
(9.41)
242
Figure 9.8 A ray depicted in the act of transmission at a curved material interface.
and t = 2 + (9.42) As before, (9.35) applies (i.e. = y 1 /R ). When this is used in (9.41) and (9.42), Snells law (9.40) becomes 2 = ni y 1 ni 1 + 1 nt R nt (9.43)
The compact matrix form of (9.33) and (9.43) turns out to be y2 2 = 1 (n i /n t 1) /R 0 n i /n t y1 1 (from n i to n t ; interface radius R )
(9.44) In summary, we have developed three basic ABCD matrices seen in (9.29), (9.39), and (9.44). All other ABCD matrices that we will use are composites of these three. For example, one can construct the ABCD matrix for a lens by using two matrices like those in (9.44) to represent the entering and exiting surfaces of the lens. A distance matrix (9.29) can be inserted to account for the thickness of the lens. It is left as an exercise to derive the ABCD matrix for such a thick lens (see P 9.6). The three ABCD matrices discussed can be used for many different composite systems. As another example, consider a ray that propagates through a distance a , followed by a reection from a mirror of radius R , and then propagates through a distance b . This example is depicted in Fig. 9.9. The vector depicting the nal ray in terms of the initial one is computed as follows: y2 2 = = 1 b 0 1 1 2b /R 2/R 1 2/R 0 1 1 0 a 1 y1 1 y1 1
a + b 2ab /R 1 2a /R
(9.45)
243
Figure 9.9 A ray that travels through a distance a , reects from a mirror, and then travels through a distance b .
The ordering of the matrices is important. The rst effect that the light experiences is the matrix to the right, in the position that rst operates on the vector representing the initial ray. We have continually worked within the y z plane as indicated in Figs. 9.69.9. This may have given the impression that it is necessary to work within that plane, or a plane containing the z -axis. However, within the paraxial approximation, our ABCD matrices are still valid for rays contained in planes that do not include the optical axis (as long as the rays are nearly parallel to the optical axis. Imagine a ray contained within a plane that is parallel to the y z plane but for which x > 0. One might be concerned that when the ray meets, for example, a spherically concave mirror, the radius of curvature in the perspective of the y z dimension might be different for x > 0 than for x = 0 (at the center of the mirror). This concern is actually quite legitimate and is the source of what is known as spherical aberration. Nevertheless, in the paraxial approximation the intersection with the curved mirror of all planes that are parallel to the optical axis always give the same curvature. To see why this is so, consider the curvature of the mirror in Fig. 9.7. As we move away from the mirror center (in either the x or y -dimension or some combination thereof), the mirror surface deviates to the left by the amount = R R cos (9.46)
In the paraxial approximation, we have cos = 1 2 /2. And since in this approximation we may also write = x 2 + y 2 R , (9.46) becomes = x2 + y 2 2R (9.47)
In the paraxial approximation, we see that the curve of the mirror is parabolic, and therefore separable between the x and y dimensions. That is, the curvature
244
in the x -dimension (i.e. /x = x /R ) is independent of y , and the curvature in the y -dimension (i.e. / y = y /R ) is independent of x . A similar argument can be made for a spherical interface between two media within the paraxial approximation. This allows us to deal conveniently with rays that have positioning and directional components in both the x and y dimensions. Each dimension can be treated separately without inuencing the other. Most importantly, the identical matrices, (9.29), (9.39), and (9.44), are used for either dimension. Figs. 9.69.9 therefore represent projections of the actual rays onto the y z plane. To complete the story, one would also need corresponding gures representing the projection of the rays onto the x z plane.
9.6 Image Formation by Mirrors and Lenses

Consider the example shown in Fig. 9.9 where a ray travels through a distance a , reects from a curved mirror, and then travels through a distance b . From (9.45) we know that the ABCD matrix for the overall process is A C B D = 1 2b /R 2/R a + b 2ab /R 1 2a /R (9.48)
As is well known, it is possible to form an image with a concave mirror. Suppose that the initial ray is one of many which leaves a point on an object positioned at d o = a before the mirror. In order for an image to occur at d i = b , it is essential that all rays leaving the original point on the object converge to a single point on the image. That is, we want rays leaving the point y 1 on the object (which may take on a range of angles 1 ) all to converge to a single point y 2 at the image. In the following equation we need y 2 to be independent of 1 : y2 2 = A C B D y1 1 = Ay 1 + B 1 C y 1 + D 1 (9.49)
The condition for image formation is therefore B = 0 (condition for image formation) When this condition is applied to (9.48), we obtain do + di 2d o d i 2 1 1 =0 = + R R do di (9.51) (9.50)
which is the familiar imaging formula for a mirror, in agreement with (9.1). When the object is innitely far away (i.e. d o ), the image appears at d i R /2. This distance is called the focal length and is denoted by f = R 2 (focal length of a mirror) (9.52)
9.6 Image Formation by Mirrors and Lenses
245
Please note that d o and d i can each be either positive (real as depicted in Fig. 9.9) or negative (virtual or behind the mirror). The magnication of the image is found by comparing the size of y 2 to y 1 . From (9.48)(9.51), the magnication is found to be M y2 2d i di = A = 1 = y1 R do (9.53)
The negative sign indicates that for positive distances d o and d i the image is inverted. Another common and very useful example is that of a thin lens, where we ignore the thickness between the two surfaces of the lens. Using the ABCD matrix in (9.44) twice, we nd the overall matrix for the thin lens is A C B D = = 1 0 (n 1) n 1 (n 1)
1 R1 1 R 2
1
1 R1 1 n
0 1
1 n
1 R2
0 1
(9.54)
(Thin Lens)
where we have taken the index outside of the lens to be unity while that of the lens material to be n . R 1 is the radius of curvature for the rst surface which is positive if convex, and R 2 is the radius of curvature for the second surface which is also positive if convex from the perspective of the rays which encounter it. Notice the close similarity between (9.54) and the matrix in (9.39). The ABCD matrix for either a thin lens or a mirror can be written as A C B D = 1 1/ f 0 1 (9.55)
where in the case of the thin lens the focal length is given by the lens makers formula 1 1 1 = (n 1) (focal length of thin lens) (9.56) f R1 R2 All of the arguments about image formation given above for the curved mirror work equally well for the thin lens. The only difference is that the focal length (9.56) is used in place of (9.52). That is, if we consider a ray traveling though a distance d o impinging on a thin lens whose matrix is given by (9.55), and then afterwards traveling a distance d i , the overall ABCD matrix is exactly like that in (9.48): A C B D = 1 di / f 1/ f do + di do di / f 1 do / f (9.57)
When we use the imaging condition (9.50), the imaging formula (9.1) emerges naturally.
246
9.7 Image Formation by Complex Optical Systems

A complicated series of optical elements (e.g. a sequence of lenses and spaces) can be combined to form a composite imaging system. The matrices for each of the elements are multiplied together (the rst element that rays encounter appearing on the right) to form the overall composite ABCD matrix. We can study the imaging properties of a composite ABCD matrix by combining the matrix with the matrices for the distances from an object to the system and from the system to the image formed: 1 di 0 1 A C B D 1 do 0 1 = = A + d iC C A C B D d o A + B + d o d iC + d i D d oC + D
(9.58)
Galileo Galilei (15641642, Italian) While Galileo did not invent the telescope, he was one of the few people of his time who knew how to build one. He also constructed a compound microscope. He attempted to measure the speed of light by having his assistant position himself on a distant hill and measuring the time it took for his assistant to uncover a lantern in response to a light signal. He was, of course, unable to determine the speed of light. His conclusion was that light is really fast if not instantaneous.
Imaging occurs according to (9.50) when B = 0, or d o A + B + d o d iC + d i D = 0, with magnication M = A + d iC (9.60) (general condition for image formation) (9.59)
There is a convenient way to simplify this analysis. For every ABCD matrix representing a (potentially) complicated optical system, there exist two principal planes located (in our convention) a distance p 1 before entering the system and a distance p 2 after exiting the system. When the matrices corresponding to the (appropriately chosen) distances to those planes are appended to the original ABCD matrix of the system, the overall matrix simplies to one that looks like the matrix for a simple thin lens (9.55). With knowledge of the positions of the principal planes, one can treat the complicated imaging system in the same way that one treats a simple thin lens. The only difference is that d o is the distance from the object to the rst principal plane and d i is the distance from the second principal plane to the image. (In the case of an actual thin lens, both principal planes are at p 1 = p 2 = 0. For a composite system, p 1 and p 2 can be either positive or negative.)
Figure 9.10 A multi-element system represented as an ABCD matrix for which principal planes always exist.
9.7 Image Formation by Complex Optical Systems
247
Next we demonstrate that p 1 and p 2 can always be selected such that we can write 1 0 p2 1 A C B D 1 0 p1 1 = = A + p 2C C 1 1/ f eff p 1 A + B + p 1 p 2C + p 2 D p 1C + D 0 1
(9.61) The nal matrix is that of a simple thin lens, and it takes the place of the composite system including the distances to the principal planes. Our task is to nd the values of p 1 and p 2 that make this matrix replacement work. We must also prove that this replacement is always possible for physically realistic values for A , B , C , and D . We can straightaway make the denition f eff 1/C (9.62)
We can also solve for p 1 and p 2 by setting the diagonal elements of the matrix to 1. Explicitly, we get 1D p 1C + D = 1 p 1 = (9.63) C and A + p 2C = 1 p 2 = 1 A C (9.64)
It remains to be shown that the upper right element in (9.61) (i.e. p 1 A + B + p 1 p 2C + p 2 D ) automatically goes to zero for our choices of p 1 and p 2 . This may seem unlikely at rst, but we can invoke an important symmetry in the matrix to show that it does in fact vanish for our choices of p 1 and p 2 . When (9.63) and (9.64) are substituted into the upper right matrix element of (9.61) we get p 1 A + B + p 1 p 2C + p 2 D = 1D 1D 1 A 1 A A +B + C+ D C C C C 1 = [1 AD + BC ] C 1 A B = 1 C D C
(9.65)
This equation shows that the upper right element of (9.61) vanishes when the determinant of the original ABCD matrix equals one. Fortunately, this is always the case as long as we begin and end in the same index of refraction. Therefore, we have A B =1 (9.66) C D Notice that the determinants of the matrices in (9.29), (9.39), and (9.55) are all one, and so ABCD matrices constructed of these will also have determinants equal to
248
1 d 0 1 1 d /n 0 1 1 1/ f 0 1
(Distance within any material, excluding interfaces)
(Window, starting and stopping in air)
(Thin lens or a mirror with f = R /2)

d 1+ R 1 1 n
1
d 1 R 2
d n 1 n
(1 n )
1 R1
1 d 1 R + R1 R2 2 n n 2
(Thick lens)
Table 9.1 Summary of ABCD matrices for common optical elements.
one. The determinant of (9.44) is not one. This is because it begins and ends in different indices, but when this matrix is used in succession to form a lens or even a strange conglomerate of successive material interfaces, the resulting matrix will have a determinant equal to one as long as the beginning and ending indices are the same. Table 9.1 is a summary of ABCD matrices of common optical elements. All of the matrices obey (9.66).
9.8 Stability of Laser Cavities

As a nal example of the usefulness of paraxial ray theory, we apply the ABCD matrix formulation to a laser cavity. The basic elements of a laser cavity include an amplifying medium and mirrors to provide feedback. Presumably, at least one of the end mirrors is partially transmitting so that energy is continuously extracted from the cavity. Here, we dispense with the amplifying medium and concentrate our attention on the optics providing the feedback. As might be expected, the mirrors must be carefully aligned or successive reections might cause rays to walk continuously away from the optical axis, so that they eventually leave the cavity out the side. If a simple cavity is formed with two at mirrors that are perfectly aligned parallel to each other, one might suppose that the mirrors would provide ideal feedback. However, all rays except for those that are perfectly aligned to the mirror surface normals eventually wander out of the side of the cavity as illustrated in Fig. 9.11a. Such a cavity is said to be unstable. We would like to do a better job of trapping the light in the cavity. To improve the situation, a cavity can be constructed with concave end mirrors to help conne the beams within the cavity. Even so, one must choose carefully the curvature of the mirrors and their separation L . If this is not done
9.8 Stability of Laser Cavities
249
Figure 9.11 (a) A ray bouncing between two parallel at mirrors. (b) A ray bouncing between two curved mirrors in an unstable conguration. (c) A ray bouncing between two curved mirrors in a stable conguration. (d) Stable cavity utilizing a lens and two at end mirrors.
correctly, the curved mirrors can overcompensate for the tendency of the rays to wander out of the cavity and thus aggravate the problem. Such an unstable scenario is depicted in Fig. 9.11b. Figure 9.11c depicts a cavity made with curved mirrors where the separation L is chosen appropriately to make the cavity stable. Although a ray, as it makes successive bounces, can strike the end mirrors at a variety of points, the curvature of the mirrors keeps the trajectories contained within a narrow region so that they cannot escape out the sides of the cavity. There are many ways to make a stable laser cavity. For example, a stable cavity can be made using a lens between two at end mirrors as shown in Fig. 9.11d. Any combination of lenses (perhaps more than one) and curved mirrors can be used to create stable cavity congurations. Ring cavities can also be made to be stable where in no place do the rays retro-reect from a mirror but circulate through a series of elements like cars going around a racetrack. We now nd the conditions that have to be met in order for a cavity to be stable. The ABCD matrix for a round trip in the cavity is useful for this analysis.
250
For example, the round-trip ABCD matrix for the cavity shown in Fig. 9.11c is A C B D = 1 L 0 1 1 2/R 2 0 1 1 L 0 1 1 2/R 1 0 1 (9.67)
where we have begun the round trip just after a reection from the rst mirror. The round-trip ABCD matrix for the cavity shown in Fig. 9.11d is A C B D = 1 2L 1 0 1 1 1/ f 0 1 1 2L 2 0 1 1 1/ f 0 1 (9.68)
where we have begun the round trip just after a transmission through the lens moving to the right. It is somewhat arbitrary where the round trip begins. To determine whether a given conguration of a cavity will be stable, we need to know what a ray does after making many round trips in the cavity. To nd the effect of propagation through many round trips, we multiply the round-trip ABCD matrix together N times, where N is the number of round trips that we wish to consider. We can then examine what happens to an arbitrary ray after making N round trips in the cavity as follows: y N +1 N +1 = A C B D
N
y1 1
(9.69)
At this point students might be concerned that taking an ABCD matrix to the N th power can be a lot of work. (It is already a signicant amount of work just to compute the ABCD matrix for a single round trip.) In addition, we are interested in letting N be very large, perhaps even innity. Students can relax because we have a neat trick to accomplish this daunting task. We use Sylvesters theorem from appendix 0.5, which states that if A C then A C where B D
N
B D
=1
(9.70)
1 sin
A sin N sin (N 1) C sin N cos =
(9.71)
1 (9.72) (A + D) . 2 As we have already discussed, (9.70) is satised if the refractive index is the same before and after, which is guaranteed for any round trip. We therefore can employ Sylvesters theorem for any N that we might choose, including very large integers. We would like the elements of (9.71) to remain nite as N becomes very large. If this is the case, then we know that a ray remains trapped within the cavity and stays reasonably close to the optical axis. Since N only appears within the argument of a sine function, which is always bounded between 1 and 1 for
9.9 Aberrations and Ray Tracing
251
real arguments, it might seem that the elements of (9.71) always remain nite as N approaches innity. However, it turns out that can become imaginary depending on the outcome of (9.72), in which case the sine becomes a hyperbolic sine, which can blow up as N becomes large. In the end, the condition for cavity stability is that a real must exist for (9.72), or in other words we need 1 < 1 (A + D) < 1 2 (condition for a stable cavity) (9.73)
It is left as an exercise to apply this condition to (9.67) and (9.68) to nd the necessary relationships between the various element curvatures and spacing in order to achieve cavity stability.

The paraxial approximation places serious limitations on the performance of optical systems (see (9.24) and (9.25)). To stay within the approximation, all rays traveling in the system should travel very close to the optic axis with very shallow angles with respect to the optical axis. To the extent that this is not the case, the collection of rays associated with a single point on an object may not converge to a single point on the associated image. The resulting distortion or blurring of the image is known as aberration. Common experience with photographic and video equipment suggests that it is possible to image scenes that have a relatively wide angular extent (many tens of degrees), in apparent serious violation of the paraxial approximation. The paraxial approximation is indeed violated in these devices, so they must be designed using more complicated analysis techniques than those we have learned in this chapter. The most common approach is to use a computationally intensive procedure called ray tracing in which sin and tan are rendered exactly. The nonlinearity of these functions precludes the possibility of obtaining analytic solutions describing the imaging performance of such optical systems. The typical procedure is to start with a collection of rays from a test point such as shown in Fig. 9.12. Each ray is individually traced through the system using the exact representation of geometric surfaces as well as the exact representation
Figure 9.12 Ray tracing through a simple lens.

252
Figure 9.13 Chromatic abberation causes lenses to have different focal lengths for different wavelengths. It can be corrected using an achromatic doublet lens.
of Snells law. On close analysis, the rays typically do not converge to a distinct imaging point. Rather, the rays can be blurred out over a range of points where the image is supposed to occur. Depending on the angular distribution of the rays as well as on the elements in the setup, the spread of rays around the image point can be large or small. The engineer who designs the system must determine whether the amount of aberration is acceptable, given the various constraints of the device. To minimize aberrations below typical tolerance levels, several lenses can be used together. If properly chosen, the lenses (some positive, some negative) separated by specic distances, can result in remarkably low aberration levels over certain ranges of operation for the device. Ray tracing is best done with commercial software designed for this purpose (e.g. Zemax or other professional products). Such software packages are able to develop and optimize designs for specic applications. A nice feature is that the user can specify that the design should employ only standard optical components available from known optics companies. In any case, it is typical to specify that all lenses in the system should have spherical surfaces since these are much less expensive to manufacture. We mention briey a few types of aberrations that you may encounter. Multiple aberrations can often be observed in a single lens. Chromatic abberation arises from the fact that the index of refraction for glass varies with the wavelength of light. Since the focal length of a lens depends on the index of refraction (see, for example, Eq. (9.56)), the focal length of a lens varies with the wavelength of light. Chromatic abberation can be compensated for by using a pair of lenses made from two types of glass as shown in Fig. 9.13 (the pair is usually cemented together to form a doublet lens). The lens with the
253
Figure 9.14 (a) Paraxial theory predicts that the light imaged from a point source will converge to a point (i.e. have spherical wave fronts coming to the image point). (b) The image of a point source made by a real lens is an extended and blurred patch of light and the converging wavefronts are only quasi-spherical.
shortest focal length is made of the glass whose index has the lesser dependence on wavelength. By properly choosing the prescription of the two lenses, you can exactly compensate for chromatic abberation at two wavelengths and do a good job for a wide range of others. Achromatic doublets can also be designed to minimize spherical abberation (see below), so they are often a good choice when you need a high quality lens. Monochromatic abberations arise from the shape of the lens rather than the variation of n with wavelength. Before the advent computers facilitated the widespread use of ray tracing, these abberations had to be analyzed primarily with analytic techniques. The analytic results derived previously in this chapter were based on rst order approximations (e.g. sin ). This analysis predicts that a lens can image a point source to an exact image point, which predicts spherically converging wavefronts at the image point as shown in Fig. 9.14(a). You can increase the accuracy of the theory for non-paraxial rays by retaining secondorder correction terms in the analysis. With these second-order terms included, the wave fronts converging towards an image point are mostly spherical, but have second-order abberation terms added in (shown conceptually in Fig. 9.14(b)). There are ve abberation terms in this second-order analysis, and these represent a convenient basis for discussing abberation.
Figure 9.15 Spherical abberation in a plano-convex lens.

254
Figure 9.16 Illustration of coma. Rays traveling through the center of the lens are imaged to point a as predicted by paraxial theory. Rays that travel through the lens at radius b in the plane of the gure are imaged to point b . Rays that travel through the lens at radius b , but outside the plane of the gure are imaged to other points on the circle (in the image plane) containing point b . Rays at that travel through the lens at other radii on the lens (e.g. c ) also form circles in the image plane with radius proportional to 2 with the center offset from point a a distance proportional to 2 . When light from each of these circles combines on the screen it produces an imaged point with a comet tail.
The rst abberation term is known as spherical abberation. This type of abberation results from the fact that rays traveling through a spherical lens at large radii experience a different focal length than those traveling near the axis. For a converging lens, this causes wide-radius rays to focus before the near-axis rays as shown in Fig. 9.15. This problem can be helped by orienting lenses so that the face with the least curvature is pointed towards the side where the light rays have the largest angle. This procedure splits the bending of rays more evenly between the front and back surface of the lens. As mentioned above, you can also cement two lenses made from different types of glass together so that spherical abberations from one lens are corrected by the other. The abberation term referred to as astigmatism occurs when an off-axis object point is imaged to an off-axis image point. In this case a spherical lens has a different focal length in the horizontal and vertical dimensions. For a focusing lens this causes the two dimensions to focus at different distances, producing a vertical line at one image plane and a horizontal line at another. A lens can also be inherently astigmatic even when viewed on axis if it is football shaped rather than spherical. In this case, the astigmatic abberation can be corrected by inserting a cylindrical lens at the correct orientation (this is a common correction needed in eyeglasses). A third abberation term is referred to as coma. This is observed when off-axis points are imaged and produces a comet shaped tail with its head at the point predicted by paraxial theory. (The term coma refers to the atmosphere of a comet, which is how the abberation got its name.) This abberation is distinct from astigmatism, which is also observed for off-axis points, since coma is observed even when all of the rays are in one plane (see Fig. 9.16). You have probably seen coma if youve ever played with a magnifying glass in the sunjust tilt the lens slightly and you see a comet-like image rather than a point. The curvature of the eld abberation term arises from the fact that spherical
255
Figure 9.17 Distortion occurs when magnication is not constant across an extended image
lenses image spherical surfaces to another spherical surface, rather than imaging a plane to a plane. This is not so bad for your eyeball, which has a curved screen, but for things like cameras and movie projectors we would like to image to a at screen. When a at screen is used and the curvature of the eld abberation is present, the image will be focus well near the center, but become progressively out of focus as you move to the edge of the screen (i.e. the at screen is further from the curved image surface as you move from the center). The nal abberation term is referred to as distortion. This abberation occurs when the magnication of a lens depends on the distance from the center of the screen. If magnication decreases as the distance from the center increases, then barrel distortion is observed. When magnication increases with distance, pincushion distortion is observed (see Fig. 9.17). All lenses will exhibit some combination of the abberations listed above (i.e. chromatic abberation plus the ve second-order abberation terms). In addition to the ve named monochromatic abberations, there are many other higher order abberations that also have to be considered. Abberations can be corrected to a high degree with multiple-element systems (designed using ray-tracing techniques) composed of lenses and irises to eliminate off-axis light. For example, a camera lens with a focal length of 50 mm, one of the simplest lenses in photography, is typically composed of about six individual elements. However, optical systems never completely eliminate all abberation, so designing a system always involves some degree of compromise in choosing which abberations to minimize and which ones you can live with.
256
Exercises
Exercises for 9.2 The Eikonal Equation P9.1 (a) Suppose that a region of air above the desert on a hot day has an index of refraction that varies with height y according to n y = n 0 1 + y 2 /h 2 . Show that R x , y = n 0 x n 0 y 2 /2h is a solution of the eikonal equation (9.9). as a function of y . (b) Give an expression for s for y = h , y = h /2, and y = h /4. Represent these vectors (c) Compute s graphically and place them sequentially point-to-tail to depict how the light bends as it travels. P9.2 Prove that under the approximation of very short wavelength, the . First work through the Poynting vector is directed along R (r) or s partial solution provided below (write your work down), then nish as directed.
Solution: (partial) First, from Faradays law (1.44) we have B(r, t ) = i E0 (r)e i (kvac R (r)t )
Applying the identity a = ( a) + a to this equation, we obtain: B(r, t ) = i i (kvac R (r)t ) e [ E0 (r)] + i k vac e i (kvac R (r)t ) [R (r) E0 (r)] i vac i [kvac R (r)t ] 1 e = [ E0 (r)] e i [kvac R (r)t ] [R (r) E0 (r)] 2 c c
The rst term vanishes in the limit of very short wavelength, and we have: 1 B(r, t ) [R (r)] E0 (r) e i [kvac R (r)t ] . c Next, from Gausss law (1.42) and the constitutive relation (2.16) we have 1 + (r) E0 (r)e i (kvac R (r)t ) = 0 Applying the identity (a) = a + a to this expression yields: e i (kvac R (r)t ) 1 + (r) E0 (r) + i k vac e i (kvac R (r)t ) 1 + (r) [R (r) E0 (r)] = 0 Canceling the common exponential term, using k vac = 2/vac , and a little algebra then gives i vac 1 + (r) E0 (r) 2 1 + (r) + R (r) E0 (r) = 0 (9.74)
In the limit of very short wavelength, this becomes R (r) E0 (r) 0 (9.75) 2004-2009 Peatross and Ware
Exercises
257
Finally, compute the time average of the Poynting vector S= 1 Re {E(r, t )} Re {B(r, t )} 0 1 = E (r, t ) + E (r, t ) B(r, t ) + B (r, t ) 40
You will need to employ expressions (9.74) and (9.75), as well as the BAC-CAB rule (see P 0.4).
Exercises for 9.3 Fermats Principle P9.3 Use Fermats Principle to derive the law of reection (3.6) for a reective surface. HINT: Do not consider light that goes directly from A to B; require a single bounce.
Figure 9.18 P9.4 Show that Fermats Principle fails to give the correct path for an extraordinary ray entering a uniaxial crystal whose optic axis is perpendicular to the surface. HINT: With the index given by (5.41), show that Fermats principle leads to an answer that neither agrees with the direction of the k-vector (5.46) nor with the direction of the Poynting vector (5.55).
Exercises for 9.5 Reection and Refraction at Curved Surfaces P9.5 Derive the ABCD matrix that takes a ray on a round trip through a simple laser cavity consisting of a at mirror and a concave mirror of radius R separated by a distance L . HINT: Start at the at mirror. Use the matrix in (9.29) to travel a distance L . Use the matrix in (9.39) to represent reection from the curved mirror. Then use the matrix in (9.29) to return to the at mirror. The matrix for reection from the at mirror is the identity matrix (i.e. R at ). Derive the ABCD matrix for a thick lens made of material n 2 surrounded by a liquid of index n 1 . Let the lens have curvatures R 1 and R 2 and thickness d .
P9.6
258
Answer: A C B D = n2 1 1
n n1 n2 1 n1 n2 1 1 d + R1 R2 R1 R2 2 n2 n1 d 1+ R
1
d n1
d 1 R
2
2 n1 n2
Exercises for 9.6 Image Formation by Mirrors and Lenses P9.7 (a) Show that the ABCD matrix for a thick lens (see P 9.6) reduces to that of a thin lens (9.55) when the thickness goes to zero. Take the index outside of the lens to be n 1 = 1. (b) Find the ABCD matrix for a thick window (thickness d ). Take the index outside of the window to be n 1 = 1. HINT: A window is a thick lens with innite radii of curvature. P9.8 An object is placed in front of a concave mirror. Find the location of the image d i and magnication M when d o = R , d o = R /2, d o = R /4, and d o = R /2 (virtual object). Make a diagram for each situation, depicting rays traveling from a single off-axis point on the object to a corresponding point on the image. You may want to emphasize especially the ray that initially travels parallel to the axis and the ray that initially travels in a direction intersecting the axis at the focal point R /2. An object is placed in front of a concave mirror. Find the location of the image d i and magnication M when d o = 2 f , d o = f , d o = f /2, and d o = f (virtual object). Make a diagram for each situation, depicting rays traveling from a single off-axis point on the object to a corresponding point on the image. You may want to emphasize especially the ray that initially travels parallel to the axis and the ray that initially travels in a direction intersecting the axis at the focal point R /2.
P9.9
Exercises for 9.7 Image Formation by Complex Optical Systems P9.10 A complicated lens element is represented by an ABCD matrix. An object placed a distance d 1 before the unknown element causes an image to appear a distance d 2 after the unknown element.
Figure 9.19
Exercises
259
Suppose that when d 1 = , we nd that d 2 = 2 . Also, suppose that when d 1 = 2 , we nd that d 2 = 3 /2 with magnication 1/2. What is the ABCD matrix for the unknown element? HINT: Use the conditions for an image (9.59) and (9.60). If the index of refraction is the same before and after, then (9.66) applies. HINT: First nd linear expressions for A , B , and C in terms of D . Then put the results into (9.66). P9.11 (a) Consider a lens with thickness d = 5 cm, R 1 = 5 cm, R 2 = 10 cm, n = 1.5. Compute the ABCD matrix of the lens. HINT: See P 9.6. (b) Where are the principal planes located and what is the effective focal length f eff for this system?
Figure 9.20 L9.12 Deduce the positions of the principal planes and the effective focal length of a compound lens system. Reference the positions of the principal planes to the outside ends of the metal hardware that encloses the lens assembly.
Figure 9.21 HINT: Obtain three sets of distances to the object and image planes and place the data into (9.59) to create three distinct equations for the unknowns A, B, C, and D. Find A, B, and C in terms of D and place the results into (9.66) to obtain the values for A, B, C, and D. The effective focal length and principal planes can then be found through (9.62) (9.64). P9.13 Use a computer program to calculate the ABCD matrix for the compound system shown in Fig. 9.22, known as the Tessar lens. The
260
details of this lens are as follows (all distances are in the same units, and only the magnitude of curvatures are givenyou decide the sign): Convex-convex lens 1 (thickness 0.357, R 1 = 1.628, R 2 = 27.57, n = 1.6116) is separated by 0.189 from concave-concave lens 2 (thickness 0.081, R 1 = 3.457, R 2 = 1.582, n = 1.6053), which is separated by 0.325 from plano-concave lens 3 (thickness 0.217, R 1 = , R 2 = 1.920, n = 1.5123), which is directly followed by convex-convex lens 4 (thickness 0.396, R 1 = 1.920, R 2 = 2.400, n = 1.6116).
Figure 9.22
HINT: You can reduce the number of matrices you need to multiply by using the thick lens matrix.
Exercises for 9.8 Stability of Laser Cavities P9.14 (a) Show that the cavity depicted in Fig. 9.11c is stable if 0 < 1 L R1 1 L <1 R2
(b) The two concave mirrors have radii R 1 = 60 cm and R 2 = 100 cm. Over what range of mirror separation L is it possible to form a stable laser cavity? HINT: There are two different stable ranges with an unstable range between them. P9.15 Find the stable ranges for L 1 = L 2 = L for the laser cavity depicted in Fig. 9.11d with focal length f = 50 cm. Experimentally determine the stability range of a HeNe laser with adjustable end mirrors. Check that this agrees reasonably well with theory. Can you think of reasons for any discrepancy?
L9.16
Figure 9.23
Chapter 10
Diffraction
10.1 Huygens Principle
Christian Huygens developed a wave description for light in the 1600s. However, his ideas were largely overlooked at the time because of Sir Isaac Newtons rejection of the wave description in favor of his corpuscular theory. It was more than a century later that Thomas Young performed his famous two-slit experiment, conclusively demonstrating the wave nature of light. Even then, Youngs conclusions were not accepted for many years, a notable exception being a young Frenchman, Augustin Fresnel. The two formed a close friendship through correspondence, and it was Fresnel that followed up on Youngs conclusions and dedicated his life to a study of light. Fresnels skill as a mathematician allowed him to transform physical intuition into powerful and concise ideas. Perhaps Fresnels greatest accomplishment was the adaptation of Huygens principle into a mathematical formula. Ironically, it was Newtons calculus that made this possible and it settled the debate between the wave and corpuscular theories. Huygens principle asserts that a wave front can be thought of as many wavelets, which propagate and interfere to form new wave fronts. Diffraction is then understood as the spilling of wavelets around corners. Let us examine the calculus that Fresnel applied to the problem of summing up the contributions from the many wavelets originating in an aperture illuminated by a light eld. Each point in the aperture is thought of as a source of a spherical wave. In our modern notation, such a spherical wave can be written as proportional to e i kR /R , where R is the distance from the source. As a spherical wave propagates, its strength falls off in proportion to the distance traveled and the phase is related to the distance propagated, similar to the phase of a plane wave. Students should be aware that a spherical wave of the form e i kR /R is not a true solution to Maxwells equations1 (see P 10.2). Near R = 0, this type of wave
1 For simplicity, we use the term spherical wave in this book to refer to waves of the type imagined by Huygens (i.e. of the form e i kR /R ). There is a different family of waves based on spherical harmonics that are also sometimes referred to as spherical waves. These waves have angular as well as radial dependence, and they are solutions to Maxwells equations. For details see
Figure 10.1 Wave fronts depicted as a series of Huygens wavelets.
261
262
Chapter 10 Diffraction
Figure 10.2 A wave propagating through an aperture, giving rise to the eld at a point downstream.
wave is in fact a very poor solution to Maxwells equations. However, if R is much larger than a wavelength, this spherical wave satises Maxwells equations to a good approximation. In fact, under this approximation a spherical wave can actually be written as a superposition of many plane waves. This is the regime in which Fresnels diffraction formula (derived in this section and the next) is very successful. The idea is straightforward. Consider an aperture at z = 0 illuminated with a light eld distribution E (x , y , z = 0) within the aperture. Then for a point lying somewhere after the aperture, say at (x , y , z = d ), the net eld is given by adding together spherical waves emitted from each point in the aperture. Each spherical wavelet takes on the strength and phase of the eld at the point where it originates. Mathematically, this summation takes the form E (x , y , z = d ) = where R= (x x )2 + ( y y )2 + d 2 (10.2) is the radius of each wavelet as it individually intersects the point x , y , z = d . The constant i / in front of the integral in (10.1) ensures the right phase and eld strength. We will see how these factors arise in section 10.2. It should be noted that (10.1) considers only a single wavelength of light (i.e. one frequency). The Fresnel diffraction formula, (10.1), is extremely successful. It was developed a half century before Maxwell assembled his equations. In 1887, Gustav Kirchhoff justied Fresnels diffraction formula in the context of Maxwells equations. In doing this he clearly showed the approximations implicit in the theory, and showed that the formula needs to be slightly modied to E x , y, z = d = i E x ,y ,z = 0
aperture
E (x , y , z = 0)
aperture
e i kR dx dy R
(10.1)
) e i kR 1 + cos (r, z dx dy R 2
(10.3)
pp. 429432 of Jacksons Classical Electrodynamics, 3rd Ed. (Ref. [2]).

10.2 Scalar Diffraction
263
) The additional factor in square brackets is known as obliquity factor (cos(r, z indicates the cosine of the angle between r and z). Notice that this factor is approximately equal to one when the point x , y , z = d is chosen to be in the far-forward direction, and we usually study elds where this approximation holds. The obliquity factor is equal to zero in the case that the eld travels in the back direction). This xes a problem with Fresnels earlier wards direction (i.e. in the z version (10.1) based on Huygens wavelets, which suggests that light can diffract backwards as easily as forwards. In honor of Kirchhoffs work, the formula is now often called the Fresnel-Kirchhoff diffraction formula. The details of Kirchhoffs derivation are given in Appendix 10.B. Section 10.2 gives a less rigorous derivation, which resorts to the paraxial approximation of the wave equation. In section 10.3, we discuss Babinets principle, which is a superposition principle for masks and apertures that create diffraction. In section 10.4, we examine Fresnels approximation made to his own formula (10.1) and nd that it is analogous to the paraxial approximation. In section 10.5, we examine the Fraunhofer approximation, a more extreme approximation that only applies to the eld at a very large distance after the aperture. We further examine the diffraction integral (in either the Fresnel or the Fraunhofer approximation) in the case of cylindrical symmetry in section 10.6.
10.2 Scalar Diffraction

Consider a light eld with a single frequency . The light eld can be represented by E (r) e i t which must obey the wave equation 2 E (r) e i t n2 2 e i t E r =0 ( ) c2 t 2 (10.4)
Since the temporal part of the eld is written explicitly, the time derivative in (10.4) can be performed easily, and the equation reduces to 2 E (r) + k 2 E (r) = 0 (10.5)
where k n /c is the magnitude of the usual wave vector. Equation (10.5) is called the Helmholtz equation. It is the wave equation written for the case of a single frequency, where the trivial time dependence has been removed from the equation. To obtain the full wave solution, the factor e i t is simply appended to the solution of the Helmholtz equation E (r). At this point it is convenient to make a signicant approximation. We ignore the vectorial nature of (10.5) and consider only the magnitude of E(r). This is serious! When we use the Fresnel-Kirchhoff diffraction formula we must keep in mind that we have taken this unjustied procedure. The signicance of this approximation is discussed in appendix 10.A. Under the scalar approximation, (10.5) becomes the scalar Helmholtz equation: 2 E (r) + k 2 E (r) = 0
(10.6)
264
This equation of course is consistent with (10.5) in the case of a plane wave. However, we are interested in so-called spherical waves, which satisfy the vector Helmholtz equation (10.5) only approximately. We can get away with this approximation in the case of a spherical wave only when the radius r is large compared to a wavelength (i.e., kr 1) and when the angle is restricted to a narrow angle perpendicular to the polarization. This highlights an important limitation of the Fresnel-Kirchhoff diffraction formula (10.1), which is a solution to the scalar Helmholtz equation (10.6), but not to the vector Helmholtz equation (10.5). As mentioned in Section 10.6, the Fresnel-Kirchhoff diffraction formula (10.1) can be viewed as a superposition of spherical waves. It turns out that spherical waves of the form E (r ) = E 0 r 0 e i kr /r are exact solutions to the scalar Helmholtz equation, (10.6), the proof of which is left as an exercise (see P 10.3). It is therefore not surprising that the Fresnel-Kirchhoff formula satises the scalar Helmholtz equation (10.6). The full derivation of the Fresnel-Kirchhoff formula is deferred to Appendix 10.B. In this section, we will justify the diffraction formula within a simplied context. We will assume that the eld that propagates through the aperture is highly directional, such that it propagates mainly in the z -direction. This (x , y , z )e i kz . Upon substitution of motivates us to write the eld as E (x , y , z ) = E this into the scalar Helmholtz equation (10.6), we arrive at 2 E 2 E E 2 E + + 2 i k + e i kz = 0 x 2 y 2 z z 2
2
(10.7)
E | |. That is, we assume that the amplitude of the eld varies slowly in the z 2 z -direction such that the wave looks much like a plane wave. We permit the amplitude to change as the wave propagates in the z -direction as long as it does so on a scale much longer than a wavelength. This leads to the paraxial wave equation 2 2 + + 2i k E (10.8) =0 2 2 x y z the solution to which is (see P10.5)
E At this point we make the paraxial wave approximation, which is |2k z |
i (x , y , z ) E = z The eld is then given by
2 2 k (x , y , 0)e i 2z (x x ) +( y y ) d x d y E
(10.9)
(x , y , z )e i kz E (x , y , z ) = E i = z
(x , y , 0)e E
i k z+ (
x x
)2 +( y y )2
2z
dx dy
(10.10)
This equation agrees with the Fresnel-Kirchhoff formula (10.1) to the extent that R = z in the denominator of the integral and R =z+ ( x x )2 + ( y y )2 2z
10.3 Babinets Principle
265
in the exponent. As we shall see in Section 10.4, this is a good approximation.
10.3 Babinets Principle

Babinets principle amounts to a recognition of the linear properties of integration. The principle may be used when a diffraction aperture has a complicated shape so that it is more convenient to break up the diffraction integral (10.3) into several pieces. Students are already used to doing this sort of piecewise approach to integration in other settings. In fact, it is hardly worth giving a name to this approach; perhaps in Babinets day people were not as comfortable with calculus. As an example of how to use Babinets principle, suppose that we have an aperture that consists of a circular obstruction within a square opening as depicted in Fig. 10.3. Thus, the light transmits through the region between the circle and the square. One can evaluate the overall diffraction pattern by rst evaluating the diffraction integral for the entire square (ignoring the circular block) and then subtracting the diffraction integral for a circular opening having the shape of the block. This removes the unwanted part of the previous integration and yields the overall result. It is important to add and subtract the integrals (i.e. elds), not their squares (i.e. intensity). Remember that it is the electric elds that obey the primary superposition principle. As trivial as Babinets principle may seem to the modern student, the principle can also be used to determine diffraction in the shadows behind small obstructions in a wide stream of light. Keep in mind that the diffraction formula (10.3) was derived for nite apertures or openings in an innite opaque mask. It therefore may not be obvious that Babinets principle also applies to an innitely wide plane
Figure 10.3 Aperture comprised of the region between a circle and a square.
266
Figure 10.4 A block in a plane wave giving rise to diffraction in the geometric shadow.
wave that is interrupted by nite obstructions. In this case, one simply computes the diffraction of the blocked portions of the eld as though these portions were openings in a mask. This result is then subtracted from the uninterrupted eld, as depicted in Fig. 10.4. When Fresnel rst presented his diffraction formula to the French Academy of Sciences, a certain judge of scientic papers named Simeon Poisson noticed that the formula predicted that there should be light in the center of the geometric shadow behind a circular obstruction. This seemed so absurd that Fresnels work was initially disbelieved until the spot was shortly thereafter experimentally conrmed. Needless to say, Fresnels paper was then awarded rst prize, and this spot appearing behind circular blocks has since been known as Poissons spot.
10.4 Fresnel Approximation

The Fresnel-Kirchhoff diffraction formula (10.3) is valid as long as R and the size of the aperture are both signicantly larger than a wavelength. The formula becomes much simpler if we restrict its use to the far-forward direction so that the )]/2 is approximately equal to one. Even though the obliquity factor [1 + cos (r, z Fresnel-Kirchhoff integral looks simple (i.e. a clear implementation of Huygens superposition of spherical wavelets e i kR /R ), it is difcult to evaluate analytically. The integral can be difcult even if the eld E x , y , z = 0 is constant across the aperture. Fresnel introduced an approximation to his diffraction formula that makes the integration much easier to perform. The approximation is analogous to the paraxial approximation made for rays in chapter 9. Similarly, the Fresnel approximation requires the avoidance of large angles with respect to the z -axis. Besides setting the obliquity factor equal to one, Fresnel made the following simplication to the distance R given in (10.2). In the denominator of (10.3) he approximated R by the distance d . He thereby removed the dependence on x and y so that it can be brought out in front of the integral. This is valid to the
10.5 Fraunhofer Approximation
267
extent that we restrict ourselves to small angles: R =d (denominator only; paraxial approximation) (10.11)
This approximation is wholly inappropriate in the exponent of (10.3) since small changes in R can result in dramatic variations in e i kR . To approximate R in the exponent, we must proceed with caution. To this end we expand (10.2) under the assumption d 2 (x x )2 + ( y y )2 . Again, this is consistent with the idea of restricting ourselves to relatively small angles. The expansion of (10.2) is written as R= =d ( x x )2 + y y 1+
2
+ d2
2
( x x )2 + y y d2
2
(10.12)
2
x x = d 1+
+ yy
2d 2
(paraxial approximation)
Substitution of (10.11) and (10.12) into the Fresnel-Kirchhoff diffraction formula (10.3) and (10.2) yields ie E x , y, d =
x2+y 2) i kd i 2k d(
E x , y , 0 e i 2d ( x
aperture
+y
k ) e i d (xx + y y ) d x d y
(10.13) This formula is called the Fresnel approximation. It may seem rather complicated, but in terms of being able to perform the integration we are far better off than previously. Notice that the integral can be interpreted as a two-dimensional k 2 2 Fourier transform on E x , y , 0 e i 2d (x + y ) . The Fresnel approximation to the Fresnel-Kirchhoff formula (10.1) renders an expression which is identical to the exact solution of the paraxial wave equation.
10.5 Fraunhofer Approximation

An additional approximation to the diffraction integral was made famous by Joseph von Fraunhofer. The Fraunhofer approximation agrees with the Fresnel approximation in the limiting case when the eld is observed at a distance far after the aperture (called the far eld). The Fraunhofer approximation also requires small angles (i.e. the paraxial approximation). As the diffraction pattern continuously evolves along the z -direction it is described everywhere by the Fresnel approximation. However, it eventually evolves into a nal diffraction pattern that maintains itself as it continues to propogate (although it increases its size in proportion to distance). It is this far-away diffraction pattern that is obtained from the Fraunhofer approximation. In many textbooks, the Fraunhofer approximation is presented rst because the formula is easier to use. However, since it is a special case of the Fresnel
268
Figure 10.5 The Fraunhofer Approximation by Sterling Cornaby

10.6 Diffraction with Cylindrical Symmetry
269
approximation, it logically should be discussed afterwards as we are doing here. To obtain the diffraction pattern very far after the aperture, we make the following assumption: k 2 2 (far eld) (10.14) e i 2d ( x + y ) =1 This approximation depends on a comparison of the size of the aperture to the distance d where the diffraction pattern is observed. Thus, we need d k aperture radius 2
2
(condition for far eld)
(10.15)
By substituting (10.14) into (10.13), the Fraunhofer approximation yields ie E x , y, d =

E x , y , 0 e i d (xx + y y ) d x d y
aperture x 2+y i 2k d(
2
(10.16)
Joseph von Fraunhofer (17871826, German) Fraunhofer was orphaned at a young age and was apprenticed to a glass maker. He was treated harshly, but through the help of the Prince of Bavaria he eventually received a good education. He became expert at making optical devices, and invented the diraction grating. He was the rst to observe absorption lines in the suns spectrum. Fraunhofer passed away at a young age. This was not uncommon for glass makers of his time because of the heavy metal vapors associated with their trade.
) from the inteAs students will no doubt appreciate, the removal of e grand improves our ability to perform the integration. Notice that the integral can now be interpreted as a two-dimensional Fourier transform on the aperture eld E x , y , 0 . Once we are in the Fraunhofer regime, a change in d is not very interesting since it appears in the combination x /d or y /d inside the integral, which in the paraxial approximation indicates a small angle from the axis. At a larger distance d , the same angle is achieved with a proportionately larger value of x or y . The Fraunhofer diffraction pattern thus preserves itself forever as the eld propagates. It grows in size as the distance d increases, but the angular size dened by x /d or y /d remains the same.
10.6 Diffraction with Cylindrical Symmetry

Often the eld transmitted by an aperture is cylindrically symmetric. In this case, the eld at the aperture can be written as E (x , y , z = 0) = E ( , z = 0) (10.17)
where x 2 + y 2 . Under cylindrical symmetry, the two-dimensional integration over x and y in (10.13) or (10.16) can be reduced to a single-dimensional integral over a cylindrical coordinate . The Fresnel diffraction integral (10.13) in this situation is given by i e i kd e i E , z = d = d
k 2 2z
d
0
d E , z = 0 ei
k 2 2d
e i d [( cos )(
cos )+( sin )( sin
)]
aperture
(10.18)
270
where x = cos y = sin x = cos y = sin Notice that in the exponent of (10.18) we have cos cos + sin sin = cos With this simplication, the diffraction formula (10.18) can be written as i e i kd e i E , z = d = d
k 2 2d k 2 2d
(10.19)
(10.20)
d E ,z = 0 e
d e i
0
k d
cos(
aperture
(10.21) We are able to perform the integration over with the help of the formula (0.54)
2
k d
e i
0
cos(
) d = 2 J k 0 d
(10.22)
where J 0 is called the zero-order Bessel function. Equation (10.21) then reduces to 2i e i kd e i E , z = d = d
k 2 2d k 2 2d
d E , z = 0 ei
J0
aperture
k d
(10.23)
(Fresnel approximation with cylindrical symmetry) The integral in (10.23) is called a Hankel transform on E , z = 0 e i 2d . In the case of the Fraunhofer approximation, the diffraction integral becomes a Hankel transform on just the eld E , z = 0 since exp i Under cylindrical symmetry, the Fraunhofer approximation is 2i e i kd e i E , z = d = d
k 2 2d k 2
k 2 2d
goes to one.
d E , z = 0 J0
aperture
k d
(10.24)
(Fraunhofer approximation with cylindrical symmetry) Just as fast Fourier transform algorithms aid in the numerical evaluation of diffraction integrals in Cartesian coordinates, fast Hankel transforms exist and can be used with cylindrically symmetric diffraction integrals.
10.A Signicance of the Scalar Wave Approximation
271
Appendix 10.A Signicance of the Scalar Wave Approximation

As was mentioned in Sect. 10.2, the arbitrary replacement of the eld vector E with its scalar amplitude in the Helmholtz equation (10.6) is unjustied. Nevertheless, the solution of the scalar Helmholtz equation is not completely unassociated with the solution to the vector Helmholtz equation. In fact, if E scalar (r) obeys the scalar Helmholtz equation (10.6), then E (r) = r E scalar (r) (10.25)
obeys the vector Helmholtz equation (10.5). Consider a spherical wave, which is a solution to the scalar Helmholtz equation: E scalar (r) = E 0 r 0 e i kr /r (10.26) Remarkably, when this expression is placed into (10.25) the result is zero. Although zero is in fact a solution to the vector Helmholtz equation, it is not very interesting. A more interesting solution to the scalar Helmholtz equation is E scalar (r) = r 0 E 0 1 i e i kr cos kr r (10.27)
which is one of an innite number of solutions that exist. Notice that in the limit of large r , this expression looks similar to (10.26), aside from the factor cos . The vector form of this eld according to (10.25) is r0E0 1 E (r) = i e i kr sin kr r (10.28)
This eld looks approximately like the scalar spherical wave solution (10.26) in the limit of large r if the angle is chosen to lie near = /2 (spherical coordinates). Since our use of the scalar Helmholtz equation is in connection with this spherical wave under these conditions, the results are close to those obtained from the vector Helmholtz equation.
Appendix 10.B Fresnel-Kirchhoff Diffraction Formula

To begin our derivation of the Fresnel-Kirchhoff diffraction formula, we employ Greens theorem (proven in appendix 10.C): U
S
V U V da = n n
U 2V V 2U d v
V
(10.29)
The notation /n implies a derivative in the direction normal to the surface. We choose for the functions to be used in this formula V e i kr /r U E (r)
(10.30)
272
Figure 10.6 A two-part surface enclosing volume V .
where E (r) is assumed to satisfy the scalar Helmholtz equation, (10.6). When these functions are used in Greens theorem (10.29), we obtain E
S
e i kr e i kr E da = n r r n
E 2
V
e i kr e i kr 2 E dv r r
(10.31)
The right-hand side of this equation vanishes (as long as we exclude the point r = 0; see P 0.5 and P 0.6) since we have E 2 e i kr e i kr 2 e i kr e i kr 2 E = k 2 E + k E =0 r r r r (10.32)
where we have taken advantage of the fact that E (r) and e i kr /r both satisfy (10.6). This is exactly the reason for our judicious choices of the functions V and U since with them we were able to make half of (10.29) disappear. We are left with E
S
e i kr e i kr E da = 0 n r r n
(10.33)
Now consider a volume between a small sphere of radius at the origin and an outer surface of whatever shape. The total surface that encloses the volume is comprised of two parts (i.e. S = S 1 + S 2 as depicted in Fig. 10.6). When we apply (10.33) to the surface in Fig. 10.6, we have E
S2
e i kr e i kr E da = n r r n
S1
e i kr e i kr E da n r r n
(10.34)
10.B Fresnel-Kirchhoff Diffraction Formula
273
Our motivation for choosing this geometry with multiple surfaces is that eventually we want to nd the eld at the origin (inside the little sphere) from knowledge of the eld on the outside surface. To this end, we assume that is small so that E (r) is approximately the same everywhere on the surface S 1 . Then the integral over S 1 becomes e i kr e i kr E E d a = lim r = 0 n r r n
2
d
0 0
S1
e i kr r e i kr E r r 2 sin d r r n r r n
(10.35) where we have used spherical coordinates. Notice that we have employed the chain rule to execute the normal derivative /n . Since r always points opposite , the normal derivative r /n is always to the direction of the surface normal n equal to 1. (From the denition of the normal derivative we have r /n = n n = 1.) We can now perform the integration in (10.35) as well as take r n the limit as 0 to obtain lim E e i kr e i kr e i kr E e i kr e i kr E d a = 4 lim r 2 2 + i k E r2 0 n r r n r r r r = 4 lim = 4E (0) (10.36) With the aid of (10.36), Greens theorem applied to our specic geometry (10.34) reduces to E (0) = 1 4 e i kr e i kr E E da r n n r (10.37) e i k + i k e i k E e i k E r
0 S1
r=
r=
S2
The eld E on the left is understood to be the value of the eld inside the little sphere at the origin. The eld E inside the integral is the value of the eld on the surface of integration. Hence, if we know the eld everywhere on the outer surface S 2 , then we can predict the eld at the origin. Of course we are free to choose any coordinate system in order to nd the eld anywhere inside the surface by moving the origin. Now let us choose a specic surface S 2 . We choose an innite mask with a nite aperture connected to a hemisphere of innite radius R . In the end, we will actually be interested in light that enters through the mask and propagates point opposite to the origin. In our present coordinate system, the vectors r and n to the incoming light. We will transform our coordinate system at a later point. We must evaluate (10.37) on the surface depicted in the gure. For the portion of S 2 which is on the hemisphere, the integrand tends to zero as R becomes large. To argue this, it is necessary to recognize the fact that at large distances the eld must decrease at least as fast as 1/R . On the mask, we assume, as did Kirchhoff, that both E /n and E are zero. (Later Sommerfeld noticed that these
274
Figure 10.7 Surface S 2 depicted as a mask and a large hemisphere.
two assumptions actually contradict each other, and he revised Kirchhoffs work to be more accurate. However, the revision in practice makes only a tiny difference as light spills onto the back of the aperture over a distance of only a wavelength. We ignore this and make Kirchhoffs (slightly awed) assumptions since it saves a lot of work.) Thus, we are left with only the integration over the open aperture: E (0) = 1 4 e i kr E e i kr E da r n n r (10.38)
aperture
We have essentially arrived at the result that we are seeking. The eld coming through the aperture is integrated to nd the eld at the origin, which is located beyond the aperture. Let us manipulate the formula a little further. The second term in the integral of (10.38) can be rewritten as follows: e i kr e i kr r ik 1 i ke i kr ) ) = = 2 e i kr cos (r, n cos (r, n r n r r r n r r r (10.39)
. We ) indicates the cosine of the angle between r and n where r /n = cos (r, n have also assumed that the distance r is much larger than a wavelength in order to drop a term. Next, we assume that the eld in the plane of the aperture can be x , y e i kz . This represents a eld traveling through the aperture written as E =E from left to right. Then, we may write the rst term in the integral of (10.38) as E E z x , y e i kz (1) = i kE = = i kE n z n Substituting (10.39) and (10.40) into (10.38) yields E (0) = i E
aperture
(10.40)
e i kr r
) 1 + cos (r, n da 2
(10.41)
10.C Greens Theorem
275
Finally, we wish to rearrange our coordinate system to that depicted in Fig. 10.2. In our derivation, it was less cumbersome to place the origin at a point after the aperture. Now that we have completed our mathematics, it is convenient to make a change of coordinate system and move the origin to the plane of the aperture as in Fig. 10.2. Then, we can obtain the eld at a point lying somewhere after the aperture by computing E x , y, z = d = where R= ( x x )2 + y y
2
E x ,y ,z = 0
aperture
) e i kR 1 + cos (r, z dx dy R 2
(10.42)
+ d2
(10.43)
Equation (10.3) is the same as (10.41) after applying a coordinate transformation. It is called the Fresnel-Kirchhoff diffraction formula and it agrees with (10.1) )]/2. except for the obliquity factor [1 + cos (r, z
Appendix 10.C Greens Theorem

To derive Greens theorem, we begin with the divergence theorem (see (0.11)): da = fn
S V
f dv
(10.44)
always points normal to the surface of volume V over which The unit vector n the integral is taken. Let the vector function f be U V , where U and V are both analytical functions of the position coordinate r. Then (10.44) becomes da = (U V ) n
S V
(U V ) d v
(10.45)
as the directional derivative of V directed along the surface We recognize V n . This is often represented in shorthand notation as normal n = V n V n (10.46)
The argument of the integral on the right-hand side of (10.45) can be expanded with the chain rule: (U V ) = U V + U 2V (10.47) With these substitutions, (10.45) becomes U
S
V da = n
U V + U 2V d v
V
(10.48)
Actually, so far we havent done much. Equation (10.48) is nothing more than the divergence theorem applied to the vector function U V . Similarly, we can
276
apply the divergence theorem to an alternative vector function given by the combination V U . Thus, we can write an equation similar to (10.48) where U and V are interchanged: V
S
U da = n
V U + V 2U d v
V
(10.49)
We simply subtract (10.49) from (10.48), and this leads to (10.29) known as Greens theorem.
Exercises
277
Exercises
Exercises for 10.1 Huygens Principle P10.1 Huygens principle is often used to describe diffraction through a slits, but it can be also used to describe refraction. Use a drawing program or a ruler and compass to produce a picture similar to Fig. 10.8, which shows that the graphical prediction of refracted angle from the Huygens principle. Verify that the Huygens picture matches the numerical prediction from Snells Law for an incident angle of your choice. Use n i = 1 and n t = 2. HINT: Draw the wavefronts hitting the interface at an angle and treat each point where the wavefronts strike the interface as the source of circular waves propagating into the n = 2 material. The wavelength of the circular waves must be exactly half the wavelength of the incident light since = vac /n . Use at least four point sources and connect the matching wavefronts by drawing tangent lines as in the gure. P10.2 (a) Show that the function f (r ) = A cos (kr t ) r
Figure 10.8
is a solution to the wave equation in spherical coordinates with only radial dependence, 1 2 f 1 2 f r = r 2 r r v 2 t 2 Determine what v is, in terms of k and . (b) If the electric eld were a scalar eld, we might be done there. However, its a vector eld, and moreover it must satisfy Maxwells equations. We know from experience that its generally transverse, and since its traveling radially lets make a guess that its oscillating in the direction: E (r ) = A cos (kr t ) r
Show that this choice for E unfortunately is not consistent with Maxwells equations. In particular: (i) show that it does satisfy Gausss Law (1.1); (ii) compute the curl of E use Faradays Law (1.3) to deduce B; (iii) Show that this B does satisfy Gausss Law for magnetism (1.2); (iv) but this B it does not satisfy Amperes law (1.4). (c) A somewhat more complicated spherical wave E (r , ) =
A sin 1 cos (kr t ) sin (kr t ) r kr
278
does satisfy Maxwells equations. Describe how this wave behaves as a function of r and . What conditions need to be satised for this equation to reduce to the spherical wave formula used in the diffraction formulas?
Exercises for 10.2 Scalar Diffraction P10.3 Show that E (r ) = E 0 r 0 e i kr /r is a solution to the scalar Helmholtz equation (10.6). HINT: 2 = P10.4 1 2 r 1 2 1 + sin + r r 2 r 2 sin r 2 sin2 2
Learn by heart the derivation of the Fresnel-Kirchhoff diffraction formula (outlined in Appendix 10.B). Indicate the percentage of how well you understand the derivation. The points for this problem are proportional to your percentage of understanding. If you write 100% percent, it means that you can reproduce the derivation after closing your notes. Check that (10.9) is the solution to the paraxial wave equation (10.8). Apply the Fresnel-Kirchhoff diffraction formula (10.1) to a monochromatic plane wave with intensity I 0 , which goes through a circular aperture of diameter . Find the intensity of the light on axis (i.e. x , y = 0). HINT: The integral takes on the following form: i E (0, 0, d ) = E x , y ,0
aperture
P10.5 P10.6
eik
x 2 + y 2 +d 2
x 2 + y 2 + d2
2 +d 2
dx dy
i E0 =
/2
d
0 0
eik
2 + d2
Then you will want to make the following change of variables: 2 + d 2 . This will make it easier to accomplish the integration.
Answer: I (0, 0, d ) = 2 I 0 1 cos k ( /2)2 + d 2 kd .
Exercises for 10.3 Babinets Principle P10.7 Subtract the eld found in P 10.6 from a plane wave eld E 0 e i kd to obtain the on-axis eld behind a circular block. Show that the intensity on axis behind the circular block is constant (i.e. independent of d ) and is equal to the intensity of the initial plane wave.
Exercises
279
L10.8
Why does the on-axis intensity behind a circular opening uctuate (see P 10.6) whereas the on-axis intensity behind a circular obstruction remains constant (see P 10.7)? Create a collimated laser beam several centimeters wide. Observe the on-axis intensity on a movable screen (e.g. a hand-held card) behind a small circular aperture and behind a small circular obstruction placed in the beam.
Figure 10.9
Exercises for 10.4 Fresnel Approximation P10.9 Repeat P 10.6 to nd the on-axis intensity after a circular aperture in the Fresnel approximation. HINT: You can make a suitable approximation directly to the answer of P 10.6 to obtain the Fresnel approximation. However, you should also perform the integration under the Fresnel approximation for the sake of gaining experience.
Exercises for 10.5 Fraunhofer Approximation P10.10 (a) Repeat P 10.6 (or P 10.9) to nd the on-axis intensity after a circular aperture in the Fraunhofer approximation. HINT: You can make a suitable approximation directly to the answer of P 10.9 to obtain the Fraunhofer approximation. However, you should perform the integration under the Fraunhofer approximation for the sake of gaining experience. (b) Check how well the Fresnel and Fraunhofer approximations work by graphing the three curves (i.e. from P 10.6, P 10.9, and this problem) on a single plot as a function of d . Take = 10 m and = 500 nm. To see the result better, use a log scale on the z -axis.
Answer: 2004-2009 Peatross and Ware
280
Figure 10.10
P10.11 A single narrow slit has a mask placed over it so the aperture function is not a square pulse but rather a cosine: E (x , y , 0) = E 0 cos(x /L ) for L /2 < x < L /2 and E (x , y , 0) = 0 otherwise. Calculate the far-eld (Fraunhofer) diffraction pattern. Make a plot of intensity as a function of xkL /2d ; qualitatively compare the pattern to that of a regular single slit.
Chapter 11
Diffraction Applications
11.1 Introduction
In this chapter, we consider a number of practical examples of diffraction. We rst examine a Gaussian laser beam. This choice is not arbitrary since most students of optics at some point in their career use laser beams to perform measurements of one kind or another. It is often essential to characterize the laser beam prole and to understand its focusing properties. (Every semester we are contacted by students and faculty from a variety of departments seeking to better understand a laser beam that they are using.) The information presented here will very likely prove valuable to future research activity. We often think of lasers as collimated beams of light that propagate indefinitely without expanding. However, the laws of diffraction require that every nite beam eventually grow in width. The rate at which a laser beam diffracts depends on its beam waist size. Because laser beams usually have narrow divergence angles and therefore obey the paraxial approximation, we can calculate their behavior via the Fresnel approximation discussed in section 10.4. This is done in section 11.2. In section 11.3, we examine the Gaussian eld solution as a practical description of simple laser beams. Section 11.A discusses the ABCD law for Gaussian beams, which is a method of computing the effects of optical elements represented by ABCD matrices on Gaussian laser beams. In section 11.4, we discuss diffraction theory in systems involving lenses. We will nd that the Fraunhofer diffraction pattern discussed in section 10.5 for a far-away screen is imaged to the focus of a lens placed in the stream of light. This has important implications for the resolution of instruments such as telescopes or the human eye, as discussed in section 11.5. The array theorem is introduced in section 11.6. This theorem is a powerful mathematical tool that enables one to deal conveniently with diffraction from an array of identical apertures. One of the important uses of the array theorem is in determining diffraction from a grating. As discussed in section 11.7, a diffraction grating can be thought of as an array of narrow slit apertures. In section 11.8, we study the workings of a diffraction spectrometer. To nd the resolution limitations, 281
282
Chapter 11 Diffraction Applications
one combines the diffraction properties of gratings with the Fourier properties of lenses discussed in section 11.4.
11.2 Diffraction of a Gaussian Field Prole

Consider the diffraction of a Gaussian eld prole. At the plane z = 0, we describe the eld prole with the functional form
x 2 +y 2 w2 0
E (x , y , 0) = E 0 e
(11.1)
where w 0 , called the beam waist, species the radius of beam prole. This beam prole, depicted in Fig. 11.1, is very common for laser beams and is called the zero-order Gaussian mode; more complicated distributions are also possible. To appreciate the meaning of w 0 , consider the intensity of the eld distribution dened in (11.1): I x , y , 0 = I 0 e 2
2 2 /w 0
(11.2)
where 2 x 2 + y 2 . In (11.2) we see that w 0 indicates the radius at which the intensity reduces by the factor e 2 = 0.135. We would like to know how this eld evolves as it propagates beyond the plane z = 0. (We will no longer write z = d as we did in chapter 10; instead we will simply retain the variable z .) We compute the eld downstream using the Fresnel approximation (10.13): e i kz e i 2z (x E x , y , z = i z
k 2
+y 2)
dx

d y E 0 e (x
+y
k k 2 2 2 )/ w 0 e i 2z (x + y ) e i z (xx + y y )
(11.3) Notice that we have treated the aperture as being innitely large. This is not a problem since the Gaussian prole itself limits the dimension of the emission
Figure 11.1 Diffraction of a Gaussian eld prole.

11.2 Diffraction of a Gaussian Field Prole
283
region to a radius on the scale of w 0 . Equation (11.3) can be rewritten as E 0 e i kz e i 2z (x E x , y , z = i z

k 2
+y 2)
dx e
1 w2 0
k i 2 x 2 i kx x z z
dy e
1 w2 0
k +i 2 y 2 i z
ky z
(11.4) The integrals over x and y have the identical form and can be done individually with the help of the integral formula (0.52). The algebra is cumbersome, but the integral in the x dimension becomes 1 2 kx 2 1 k kx 2 i 2 i 2z x i z x z exp d x e w0 = 1 k i 2z 4 12 i k w2
w0
2z
= =
i
k 2z
1+i
2z 2 kw 0
exp
kx 2 2z
2z 2 kw 0
z 1+
2z 2 kw 0 2 i
tan1 2z2 kw 0
2 2 2z 2 +i kx kw 0 exp 2 2z 2z 1 + kw 2
0
(11.5) A similar expression results from the integration on y . When (11.5) and the equivalent expression for the y -dimension are used in (11.4), the result is k (x 2 + y 2 ) 2z +i E x , y , z = E 0 e i kz e i 2z (x
k 2
+y 2)
1 1+
2z 2 kw 0 2
2 z 1+
2z kw 2 0
kw 2 0
i tan1
2z kw 2 0
(11.6)
This rather complicated expression for the eld distribution is in fact very useful and can be directly interpreted, as discussed in the next section. Before proceeding, we take a moment to mention that this Fresnel integral can also be performed while utilizing the cylindrical symmetry. A Gaussian eld prole is one of few diffraction problems that can be handled conveniently in either the Cartesian or the cylindrical coordinate systems. In cylindrical coordinates, the Fresnel diffraction integral (10.23) takes the form 2i e i kz e i E , z = z
k 2 2z
d E 0 e
0
2 /w 0 i
k 2 2z
J0
k z
(11.7)
We can use the integral formula (0.56) to obtain
4 k 2 z 1 i k 2z w2 0
2E 0 e i kz e i E , z = i z
k 2 2z
e 2
1 2 w0
k i 2 z
(11.8)
284
which is identical to (11.6).
11.3 Gaussian Laser Beams

The rather complicated Gaussian eld expression (11.6) can be cleaned up through the judicious introduction of new quantities: E , z = E0 where 2 x 2 + y 2, w (z ) w 0
2 kw 0 2 1 + z 2 /z 0 , 2
2 k 2 w0 i kz +i 2R (z ) i tan1 zz 0 e w 2 (z ) e w (z )
(11.9)
(11.10) (11.11) (11.12) (11.13)
R (z ) z + z 0 /z , z0 2
This formula describes the lowest-order Gaussian mode, the most common laser beam prole. (Please be aware that some lasers are multimode and exhibit more complicated spatial mode structurese.g. a high-power YAG laser.) It turns out that (11.9) works equally well for negative values of z . The expression can therefore be used to describe the eld of a simple laser beam everywhere (before and after it goes through a focus). In fact, the expression works also near z = 0! One might call into question the paraxial approximation for small z since the radius of the beam might be larger than z . Nevertheless, at z = 0 the diffracted eld (11.9) returns the exact expression for the original eld prole (11.1) (see P 11.1). (There is good reason for this since the solution (11.9) obeys the scalar Helmholtz equation (10.6) under the paraxial approximation, where the second derivative with respect to z is neglected.) In short, (11.9) may be used with impunity as long as the divergence angle of the beam is not too wide. To begin our interpretation of (11.9), consider the intensity prole I E E as depicted in Fig. 11.2: I , z = I0
2 w0
w 2 (z )
2 2 w 2 (z )
I0
1 + z 2 /z 0
e 2
2 2 w 2 (z )
(11.14)
Figure 11.2 A Gaussian laser eld prole in the vicinity of its beam waist.
11.4 Fraunhofer Diffraction Through a Lens
285
By inspection we see that w (z ) is the radius of the beam as a function of z . At z = 0, the beam waist, w (z = 0) reduces to w 0 , as is seen in (11.11). The parameter z 0 , known as the Rayleigh range, species the distance along the axis from z = 0 to the point where the intensity decreases by a factor of 2. Note that w 0 and z 0 are not independent of each other but are connected through the wavelength according to (11.13). There is a tradeoff: a small beam waist means a short depth of focus. That is, a small w 0 means a small z 0 . We now return to an examination of the electric eld (11.9). As a reminder, to restore the temporal dependence of the eld, we simply append e i t to the solution, as discussed in connection with (10.5). Let us consider the phase terms that appear in (11.9). The factor exp i kz + i k 2 /2R (z ) describes the phase of curved wave fronts, where R (z ) is the radius of curvature of the wave front at z . At z = 0, the radius of curvature is innite (see (11.12)), meaning that the wave front is at at the laser beam waist. In contrast, at very large values of z we have R (z ) = z (see (11.12)). In this case, we may write these k 2 phase terms as kz + = k z 2 + 2 . This describes a spherical wave front emanating from the origin out to point , z . The Fresnel approximation (same as the paraxial approximation) essentially replaces spherical wave fronts with the former parabolic approximation. Near the origin, the wave fronts are at. Far from the origin, the wave fronts are spherical. The phase factor exp i tan1 z /z 0 is perhaps the most mysterious. It is called the Gouy shift and is actually present for any light that goes through a focus, not just laser beams. The Gouy shift is not overly dramatic since the expression tan1 z /z 0 ranges from /2 (at z = ) to /2 (at z = +). Nevertheless, when light goes through a focus, it experiences an overall phase shift of .
2R ( z )

As has been previously discussed, the Fraunhofer approximation applies to diffraction when the propagation distance from an aperture is sufciently large (see (10.15) and (10.16)). The intensity of the far-eld diffraction pattern is
2
1 I x , y, z = c 2
1 z
E x , y , 0 e i k
aperture
x z
y +z
dx dy
(11.15)
Notice that the dependence of the diffraction on x , y , and z comes only through the combinations x = x /z and y = y /z . Therefore, the diffraction pattern in the Fraunhofer limit is governed by the two angles x and y , and the pattern preserves itself indenitely. As the light continues to propagate, the pattern increases in size at a rate proportional to distance traveled so that the angular width is preserved. The situation is depicted in Fig. 11.3. The Fraunhofer limit corresponds to the ultimate amount of diffraction that light in an optical system experiences. Mathematically, it is obtained via a twodimensional Fourier transform as seen in (11.15). The Fraunhofer limit is very
286
Figure 11.3 Diffraction in the far eld.
important in a variety of optical instruments (e.g. telescopes, spectrometers), discussed later in this chapter. Recall that in order to use the Fraunhofer diffraction formula we need to 2 satisfy z aperture radius / (see (10.15)). As an example, if an aperture with a 1 cm radius (not necessarily circular) is used with visible light, the light must travel more than a kilometer in order to reach the Fraunhofer limit. It may therefore seem unlikely to reach the Fraunhofer limit in a typical optical system, especially if the aperture or beam size is relatively large. Nevertheless, spectrometers, which typically utilize diffraction gratings many centimeters wide, depend on achieving the Fraunhofer limit within the connes of a manageable instrument box. This is accomplished using imaging techniques, which is the topic addressed in this section. Consider a lens with focal length f placed in the path of light following an aperture (see Fig. 11.4). Let the lens be placed an arbitrary distance L after the aperture. The lens produces an image of the Fraunhofer pattern at a new location d i following the lens according to the imaging formula (see (9.51)) 1 1 1 = + . f (z L ) d i (11.16)
Keep in mind that the lens interrupts the light before the Fraunhofer pattern has a chance to form, at a distance z after the aperture (or a distance z L after the location of the lens). This means that the Fraunhofer diffraction pattern may be thought of as a virtual object for the imaging system. Since the Fraunhofer diffraction pattern occurs at very large distances (i.e. z ) we see that the image of the Fraunhofer pattern must appear at the focus of the lens: di = f. (11.17)
Thus, a lens makes it very convenient to observe the Fraunhofer diffraction pattern even from relatively large apertures. It is not necessary to let the light propagate for kilometers. We need only observe the pattern at the focus of the lens as shown in Fig. 11.4. Notice that the spacing L between the aperture and the lens is unimportant to this conclusion.
287
Figure 11.4 Imaging of the Fraunhofer diffraction pattern to the focus of a lens.
Even though we know that the Fraunhofer diffraction pattern occurs at the focus of a lens, the question remains as to the size of the image. We would like to know how the size of the diffraction pattern compares to what would have occurred on a far-away screen. To nd the answer, let us examine the magnication (9.53), which is given by M = di (z L ) (11.18)
Taking the limit of very large z and employing (11.17), the magnication becomes M f z (11.19)
This is a remarkable result. When the lens is inserted, the size of the diffraction pattern decreases by the ratio of the lens focal length f to the original distance z to a far-away screen. Since in the Fraunhofer regime the diffraction pattern is proportional to distance (i.e. si ze z ), the image at the focus of the lens scales in proportion to the focal length (i.e. si ze f ). This means that the angular width of the pattern is preserved! With the lens in place, we can rewrite (11.15) straightaway as
2
1 I x , y, L + f = c 2
1 f
E x , y ,0 e
aperture
i k xx + y y f (
)d x d y
(11.20)
which describes the intensity distribution pattern at the focus of the lens. Although (11.20) correctly describes the intensity, we cannot easily write the electric eld since the imaging techniques that we have used do not render the phase information. To obtain an expression for the eld, it is necessary to use repeatedly the Fresnel diffraction formula. First, the Fresnel diffraction formula is used to nd the eld arriving at the lens. The next task is to determine what the lens does to the eld as the light passes through it. Finally, the Fresnel diffraction formula is used again to nd the eld distribution at the focus of the lens. The
288
Figure 11.5 A thin lens, which modies the phase of a eld passing through.
result of this lengthy analysis gives an intensity pattern in agreement with (11.20). However, it also gives the full expression for the eld, including its phase. Before proceeding we take a moment to understand how a lens modies the eld of light as it passes through. Consider a monochromatic light eld that goes through a thin lens with focal length f . In traversing the lens, the eld undergoes a phase shift. Let us reference this phase shift to that experienced by the light that goes through the center of the lens. In the Fig. 11.5, R 1 is a positive radius of curvature, and R 2 is a negative radius of curvature, according to our previous convention. We take the distances 1 and 2 to be positive. The light passing through the off-axis portion of the lens experiences less material than the light passing through the center. The difference in optical path length is (1 n ) ( 1 + 2 ) (see discussion connected with (9.16)). This means that the phase of the eld passing through the off-axis portion of the lens relative to the phase of eld passing through the center is = k (n 1) (
1 + 2) .
(11.21)
The negative sign indicates a phase advance (i.e. same sign as t ) in contrast to a phase delay, which takes a positive sign. Off axis, the phase advances because the light travels through less material and gets ahead of the light traveling through the center of the lens. In (11.21), k represents the wave number in vacuum (i.e. 2/vac ). We can nd expressions for 1 and 2 from the equations describing the spherical surfaces of the lens: (R 1 (R 2 +
1) 2) 2 2 2 + x 2 + y 2 = R1 2 + x 2 + y 2 = R2
(11.22)
In the Fresnel approximation, the light propagation takes place in the paraxial 2 limit. It is therefore appropriate to neglect the terms 2 1 and 2 in comparison with the other terms present. Within this approximation, equations (11.22) can
289
Figure 11.6 Diffraction from an aperture viewed at the focus of a lens.
be solved, and they render x2 + y 2 = 2R 1 2 2 x +y 2= 2R 2

1
(11.23)
We are now able to evaluate the phase advance (11.21) in terms of x and y . Substitution of (11.23) into (11.21) yields = k (n 1) 1 1 R1 R2 x2 + y 2 2 (11.24)
As is noticed right away, (11.24) contains the focal length f of a thin lens according to lens-makers formula (9.56). With this identication, the phase introduced by the lens becomes k = x2 + y 2 (11.25) 2f In summary, the light traversing a lens experiences a relative phase shift given by E x , y , z after lens = E x , y , z before lens e
i 2kf (x 2 + y 2 )
(11.26)
Equation (11.26) introduces a wave-front curvature to the eld. For example, if a plane wave (i.e. a uniform eld E 0 ) passes through the lens, the eld emerges with a spherical-like wave front converging towards the focus of the lens. We now consider the Fresnel diffraction pattern at the focus of a lens inserted a distance L following the aperture (see Fig. 11.6). Assume that the eld E x , y , 0 at the aperture is known. We use the Fresnel approximation to compute the eld incident on the lens: E (x , y , L ) = i e i kL e i 2L (x L
k 2
+y
E (x , y , 0)e i 2L (x
+y
x ) e i k L(
x +y y
)d x d y (11.27)
290
(The double primes keep track of distinct variables in the two diffraction problems that are being put together into a system.) Next, the eld gains a phase factor according to (11.26) upon transmitting through the lens. Finally, we use the Fresnel diffraction formula to propagate the distance f from the back of the thin lens: eik f e
i 2kf (x 2 + y 2 ) i 2kf (x 2 + y +y y
2
E x , y , L + f = i
f e
E ( x , y , L )e
i 2kf (x 2 + y
2
) (11.28)
xx ) e i k f (
)d x d y
As is immediately appreciated by students, the injection of (11.27) into (11.28) makes a rather long formula involving four integrals. Nevertheless, two of the integrals can be performed in advance of choosing the aperture (i.e. those over x and y ). This is accomplished with the help of the integral formula (0.52) (even though in this instance the real part of A is zero). After this cumbersome work, (11.28) becomes e i k (L + f ) e
kL 2 2 i 2kf (x 2 + y 2 ) i 2 f 2 (x + y )
E x , y , L + f = i
(11.29) Notice that at least the integration portion of this formula looks exactly like the Fraunhofer diffraction formula! This happened even though in the preceding discussion we did not at any time specically make the Fraunhofer approximation. The result (11.29) implies the intensity distribution (11.20) as anticipated. However, the phase of the eld is also revealed in (11.29). In general, the eld caries a wave front curvature as it passes through the focal plane of the lens. In the special case L = f , the diffraction formula takes a particularly simple form: E (x , y , L + f ) = i e 2i k f f E (x , y , 0)e
i k xx + y y f (
E (x , y , 0)e
i k xx + y y f (
)d x d y
L= f
)d x d y
(11.30)
When the lens is placed at this special distance following the aperture, the Fraunhofer diffraction pattern viewed at the focus of the lens carries a at wave front.
11.5 Resolution of a Telescope

In the previous section we learned that the Fraunhofer diffraction pattern appears at the focus of a lens. This has important implications for telescopes and other optical instruments such as the human eye. In essence, any optical instrument involving lenses or mirrors has a built-in aperture, limiting the light that enters. For example, the pupil of the eye acts as an aperture that induces a Fraunhofer diffraction pattern to occur at the retina. Cameras have irises which aperture the light, causing a Fraunhofer diffraction pattern at the lm plane. If nothing else, the diameter of the lens itself induces diffraction.
11.5 Resolution of a Telescope
291
Figure 11.7 To resolve distinct images at the focus of a lens, the angular separation must exceed the width of the Fraunhofer diffraction patterns.
Recall that the Fraunhofer pattern represents the ultimate amount of diffraction caused by an aperture, and this just happens to occur at the focus of any lens. Of course, the focus of the lens is just where one needs to look in order to see images of distant objects. This has the effect of blurring out features in the image, limiting the resolution. This illustrates why it is impossible to focus light to a true point. Suppose you want to image two very distant stars that are close together. An image of each star appears near the focus of the lens. Since the rays traversing the center of the lens from either star are non-deviating (in the thin lens approximation), the angular separation between the two images is the same as the angular separation between the stars. This is seen in Fig. 11.7. A resolution problem occurs when the Fraunhofer diffraction pattern causes each image to blur by more than the angular separation between them. In this case the two images cannot be resolved because they bleed into each other. The Fraunhofer diffraction pattern from a circular aperture was computed in P 11.6. At the focus of a lens, this pattern is I , f = I0 2 4 f
2
J 1 k /2 f k /2 f
(11.31)
where f is the focal length of the lens and is its diameter. This pattern contains the rst order Bessel function J 1 , which behaves somewhat like a sine wave as seen in Fig. 11.8. The main differences are that the zero crossings are not exactly periodic and the function slowly diminishes with larger arguments. The rst zero crossing (after x = 0) occurs at 1.22. The intensity pattern described by (11.31) contains the factor 2 J 1 (x )/x , where x represents the combination k /2 f . As noticed in Fig. 11.8, J 1 (x ) goes to zero at x = 0. Thus, we have a zero-divided-by-zero situation when evaluating
292
Figure 11.8 (a) First-order Bessel function. (b) Square of the Jinc function.
2 J 1 (x )/x at the origin. This is similar to the sinc function (i.e. sin (x )/x ), which approaches one at the origin. In fact, 2 J 1 (x )/x is sometimes called the jinc function because it also approaches one at the origin. The square of the jinc is shown in Fig. 11.8b. This curve is proportional to the intensity described in (11.31). This pattern is sometimes called an Airy pattern after Sir George Biddell Airy (English, 1801U1892) who rst described the pattern. As can be seen in Fig. 11.8b, the intensity quickly drops at larger radii. We now return to the question of whether the images of two nearby stars as depicted in Fig. 11.7 can be distinguished. Since the peak in Fig. 11.8b is the dominant feature in the diffraction pattern, we will say that the two stars are resolved if the angle between them is enough to keep their respective diffraction peaks from seriously overlapping. We will use the criterion suggested by Lord Rayleigh for this purpose. His criterion for well-separated diffraction patterns requires the peak of one pattern to be no closer than the rst zero of the other. This situation is shown in Fig. 11.9. It is straightforward to nd the angle that corresponds to this separation of diffraction patterns. Since the width of the diffraction patterns depends on the diameter of the lens as well as on the wavelength of the light, we expect the minimum angle between resolvable objects to depend on these parameters. To nd this angle we set the argument of (11.31) equal to 1.22, the location of the rst zero: k = 1.22 2f
(11.32)
11.6 The Array Theorem
293
Figure 11.9 The Rayleigh criterion for a circular aperture.
With a little rearranging we have 1.22 min = = f (11.33)
Here we have associated the ratio / f (i.e. the radius of the diffraction pattern compared to the distance from the lens) with an angle. This angle is a measure of the angular extent of the diffraction pattern. The Rayleigh criterion requires that the diffraction patterns associated with two images be separated by at least this amount before we say that they are resolved. We therefore label the angle as min .
11.6 The Array Theorem

In this section we develop the array theorem, which is used for calculating the Fraunhofer diffraction from an array of N identical apertures. The array theorem is remarkable in itself, but our purpose in studying it is for its application to diffraction gratings, discussed in the next section. Conceptually, a grating may be thought of as a mask with an array of identical slits. This is similar to a Youngs double-slit setup, only an arbitrarily large number of slits may be used. As far as the array theorem is concerned, however, the apertures can have any shape, as suggested by Fig. 11.10. Consider N apertures in a mask, each with the identical eld distribution described by E aperture (x , y , 0) (11.34) Each identical aperture has a unique location on the mask. Let the location of the n th aperture be designated by the coordinates x n , y n . Since each aperture is
294
Figure 11.10 Array of identical apertures.
identical, we can conveniently write the total eld by summing over the same individual pattern displaced repeatedly according to the locations of the individual apertures:
N
E x , y ,0 =
n =1
E aperture (x x n , y y n , 0)
(11.35)
Let us compute the Fraunhofer diffraction pattern produced by the eld described by (11.35). However, we want to do this in a general case so that we can delay picking the specic aperture shape until a later time (i.e. (11.34)). Upon inserting (11.35) into the Fraunhofer diffraction formula (10.16) we obtain e i kz e i 2z (x E x , y , z = i z
k 2
+y 2) N
dx
n =1
d y E aperture x x n , y y n , 0 e i z (xx + y y ) (11.36)
where we have taken the summation out in front of the integral. To proceed further, let us make the following change of variables: x x xn y y yn With the use of these new variables, (11.36) becomes e i kz e i 2z (x E x , y , z = i z
k 2
(11.37)
+y 2) N
dx
n =1
d y E aperture x , y , 0 e i z [x (x
+x n )+ y ( y + y n )]
(11.38)
11.7 Diffraction Grating
295
Figure 11.11 Transmission grating.
We next pull a constant factor out of the integrals and we arrive at our nal result. With a slight re-arrangement (and with a trivial exchange of x for x and y for y ), (11.38) can be rewritten as
N
E x , y, z =
n =1
e i z (xxn + y y n )
k x2+y 2) i kz i 2 z(
dx

d y E aperture x , y , 0 e i (xx + y y )
k z
(11.39) Equation (11.39) is known as the array theorem. Note that the second factor in brackets is exactly the Fraunhofer diffraction pattern from just one of the identical apertures. When more than one identical aperture is present, we need only evaluate the Fraunhofer diffraction formula for an individual aperture. Then, the single-aperture result is multiplied by the summation in front, which contains the information about the number of apertures and their respective positions.
11.7 Diffraction Grating

In this section we will use the array theorem to calculate the diffraction from a grating comprised of an array of equally spaced identical slits. An array of uniformly spaced slits is called a transmission grating (see Fig. 11.11). Reection gratings are similar, being composed of an array of narrow rectangular mirrors that behave similarly to the slits.
296
As was calculated in P 11.5, the Fraunhofer diffraction pattern from a single aperture is given by E aperture x , y , z = i E 0 x x ye i kz i k (x 2 + y 2 ) y sinc e 2z x sinc y z z z (11.40)
The only part of (11.39) that remains to be evaluated is the summation out in front. Let the apertures be positioned at xn = n N +1 h, 2 yn = 0 (11.41)
where N is the total number of slits. Then the summation in the array theorem, (11.39), becomes
N n =1
e i z (xxn + y y n ) = e i
khx z
N +1 2
N n =1
e i
khx n z
(11.42)
This summation is recognized as a geometric sum, which can be performed using formula (0.59). Equation (11.42) then simplies to
N
e
n =1
i k xx n + y y n ) z(
=e =
ik z i
N +1 2
xh i khx z
e i
khx z
e i
i
khx 2z khx 2z
khx z
1 (11.43)
khx 2z
i khx 2z
ei
sin N khx 2z sin

khx 2z
By combining (11.40) and (11.43) we obtain the full Fraunhofer diffraction pattern for a diffraction grating. The expression for the eld is E x , y, z = sin N khx 2z sin
khx 2z
i E 0
x x ye i kz i k (x 2 + y 2 ) y sinc e 2z x sinc y z z z
(11.44) Lets consider a grating with the slits oriented in the y -direction, and y so that the last sinc function in Eq. (11.44) goes to one.1 The intensity pattern in the horizontal direction can then be written in terms of the peak intensity of the diffraction pattern on the screen: I (x ) = I peak sinc
2 hx sin2 N x z x hx z N 2 sin2 z
(11.45)
1 This is mostly the right idea, but is still a bit of a fake. In fact, the eld often does not have a uniform phase along the entire slit in the y -dimension, so our use of the function sinc y /z y was inappropriate to begin with. The energy in a real spectrometer is usually spread out in a diffuse pattern in the y -dimension. However, its form in y is of little relevance; the spectral information is carried in the x -dimension only.
11.8 Spectrometers
297
Note that lim
ducing our denition of I peak , which represents the intensity on the screen at x = 0. In principle, the intensity I peak is a function of y and depends on the exact details of how the slits are illuminated as a function of y , but this is usually not of interest as long as we stay with a given value of y as we scan along x . It is left as an exercise to study the functional form of (11.45), especially how the number of slits N inuences the behavior. The case of N = 2 describes the diffraction pattern for a Youngs double slit experiment. We now have a description of the Youngs two-slit pattern in the case that the slits have nite openings of width x rather than innitely narrow ones. A nal note: You may wonder why we are interested in Fraunhofer diffraction from a grating. The reason is that we are actually interested in separating different wavelengths by observing their distinct diffraction patterns separated in space. In order to achieve good spatial separation between light of different wavelengths, it is necessary to allow the light to propagate a far distance. Optimal separation (the maximum possible) occurs therefore in the Fraunhofer regime.
sin N 0 sin
= N so we have placed N 2 in the denominator when intro-
11.8 Spectrometers
The formula (11.45) can be exploited to make wavelength measurements. This forms the basis of a diffraction grating spectrometer. A spectrometer has relatively poor resolving power compared to a Fabry-Perot interferometer. Nevertheless, a spectrometer is not hampered by the serious limitation imposed by free spectral range. Therefore, it is able to measure a wide range of wavelengths simultaneously. The Fabry-Perot interferometer and the grating spectrometer in this sense are complementary, the one being able to make very precise measurements within a narrow wavelength range and the other being able to characterize wide ranges of wavelengths simultaneously. To appreciate how a spectrometer works, consider the Fraunhofer diffraction from a grating, described by (11.45). The structure of the diffraction pattern gives rise to peaks. For example, Fig. 11.12a shows the diffraction peaks from a Youngs double slit (i.e. N = 2). The diffraction pattern is comprised of the typical Youngs double-slit pattern multiplied by the diffraction pattern of a single slit, according
hx 2 to the array theorem. (Note that sin2 2 z /sin hx z
= 4 cos2
hx z
.)
As the number of slits N is increased, the peaks seen in the Youngs double-slit pattern tend to sharpen with additional smaller peaks appearing in between. Figure 11.12b shows the case for N = 5. The more signicant peaks occur when sin(hx /z ) in the denominator of (11.45) goes to zero. Keep in mind that the numerator goes to zero at the same places, creating a zero-over-zero situation, so the peaks are not innitely tall. With larger values of N , the peaks can become extremely sharp, and the small secondary peaks in between are smaller in comparison. Fig. 11.12c shows the case of N = 10 and Fig. 11.12d, shows the case of N = 100.
298
Figure 11.12 Diffraction through various numbers of slits, each with x = h /2 (slit widths half the separation). The dotted line shows the single slit diffraction pattern. (a) Diffraction from a double slit. (b) Diffraction from 5 slits. (c) Diffraction from 10 slits. (d) Diffraction from 100 slits.
11.A ABCD Law for Gaussian Beams
299
When very many slits are used, the diffraction pattern becomes very useful for measuring spectra of light. Keep in mind that the position of the diffraction peaks depends on wavelength (except for the center peak at x = 0). If light of different wavelengths is simultaneously present, then the diffraction peaks associated with different wavelengths appear in different locations. It helps to have very many slits involved (i.e. large N ) so that the diffraction peaks are sharply dened. Then closely spaced wavelengths can be more easily distinguished. Consider the inset in Fig. 11.12d, which gives a close-up view of the rst-order diffraction peak for N = 100. The location of this peak on a distant screen varies with the wavelength of the light. How much must the wavelength change to cause the peak to move by half of its width as marked in the inset of Fig. 11.12d? We will say that this is the minimum separation of wavelengths that still allows the two peaks to be distinguished. Let us solve for this minimum distinguishable wavelength difference. As mentioned, the main diffraction peaks occur when the denominator of (11.45) [i.e. sin2 (hx /z )] goes to zero. The location of the m th peak is therefore located at hx hx = m 0 = (11.46) 0 z mz The numerator of (11.45) sin2 (N hx /z ) also goes to zero at this same location, so the expression avoids going to innity. The rst zero to the sides of the main peaks (see Fig. 11.12d) occurs when N k hx hx = N m + 0 = N 2z (N m + 1) z (11.47)
The wavelength difference that shifts the peak by this amount (from peak center to the adjacent zero) is then = hx N hx 0 (N m + 1) N m hx hx = = = mz (N m + 1) z mz N m 2 z N m (N m + 1) (11.48)
This is the minimum difference in wavelength that we can hope to distinguish if two peaks of the different wavelengths are together side by side. As we did for the Fabry-Perot interferometer, we can dene the resolving power of the diffraction grating as RP = mN (11.49)
The resolving power is proportional to the number of slits illuminated on the diffraction grating. The resolving power also improves by using larger diffraction orders m .
Appendix 11.A ABCD Law for Gaussian Beams

In this section we discuss and justify the ABCD law for Gaussian beams. The law enables one to predict the parameters of a Gaussian beam that exits from an
300
Figure 11.13 Gaussian laser beam traversing an optical system described by an ABCD matrix. The dark lines represent the incoming and exiting beams. The gray line represents where the exiting beam appears to have been.
optical system, given the parameters of an input Gaussian beam. To make the prediction, one needs only the ABCD matrix for the optical system, taken as a whole. The system may be arbitrarily complex with many optical components. At rst, it may seem unlikely that such a prediction should be possible since ABCD matrices were introduced to describe the propagation of rays. On the other hand, Gaussian beams are governed by the laws of diffraction. As an example of this dichotomy, consider a collimated Gaussian beam that traverses a converging lens. By ray theory, one expects the Gaussian beam to focus near the focal point of the lens. However, a collimated beam by denition is already in the act of going through focus. In the absence of the lens, there is a tendency for the beam to grow via diffraction, especially if the beam waist is small. This tendency competes with the focusing effect of the lens, and a new beam waist can occur at a wide range of locations, depending on the exact outcome of this competition. A Gaussian beam is characterized by its Rayleigh range z 0 . From this, the beam waist radius w 0 may be extracted via (11.13), assuming the wavelength is known. Suppose that a Gaussian beam encounters an optical system at position z , referenced to the position of the beams waist as shown in Fig. 11.13. The beam exiting from the system, in general, has a new Rayleigh range z 0 . The waist of the new beam also occurs at a different location. Let z denote the location of the exit of the optical system, referenced to the location of the waist of the new beam. If the exiting beam diverges as in Fig. 11.13, then it emerges from a virtual beam waist located before the exit point of the system. In this case, z is taken to be positive. On the other hand, if the emerging beam converges to an actual waist, then z is taken to be negative since the exit point of the system occurs before the focus. The ABCD law is embodied in the following relationship: z i z0 = A (z i z 0 ) + B C (z i z 0 ) + D (11.50)
where A , B , C , and D are the matrix elements of the optical system. The imaginary number i 1 imbues the law with complex arithmetic. It makes two equations from one, since the real and imaginary parts of (11.50) must separately be equal. We now prove the ABCD law. We begin by showing that the law holds for
11.A ABCD Law for Gaussian Beams
301
two specic ABCD matrixes. First, consider the matrix for propagation through a distance d : A B 1 d = (11.51) C D 0 1 We know that simple propagation has minimal effect on a beam. The Rayleigh range is unchanged, so we expect that the ABCD law should give z 0 = z 0 . The propagation through a distance d modies the beam position by z = z + d . We now check that the ABCD law agrees with these results by inserting (11.51) into (11.50): z i z0 = 1 (z i z 0 ) + d = z + d i z0 0 (z i z 0 ) + 1 (propagation through distance d) (11.52)
Thus, the law holds in this case. Next we consider the ABCD matrix of a thin lens (or a curved mirror): A C B D = 1 1/ f 0 1 (11.53)
A beam that traverses a thin lens undergoes the phase shift k 2 /2 f , according to (11.26). This modies the original phase of the wave front k 2 /2R (z ), seen in (11.9). The phase of the exiting beam is therefore k 2 k 2 k 2 = 2R ( z ) 2R ( z ) 2 f (11.54)
where we do not keep track of unimportant overall phases such as kz or kz . With (11.12) this relationship reduces to 1 1 1 1 1 1 = = 2 2 R (z ) R (z ) f z + z 0 /z f z + z 0 /z (11.55)
In addition to this relationship, the local radius of the beam given by (11.11) cannot change while traversing the thin lens. Therefore, w z = w (z ) z 0 1 + z z
2 2
0
= z0 1 +
z2
2 z0
(11.56)
On the other hand, the ABCD law for the thin lens gives z i z0 = 1 (z i z 0 ) + 0 1/ f (z i z 0 ) + 1 (traversing a thin lens with focal length f )
(11.57) It is left as an exercise (see P 11.18) to show that (11.57) is consistent with (11.55) and (11.56). So far we have shown that the ABCD law works for two specic examples, namely propagation through a distance d and transmission through a thin lens
302
with focal length f . From these elements we can derive more complicated systems. However, the ABCD matrix for a thick lens cannot be constructed from just these two elements. However, we can construct the matrix for a thick lens if we sandwich a thick window (as opposed to empty space) between two thin lenses. The proof that the matrix for a thick window obeys the ABCD law is left as an exercise (see P 11.21). With these relatively few elements, essentially any optical system can be constructed, provided that the beam propagation begins and ends up in the same index of refraction. To complete our proof of the general ABCD law, we need only show that when it is applied to the compound element A2B1 + B2D 1 C2B1 + D 2D 1 (11.58) it gives the same answer as when the law is applied sequentially, rst on = = A1 C1 and then on A2 C2 Explicitly, we have z i z0 = A 2 z i z0 + B 2 C 2 z i z0 + D 2 A2 = C2 =
A 1 (z i z 0 )+B 1 C 1 (z i z 0 )+D 1 A 1 (z i z 0 )+B 1 C 1 (z i z 0 )+D 1
A C
B D
A2 C2
B2 D2
A1 C1
B1 D1
A 2 A 1 + B 2C 1 C 2 A 1 + D 2C 1
B1 D1 B2 D2
+ B2 + D2 (11.59)
A 2 [ A 1 (z i z 0 ) + B 1 ] + B 2 [C 1 (z i z 0 ) + D 1 ] C 2 [ A 1 (z i z 0 ) + B 1 ] + D 2 [C 1 (z i z 0 ) + D 1 ] ( A 2 A 1 + B 2C 1 ) (z i z 0 ) + ( A 2 B 1 + B 2 D 1 ) = (C 2 A 1 + D 2C 1 ) (z i z 0 ) + (C 2 B 1 + D 2 D 1 ) A (z i z 0 ) + B = C (z i z 0 ) + D
Thus, we can construct any ABCD matrix that we wish from matrices that are known to obey the ABCD law. The resulting matrix also obeys the ABCD law.
Exercises
303
Exercises
Exercises for 11.3 Gaussian Laser Beams P11.1 (a) Conrm that (11.9) reduces to (11.1) when z = 0. (b) Take the limit z z 0 to nd the eld far from the laser focus. (c) Dene the ratio of z to the (far-away) beam diameter as the fnumber: z ) f # lim z 2w (z ) Write the beam waist w 0 in terms of the f-number and the wavelength.
Figure 11.14 NOTE: You now have a convenient way to predict the size of a laser focus by measuring the cone angle of the beam. However, in an experimental setting you may be very surprised at how badly a beam focuses compared to the theoretical prediction (due to aberrations, etc.). It is always good practice to actually measure your focus if its size is important to the experiment. P11.2 Use the Fraunhofer integral formula (either (10.16) or (10.24)) to determine the far-eld pattern of a Gaussian laser focus (11.1). HINT: The answer should agree with P 11.1 part (b). L11.3 Consider the following setup where a diverging laser beam is collimated using an uncoated lens. A double reection from both surfaces of the lens (known as a ghost) comes out in the forward direction, focusing after a short distance. Use a CCD camera to study this focused beam. The collimated beam serves as a reference to reveal the phase of the focused beam through interference. Because the weak ghost beam concentrates near its focus, the two beams can have similar intensities for optimal interference effects. The ghost beam E 1 , z is described by (11.9), where the origin is at the focus. Let the collimated beam be approximated as a plane wave E 2 e i kz +i , where is the relative phase between the two beams. The 2 net intensity is then I t , z E 1 , z + E 2 e i kz +i or I t , z = I2 + I1 , z + 2
I 2 I 1 , z cos
k 2 z tan1 2R ( z ) z0
304
where I 1 , z is given by (11.14). We now have a formula that retains both R (z ) and the Gouy shift tan1 z /z 0 , which are not present in the intensity distribution of a single beam (see (11.14)).
Figure 11.15
(a) Determine the f-number for the ghost beam (see P 11.1 part (c)). Use this measurement to predict a value for w 0 . HINT: You know that at the lens, the focusing beam is the same size as the collimated beam. (b) Measure the actual spot size w 0 at the focus. How does it compare to the prediction? HINT: Before measuring the spot size, make a minor adjustment to the tilt of the lens. This controls the relative phase between the two beams, which you will set to = /2 so that at the focus the cosine term vanishes and the two beams dont interfere. This is accomplished if the center of the interference pattern is as dark as possible either far before or far after the focus. In this case, the intensity of the individual beams at the focus simply added together (only at z = 0), the small prole on top of the wave prole. (c) Observe the effect of the Gouy shift. Since tan1 z /z 0 varies over a range of , you should see that the ring pattern inverts before and after the focus. The bright rings exchange with the dark ones. (d) Predict the Rayleigh range z 0 and check that the radius of curvature 2 R (z ) z + z 0 /z agrees with measurement. HINT: As you look at different radii , the only interference term that varies is k 2 /2R (z ). If you count N fringes out to a radius , then k 2 /2R (z ) has varied by 2N . You can then compute R (z ) and compare it to the prediction. You should see pictures like the following:
Exercises
305
Figure 11.16
Exercises for 11.4 Fraunhofer Diffraction Through a Lens P11.4 Fill in the steps leading to (11.29) from (11.28). Show that the intensity distribution (11.20) is consistent (11.29). Calculate the Fraunhofer diffraction eld and intensity patterns for a rectangular aperture (dimensions x by y ) illuminated by a plane wave E 0 .
P11.5
306
HINT: Use (10.16) e i kz i k (x 2 + y 2 ) E x , y , z = i E 0 e 2z z

x /2 y /2 x i kx z y /2
dx e
d y e i
ky z
x /2
Answer: I x , y , z = I 0
y x 2 y 2 x x sinc2 z y sinc2 z 2 z 2
P11.6
Calculate the Fraunhofer diffraction intensity pattern for a circular aperture (diameter ) illuminated by a plane wave E 0 . HINT: Use (10.24) and (0.55).
2 2 J (k /2z ) 2 2 J (x ) Answer: I , z = I 0 . The function 1 (sometimes called the jinc func2 1k /2z x 4 z ( ) tion) looks similar to the sinc function except that its rst zero is at x = 1.22 rather than at . 2 J (x ) Note that lim 1 = 1. x
x 0
L11.7
Set up a collimated plane wave in the laboratory using a HeNe laser ( = 633 nm) and appropriate lenses. (a) Choose a rectangular aperture (x by y ) and place it in the plane wave. Observe the Fraunhofer diffraction on a very far away screen (i.e., 2 k where z 2 aperture radius is satised). Check that the location of the zeros agrees with the result from P 11.5. (b) Place a lens in the beam after the aperture. Use a CCD camera to observe the Fraunhofer diffraction prole at the focus of the lens. Check that the location of the zeros agrees with the result from P 11.5, replacing z with f . (c) Repeat parts (a) and (b) using a circular aperture with diameter . Check the position of the rst zero.
Exercises for 11.5 Resolution of a Telescope P11.8 (a) What minimum telescope diameter would be required to distinguish a Jupiter-like planet (orbital radius 8 108 km) from its star if they are 10 light-years away? Take the wavelength to be = 500 nm. NOTE: The unequal brightness is the biggest technical challenge. (b) On the night of April 18, 1775, a signal was sent from the Old North Church steeple to Paul Revere, who was 1.8 miles away: One if by land, two if by sea. If in the dark, Pauls pupils had 4 mm diameters, what is the minimum possible separation between the two lanterns that would allow him to correctly interpret the signal? Assume that the predominant wavelength of the lanterns was 580 nm. HINT: In the eye, the index of refraction is about 1.33 so the wavelength is shorter. This leads to a smaller diffraction pattern on the retina. However, in accordance with Snells law, two rays separated by an angle
Exercises
307
580 nm outside of the eye are separated by an angle /1.33 inside the eye. The two rays then hit on the retina closer together. As far as resolution is concerned, the two effects exactly compensate. L11.9 Simulate two stars with laser beams ( = 633 nm). Align them nearly parallel with a small lateral displacement. Send the beams down a long corridor until diffraction causes both beams to grow into one another so that it is no longer apparent that they are from two distinct sources. Use a lens to image the two sources onto a CCD camera. The camera should be placed close to the focal plane of the lens. Use a variable iris near the lens to create different pupil openings.
Figure 11.17 Experimentally determine the pupil diameter that just allows you to resolve the two sources according to the Rayleigh criterion. Check your measurement against theoretical prediction. HINT: The angular separation between the two sources is obtained by dividing propagation distance into the lateral separation of the beams. P11.10 (a) A monochromatic plane wave with intensity I 0 and wavelength is incident on a circular aperture of diameter followed by a lens of focal length f . Write the intensity distribution at a distance f behind the lens. (b) You wish to spatially lter the beam such that, when it emerges from the focus, it varies smoothly without diffraction rings or hard edges. A pinhole is placed at the focus, which transmits only the central portion of the Airy pattern (inside of the rst zero). Calculate the intensity pattern at a distance f after the pinhole using the approximation given in the hint below.
Figure 11.18 HINT: A reasonably good approximation of the transmitted eld is

308
that of a Gaussian E , 0 = E f e /w 0 , where E f is the magnitude of the eld at the center of the focus found in part (a), and the width is w 0 = 2 f # / and f # f / . The gure below shows how well the Gaussian approximation ts the actual curve. We have assumed that the rst aperture is a distance f before the lens so that at the focus after the lens the wave front is at at the pinhole. To avoid integration, you may want to use the result of P 11.2 or P 11.1(b) to get the Fraunhofer limit of the Gaussian prole. (See gure below.)
2 2
Figure 11.19
Exercises for 11.6 The Array Theorem P11.11 Find the diffraction pattern created by an array of nine circles, each with radius a , which are centered at the following (x , y ) coordinates: (b, b ), (0, b ), (b, b ), (b, 0), (0, 0), (b, 0), (b, b ), (0, b ), (b, b ) (a is less than b ). Make a plot of the result for the situation where (in some choice of units) a = 1, b = 5a , and k /d = 1. View the plot at different zoom levels to see the ner detail. P11.12 A diffraction screen with apertures as arranged in Fig. 11.20 is illuminated with a plane wave. All the circles are of radius b , and the square which is centered in the upper circle, is of side a . Light comes through all the circles and the square, but in the shaded regions labeled with the light coming through is shifted to be 180 out of phase with light coming through the other regions. (Thus the square is 180 out of phase with the rest of the upper circle, and the left circle is 180 out of phase with the right circle.) The distances c and L as indicated below are also given.
Exercises
309
Figure 11.20 (a) Find the fair-eld (Fraunhofer) diffraction pattern for the upper aperture alone-that is, the circle-square combination. (b) Find the fair-eld (Fraunhofer) diffraction pattern for the lower two apertures alone (omitting the upper square-circle combination) (c) Find the diffraction pattern for all the apertures together.
Exercises for 11.7 Diffraction Grating P11.13 Consider Fraunhofer diffraction from a grating of N slits having widths x and equal separations h . Make plots (label relevant points and scaling) of the intensity pattern for N = 1, N = 2, N = 5, and N = 1000 in the case where h = 2x , x = 5 m, and = 500 nm. Let the Fraunhofer diffraction be observed at the focus of a lens with focal length f = 100 cm. Do you expect I peak to be the same value for all of these cases? P11.14 For the case of N = 1000 in P 11.13, you wish to position a narrow slit at the focus of the lens so that it transmits only the rst-order diffraction peak (i.e. at khx / 2 f = ). (a) How wide should the slit be if it is to be half the separation between the rst intensity zeros to either side of the peak? (b) What small change in wavelength (away from = 500 nm) will cause the intensity peak to shift by the width of the slit found in part (a)? P11.15 (a) A plane wave is incident on a screen of N 2 uniformly spaced identical rectangular apertures of dimension x by y (see gure below). +1 +1 Their positions are described by x n = h n N2 and y m = s m N2 . Find the far-eld (Fraunhofer) pattern of the light transmitted by the grid.
310
(b) You are looking at a distant sodium street lamp (somewhat monochromatic) through a curtain made from a ne mesh fabric with crossed threads. Make a sketch of what you expect to see (how the lamp will look to you). HINT: Remember that the lens of your eye causes the Fraunhofer diffraction of the mesh to appear at the retina.
Figure 11.21
Exercises for 11.8 Spectrometers L11.16 (a) Use a HeNe laser to determine the period h of a reective grating. (b) Give an estimate of the blaze angle on the grating. HINT: Assume that the blaze angle is optimized for rst-order diffraction of the HeNe laser (on one side). The blaze angle enables a mirror-like reection of the diffracted light on each groove.
Figure 11.22 (c) You have two mirrors of focal length 75 cm and the reective grating in the lab. You also have two very narrow adjustable slits and the ability to tune the angle of the grating. Sketch how to use these items to make a monochromator (scans through one wavelength at a time). If the beam that hits the grating is 5 cm wide, what do you expect the ultimate resolving power of the monochromator to be in the wavelength range
Exercises
311
of 500 nm? Do not worry about aberration such as astigmatism from using the mirrors off axis.
Figure 11.23 L11.17 Study the Jarrell Ash monochromator. Use a tungsten lamp as a source and observe how the instrument works by taking the entire top off. Do not breathe or touch when you do this. In the dark, trace the light inside of the instrument with a white plastic card and observe what happens when you change the wavelength setting. Place the top back on when you are done. (a) Predict the best theoretical resolving power that this instrument can do assuming 1200 lines per millimeter. (b) What should the width x of the entrance and exit slits be to obtain this resolving power? Assume = 500 nm. HINT: Set x to be the distance between the peak and the rst zero of the diffraction pattern at the exit slit for monochromatic light.
Exercises for 11.A ABCD Law for Gaussian Beams P11.18 Find the solutions to (11.57) (i.e. nd z and z 0 in terms of z and z 0 ). Show that the results are in agreement with (11.55) and (11.56). P11.19 Assuming a collimated beam (i.e. z = 0 and beam waist w 0 ), nd the location L = z and size w 0 of the resulting focus when the beam goes through a thin lens with focal length f . L11.20 Place a lens in a HeNe laser beam soon after the exit mirror of the cavity. Characterize the focus of the resulting laser beam, and compare the results with the expressions derived in P 11.19.
312
P11.21 Prove the ABCD law for a beam propagating through a thick window of material with matrix A C B D = 1 d /n 0 1
Review, Chapters 911
True and False Questions R48 T or F: The eikonal equation and Fermats principle depend on the assumption that the wavelength is relatively small compared to features of interest. T or F: The eikonal equation and Fermats principle depend on the assumption that the index of refraction varies only gradually. T or F: The eikonal equation and Fermats principle depend on the assumption that the angles involved must not be too big. T or F: The eikonal equation and Fermats principle depend on the assumption that the polarization is important to the problem. T or F: Spherical aberration can be important even when the paraxial approximation works well. T or F: Chromatic aberration (the fact that refractive index depends on frequency) is an example of the violation of the paraxial approximation. T or F: The Fresnel approximation falls within the paraxial approximation. T or F: The imaging relation 1/ f = 1/d o + 1/d i relies on the paraxial ray approximation. T or F: Spherical waves of the form e i kR /R are exact solutions to Maxwells equations. T or F: Spherical waves can be used to understand diffraction from apertures that are relatively large compared to . T or F: Fresnel was the rst to conceive of spherical waves. T or F: Spherical waves were accepted by Poisson immediately without experimental proof. 313
R49
R50
R51
R52
R53
R54
R55
R56
R57
R58 R59
314
R60
T or F: The array theorem is useful for deriving the Fresnel diffraction from a grating. T or F: A diffraction grating with a period h smaller than a wavelength is ideal for making a spectrometer. T or F: The blaze on a reection grating can improve the amount of energy in a desired order of diffraction. T or F: The resolving power of a spectrometer used in a particular diffraction order depends only on the number of lines illuminated (not wavelength or grating period). T or F: The central peak of the Fraunhofer diffraction from two narrow slits separated by spacing h has the same width as the central diffraction peak from a single slit with width x = h . T or F: The central peak of the Fraunhofer diffraction from a circular aperture of diameter has the same width as the central diffraction peak from a single slit with width x = . T or F: The Fraunhofer diffraction pattern appearing at the focus of a lens varies in angular width, depending on the focal length of the lens used. T or F: Fraunhofer diffraction can be viewed as a spatial Fourier transform (or inverse transform if you prefer) on the eld at the aperture.
R61
R62
R63
R64
R65
R66
R67
Problems R68 (a) Derive Snells law using Fermats principle. (b) Derive the law of reection using Fermats principle. R69 (a) Consider a ray of light emitted from an object, which travels a distance d o before traversing a lens of focal length f and then traveling a distance d i .
Figure 11.24
Write a vector equation relating
y2 y1 to . Be sure to simplify 2 1 the equation so that only one ABCD matrix is involved.
315
HINT:
1 1/ f
0 1
1 d 0 1
(b) Explain the requirement on the ABCD matrix in part (a) that ensures that an image appears for the distances chosen. From this requirement, extract a familiar constraint on d o and d i . Also, make a reasonable denition for magnication M in terms of y 1 and y 2 , then substitute to nd M in terms of d o and d i . (c) A telescope is formed with two thin lenses separated by the sum of their focal lengths f 1 and f 2 . Rays from a given far-away point all strike the rst lens with essentially the same angle 1 . Angular magnication M quanties the telescopes purpose of enlarging the apparent angle between points in the eld of view.
Figure 11.25 Give a sensible denition for angular magnication in terms of 1 and 2 . Use ABCD-matrix formulation to derive the angular magnication of the telescope in terms of f 1 and f 2 . R70 A B (beginning C D and ending in the same index of refraction) can be made to look like the matrix for a thin lens if the beginning and ending positions along the z-axis are referenced from two principal planes, located distances p 1 and p 2 before and after the system. (a) Show that a system represented by a matrix HINT: A C B D = 1.
(b) Where are the principal planes located and what is the effective focal length for two identical thin lenses with focal lengths f that are separated by a distance d = f ?
Figure 11.26 R71 Derive the on-axis intensity (i.e. x , y = 0) of a Gaussian laser beam if you know that at z = 0 the electric eld of the beam is E , z = 0 = E 0 e
2 w2 0
316
Fresnel: ie E x , y, d =
E x , y , 0 e i 2d ( x
+y
k ) e i d (xx + y y ) d x d y
e Ax
+B x +C
dx =
B 2 +C e 4A . A
R72
(a) You decide to construct a simple laser cavity with a at mirror and another mirror with concave curvature of R = 100 cm. What is the longest possible stable cavity that you can make? HINT: Sylvesters theorem is A C B D
N
1 sin
1 where cos = 2 ( A + D ).
(b) The amplier is YLF crystal, which lases at = 1054 nm. You decide to make the cavity 10 cm shorter than the longest possible (i.e. found in part (a)). What is the value of w 0 , and where is the beam waist located inside the cavity (the place we assign to z = 0)? HINT: One can interpret the parameter R (z ) as the radius of curvature of the wave front. For a mode to exist in a laser cavity, the radius of curvature of each of the end mirrors must match the radius of curvature of the beam at that location. E , z = E0 2 x 2 + y 2 w (z ) w 0
2 kw 0 2 1 + z 2 /z 0
2 k 2 w0 i tan1 zz 0 e w 2 (z ) e i kz +i 2R (z ) e w (z )
2 R (z ) z + z 0 /z
z0 R73
(a) Compute the Fraunhofer diffraction intensity pattern for a uniformly illuminated circular aperture with diameter . HINT: ie E x , y, d =
d 1 J 0 () = 2
2
E x , y , 0 e i d (xx + y y ) d x d y
e i cos( ) d
0
317
J 0 (bx ) xd x =
0
a J 1 (ab ) b
J 1 (1.22) = 0 2 J 1 (x ) =1 x 0 x lim (b) The rst lens of a telescope has a diameter of 30 cm, which is the only place where light is clipped. You wish to use the telescope to examine two stars in a binary system. The stars are approximately 25 light-years away. How far apart need the stars be (in the perpendicular sense) for you to distinguish them in the visible range of = 500 nm? Compare with the radius of Earths orbit, 1.5 108 km. R74 (a) Derive the Fraunhofer diffraction pattern for the eld from a uniformly illuminated single slit of width x . (Dont worry about the y -dimension.) (b) Find the Fraunhofer intensity pattern for a grating of N slits of width +1 x positioned on the mask at x n = h n N2 so that the spacing between all slits is h.
N
HINT: The array theorem says that the diffraction pattern is times the diffraction pattern of a single slit. You will need
N n =1 n =1
e i d xxn
rn =r
rN 1 r 1
(c) Consider Fraunhofer diffraction from the grating in part (b). The grating is 5.0 cm wide and is uniformly illuminated. For best resolution in a monochromator with a 50 cm focal length, what should the width of the exit slit be? Assume a wavelength of = 500 nm. Selected Answers R72: (a) 100 cm (b) 0.32 mm. R73: (b) 4.8 108 km. R74: (c) 5 m.
Chapter 12
Interferograms and Holography

12.1 Introduction
In chapter 7, we studied a Michelson interferometer in an idealized sense: 1) The light entering the instrument was considered to be a planewave. 2) The retroreecting mirrors were considered to be aligned perpendicular to the beams impinging on them. 3) All reective surfaces were taken to be perfectly at. If any of these conditions are relaxed, the result is an interference or fringe pattern in the beam emerging from the interferometer. A recorded fringe pattern (on a CCD or photographic lm) is called an interferogram. In section 12.2, we shall examine typical fringe patterns that can be produced in an interferometer. Such patterns are very useful for testing the prescription and quality of optical components. Some examples of how to do this are addressed in section 12.3. The technique of holography was conceived of by Dennis Gabor in the late 1940s. In optical holography, light interference patterns (or fringe patterns) are recorded and then later used to diffract light, much like gratings diffract light.1 The recorded fringe pattern, when used for the purpose of diffracting light, is called a hologram. When the light diffracts from the hologram, it can mimic the light eld originally used to generate the previously recorded fringe pattern. This is true even for very complex elds generated when light is scattered from arbitrary three-dimensional objects. When the light eld is re-created through diffraction by the fringe pattern, an observer perceives the presence of the original object. The image looks three-dimensional since the holographic fringes reconstruct the original light pattern simultaneously for a wide range of viewing angles. Holograms are studied in sections 12.4 and 12.5.
12.2 Interferograms
Consider the Michelson interferometer seen in Fig. 12.1. Suppose that the beamspliter divides the elds evenly, so that the overall output intensity is given by
1 In fact, a grating can be considered to be a hologram and holographic techniques are often
employed to produce gratings.
319
320
Chapter 12 Interferograms and Holography
Figure 12.1 Michelson interferometer.
(8.1): I det = 2 I 0 [1 + cos ()] (12.1) where is the roundtrip delay time of one path relative to the other. This equation is based on the idealized case, where the amplitude and phase of the two beams are uniform and perfectly aligned to each other following the beamsplitter. The entire beam blinks on and off as the delay path is varied. What happens if one of the retro-reecting mirrors is misaligned by a small angle ? The fringe patterns seen in Fig. 12.2 (b)-(d) are the result. By the law of reection, the beam returning from the misaligned mirror deviates from the ideal path by an angle 2 . This puts a relative phase term of = kx sin (2x ) + k y sin 2 y (12.2)
on the misaligned beam (in addition to ). Here x represents the tilt of the mirror in the x -dimension and y represents the amount of tilt in the y -dimension. When the two plane waves join, the resulting intensity pattern is I det = 2 I 0 1 + cos + (12.3)
Of course, the phase term depends on the local position within the beam through x and y . Regions of uniform phase, called fringes (in this case individual stripes), blink on and off together as the delay is varied. As the delay is varied, the fringes seem to move across the detector, owing to the fact that the phase of the blinking varies smoothly across the beam. The fringes emerge from one edge of the beam and disappear at the other.
12.3 Testing Optical Components
321
Figure 12.2 Fringe patterns for a Michelson interferometer: (a) perfectly aligned beams. (b) Horizontally misaligned beams. (c) Vertically misaligned beams. (d) Both vertically and horizontally misaligned beams. (e) Diverging beam with unequal paths. (f) Diverging beam with unequal paths and horizontal misalignment.
Another interesting situation arises when the beams in a Michelson interferometer are diverging. A fringe pattern of concentric circles will be seen at the detector when the two beam paths are unequal (see Fig. 12.2 (e)). The radius of curvature for the beam traveling the longer path is increased by the added amount of delay d = /c . Thus, if beam 1 has radius of curvature R 1 when returning to the beam splitter, then beam 2 will have radius R 2 = R 1 + d upon return (assuming at mirrors). The relative phase between the two beams is = k 2 /2R 1 k 2 /2R 2 and the intensity pattern at the detector is given as before by (12.3). (12.4)

A Michelson interferometer is ideal for testing the quality of optical surfaces. If any of the at surfaces (including the beam splitter) in the interferometer are distorted, the fringe pattern readily reveals it. Fig. 12.3 shows an example of a fringe pattern when one of the mirrors in the interferometer has an arbitrary deformity in the surface gure. A new fringe stripe occurs for every half wavelength that the surface varies. (The round trip turns a half wavelength into a whole wavelength.) This makes it possible to determine the atness of a surface with very high precision.
322
Figure 12.3 (a) Fringe pattern arising from an arbitrarily distorted mirror in a perfectly aligned interferometer with plane wave beams. (b) Fringe pattern from the same mirror as (a) when the mirror is tilted (still plane wave beams). The distortion due to surface variation is still easily seen.
Of course, in order to test a given surface in an interferometer, the quality of the other surfaces must rst be ensured. A typical industry standard for research-grade optics is to specify the surface atness to within one tenth of an optical wavelength (633 nm HeNe laser). This means that the interferometer should reveal no more than one fth of a fringe variation across the substrate. The fringe pattern tells the technician how the surface should continue to be polished in order to achieve the desired surface atness. When testing a surface, it is not necessary to remove all tilt from the alignment in order to see fringe effects due to surface variations. In fact, it is sometimes helpful to observe the effects of a distorted surface gure as deviations in a regular striped fringe pattern. Other types of optical surfaces and optical component besides at mirrors can also be tested with an interferometer. Fig. 12.4 shows how a lens can be tested using a convex mirror to compensate for the focusing action of the lens. With appropriate spacing, the lens-mirror combination can act like a at surface. Distortions in the lens gure are revealed in the fringe pattern. In this case, the surfaces of the lens are tested together, and variations in optical path length are observed. In order to record fringes, say with a CCD camera, it is often convenient to image a larger beam onto a relatively small active area of the detector. The imaging objective should be adjusted to produce an image of the test optic on the detector screen. The diameter of the objective lens needs to accommodate the whole beam.
323
Figure 12.4 Twyman-Green setup for testing lenses.
324
12.4 Generating Holograms

Consider a coherent monochromatic beam of light that is split in half by a beamsplitter, similar to that in a Michelson interferometer. Let one beam, called the reference beam, proceed directly to a recording lm, and let the other beam scatter from an arbitrary object back towards the same lm. The two beams interfere at the recording lm. It may be advantageous to split the beam initially into unequal intensities such that the light scattered from the object has an intensity similar to the reference beam at the lm. The purpose of the lm is to record the interference pattern. It is important that the coherence length of the light be much longer than the difference in path length starting from the beam splitter and ending at the lm. In addition, during exposure to the lm, it is important that the whole setup be stable against vibrations on the scale of a wavelength since this will cause the fringes to washout. For simplicity, we neglect the vector nature of the electric eld, assuming that the scattering from the object for the most part preserves polarization and that the angle between the two beams incident on the lm is modest (so that the electric elds of the two beams are close to parallel). To the extent that the light scattered from the object contains the polarization component orthogonal to that of the reference beam, it provides a uniform (unwanted) background exposure to the lm on top of which the fringe pattern is recorded. In general terms, we may write the electric eld arriving at the lm as E lm (r) e i t = E object (r) e i t + E ref (r) e i t (12.5)
Dennis Gabor (19001979, Hungarian) Gabor was educated and worked in Germany. However, when Hitler came to power, he left and eventually went to England. While there Gabor invented holography in the early 1950s, but it would not become practical until the invention of the laser.
Here, the coordinate r indicates locations on the lm surface, which may have arbitrary shape. The eld E object (r), which is scattered from the object, is in general very complicated. The eld E ref (r) may be equally complicated, but typically it is convenient if it has a simple form such as a plane wave, since this beam must be re-created later in order to view the hologram.
Figure 12.5 Exposure of holographic lm.

12.5 Holographic Wavefront Reconstruction
325
The intensity of the eld (12.5) is given by 1 I lm (r) = c 2 1 = c 2

0
E object (r) + E ref (r)

2
(12.6)
E object (r) + |E ref (r)|2 + E ref (r ) (r) E object (r) + E ref (r) E object
For typical photographic lm, the exposure of the lm is proportional to the intensity of the light hitting it. This is known as the linear response regime. That is, after the lm is developed, the transmittance T of the light through the lm is proportional to the intensity of the light that exposed it I lm . However, for low exposure levels, or for lm specically designed for holography, the transmission of the light through the lm can be proportional to the square of the intensity of the light that exposes the lm. Thus, after the lm is exposed to the fringe pattern and developed, the lm acquires a spatially varying transmission function according to 2 T (r) I lm (r) (12.7) This means that a eld that is later incident on the lm has its amplitude modied by E transmitted (r) = t (r) E incident (r) I lm (r) E incident (r) (12.8) as it emerges from the other side of the lm.

To see a holographic image, we re-illuminate lm (previously exposed and developed) with the original reference beam. That is, we send in E incident (r) = E ref (r) (12.9)
and view the light that is transmitted. According to (12.6) and (12.8), the transmitted eld is proportional to E transmitted (r) I lm (r) E ref (r)
2 = E object (r) + |E ref (r)|2 E ref (r) + |E ref (r)|2 E object (r) + E ref (r) E object (r ) (12.10) Although this expression looks fairly complicated, each of the three terms has a direct interpretation. The rst term is just the reference beam E ref (r) with an amplitude modied by the transmission through the lm. It is the residual undeected beam, similar to the zero-order diffraction peak for a transmission grating. The second term is interpreted as a reconstruction of the light eld originally scattered from the object E object (r). Its amplitude is modied by the intensity of the reference beam, but if the reference beam is uniform across the lm, this hardly matters. An observer looking into the lm sees a wavefront identical to the one produced by the original object. Thus, the observer sees a virtual image at the location of the original object. Since the wavefront of the
326
Figure 12.6 Holographic reconstruction of wavefront through diffraction from fringes on lm. Compare with Fig. 12.2.
original object has genuinely been recreated, the image looks three-dimensional, because the observer is free to view from different perspectives. The nal term in (12.10) is proportional to the complex conjugate of the original eld from the object. It also contains twice the phase of the reference beam, which we can overlook if the reference beam is uniform on the lm. In this case, the complex conjugate of the object eld actually converges to a real image of the original object. This image is located on the observers side of the lm, but it is often of less interest since the image is inside out. An ideal screen for viewing the real image would be an item shaped identical to the original object, which of course defeats the purpose of the hologram! To the extent that the lm is not at 2 or to the extent that the reference beam is not a plane wave, the phase of E ref (r ) severely distorts the image. The virtual image never suffers from this problem. As an example, consider a hologram made from a point object, as depicted in Fig. 12.7. Presumably, the point object is illuminated sufciently brightly so as to make the scattered light have an intensity similar to the reference beam at the lm. Let the reference plane wave strike the lm at normal incidence. Then the reference eld will have constant amplitude and phase across it; call it E ref . The eld from the point object can be treated as a spherical wave: E object = E ref L L2 + 2 eik
L 2 + 2
(point source example)
(12.11)
Here represents the radial distance from the center of the lm to some other point on the lm. We have taken the amplitude of the object eld to match E ref in the center of the lm. After the lm is exposed, developed, and re-illuminated by the reference beam, the eld emerging from the right-hand-side of the lm, according to (12.10),
327
Figure 12.7 Exposure to holographic lm by a point source and a reference plane wave. The holographic fringe pattern for a point object and a plane wave reference beam exposing a at lm is shown on the right.
becomes E transmitted
2 2 E ref L
L2 + 2 + E ref
2
2 2 + E ref E ref + E ref
E ref L L2 + 2
eik
L 2 + 2
E ref L L2 + 2
(12.12)
i k
L 2 + 2
(point source example)
We see the three distinct waves that emerge from the holographic lm. The rst term in (12.12) is merely the plane wave reference beam passing straight through the lm (with some variation in amplitude), which is depicted in Fig. 12.8 (a). The second term in (12.12) has the identical form as the eld from the original object (aside from an overall amplitude factor). It describes an outward-expanding spherical wave, which gives rise to a virtual image at the location of the original point object, as depicted in Fig. 12.8 (b). The nal term in (12.12) corresponds to a converging spherical wave, which focuses to a point at a distance L from the observers side of the screen (depicted in Fig. 12.8 (c)).
328
Figure 12.8 Reference beam incident on previously exposed holographic lm. (a) Part of the beam goes through. (b) Part of the beam takes on the eld prole of the original object. undeected. (c) Part of the beam converges to a real image of the original object.
Exercises
329
Exercises
Exercises for 12.4 Generating Holograms P12.1 An ideal Michelson interferometer that uses at mirrors is perfectly aligned to a wide collimated laser beam. Suppose that one of the mirrors is then misaligned by 0.1 . What is the spacing between adjacent fringes on the screen if the wavelength is = 633 nm? What would happen if the the angle of the input beam (before the beamsplitter) was tilted by 0.1 ? An ideal Michelson interferometer uses at mirrors perfectly aligned to an expanding beam that diverges from a point 50 cm before the beamsplitter. Suppose that one mirror is 10 cm away from the beam splitter, and the other is 11 cm. Suppose also that the center of the resulting bulls-eye fringe pattern is dark. If a screen is positioned 10 cm after the beam splitter, what is the radial distance to the next dark fringe on the screen if the wavelength is = 633 nm? Set up an interferometer and observe distortions to a mirror substrate when the setscrew is over tightened. Consider a diffraction grating as a simple hologram. Let the light from the object be a plane wave (object placed at innity) directed onto a at lm at angle . Let the reference beam strike the lm at normal incidence, and take the wavelength to be . (a) What is the period of the fringes? (b) Show that when re-illuminated by the reference beam, the three terms in (12.10) give rise zero-order and 1st-order diffraction to either side of center. (c) Check that it matches predictions in the previous section. P12.5 Consider the holographic pattern produced by the point object described in section 12.5. (a) Show that the phase of the real image in (12.12) may be approximated as = k 2 /2L , aside from a spatially independent overall phase. Compare with (11.25) and comment. (b) This hologram is similar to a Fresnel zone plate, used to focus extreme ultraviolet light or x-rays, for which it is difcult to make a lens. Graph the eld transmission for the hologram as a function of and superimpose a similar graph for a best-t mask that has regions of either 100% or 0% transmission. Use = 633 nm and L = (5 105 1 4 ) (this places the point source about a 32 cm before the screen). See Fig. 12.9.
P12.2
L12.3
P12.4
330
Figure 12.9 Field transmission for a point-source hologram (left) and a Fresnel zone plate (middle), and a plot of both as a function of radius (right).
L12.6
Make a hologram.
Chapter 13
Blackbody Radiation
13.1 Introduction
Hot objects glow. In 1860, Kirchhoff proposed that the radiation emitted by hot objects as a function of frequency is approximately the same for all materials. (An important exception is atomic vapors, which have relatively few discrete spectral lines. However, Kirchhoffs assumption holds quite well for most solids, which are sufciently complex.) The notion that all materials behave similarly led to the concept of an ideal blackbody radiator. Most materials have a certain shininess that causes light to reect or scatter in addition to being absorbed and reemitted. However, light that falls upon an ideal blackbody is absorbed perfectly before the possibility of reemission, hence the name blackbody. The distribution of frequencies emitted by a blackbody radiator is related to its temperature. The key concept of a blackbody radiator is that the light surrounding it is in thermal equilibrium with the radiation. If some of the light escapes to the environment, the object inevitably must cool as it continually moves towards a new thermal equilibrium. The Sun is a good example of a blackbody radiator. The light emitted from the Sun is associated with its surface temperature. Any light that arrives to the Sun from outer space is virtually 100% absorbed, however little light that might be. Mostly, light escapes to the much colder surrounding space, and the temperature of the Suns surface is maintained by the fusion process within. Experimentally, a near perfect blackbody radiator can be constructed from a hollow object. As the object is heated, the light present inside the internal cavity can only come from the walls. Also, any radiation in the interior cavity is eventually absorbed (before being potentially reemitted), if not on the rst bounce then on subsequent bounces. In this case, the walls of the cavity and light eld are in thermal equilibrium. A small hole can be drilled through the wall into the interior to observe the radiation there without signicantly disturbing the system. A glowing tungsten lament also makes a reasonably good example of a blackbody radiator. However, if not formed into a cavity, one must take surface reections into account because the emissivity is less than unity. 331
332
Chapter 13 Blackbody Radiation
In this chapter, we develop a theoretical understanding of blackbody radiation and provide some historical perspective. One of the earliest properties deduced about blackbody radiation is known as the Stefan-Boltzmann law, derived from thermodynamic ideas in 1879, long before blackbody radiation was fully understood. This law says that the total intensity I of radiation (including all frequencies) that ows outward from a blackbody radiator is given by I = e T 4 , (13.1)
Gustav Kirchho (18241887, German) Kirchho studied the spectra emitted by various objects. He coined the term blackbody radiation. He understood that an excited gas gives o a discrete spectrum, and that an unexcited gas surrounding a blackbody emitter produces dark lines in the blackbody spectrum.
where is called the Stefan-Boltzmann constant and T is the absolute temperature (in Kelvin) of the blackbody. The value of the Stefan-Boltzmann constant is = 5.6696 108 W/m2 K4 . The dimensionless parameter e called the emissivity is equal to one for an ideal blackbody surface. However, it is less than one for actual materials because of surface reections. For example, the emissivity of tungsten is approximately e = 0.4. It is sometimes useful to express intensity in terms of the energy density of the light eld u eld (given by (2.51) in units of energy per volume). This connection between outward-going intensity and energy density of the eld is given by I= 4T 4 cu eld u eld = e 4 c (13.2)
since the energy travels at speed c equally in all directions (for example, inside a cavity within a solid object). A factor of 1/2 occurs because only half of the energy travels away from rather than towards any given surface (e.g. the wall of the cavity). The remaining factor of 1/2 occurs because the energy that ows outward through a given surface is directionally distributed over a hemisphere . The average as opposed to owing only in the direction of the surface normal n over the hemisphere is carried out as follows:
2 0 2 0
/2 0
sin d rn = r sin d
2 0
d
2 0
/2 0
r cos sin d = r sin d
/2 0
/2 0
1 2
(13.3)
The thermodynamic derivation of the Stefan-Boltzmann law is given in appendix 13.A. Although (13.1) describes the total intensity of the light that leaves a blackbody surface, it does not describe what frequencies make up the radiation eld. This frequency distribution was not fully described for another two decades when Max Planck developed his famous formula. Planck rst arrived at the blackbody radiation formula empirically in an effort to match experimental data. He then attempted to explain it, which marks the birth of quantum mechanics. Even Planck was uncomfortable with and perhaps disbelieved the assumptions that his formula implied, but he deserves credit for recognizing and articulating those assumptions. In section 13.3, we study how Plancks blackbody radiation formula implies the existence of electromagnetic quanta, which we now call photons.
13.2 Failure of the Equipartition Principle
333
In section 13.2 we rst examine the failure of classical ideas to explain blackbody radiation (even though this failure was only appreciated years after Planck developed his formula). Section 13.4 gives an analysis of blackbody radiation developed by Einstein where he introduced the concept of stimulated and spontaneous emission. In this sense, Einstein can be thought of as the father of light amplication by stimulated emission of radiation (LASER).
13.2 Failure of the Equipartition Principle

In the latter part of the 1800s as spectrographic technology improved, experimenters acquired considerable data on the spectra of blackbody radiation. Experimentalists were able to make detailed maps of the intensity per frequency associated with blackbody radiation over a fairly wide wavelength range. The results appeared to be independent of the material as long as the object was black and rough, and this suggested general underlying physical reasons for the behavior. The intensity per frequency depended only on temperature and when integrated over all frequencies agreed with the Stefan-Boltzmann law (13.1). In 1900, Rayleigh (and later Jeans in 1905) attempted to explain the blackbody spectral distribution (intensity per frequency) as a function of temperature by applying the equipartition theorem to the problem. Recall, the equipartition theorem states the energy in a system on the average is distributed equally among all degrees of freedom in the system. For example, a system composed of oscillators (say, electrons attached to springs representing the response of the material on the walls of a blackbody radiator) has an energy of k B T /2 for each degree of freedom, where k B = 1.38 1023 J/K is Boltzmanns constant. Rayleigh and Jeans supposed that each unique mode of the electromagnetic eld should carry energy k B T just as each mechanical spring in thermal equilibrium carries energy k B T (k B T /2 as kinetic and k B T /2 as potential energy). The problem then reduces to that of nding the number of unique modes for the radiation at each frequency. They anticipated that requiring each mode of electromagnetic energy to hold energy k B T should reveal the spectral shape of blackbody radiation.
2 2 2 A given frequency is associated with a specic wave number k = k x + ky + kz . Notice that there are many ways (i.e. combinations of k x , k y , and k z ) to come up with the same wave number k = 2/c (corresponding to a single frequency ). To count these ways properly, we can let our experience with Fourier series guide us. Consider a box with each side of length L . The Fourier theorem (0.33) states that the total eld inside the box (no matter how complicated the distribution) can always be represented as a superposition of sine (and cosine) waves. The total eld in the box can therefore be written as
Re
n = m = =
E n ,m , e i (nk0 x +mk0 y +
k0 z )
(13.4)
where each component of the wave number in any of the three dimensions is
334
Figure 13.1 The volume of a thin spherical shell in n , m ,
space.
always an integer times k 0 = 2/L (13.5)
We must keep in mind that (13.4) does not account for the two distinct polarizations for each wave. To nd the total number of modes associated with a given frequency, we should double the number of terms in (13.4) that have that frequency. It is important to note that we have not articially made any restrictions by considering the box of size L since we may later take the limit L so that our box represents the entire universe. In fact, L naturally disappears from our calculation as we consider the density of modes. We can think of a given wave number k as specifying the equation of a sphere in a coordinate system with axes labeled n , m , and : n2 + m2 +
2
k k0
(13.6)
We need to know how many more ways there are to choose n , m , and when the wave number k /k 0 is replaced by (k + d k )/k 0 . The answer is the difference in the volume of the two spheres as shown in Fig. 13.1: # modes in (k ,k +d k ) = 4 k2 d k 2 k k0 0 (13.7)
This represents the number of ways to come up with a wave number between k and k + d k . Again, this is the number of terms in (13.4) with a wave number between k and k + d k . Recall that n , m , and are integers. Notice that we have included the possibility of negative integers. This automatically takes into account the fact that for each mode (dened by a set n , m , and ) the eld may travel in the forwards or the backwards direction. Since according to the Rayleigh-Jeans assumption each mode carries energy of k B T , the energy density (energy per volume) associated with a specied range
13.3 Plancks Formula
335
of wave numbers d k is k B T /L 3 times (13.7), the number of modes within that range. Thus, the total energy density in the eld for all wave numbers is
u eld =
0
k B T 4 k 2 2 3 3 d k = kB T L k0
k2 dk 2
(13.8)
where the extra factor of 2 accounts for two independent polarizations for each mode. The dependence on L has disappeared from (13.8). We can see that (13.8) disagrees drastically with the Stefan-Boltzmann law (13.2), since (13.8) is proportional to temperature rather than to its fourth power. In addition, the integral in (13.8) is seen to diverge, meaning that regardless of the temperature, the light carries innite energy density! This has since been named the ultraviolet catastrophe since the divergence occurs on the short wavelength end of the spectrum. This is a clear failure of classical physics to explain blackbody radiation. Nevertheless, Rayleigh emphasized the fact that his formula worked well for the longer wavelengths and he did not necessarily want to abandon classical physics. Such dramatic changes take time. It is instructive to make the change of variables k = 2/c in the integral to write
u eld = k B T
0 2 3
82 d c3
(13.9)
The important factor 8 /c can now be understood to be the number of modes per frequency. Then (13.9) is rewritten as
u eld =
0
() d
(13.10)
where Rayleigh-Jeans () = k B T
82 c3
(13.11)
describes (incorrectly) the spectral energy density of the radiation eld associated with blackbody radiation.

In the late 1800s Wien considered various physical and mathematical constraints on the spectrum of blackbody radiation and tried to nd a function to t the experimental data. The form for the energy distribution of blackbody radiation that Wien proposed was Wien () =
8h 3 e h /kB T c3
(13.12)
336
It is important to note that the constant h had not yet been introduced by Planck. The actual way that Wien wrote his distribution was Wien () = a 3 e b /T , where a and b were parameters used to t the data. Wiens formula did a good job of tting experimental data. However, in 1900 Lummer and Pringshein reported experimental data that deviated from the Wien distribution at long wavelengths (infrared). Max Planck was privy to this information and later that year came up with a revised version of Wiens formula that t the data beautifully everywhere: Planck () = 8h 3 c 3 e h /kB T 1 (13.13)
Max Planck (18581947, German) Plancks work on thermodynamics led him to study the equilibrium between hot objects and electromagnetic radiation, which led to his introduction of the energy quantum in 1900. While he won the Nobel prize in 1918 for this contribution, he had serious reservations about the course that quantum mechanics theory took. He rejected the Copenhagen interpretation of quantum mechanics.
where h = 6.626 1034 J s is an experimentally determined constant. As seen in Fig. 13.2, the Rayleigh-Jeans curve, (13.11), and the Wien curve, (13.12), both t the Plancks distribution function asymptotically on opposite ends. The Wien distribution does a good job nearly everywhere. However, at long wavelengths it was off by just enough for the experimentalists to notice that something was wrong. At this point, it may seem fair to ask, what did Planck do that was so great? After all, he simply guessed a function that was only a slight modication of Wiens distribution. And he knew the answer from the back of the book, namely Lummers and Pringsheins well done experimental results. (At the time, Planck was unaware of the work by Rayleigh.) What Planck did that was so great was to interpret the meaning of his new formula. His interpretation was what he called an act of desperation. While Planck was able to explain the implications of his formula, he did not assert that the implications were necessarily right; in fact, he presented them somewhat apologetically. It was several years later that the young Einstein published his paper explaining the photoelectric effect in
Figure 13.2 Energy density per frequency according to Planck, Wien, and Rayleigh-Jeans.
337
terms of the implications of Plancks formula. Plancks insight was an enormous step towards understanding the quantum nature of light. The full theory of quantum electrodynamics would not be developed until nearly three decades later. Students should appreciate that the very people who developed quantum mechanics were also bothered by its confrontation with deep-seated intuition. If quantum mechanics bothers you, you should feel yourself in good company! Planck found that he could derive his formula only if he made the following strange assumption: A given mode of the electromagnetic eld is not able to carry an arbitrary amount of energy (for example, k B T which varies continuously as the temperature varies). Rather, the eld can only carry discrete amounts of energy separated by spacing h . Under this assumption, the probability P n that a mode of the eld is excited to the n th level is proportional to the Boltzmann statistical weighting factor e nh /kB T . We can normalize this factor by dividing by the sum of all such factors to obtain the probability of having energy nh in a particular mode: e nh /kB T = e nh /kB T 1 e h /kB T (13.14) Pn = e mh /kB T
m =0
Then, the energy in each mode of the eld is expected to be

n =0
h nP n = h 1 e h /kB T = h e h /kB T 1 = h e h /kB T 1
n =0
ne nh /kB T (13.15)
e nh /kB T (h /k B T ) n =0
Equation (13.15) is interpreted as the expectation of the energy (associated with an individual frequency) based on probabilities consistent with thermal equilibrium. Finally, we multiply this expected energy by the mode density 82 /c 3 , obtained in the derivation of the Rayleigh-Jeans formula. In other words, we substitute (13.15) for k B T in (13.10) to obtain the Planck distribution (13.13). It is interesting that we are now able to derive the constant in the StefanBoltzmann law (13.2) in terms of Plancks constant h (see P 13.3). The StefanBoltzmann law is obtained by integrating the spectral density function (13.13) over all frequencies to obtain the total eld energy density, which is in thermal equilibrium with the blackbody radiator:
u eld =
0
Plank ()d =
4 4 2 5 k B 4 T 4 = T 4 2 3 c 15c h c
(13.16)
The Stefan-Boltzmann constant is thus calculated in terms of Plancks constant. However, Plancks constant was not introduced for several decades after the Stefan-Boltzmann law was developed. Thus, one may say that the Stefan-Boltzmann constant pins down Plancks constant.
338
13.4 Einsteins A and B Coefcients

More than a decade after Planck introduced his formula, and after Bohr had proposed that electrons occupy discrete energy states in atoms, Einstein reexamined blackbody radiation in terms of Bohrs new idea. If the material of a blackbody radiator interacts with a mode of the eld with frequency , then electrons in the material must make transitions between two energy levels with energy separation h . Since the radiation of a blackbody is in thermal equilibrium with the material, Einstein postulated that the eld stimulates electron transitions between the states. In addition, he postulated that some transitions must occur spontaneously. (If the possibility of spontaneous transitions is not included, then there can be no way for a eld mode to receive energy if none is present to begin with.) Einstein wrote down rate equations for populations of the two levels N1 and N2 associated with the transition h :
Albert Einstein (18791955, German) Einstein is without a doubt the most famous scientist in history, and he made signicant contributions to the eld of optics. Einstein took Plancks notion of energy quanta and used them to explain the photoelectric eect. In addition, he developed a description that predicted the possibility of lasers years before quantum theory was fully developed.
1 = A 21 N2 B 12 () N1 + B 21 () N2 , N 2 = A 21 N2 + B 12 () N1 B 21 () N2 N
(13.17)
The coefcient A 21 is the rate of spontaneous emission from state 2 to state 1, B 12 () is the rate of stimulated absorption from state 1 to state 2, and B 21 () is the rate of stimulated emission from state 2 to state 1. In thermal equilibrium, the rate equations (13.17) are both equal to zero (i.e., 1 = N 2 = 0) since the relative populations of each level must remain constant. N We can then solve for the spectral density () at the given frequency. Either expression in (13.17) yields () = A 21 N1 N2 B 12 B 21 (13.18)
In thermal equilibrium, the spectral density must match the Planck spectral density formula (13.13). In making the comparison, we should rst rewrite the ratio N1 /N2 of the populations in the two levels using the Boltzmann probability factor: N1 e E 1 /kB T = = e (E 2 E 1 )/kB T = e h /kB T (13.19) N2 e E 2 /kB T Then when equating (13.18) to the Planck blackbody spectral density (13.13) we get A 21 8h 3 = (13.20) e h /kB T B 12 B 21 c 3 e h /kB T 1 From this expression we deduce that B 12 = B 21 and A 21 = 8h 3 B 21 c3 (13.21)
(13.22)
13.A Thermodynamic Derivation of the Stefan-Boltzmann Law
339
We see from (13.21) that the rate of stimulated absorption is the same as the rate of stimulated emission. In addition, if one knows the rate of stimulated emission between a pair of states, it follows from (13.22) that one also knows the rate of spontaneous emission. This is remarkable because to derive A 21 directly, one needs to use the full theory of quantum electrodynamics (the complete photon description). However, to obtain B 21 , it is actually only necessary to use the semiclassical theory, where the light is treated classically and the energy levels in the material are treated quantum-mechanically using the Schr odinger equation. The usual semiclassical theory cannot explain spontaneous emission, but it can explain stimulated emission and the rate of sponaneous emission can then be obtained indirectly through (13.22). It should be mentioned that (13.21) and (13.22) assume that the energy levels 1 and 2 are non-degenerate. Some modications must be made in the case of degenerate levels, but the procedure is similar. In writing the rate equations, (13.17), Einstein predicted the possibility of creating lasers fty years in advance of their development. These rate equations are still valid even if the light is not in thermal equilibrium with the material. The equations suggest that if the population in the upper state 2 can be made articially large, then amplication will result via the stimulated transition. The rate equations also show that a population inversion (more population in the upper state than in the lower one) cannot be achieved by pumping the material with the same frequency of light that one hopes to amplify. This is because the stimulated absorption rate is balanced by the stimulated emission rate. The material-dependent parameters A 21 and B 12 = B 21 are called the Einstein A and B coefcients.
Appendix 13.A Thermodynamic Derivation of the StefanBoltzmann Law

In this appendix, we derive the Stefan-Boltzmann law. This derivation is included for historical interest and may be a little difcult to follow. The derivation relies on the 1st and 2nd laws of thermodynamics. Consider a container whose walls are all at the same temperature and in thermal equilibrium with the radiation eld inside, according to the properties of an ideal blackbody radiator. Notice that the units of energy density u eld (energy per volume) are equivalent to force per area, or in other words pressure. The radiation exerts a pressure of P = u eld /3 (13.23)
on each wall of the box. This can be derived from the fact that radiation of energy E imparts a momentum 2 E p = cos (13.24) c when it is absorbed and reemitted from a wall at an angle . The fact that light carries momentum was understood well before the development of the theory of
340
relativity and the photon description of light. The total pressure (force per area averaged over all angles) on a wall averages to be
/2 p 1 t A
sin d (13.25)
P=
/2 0
sin d
where A is the area of the wall and E = u eld AL is the total energy in the box, which makes a round trip during the interval t = 2L /(c cos ). L is the length of the box in the direction perpendicular to the surface. Upon performing the integration in (13.25), the simple result (13.23) is obtained. To derive the Stefan-Boltzmann law, consider entropy which is dened in differential form by the quantity dQ (13.26) T where d Q is the injection of heat (or energy) into the radiation eld in the box and T is the temperature at which that injection takes place. We would like to write d Q in terms of u eld , V , and T . Then we may invoke the fact that S is a state variable, which implies 2 S 2 S = (13.27) T V V T This is a mathematical statement of the fact that S is fully dened if the internal energy, temperature, and volume of system are specied. In other words, S does not depend on past temperature and volume history of a system, but is completely parameterized by the present state of the system. To obtain d Q in the form that we need, we can use the 1st law of thermodynamics, which is a statement of energy conservation: dS = d Q = dU + P dV = d (u eldV ) + P dV 1 = V d u eld + u eld dV + u eld dV (13.28) 3 d u eld 4 =V d T + u eld dV dT 3 Notice that we have used energy density times volume to obtain the total energy U in the radiation eld in the box. We have also used (13.23) to obtain the work accomplished by pressure as the volume changes. A change in internal energy dU = d (u eldV ) can take place by the injection of heat d Q or by doing work dW = P dV as the volume increases. We can use (13.28) to rewrite (13.26): V d u eld 4u eld dT + dV (13.29) T dT 3T When we differentiate (13.29) with respect to temperature or volume we get dS = S V d u eld = T T dT S 4u eld = V 3T
Figure 13.3 Field inside a blackbody radiator.
(13.30)
13.A Thermodynamic Derivation of the Stefan-Boltzmann Law
341
We are now able to evaluate the partial derivatives in (13.27), which give 4 u eld 4 1 u eld 4 u eld 2 S = = T V 3 T T 3 T T 3 T2 2 S 1 d u eld = V T T d T
(13.31)
Finally, (13.27) becomes a differential equation relating the internal energy of the system to the temperature: 4 1 u eld 4 u eld 1 d u eld u eld 4u eld = = 2 3 T T 3 T T dT T T (13.32)
The solution to this differential equation is (13.2), where 4/c is a constant to be determined experimentally (or derived from the Planck blackbody formula as was done in (13.16)).
342
Exercises
Exercises for 13.1 Introduction P13.1 The Sun has a radius of R S = 6.96 108 m. What is the total power that it radiates, given a surface temperature of 5750 K? A 1 cm-radius spherical ball of polished gold hangs suspended inside an evacuated chamber that is at room temperature (20 C. There is no pathway for thermal conduction to the chamber wall. (a) If the gold is at a temperature of 100 C, what is the initial rate of temperature loss in C/s? The emissivity for polished gold is e = 0.02. The specic heat of gold is 129 J/kg C and its density is 19.3 g/cm3 . HINT: Q = mc T and Power = Q /t . (b) What is the initial rate of temperature loss if the ball is coated with at black paint, which has emissivity e = 0.95? HINT: You should consider the energy owing both ways.
P13.2
Exercises for 13.3 Plancks Formula P13.3 Derive (or try to derive) the Stefan-Boltzmann law by integrating the (a) Rayleigh-Jeans energy density
u eld =
0
Rayleigh-Jeans () d
Please comment. (b) Wien energy density
u eld =
0
Wien () d
Please evaluate .
HINT:
0
x 3 e ax d x =
6 . a4
(c) Planck energy density
u eld =
0
Planck () d
Please evaluate . Compare results of (b) and (c).
HINT:
0
x3d x e ax 1
4 . 15a 4
Exercises
343
P13.4
(a) Derive Wiens displacement law max = 0.00290 m K T
which gives the strongest wavelength present in the blackbody spectral distribution. HINT: Transform the integral to wavelength instead of frequency:

u eld =
0
Planck () d u eld =
0
Planck () d
Then nd what corresponds to the maximum of Planck (). You may like to know that the solution to the transcendental equation (5 x ) e x = 5 is x = 4.965. (b) What is the strongest wavelength emitted by the Sun, which has a surface temperature of 5750 K (see P 13.1)? (c) Is max the same as c /max , where max corresponds to the peak of Planck ()? Why would we be interested mainly in max ?
Bibliography
[1] M. Born and E. Wolf, Principles of Optics, seventh ed. (Cambridge University Press, 1999). [2] J. D. Jackson, Classical Electrodynamics, 3rd ed. (Wiley, 1999). [3] G. R. Fowles, Introduction to Modern Optics, 2nd ed. (Dover, 1975). [4] J. W. Goodman, Introduction to Fourier Optics (McGraw-Hill, 1968). [5] R. D. Guenther, Modern Optics (Wiley, 1990). [6] P. W. Milonni and J. H. Eberly, Lasers (Wiley, 1988). [7] P. W. Milonni, The Quantum Vacuum: an Introduction to Quantum Electrodynamics (Academic Press, 1994). [8] J. R. Reitz, F. J. Milford, and R. W. Christy, Foundations of Electromagnetic Theory, fourth ed. (Addison-Wesley, 1992). [9] A. Yariv and P . Yeh, Optical Waves in Crystals (Wiley, 1984).
345
Index
A and B coefcients, 338 ABCD law for gaussian beams, 299 ABCD matrices, 238 reection from a curved surface, 241 transmission through a curved surface, 242 aberrations, 251 Airy pattern, 292 Amperes law, 25 derivation of, 25 array theorem, 293 astigmatism, 254 Babinets Principle, 265 beam waist, 282 Biot-Savart law, 23 birefringence, 119 blackbody radiation, 331 boundary conditions for E and B, 71 Brewsters angle, 67 chromatic abberation, 252 circular polarization, 79 coefcient of nesse, 143 coherence length, 204 coma, 254 complex numbers, 4 conductor model, 46 convolution theorem, 19 Coulombs law, 22 curvature of the eld (abberation), 254 degree of coherence , 202 depth of focus, 285 diffraction (scalar), 263 Fraunhofer approximation, 267 Fresnel approximation, 266 of a gaussian prole, 282 through a lens, 285 with cylindrical symmetry, 269 diffraction grating, 295 dispersion relation in crystals, 110 vacuum, 39 displacement current, 27 distortion, 255 double boundary, 138 at sub-critical angles, 142 beyond critical angle, 145 eikonal equation, 231, 233 Einstein, Albert, 338 ellipsometry, 77, 93 elliptically polarized light, 79, 81 ellipticity, 82 energy density electric elds, 51 magnetic elds, 52 equipartition principle (failur of), 333 Eulers formula, 4 evanescent waves, 69 exchange energy, 189 Fabry-Perot distinguishing wavelengths with, 152 etalon, 147 setup of, 150 Faradays law, 25 Faraday, Michael, 25 Fermats principle, 233 Fermat, Pierre, 235 nesse coefcient of, 143 reecting, 156 focal length of a mirror, 244 347
348
INDEX
Fourier spectroscopy, 206 fourier spectroscopy, 206 fourier theory, 7 Fraunhofer approximation, 267 Fraunhofer, Joseph, 269 free spectral range (Fabry-Perot), 154 Fresnel coefcients, 63 Fresnels equation, 111 Fresnel, Augustin, 64 Fresnel-Kirchhoff formula, 271 fringe visibility, 204, 205 Gabor, Dennis, 324 Galileo, 246 Gausss law, 22 derivation of, 23 gaussian beams, 284 Greens theorem, 275 group delay, 180 generalized context, 185 group velocity, 172, 173 helicity, 82 Helmholtz equation scalar, 263 vector, 263 holograms generating, 324 reconstruction, 325 Huygens elliptical construction, 123 Huygens principle, 261 Huygens, Christian, 124 image formation, 244 complex optical system, 246 index of refraction, 42 instantaneous power spectrum, 192 integral table, 14 intensity of a wave packet, 171 interferograms, 319 irradiance of a plane wave, 49 Jones matrices, 82 polarizer at arbitrary angle, 85
wave plates, 88 Jones vector, 80 Jones vectors, 79 Jones, R. Clark, 81 Kirchhoff, Gustav, 332 laser cavity stability of, 248 linear medium, 41 Lorentz force, 24 Lorentz model, 41, 43 Lorentz, Hendrik, 45 Maxwell, James, 27 Michelson interferometer, 200 Michelson, Albert, 201 multilayer coatings, 157 repeated stacks, 161 Newton, Isaac, 175 obliquity factor, 263, 266, 275 optic axes of a crystal, 115 optical activity, 100 optical path length, 234 oscillator strength, 45 p-polarized ligth, 60 paraxial approximation, 238 partially polarized light, 93 pellicle, 147 phase velocity, 172, 173 photometry, 53 Planck formula, 335 Planck, Max, 336 plane wave, 40 plane waves, 38 plasma frequency, 45 polarization (of a medium), 28 polarization (of light), 78 Polaroid, 82 Poyntings theorem, 47 principal axes, 107 quadratic dispersion, 182
INDEX
349
radiometry, 53 ray tracing, 251 Rayleigh criterion, 293 Rayleigh range, 285 reectance, 65 reection at a single boundary, 60 effect of polarization on, 90 from metallic surface, 70 total internal, 68 reection, law of, 62 resolving power (Fabry-Perot), 156 Roemer, Ole, 36 rotation of coordinates, 121 s-polarized light, 61 Snells law, 62 Snell, Willebrord, 62 spatial coherence, 208 with a continuous source, 213 spatial lter, 307 spectrometers, 297 spectrum, 175 spherical abberation, 254 spherical wave, 261, 326 Stefen-Boltzmann law derivation of, 339 Strutt, John, 181 Sylvesters theorem, 11 telescope resolving power, 290 temporal coherence, 202 testing optical components, 321 transmittance, 65 uniaxial crystals, 115 Poynting vector in, 119 van Cittert-Zernike theorem, 215 vector calculus, 1 wave equation, 30 Youngs two slit, 208 Young, Thomas, 209
Physical Constants
Constant Permittivity Permeability Speed of light in vacuum Charge of an electron Mass of an electron Boltzmanns constant Plancks constant Stefan-Boltzmann constant
Symbol
0
Value 8.854 1012 C2 /N m2 4 107 T m/A (same as kg m C2 ) 2.9979 108 m/s 1.602 1019 C 9.108 1031 kg 1.380 1023 J/K 6.626 1034 J s 1.054 1034 J s 5.670 108 W/m2 K4
0 c qe me kB h

Physics of Light & Optics

Uploaded by

Copyright:

Available Formats

Physics of Light & Optics

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Physics of Light & Optics

Uploaded by

Copyright:

Available Formats

What topics does the book cover?

What topics does the book cover?

What mathematical tools are assumed as prerequisites?

What mathematical tools are assumed as prerequisites?

Physics of Light and Optics

Justin Peatross Michael Ware Brigham Young University

August 17, 2009

2004-2009 Peatross and Ware

Index Physical Constants

2004-2009 Peatross and Ware

0.2 Vector Calculus

Chapter 0 Mathematical Tools

0.2 Vector Calculus

Chapter 0 Mathematical Tools

0.3 Complex Numbers

2004-2009 Peatross and Ware

0.3 Complex Numbers

Chapter 0 Mathematical Tools

0.4 Fourier Theory

0.4 Fourier Theory

Chapter 0 Mathematical Tools

a n i b n i n t a n + i b n i n t = a0 + e + e 2 2 n =1 n =1 Thus, we can rewrite (0.31) as

where c n <0 a n i b n 2 an + i bn c n >0 2 c0 a0

0.4 Fourier Theory

m is an integer, and integrate over the function period 2/:

Chapter 0 Mathematical Tools

really a summation of rectangles under a curve with nely spaced steps:

0.5 Linear Algebra and Sylvesters Theorem

0.5 Linear Algebra and Sylvesters Theorem

Chapter 0 Mathematical Tools

This can be proven by direct substitution: A C B D A C B D

A sin N sin (N 1) C sin N

B sin N D sin N sin (N 1)

A sin N sin (N 1) C sin N

B sin N D sin N sin (N 1)

and rearrange the result to give

0.5 Linear Algebra and Sylvesters Theorem

( A + D ) while twice invoking B sin (N + 1) D sin (N + 1) sin N

A sin (N + 1) sin N C sin (N + 1)

which completes the proof.

2004-2009 Peatross and Ware

Chapter 0 Mathematical Tools

Appendix 0.A Integral and Sum Table

sin2 (ax ) (ax )

(0.57) 1 cos(ax ) cos(bx ) d x = ab 2

(0.59) (r < 1) (0.60)

2004-2009 Peatross and Ware

where r operates only on r, treating r as a constant vector. P0.6

Verify ( f) = 0 for any vector function f. Verify ( f) = ( f) 2 f

Solution: From (0.6), we have f = f y fx fz f y fz fx + x y z y z x z x y

2004-2009 Peatross and Ware

Chapter 0 Mathematical Tools

After rearranging, we get ( f ) = 2 f x x 2 + 2 f y x y + 2 f y 2 f y 2 f z 2 f x 2 f z 2 f x 2 f z + + + x + y + + z 2 x z x y y z x z y z y z 2

where we have added and subtracted +y +z x y z

= ( f) 2 f where on the nal line we invoked (0.4), (0.5), and (0.8).

P0.9 P0.10 P0.11 P0.12 P0.13

2004-2009 Peatross and Ware

Solution: We have by the divergence theorem rr

= 43 r r . The delta function is dened in (0.43)

Chapter 0 Mathematical Tools