Physics of Light and Optics
Physics of Light and Optics
Physics of Light and Optics
Preface
This curriculum was originally developed for a senior-level optics course in the Department of Physics and Astronomy at Brigham Young University. Topics are addressed from a physics perspective and include the propagation of light in matter, reection and transmission at boundaries, polarization effects, dispersion, coherence, ray optics and imaging, diffraction, and the quantum nature of light. Students using this book should be familiar with differentiation, integration, and standard trigonometric and algebraic manipulation. A brief review of complex numbers, vector calculus, and Fourier transforms is provided in Chapter 0, but it is helpful if students already have some experience with these concepts. While the authors retain the copyright, we have made this book available free of charge at optics.byu.edu. This is our contribution toward a future world with free textbooks! The web site also provides a link to purchase bound copies of the book for the cost of printing. A collection of electronic material related to the text is available at the same site, including videos of students performing the lab assignments found in the book. The development of optics has a rich history. We have included historical sketches for a selection of the pioneers in the eld to help students appreciate some of this historical context. These sketches are not intended to be authoritative, the information for most individuals having been gleaned primarily from Wikipedia. The authors may be contacted at [email protected]. We enjoy hearing reports from those using the book and welcome constructive feedback. We occasionally revise the text. The title page indicates the date of the last revision. We would like to thank all those who have helped improve this material. We especially thank John Colton, Bret Hess, and Harold Stokes for their careful review and extensive suggestions. This curriculum benetted from a CCLI grant from the National Science Foundation Division of Undergraduate Education (DUE9952773).
iii
Contents
Preface Table of Contents 0 Mathematical Tools 0.1 Vector Calculus . . . . . . . . . . . . . . 0.2 Complex Numbers . . . . . . . . . . . . 0.3 Linear Algebra . . . . . . . . . . . . . . 0.4 Fourier Theory . . . . . . . . . . . . . . Appendix 0.A Table of Integrals and Sums Exercises . . . . . . . . . . . . . . . . . . . . . iii v 1 1 6 11 13 20 21 27 28 29 31 32 33 36 37 41 45 45 48 51 54 55 58 60 63 66 68 69 73 v
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
Electromagnetic Phenomena 1.1 Gauss Law . . . . . . . . . . . . . . . . . 1.2 Gauss Law for Magnetic Fields . . . . . 1.3 Faradays Law . . . . . . . . . . . . . . . . 1.4 Amperes Law . . . . . . . . . . . . . . . . 1.5 Maxwells Adjustment to Amperes Law . 1.6 Polarization of Materials . . . . . . . . . 1.7 The Wave Equation . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Plane Waves and Refractive Index 2.1 Plane Wave Solutions to the Wave Equation . . 2.2 Index of Refraction . . . . . . . . . . . . . . . . 2.3 The Lorentz Model of Dielectrics . . . . . . . . 2.4 Index of Refraction of a Conductor . . . . . . . 2.5 Poyntings Theorem . . . . . . . . . . . . . . . . 2.6 Irradiance of a Plane Wave . . . . . . . . . . . . Appendix 2.A Radiometry, Photometry, and Color Appendix 2.B Clausius-Mossotti Relation . . . . . Appendix 2.C Energy Density of Electric Fields . . Appendix 2.D Energy Density of Magnetic Fields . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . Reection and Refraction
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
vi
CONTENTS
3.1 Refraction at an Interface . . . . . . . . . . . . . . . . . . . . 3.2 The Fresnel Coefcients . . . . . . . . . . . . . . . . . . . . 3.3 Reectance and Transmittance . . . . . . . . . . . . . . . . 3.4 Brewsters Angle . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Total Internal Reection . . . . . . . . . . . . . . . . . . . . 3.6 Reections from Metal . . . . . . . . . . . . . . . . . . . . . Appendix 3.A Boundary Conditions For Fields at an Interface Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
73 77 78 80 81 83 84 86 89 90 93 96 98 100 101 105 109 112 117 117 119 123 124 125 128 129 131 134 136 139 145 146 147 148 149 152 153 156 157 159 166
Multiple Parallel Interfaces 4.1 Double-Interface Problem Solved Using Fresnel Coefcients . . . 4.2 Two-Interface Transmittance at Sub Critical Angles . . . . . . . . 4.3 Beyond Critical Angle: Tunneling of Evanescent Waves . . . . . . 4.4 Fabry-Perot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.5 Setup of a Fabry-Perot Instrument . . . . . . . . . . . . . . . . . . 4.6 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument 4.7 Multilayer Coatings . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.8 Repeated Multilayer Stacks . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Propagation in Anisotropic Media 5.1 Constitutive Relation in Crystals . . . . . . . . . . . . . . . . . 5.2 Plane Wave Propagation in Crystals . . . . . . . . . . . . . . . . 5.3 Biaxial and Uniaxial Crystals . . . . . . . . . . . . . . . . . . . . 5.4 Refraction at a Uniaxial Crystal Surface . . . . . . . . . . . . . 5.5 Poynting Vector in a Uniaxial Crystal . . . . . . . . . . . . . . . Appendix 5.A Symmetry of Susceptibility Tensor . . . . . . . . . . Appendix 5.B Rotation of Coordinates . . . . . . . . . . . . . . . . Appendix 5.C Electric Field in Crystals . . . . . . . . . . . . . . . . Appendix 5.D Huygens Elliptical Construct for a Uniaxial Crystal Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
Review, Chapters 15 6 Polarization of Light 6.1 Linear, Circular, and Elliptical Polarization . . . . . 6.2 Jones Vectors for Representing Polarization . . . . . 6.3 Elliptically Polarized Light . . . . . . . . . . . . . . . 6.4 Linear Polarizers and Jones Matrices . . . . . . . . . 6.5 Jones Matrix for Polarizers at Arbitrary Angles . . . 6.6 Jones Matrices for Wave Plates . . . . . . . . . . . . . 6.7 Polarization Effects of Reection and Transmission Appendix 6.A Ellipsometry . . . . . . . . . . . . . . . . . Appendix 6.B Partially Polarized Light . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
CONTENTS
vii
Superposition of Quasi-Parallel Plane Waves 7.1 Intensity of Superimposed Plane Waves . . . . . . . . . . . . . 7.2 Group vs. Phase Velocity: Sum of Two Plane Waves . . . . . . 7.3 Frequency Spectrum of Light . . . . . . . . . . . . . . . . . . . 7.4 Packet Propagation and Group Delay . . . . . . . . . . . . . . . 7.5 Quadratic Dispersion . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Generalized Context for Group Delay . . . . . . . . . . . . . . . Appendix 7.A Pulse Chirping in a Grating Pair . . . . . . . . . . . . Appendix 7.B Causality and Exchange of Energy with the Medium Appendix 7.C Kramers-Kronig Relations . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Coherence Theory 8.1 Michelson Interferometer . . . . . . . . . . . . . . . . 8.2 Coherence Time and Fringe Visibility . . . . . . . . . . 8.3 Temporal Coherence of Continuous Sources . . . . . 8.4 Fourier Spectroscopy . . . . . . . . . . . . . . . . . . . 8.5 Youngs Two-Slit Setup and Spatial Coherence . . . . Appendix 8.A Spatial Coherence for a Continuous Source Appendix 8.B Van Cittert-Zernike Theorem . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
171 172 174 176 181 183 185 189 191 196 200 203 203 208 209 210 211 216 217 219 223 227 228 231 235 237 239 241 244 245 248 251 257 258 260 262 264 265 267 270 272
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
Review, Chapters 68 9 Light as Rays 9.1 The Eikonal Equation . . . . . . . . . . . . . . . . 9.2 Fermats Principle . . . . . . . . . . . . . . . . . . 9.3 Paraxial Rays and ABCD Matrices . . . . . . . . . 9.4 Reection and Refraction at Curved Surfaces . . 9.5 ABCD Matrices for Combined Optical Elements 9.6 Image Formation . . . . . . . . . . . . . . . . . . 9.7 Principal Planes for Complex Optical Systems . 9.8 Stability of Laser Cavities . . . . . . . . . . . . . . Appendix 9.A Aberrations and Ray Tracing . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
10 Diffraction 10.1 Huygens Principle as Formulated by Fresnel . . 10.2 Scalar Diffraction Theory . . . . . . . . . . . . . . 10.3 Fresnel Approximation . . . . . . . . . . . . . . . 10.4 Fraunhofer Approximation . . . . . . . . . . . . . 10.5 Diffraction with Cylindrical Symmetry . . . . . . Appendix 10.A Fresnel-Kirchhoff Diffraction Formula Appendix 10.B Greens Theorem . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
viii
CONTENTS
11 Diffraction Applications 11.1 Fraunhofer Diffraction Through a Lens . 11.2 Resolution of a Telescope . . . . . . . . . . 11.3 The Array Theorem . . . . . . . . . . . . . 11.4 Diffraction Grating . . . . . . . . . . . . . 11.5 Spectrometers . . . . . . . . . . . . . . . . 11.6 Diffraction of a Gaussian Field Prole . . 11.7 Gaussian Laser Beams . . . . . . . . . . . Appendix 11.A ABCD Law for Gaussian Beams Exercises . . . . . . . . . . . . . . . . . . . . . . . 12 Interferograms and Holography 12.1 Interferograms . . . . . . . . . . . . . . . 12.2 Testing Optical Components . . . . . . . 12.3 Generating Holograms . . . . . . . . . . 12.4 Holographic Wavefront Reconstruction Exercises . . . . . . . . . . . . . . . . . . . . . . Review, Chapters 912
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
. . . . . . . . .
275 275 279 282 284 285 287 289 291 295 301 301 302 303 304 307 309
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
13 Blackbody Radiation 315 13.1 Stefan-Boltzmann Law . . . . . . . . . . . . . . . . . . . . . . . . . 316 13.2 Failure of the Equipartition Principle . . . . . . . . . . . . . . . . . 317 13.3 Plancks Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 13.4 Einsteins A and B Coefcients . . . . . . . . . . . . . . . . . . . . . 322 Appendix 13.A Thermodynamic Derivation of the Stefan-Boltzmann Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324 Appendix 13.B Boltzmann Factor . . . . . . . . . . . . . . . . . . . . . . 326 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328 Index Physical Constants 331 337
Chapter 0
Mathematical Tools
Our study of optics begins with Maxwells equations in Chapter 1. Before we start, look over this chapter to make sure you are comfortable with the mathematical tools well be using. The vector calculus material in section 0.1 will be used beginning in Chapter 1, so you should review it now. In Section 0.2 we review complex numbers. You have probably had some exposure to complex numbers, but if you are like many students, you havent yet fully appreciated their usefulness. Please be warned that your life will be much easier if you understand the material in section 0.2 by heart. Complex notation is pervasive throughout the book, beginning in chapter 2. You may safely procrastinate reviewing Sections 0.3 and 0.4 until they come up in the book. The linear algebra refresher in Section 0.3 is useful for Chapter 4, where we analyze multilayer coatings, and again in Chapter 6, where we discuss polarization. Section 0.4 provides an introduction to Fourier theory. Fourier transforms are used extensively in optics, and you should study Section 0.4 carefully before tackling Chapter 7.
couraged Descartes to become a lawyer. Descartes graduated with a degree in law from the University of Poitiers in 1616. In 1619, he had a series of dreams that led him to believe that he should instead pursue science. Descartes became one of the greatest mathematicians, physicists, and philosophers of all time. He is credited with inventing the cartesian coordinate system, which is named after him. For the rst time, geometric shapes could be expressed as algebraic equations. (Wikipedia)
+ ( z z 0 )2
(0.1)
Example 0.1
Compute the electric eld at r = 2 x + 2 y + 2 z due to a positive point charge q positioned at r0 = 1 x + 1 y + 2 z . Solution: As mentioned above, the eld is given by E (r) = q (r r0 ) 4 0 |r r0 |3 . We have r r0 = (2 1) x + (2 1) y + (2 2) z = 1 x + 1 y and | r r0 | = The electric eld is then E= q 1 x + 1 y 4
0
(1)2 + (1)2 =
In addition to position, the electric and magnetic elds almost always depend on time in optics problems. For example, a common time-dependent eld is E(r, t ) = E0 cos(krt ). The dot product kr is an example of vector multiplication, and signies the following operation: + ky y + kz z xx + yy + zz k r = kx x = kx x + k y y + kz z = |k||r| cos where is the angle between the vectors k and r. Proof of the nal line of (0.2)
Consider the plane that contains the two vectors k and r. Call it the x y -plane. In +k sin y and this coordinate system, the two vectors can be written as k = k cos x + r sin y , where and are the respective angles that the two vectors r = r cos x make with the x -axis. The dot product gives k r = kr (cos cos + sin sin ). This simplies to k r = kr cos (see (0.13)), where is the angle between the vectors. Thus, the dot product between two vectors is the product of the magnitudes of each vector times the cosine of the angle between them.
(0.2)
Another type of vector multiplication is the cross product , which is accomplished in the following manner:1 x Ex Bx y Ey By z Ez Bz
EB =
(0.3)
(E x B z E z B x ) y + Ex B y E y Bx z = E y Bz Ez B y x
1 The use of the determinant to generate the cross product is merely a fortuitous device for
Note that the cross product results in a vector, whereas the dot product mentioned above results in a scalar (i.e. a number with appropriate units). The resultant vector is always perpendicular to the two vectors that are cross multiplied. If the ngers on your right hand curl from the rst vector towards the second, your thumb will point in the direction of the result. The magnitude of the result equals the product of the magnitudes of the constituent vectors times the sine of the angle between them.
We will use several multidimensional derivatives in our study of optics, namely the gradient, the divergence, and the curl.2 In Cartesian coordinates, the gradient of a scalar function is given by f x , y, z = f f f + + x y z x y z (0.4)
E =
(0.6)
E z y z
Example 0.2
Derive the gradient (0.4) in cylindrical coordinates dened by the transformations x = cos and y = sin . (The coordinate z remains unchanged.)
2 See M. R. Spiegel, Schaums Outline of Advanced Mathematics for Engineers and Scientists, pp.
Solution: By inspection of Fig. 0.2, the cartesian unit vectors may be expressed as and = cos sin x = sin + cos y
In accordance with the rules of calculus, the needed partial derivatives expressed in terms of the new variables are = + x x x and = + y y y
from which we obtain the following derivatives: and Figure 0.2 The unit vectors x may be expressed in terms of y and in components along cylindrical coordinates. = x = y x x2 + y 2 y x2 + y 2 = cos = sin y sin = 2 = x x + y2 x cos = = 2 2 y x +y
Putting this all together, we arrive at f = f f f + + x y z x y z f sin f sin cos = cos f cos f f + + cos + sin + sin z z 1 f f f + + z = z
We will sometimes need a multidimensional second derivative called the Laplacian. When applied to a scalar function, it is dened as the divergence of a gradient: 2 f x , y , z f x , y , z (0.7) In cartesian coordinates, this reduces to
2 f x , y , z =
2 f 2 f 2 f + + x 2 y 2 z 2
(0.8)
Since the Laplacian applied to a scalar gives a result that is also a scalar, in Cartesian coordinates we deal with vector functions by applying the Laplacian to the scalar function attached to each unit vector: 2 E = 2 E y 2 E y 2 E y 2 E x 2 E x 2 E x + + x + + + y x 2 y 2 z 2 x 2 y 2 z 2 2 E z 2 E z 2 E z + + z + x 2 y 2 z 2
(0.9)
This is possible because each unit vector is a constant in Cartesian coordinates. The various multidimensional derivatives take on more complicated forms in non-cartesian coordinates such as cylindrical or spherical. You can derive the Laplacian for these other coordinate systems by changing variables and rewriting the unit vectors starting from the above Cartesian expression. (See Problem 0.10.) Regardless of the coordinate system, the Laplacian for a vector function can be obtained from rst derivatives though 2 E ( E) ( E) (0.10)
E y E x E z E x E z E y + x y z y z x z x y
z /z
E y E z z y
E y E yx x
E y E x E z E x + y x y z x z +
x z
E y E x E z E y x x y z y z
E z E y E z E x x x z y y z
2 2 2 E y 2 E z 2 E x E y 2 E z 2 E x E y 2 E z + + + + x + + y + + z x y x z x y y z x z y z x 2 y 2 z 2
2 E x x 2
2 E x y 2
2 E x z 2
2 E y x 2
2 E y y 2
2 E y z 2
2 E z x 2
2 E z y 2
2 E z z 2
= ( E) 2 E
where on the nal line we invoked (0.4), (0.5), and (0.8). We will also encounter several integral theorems3 involving vector functions. The divergence theorem for a vector function F is da = Fn
S V
F dv
(0.11)
3 For succinct treatments of the divergence theorem and Stokes theorem, see M. R. Spiegel,
Schaums Outline of Advanced Mathematics for Engineers and Scientists, p. 154 (New York: McGrawHill 1971).
The integration on the left-hand side is over the closed surface S , which contains the volume V associated with the integration on the right-hand side. The unit points outward, normal to the surface. The divergence theorem is espevector n cially useful in connection with Gauss law, where the left-hand side is interpreted as the number of eld lines exiting a closed surface. Example 0.3
+x y y + Check the divergence theorem (0.11) for the vector function F x , y , z = y 2 x . Take as the volume a cube contained by the six planes |x | = 1, y = 1, and x2zz |z | = 1. Solution: First, we evaluate the left side of (0.11) for the function:
1 1 1 1 1 1
da = Fn Figure 0.3 The function F (red arrows) plotted for several points on the surface S .
S 1 1
d xd y x 2 z
1 1
z =1 1 1
d xd y x 2 z
1 1
z =1 + 1 1 1 1
d xd z x y
y =1
1 1 1 1
d xd z x y
y =1 + 1 1 1 1
d yd z y 2 x3 3
1
x =1 1 1 1
d yd z y 2 x2 2
x =1
=2
1 1
d xd y x + 2
1 1
d xd zx = 4
+4
1
=
1
8 3
Fd v =
V 1 1 1
d xd yd z x + x 2 = 4
1
d x x + x2 = 4
x2 x3 + 2 3
=
1
8 3
Fd
(0.12)
The integration on the left-hand side is over an open surface S (not enclosing a volume). The integration on the right-hand side is around the edge of the surface. is a unit vector that always points normal to the surface . The vector d Again, n points along the curve C that bounds the surface S . If the ngers of your right hand point in the direction of integration around C , then your thumb points . Stokes theorem is especially useful in connection with in the direction of n Amperes law and Faradays law. The right-hand side is an integration of a eld around a loop.
sine function is intrinsically present in this formula via the identity cos + = cos cos sin sin (0.13)
This is a good formula to commit to memory, as well as the frequently used identity sin + = sin cos + sin cos (0.14) With a basic familiarity with trigonometry, we can approach many optical problems including those involving the addition of multiple waves. However, the manipulation of trigonometric functions via identities such as (0.13) and (0.14) can be cumbersome and tedious. Fortunately, complex-number notation offers an equivalent approach with far less busy work. The modest investment needed to become comfortable with complex notation is denitely worth it; optics problems can become cumbersome enough even with the most efcient methods! The convenience of complex-number notation has its origins in Eulers formula: e i = cos + i sin (0.15) where i 1 is an imaginary number. By inverting Eulers formula (0.15) we can obtain the following representation of the cosine and sine functions: cos = +e , 2 e i e i sin = 2i e
i i
(0.16)
Equation (0.16) shows how ordinary sines and cosines are intimately related to hyperbolic cosines and hyperbolic sines. If happens to be imaginary such that = i where is real, then we have e e = i sinh 2i e + e cos i = = cosh 2 sin i = Proof of Eulers formula
We can prove Eulers formula using a Taylors series expansion: 1 df f (x ) = f (x 0 ) + (x x 0 ) 1! dx 1 d2 f + ( x x 0 )2 2! d x2 x =x 0 +
x =x 0
(0.17)
ei + 1 = 0
has
been voted by modern fans of mathematics (including Richard Feynman) as the Most Beautiful Mathematical Formula Ever for its single uses of addition, multiplication, exponentiation, equality, and the constants 0,
(0.18)
1,
e, i
and
By expanding each function appearing in (0.15) in a Taylors series about the origin we obtain 2 4 cos = 1 + 2! 4! 3 5 (0.19) i sin = i i +i 3! 5! 2 3 4 5 ei = 1 + i i + +i 2! 3! 4! 5!
rina Gsell, were the parents of 13 children, many of whom died in childhood. (Wikipedia)
The last line of (0.19) is seen to be the sum of the rst two lines, from which Eulers formula directly follows.
Example 0.4
Prove (0.13) and (0.14) as well as cos2 + sin2 = 1 by taking advantage of (0.16). Solution: We start with (0.13). By direct application of (0.16) and some rearranging, we have cos cos sin sin = e i + e i e i + e i e i e i e i e i 2 2 2i 2i e i (+) + e i () + e i () + e i (+) = 4 e i (+) e i () e i () + e i (+) + 4 i (+) i (+) e +e = = cos + 2
We can prove (0.14) using the same technique: sin cos + sin cos = e i e i e i + e i e i e i e i + e i + 2i 2 2i 2 i (+) i () i () i (+) e +e e e = 4i e i (+) e i () + e i () e i (+) + 4i e i (+) e i (+) = = sin + 2i
e i e i 2i
e 2i + 2 + e 2i e 2i 2 + e 2i =1 4 4
As was mentioned previously, we will often be interested in waves of the form A cos (x + ). We can use complex notation to represent this wave simply by writing i A cos + = Re Ae (0.20) Ae i . where the phase is conveniently contained within the complex factor A The operation Re { } means to retain only the real part of the argument without regard for the imaginary part. As an example, we have Re {1 + 2i } = 1. The formula (0.20) follows directly from Eulers equation (0.15).
It is common (even conventional) to omit the explicit writing of Re { }. Thus, i actually means A cos + . This physicists participate in a conspiracy that Ae laziness is permissible because it is possible to perform linear operations on Re f such as addition, differentiation, or integration while procrastinating the taking of the real part until the end: Re f + Re g = Re f + g d df Re f = Re dx dx Re f d x = Re f dx (0.21)
As an example, note that Re {1 + 2i } + Re {3 + 4i } = Re {(1 + 2i ) + (3 + 4i )} = 4. However, we must be careful when performing other operations such as multiplication. In this case, it is essential to take the real parts before performing the operation. Notice that Re f Re g = Re f g (0.22)
As an example, we see Re {1 + 2i } Re {3 + 4i } = 3, but Re {(1 + 2i ) (3 + 4i )} = 5. When dealing with complex numbers it is often advantageous to transform between a Cartesian representation and a polar representation. With the aid of Eulers formula, it is possible to transform any complex number a + i b into the form e i , where a , b , , and are real. From (0.15), the required connection between , and (a , b ) is e
i
= cos + i sin = a + i b
(0.23)
The real and imaginary parts of this equation must separately be equal. Thus, we have a = cos (0.24) b = sin These equations can be inverted to yield = a2 + b2 b a (0.25) (a > 0)
Quadrant I II
= tan1
When a < 0, we must adjust by since the arctangent has a range only from /2 to /2. The transformations in (0.24) and (0.25) have a clear geometrical interpretation in the complex plane, and this makes it easier to remember them. They are just the usual connections between Cartesian and polar coordinates. As seen in Fig. 0.4, is the hypotenuse of a right triangle having legs with lengths a and b , and is the angle that the hypotenuse makes with the x -axis. Again, you should be careful when a is negative since the arctangent is dened in quadrants I and
III
IV
Figure 0.4 A number in the complex plane can be represented either by Cartesian or polar representation.
10
IV. An easy way to deal with the situation of a negative a is to factor the minus sign out before proceeding (i.e. a + i b = (a i b ) ). Then the transformation is made on a i b where a is positive. The overall minus sign out in front is just carried along unaffected and can be factored back in at the end. Notice that e i is the same as e i () . Example 0.5
Write 3 + 4i in polar format. Solution: We must be careful with the negative real part since it indicates a quadrant (in this case II) outside of the domain of the inverse tangent (quadrants I and IV). Best to factor the negative out and deal with it separately. 3 + 4i = (3 4i ) = 32 + (4)2 e i tan
1 (4) 3
= e i 5e i tan
1 4 3
= 5e i (tan
1 4 ) 3
Finally, we consider the concept of a complex conjugate. The conjugate of a complex number z = a + i b is denoted with an asterisk and amounts to changing the sign on the imaginary part of the number: z = ( a + i b ) a i b (0.26)
The complex conjugate is useful when computing the absolute value of a complex number: |z | = z z = (a i b ) (a + i b ) = a 2 + b 2 = (0.27) Note that the absolute value of a complex number is the same as its magnitude as dened in (0.25). The complex conjugate is also useful for eliminating complex numbers from the denominator of expressions: a + i b (a + i b ) (c i d ) ac + bd + i (bc ad ) = = c + i d (c + i d ) (c i d ) c2 + d2 (0.28)
No matter how complicated an expression, the complex conjugate is calculated by inserting a minus sign in front of all occurrences of i in the expression, and placing an asterisk on all complex variables in the expression. For example, the complex conjugate of e i is e i assuming and are real, as can be seen from Eulers formula (0.15). As another example consider E 0 exp {i (K z t )}
= E0 exp i K z t
(0.29)
assuming z , , and t are real, but E 0 and K are complex. A common way of obtaining the real part of an expression is by adding the complex conjugate and dividing the result by 2: Re {z } = 1 z + z 2 (0.30)
11
Notice that the expression for cos in (0.16) is an example of this formula. Sometimes when a lengthy expression is added to its own complex conjugate, we let C.C. represent the complex conjugate in order to avoid writing the expression twice. In optics we sometimes encounter a complex angle, , such as K z in (0.29). The imaginary part of K governs exponential decay (or growth) when a light wave propagates in an absorptive (or amplifying) medium. Similarly, when we compute the transmission angle for light incident upon a surface beyond the critical angle for total internal reection, we encounter the arcsine of a number greater than one in an effort to satisfy Snells law. Even though such an angle does not exist in the physical sense, a complex value for the angle can be found, which satises (0.16) and describes evanescent waves.
where x and y are variables. A set of linear equations such as (0.31) can be expressed using matrix notation as A C B D x y = Ax + B y Cx +Dy = F G (0.32)
As seen above, the 2 2 matrix multiplied onto the two-dimensional column vector results in a two-dimensional vector. The elements of rows are multiplied onto elements of the column and summed to create each new element in the result. A matrix can also be multiplied onto another matrix (rows multiplying columns, resulting in a matrix). The order of multiplication is important; matrix multiplication is not commutative. To solve a matrix equation such as (0.32), we multiply both sides by an inverse matrix, which gives A C B D
1
A C
B D
x y
A C
B D
F G
(0.33)
A C
B D
1 0 0 1
(0.34)
where the right-hand side is called the identity matrix. You can easily check that the identity matrix leaves unchanged anything that it multiplies, and so (0.33)
12
simplies to x y = A C B D
1
F G
Once the inverse matrix is found, the matrix multiplication on the right can be performed and the answers for x and y obtained as the upper and lower elements of the result. The inverse of a 2 2 matrix is given by A C B D
1
1 A C B D
D C
B A
(0.35)
where A C B D AD C B
is called the determinant. We can check that (0.35) is correct by direct substitution: A C B D
1
A C
B D
= = =
1 AD BC 1 AD BC 1 0 0 1
D C
B A
A C
B D (0.36)
AD BC 0
0 AD BC
The above review of linear algebra is very basic. In contrast, we next discuss Sylvesters theorem, which you probably have not previously encountered. Sylvesters theorem is useful when multiplying the same 2 2 matrix (with a determinate of unity) together many times (i.e. raising the matrix to a power). This situation occurs when modeling periodic multilayer mirror coatings or when considering light rays trapped in a laser cavity as they reect many times. Sylvesters Theorem:4 If the determinant of a 2 2 matrix is one, (i.e. AD BC = 1) then A C where cos = B D
N
1 sin
(0.37)
1 (A + D) 2
(0.38)
4 The theorem presented here is a specic case. See A. A. Tovar and L. W. Casperson, Generalized Sylvester theorems for periodic applications in matrix optics, J. Opt. Soc. Am. A 12, 578-590 (1995).
13
A C =
B D 1 sin
1 sin
A C
B D
Now we inject the condition AD BC = 1 into the diagonal elements and obtain
1 sin A 2 + AD 1 sin N A sin (N 1) C [( A + D ) sin N sin (N 1) ] B [( A + D ) sin N sin (N 1) ] D 2 + AD 1 sin N D sin (N 1)
and then
1 sin A [( A + D ) sin N sin (N 1) ] sin N C [( A + D ) sin N sin (N 1) ] B [( A + D ) sin N sin (N 1) ] D [( A + D ) sin N sin (N 1) ] sin N
In each matrix element, the expression ( A + D ) sin N = 2 cos sin N = sin (N + 1) + sin (N 1) occurs, which we have rearranged using cos = (0.14). The result is A C B D
N +1 1 2
(0.39)
1 sin
14
We begin with a derivation of the Fourier integral theorem. As asserted by Fourier, a periodic function can be represented in terms of sines and cosines in the following manner:
f (t ) =
n =0
a n cos (n t ) + b n sin (n t )
(0.40)
This is called a Fourier expansion. It is similar in idea to a Taylors series (0.18), which rewrites a function as a polynomial. In both cases, the goal is to represent one function in terms of a linear combination of other functions (requiring a complete basis set). In a Taylors series the basis functions are polynomials and in a Fourier expansion the basis functions are sines and cosines with various frequencies (multiples of a fundamental frequency). By inspection, we see that all terms in (0.40) repeat with a maximum period of 2/. In other words, a Fourier series is good for functions where f (t ) = f (t + 2/). The expansion (0.40) is useful even if f (t ) is complex, requiring a n and b n to be complex. Using (0.16), we can rewrite the sines and cosines in the expansion (0.40) as
f (t ) =
n =0
an
e i n t + e i n t e i n t e i n t + bn 2 2i
= a0 + or more simply as
a n i b n i n t a n + i b n i n t e + e 2 2 n =1 n =1
(0.41)
f (t ) =
n =
c n e i n t
(0.42)
where
c n <0
a n i b n 2 an + i bn c n >0 2 c0 a0
(0.43)
Notice that if c n = c n for all n , then f (t ) is real (i.e. real a n and b n ); otherwise f (t ) is complex. The real parts of the c n coefcients are connected with the cosine terms in (0.40), and the imaginary parts of the c n coefcients are connected with the sine terms in (0.40). Given a known function f (t ), we can compute the various coefcients c n . There is a trick for guring out how to do this. We multiply both sides of (0.42) by
15
f ( t )e
/
dt =
n =
cn
/
e i (m n )t d t e i (m n )t i (m n )
n / / i (m n )
= = =
cn
n = 2 c
(0.44)
i (m n )
n = 2 c n n =
e 2i (m n )
sin [(m n ) ] (m n )
The function sin [(m n ) ] / [(m n ) ] is equal to zero for all n = m , and it is equal to one when n = m (to see this, use LHospitals rule on the zero-over-zero situation, or just go back and re perform the above integral for n = m ). Thus, only one term contributes to the summation in (0.44). We now have cm = 2
/
f ( t )e i m t d t
/
(0.45)
from which the coefcients c n can be computed, given a function f (t ). (Note that m is a dummy index so we can change it back to n if we like.) This completes the circle. If we know the function f (t ), we can nd the coefcients c n via (0.45), and, if we know the coefcients c n , we can generate the function f (t ) via (0.42). If we are feeling a bit silly, we might combine these into a single identity:
f (t ) = n = 2
f (t )e i n t d t e i n t
/
(0.46)
We start with a function f (t ) followed by a lot of computation and obtain the function back again! (This is not quite as foolish as it rst appears, as we will discuss later.) As mentioned above, Fourier expansions represent functions f (t ) that are periodic over the interval 2/. This is disappointing since many optical waveforms do not repeat (e.g. a single short laser pulse). Nevertheless, we can represent a function f (t ) that is not periodic if we let the period 2/ become innitely long. In other words, we can accommodate non-periodic functions if we take the limit as goes to zero so that the spacing of terms in the series becomes very ne. Applying this limit to (0.46) we obtain 1 e i n t f (t ) = lim f t e i n t d t (0.47) 2 0 n =
16
At this point, a brief review of the denition of an integral is helpful to better understand the next step that we shall administer to (0.47).
g () d lim
a
b a
0 n =0
g (a + n ) (0.48) a +b g + n 2 b a
b a 2
= lim
n = 2
The nal expression has been manipulated so that the index ranges through both negative and positive numbers. If we set a = b and take the limit b , then the above expression becomes
g () d = lim
0 n =
g (n )
(0.49)
Now, (0.47) has the same form as (0.49) if g (n ) represents everything in the square brackets of (0.47). The result is the Fourier integral theorem: 1 1 e i t f t e i t d t d (0.50) f (t ) = 2 2
The piece in brackets is called the Fourier transform, and the rest of the operation is called the inverse Fourier transform. The Fourier integral theorem (0.50) is often written with the following (potentially confusing) notation: f () 1 2 1 2
f ( t )e i t d t
(0.51) f () e
i t
f (t )
The transform and inverse transform are also sometimes written as f () F f (t ) and f (t ) F 1 f () . Note that the functions f (t ) and f () are entirely different, even taking on different units (e.g. the latter having extra units of per frequency). The two functions are distinguished by their arguments, which also have different units (e.g. time vs. frequency). Nevertheless, it is customary to use the same letter to denote either function since they form a transform pair.
17
You should be aware that it is arbitrary which of the expressions in (0.51) is called the transform and which is called the inverse transform. In other words, the signs in the exponents of (0.51) may be interchanged (and this convention varies in published works!). Also, the factor 2 may be placed on either the transform or the inverse transform, or divided equally between the two as has been done here. Example 0.6
Compute the Fourier transform of E (t ) = E 0 e t Fourier transform.
2 /2T 2
E0e
t 2 /2T 2 i 0 t
i t
dt =
E0 2
e t
2 /2T 2 +i
(0 )t
dt
The integration can be performed with the help of (0.55), which yields E () = E0 2
( 0) 2 2 4(1/2T 2 ) e = T E 0 e T (0 ) /2 2 1/2T
2
T E0e
T 2 (0 )2 /2
i t
d =
T E0 2
T2 2
2 +(T 2 0 i t ) T2 2 0
T2 2 0
= E 0 e t
2 /2T 2 i t 0
As was previously mentioned, it would seem rather pointless to perform a Fourier transform on the function f (t ) followed by an inverse Fourier transform, just to end up with f (t ) again. Instead, we will typically apply a frequencydependent effect on f () before performing the inverse Fourier transform. In this case, the nal function will be different from f (t ). Keep in mind that f () is the continuous analog of the discrete coefcients c n (or the a n and b n ). The real part of f () indicates the amplitudes of the cosine waves necessary to construct the function f (t ). The imaginary part of f () indicates the amplitudes of the sine waves necessary to construct the function f (t ). Finally, we comment on the Dirac delta function, 6 which is dened indirectly through
f (t ) =
f t t t dt
(0.52)
6 See G. B. Arfken and H. J. Weber, Mathematical Methods for Physicists 6th ed., Sect. 1.15 (San
18
The delta function t t is zero everywhere except at t = t where it is innite in such a way as to make the integral take on the value of the function f (t ). (You can think of t t d t as an innitely tall and innitely thin rectangle centered at t = t with an area unity.) The integral only pays attention to the value of f t at the point t = t . A remarkable attribute of the delta function can be seen from the Fourier integral theorem. After rearranging the order of integration, the Fourier integral theorem (0.50) can be written as 1 f (t ) = f t e i (t t ) d d t (0.53) 2
A comparison of (0.52) and (0.53) shows that you may write the delta function as a uniform superposition of all frequency components: 1 t t = 2
e i (t t ) d
(0.54)
Example 0.7
Use (0.54) to prove Parsevals relation: 7
f () d =
f (t ) d t
f () d =
f () f () d 1 2
1 f ( t ) e i t d t 2
f t e i t d t
e i (t (t )) d d t d t
f ( t ) f t t ( t ) d t d t
f (t ) f (t ) d t =
f (t ) d t
7 For a more general version of the relation, see G. B. Arfken and H. J. Weber, Mathematical
Methods for Physicists 6th ed., Sect. 15.5 (San Diego: Elsevier Academic Press 2005).
19
20
e ax
+bx +c
dx =
b2 +c e 4a a
(Re {a } > 0)
(0.55)
0 2
|b | |ab | e i ax dx = e 2 2 1 + x /b 2
(b > 0)
(0.56)
e i a cos( ) d = 2 J 0 (a )
0 a
(0.57)
J 0 (bx ) x d x =
0
a J 1 (ab ) b
2
(0.58)
e
0
ax 2
(0.59)
(0.60)
dy
3/2 y2 + c
dx
x2 c
sin(ax ) sin(bx ) d x =
0 N n =0 N n =1 n =0 0
rn = rn = rn =
1 r N +1 1r r (1 r N ) 1r 1 1r (r < 1)
Exercises
21
Exercises
Exercises for 0.1 Vector Calculus P0.1 + 2 + 3 Let r = x y 3 z m and r0 = x y + 2 z m. (a) Find the magnitude of r. (b) Find r r0 . (c) Find the angle between r and r0 .
Answer: (a) r = 14 m; (c) 94 .
P0.2
Use the dot product (0.2) to show that the cross product E B is perpendicular to E and to B. Verify the BAC-CAB rule: A (B C) = B (A C) C (A B). Prove the following identity: r rr 1 = , |r r | |r r | 3
P0.3 P0.4
Verify ( f) = 0 for any vector function f. Verify f g = f g g ( f) + g f (f ) g. Verify f g = g ( f) f g . Verify g f = f g + g f and g f = g f + g f. Show that the Laplacian in cylindrical coordinates can be written as 2 = 1 1 2 2 + 2 + 2 z 2
= =
22
and 2 f = = 2 f x 2 + 2 f y 2 + 2 f z 2 f + + 2 y 2 2 2 f 2 + +2 x y 2 f + x + x y y 2 f
2 x 2 +
2 y 2
2 x 2
2 2 f 2 f 2 + + x y 2 z 2
The needed rst derivatives are given in Example 0.2. The needed second derivatives are 2 x 2 2 x 2 2 y 2 2 y 2 1 x2 + y 2 2x y x2 + y 2 1 x2 + y 2 2x y x2 + y 2
2 2
x2
3/2 x2 + y 2
sin2
2 sin cos 2 y2
3/2 x2 + y 2
cos2
2 sin cos 2
Finish the derivation by substituting these derivatives into the above expression.
P0.11
Verify Stokes theorem (0.12) for the function given in Example 0.3. Take the surface to be a square in the x y -plane contained by |x | = 1 and y = 1, as illustrated in Fig. 0.6. Verify the following vector integral theorem for the same volume used + x yz and G = x 2 x : in Example 0.3, but with F = y 2 x x [F ( G) + (G ) F] d v =
V S
P0.12
Figure 0.6
)da F (G n
P0.13
Use the divergence theorem to show that the function in P0.5 is 4 times the three-dimensional delta function 3 r r x x y y z z which has the property that 3 r r d v =
V
1 if V contains r 0 otherwise
rr
da = n
V
rr rr
3
dv
Exercises
23
From P0.5, the argument in the integral on the right-hand side is zero except at r = r . Therefore, if the volume V does not contain the point r = r , then the result of both integrals must be zero. Let us construct a volume between an arbitrary surface S 1 containing r = r and S 2 , the surface of a tiny sphere centered on r = r . Since the point r = r is excluded by the tiny sphere, the result of either integral in the divergence theorem is still zero. However, we have on the tiny sphere rr
S2 2
rr
da = n 3
0 0
1 r2
r 2 sin d d = 4
Therefore, for the outer surface S 1 (containing r = r ) we must have the equal and opposite result: rr d a = 4 n 3 rr
S1
This implies r
V
rr rr
3
dv =
4 if V contains r 0 otherwise
rr
The integrand exhibits the same characteristics as the delta function Therefore, r 43 r r . The delta function is dened in (0.52)
|rr |3
Exercises for 0.2 Complex Numbers P0.14 Using only a calculators arithmetic and trigonometric functions, compute z 1 z 2 and z 1 /z 2 in both rectangular and polar form for z 1 = 1 i and z 2 = 3 + 4i .
1 b a ib = e 2i tan a a +ib regardless of the sign of a , assuming a and b are real.
P0.15
Show that
P0.16
Invert (0.15) to get both formulas in (0.16). HINT: You can get a second equation by considering Eulers equation with a negative angle . Show Re { A } Re {B } = ( AB + A B ) /4 + C .C . If E 0 = |E 0 | e i E and B 0 = |B 0 | e i B , and if k , z , , and t are all real, prove Re E 0 e i (kz t ) Re B 0 e i (kz t ) = 1 E B0 + E0B0 4 0 1 + |E 0 | |B 0 | cos [2 (kz t ) + E + B ] 2
P0.17 P0.18
P0.19
(a) If sin = 2, show that cos = i 3. HINT: Use sin2 + cos2 = 1. (b) Show that the angle in (a) is /2 i ln(2 + 3).
P0.20
Write A cos(t ) + 2 A sin(t + /4) as simple phase-shifted cosine wave (i.e. nd the amplitude and phase of the resultant cosine wave).
24
Exercises for 0.4 Fourier Theory P0.21 Prove that Fourier Transforms have the property of linear superposition: F ag (t ) + bh (t ) = ag () + bh () where g () F g (t ) and h () F {h (t )}. P0.22 P0.23 P0.24 Prove F g (at ) =
1 |a | g a
+e
0 )2 4/T 2
P0.25
Take the inverse Fourier transform of the result in P0.24. Check that it returns exactly the original function. The following operation is referred to as the convolution of the functions g (t ) and h (t ):
P0.26
g (t ) h (t )
g ( t )h ( t ) d t
A convolution measures the overlap of g (t ) and a reversed h (t ) as a function of the offset . The result is a function of . (a) Prove the convolution theorem: F g (t ) h (t )
2g ()h ()
1 2
g ( ) h ( )
1 2 1 2 2
g (t ) h ( t ) d t
e i d
(Let = t + t )
g (t ) h t e i t +t d t d t 1 1 2
= =
g (t ) e i t d t
h t e i t d t
2g () h ()
Exercises
25
P0.27
2 |h ()|2
P0.28
(a) Compute the Fourier transform of a Gaussian function, g (t ) = 2 2 e t /2T . Do the integral by hand using the table in Appendix 0.A. (b) Compute the Fourier transform of a sine function, h (t ) = sin 0 t . Do the integral by hand using sin(x ) = (e i x e i x )/2i , combined with the integral formula (0.54). (c) Use your results to parts (a) and (b) and a convolution theorem from 2 2 P0.26(b) to evaluate the Fourier transform f (t ) = e t /2T sin 0 t . (The answer should be similar to P0.24). (d) Plot f (t ) and the imaginary part of its Fourier transform for the parameters 0 = 1 and T = 8.
Chapter 1
Electromagnetic Phenomena
In 1861, James Maxwell assembled the various known relationships of electricity and magnetism into a concise1 set of equations:2 E =
0
(Gausss Law) (Gausss Law for magnetism) (Faradays Law) (Amperes Law revised by Maxwell)
B = 0 B t B E = 0 +J 0 t E =
Here E and B represent electric and magnetic elds, respectively. The charge density describes the charge per volume distributed through space.3 The current density J describes the motion of charge density (in units of times velocity). The constant 0 is called the permittivity, and the constant 0 is called the permeability. Taken together, these are known as Maxwells equations. After introducing a key revision of Amperes law, Maxwell realized that together these equations comprise a complete self-consistent theory of electromagnetic phenomena. Moreover, the equations imply the existence of electromagnetic waves, which travel at the speed of light. Since the speed of light had been measured before Maxwells time, it was immediately apparent (as was already suspected) that light is a high-frequency manifestation of the same phenomena that govern the inuence of currents and charges upon each other. Previously, optics had been considered a topic quite separate from electricity and magnetism. Once the connection was made, it became clear that Maxwells equations form the theoretical foundations of optics, and this is where we begin our study of light.
1 In Maxwells original notation, this set of equations was hardly concise, written without the
convenience of modern vector notation or . His formulation wouldnt t easily on a T-shirt! 2 See J. D. Jackson, Classical Electrodynamics, 3rd ed., p. 1 (New York: John Wiley, 1999) or the back cover of D. J. Grifths, Introduction to Electrodynamics, 3rd ed. (New Jersey: Prentice-Hall, 1999). 3 Later in the book we use for the radius in cylindrical coordinates, not to be confused with charge density.
27
28
q 4
rr
Origin
This three-dimensional integral gives the net electric eld produced by the charge density distributed throughout the volume V . Gauss law (1.1), the rst of Maxwells equations, follows directly from (1.7) with some mathematical manipulation. No new physical phenomenon is introduced in this process.5
Origin
rr |r r | 3
dv
(1.8)
The subscript on r indicates that it operates on r while treating r , the dummy variable of integration, as a constant. The integrand contains a remarkable mathematical property that can be exploited, even without specifying the form of the
4 Here d v stands for d x d y d z and r = x x +y y +z z (in Cartesian coordinates). 5 Actually, Coulombs law applies only to static charge congurations, and in that sense it is
incomplete since it implies an instantaneous response of the eld to a reconguration of the charge. The generalized version of Coulombs law, one of Jemenkos equations, incorporates the fact that electromagnetic news travels at the speed of light. See D. J. Grifths, Introduction to Electrodynamics, 3rd ed., Sect. 10.2.2 (New Jersey: Prentice-Hall, 1999). Ironically, Gauss law, which can be derived from Coulombs law, holds perfectly whether the charges remain still or are in motion.
29
charge distribution r . In modern mathematical language, the vector expression in the integral is a three-dimensional delta function (see (0.52):6 r rr | r r |3 43 r r 4 x x y y z z (1.9)
A derivation of this formula is addressed in problem P0.13. The delta function allows the integral in (1.8) to be performed, and the relation becomes simply E (r ) = (r )
0
The (perhaps more familiar) integral form of Gauss law can be obtained by integrating (1.1) over a volume V and applying the divergence theorem (0.11) to the left-hand side: 1 da = (r) d v (1.10) E (r) n
S
0
This form of Gauss law shows that the total electric eld ux extruding through a closed surface S (i.e. the integral on the left side) is proportional to the net charge contained within it (i.e. within volume V contained by S ). Example 1.1
+ z 4 y ) cos t . Use Gauss Suppose we have an electric eld given by E = (x 2 y 3 x law (1.1) to nd the charge density (x , y , z , t ). Solution: =
0
Figure 1.3 Gauss law in integral form relates the ux of the electric eld through a surface to the charge contained inside that surface.
E =
+ z 4 y ) cos t = 2 0 x y 3 cos t +y +z ( x 2 y 3 x x y z
man) was born in Braunschweig, Germany to a poor family. Gauss was a child prodigy, and he made his rst signicant advances to mathematics as a teenager. In grade school, he purportedly was asked to add all integers from 1 to 100, which he did in seconds to the astonishment of his teacher. (Presumably, Friedrich immediately realized that the numbers form fty pairs equal to 101.) Gauss made important advances in number theory and dierential geometry. He developed the law discussed here as one of Maxwell's equations in 1835, but it was not published until 1867, after Gauss' death. Ironically, Maxwell was already using Gauss' law by that time. (Wikipedia)
6 For a derivation of Gauss law from Coulombs law that does not rely directly on the Dirac delta
function, see J. D. Jackson, Classical Electrodynamics 3rd ed., pp. 27-29 (New York: John Wiley, 1999).
30
where B (r ) =
0 4
J r
V
rr |r r | 3
dv
(1.12)
The latter equation is known as the Biot-Savart law. The permeability 0 dictates the strength of the magnetic eld, given the current distribution. As with Coulombs law, we can apply mathematics to the Biot-Savart law to obtain another of Maxwells equations. Nevertheless, the essential physics is already inherent in the Biot-Savart law.7 Using the result from P0.4, we can rewrite (1.12) as8
B (r ) =
0 4
J r r
V
0 1 dv = |r r | 4
J r
V
|r r |
dv
(1.13)
Since the divergence of a curl is identically zero (see P0.6), we get straight away the second of Maxwells equations (1.2) B = 0 which is known as Gauss law for magnetic elds. (Two equations down; two to go.) The similarity between B = 0 and E = / 0 , Gauss law for electric elds, is immediately apparent. In integral form, Gauss law for magnetic elds looks the same as (1.10), only with zero on the right-hand side. If one were to imagine the existence of magnetic monopoles (i.e. isolated north or south charges), then the right-hand side would not be zero. The law implies that the total magnetic ux extruding through any closed surface balances, with as many eld lines pointing inwards as pointing outwards.
Example 1.2
The eld surrounding a magnetic dipole is given by + 3y z y + 3z 2 r 2 z B = 3xz x r5
where r x 2 + y 2 + z 2 . Show that this eld satises Gauss law for magnetic elds (1.2).
7 Like Coulombs law, the Biot-Savart law is incomplete since it also implies an instantaneous
response of the magnetic eld to a reconguration of the currents. The generalized version of the Biot-Savart law, another of Jemenkos equations, incorporates the fact that electromagnetic news travels at the speed of light. Ironically, Gauss law for magnetic elds and Maxwells version of Amperes law, derived from the Biot-Savart law, hold perfectly whether the Currents are steady or vary in time. The Jemenko equations, analogs of Coulomb and Biot-Savart, also embody Faradays law, the only of Maxwells equations that cannot be derived from the usual forms of Coulombs law and the Biot-Savart law. See D. J. Grifths, Introduction to Electrodynamics, 3rd ed., Sect. 10.2.2 (New Jersey: Prentice-Hall, 1999). 8 Note that ignores the variable of integration r . r
31
Solution: B = 3 = 3 yz 3z 2 1 xz + 3 + 3 x r 5 y r 5 z r 5 r
12z 15z 3z 5 + 5 =0 r5 r r
to a very basic education, and so he was mostly self taught and never did acquire much skill in mathematics. As a teenager, he obtained a seven-year apprenticeship with a book binder, during which time he read many books, including books on science and electricity. Given his background, Faraday's entry into the scientic community was very gradual, from servant to assistant and eventually to director of the laboratory at the Royal Institution. Faraday is perhaps best known for his work that established the law of induction and for the discovery that magnetic elds can interact with light, known as the Faraday eect. He also made many advances to chemistry during his career including guring out how to liquify several gases. Faraday was a deeply religious man, serving as a Deacon in his church. (Wikipedia)
(1.14)
The right side describes a change in the magnetic ux through a surface and the left side describes the voltage around the loop containing the surface. We apply Stokes theorem (0.12) to the left-hand side of Faradays law and obtain da = ( E) n t da Bn
S
or
S
B da = 0 E+ n t
(1.15)
which is the differential form of Faradays law (1.4) (three of Maxwells equations down; one to go).
Example 1.3
+ z 4 y ) cos t , use Faradays For the electric eld given in Example 1.1, E = (x 2 y 3 x law (1.3) to nd B(x , y , z , t ).
N
Magnet
32
Solution: B = E = cos t t x x y
x 2 3
y z 4
y
z
z
z 4 y x 2 y 3 = cos t x (0) x (0) + y y z x z z 4 z x 2 y 3 +z x y + 3 x 2 y 2 z cos t = 4 z 3 x Integrating in time, we get + 3 x 2 y 2 z B = 4 z 3 x plus possibly a constant eld. sin t
B (r ) =
0 4
r J r
V
rr |r r |3
dv
(1.16)
We next apply the differential vector rule from P0.7 while noting that J r does not depend on r so that only two terms survive. The curl of B (r) then becomes B (r ) = 0 4 J r
V
rr |r r |3
J r r
rr |r r |3
dv
(1.17)
According to (1.9), the rst term in the integral is 4J r 3 r r , which is easily integrated. To make progress on the second term, we observe that the gradient can be changed to operate on the primed variables without affecting the nal result (i.e. r r ). In addition, we take advantage of a vector integral theorem (see P0.12) to arrive at B (r) = 0 J (r) 0 4 rr
V
|r r |
r J r
dv +
0 4
rr
S
| r r |3
da J r n (1.18)
33
The last term in (1.18) vanishes if we assume that the current density J is completely contained within the volume V so that it is zero at the surface S . Thus, the expression for the curl of B (r) reduces to B (r) = 0 J (r) The latter term in (1.19) vanishes if J =0 (steady-state approximation) (1.20) 0 4 rr
V
|r r | 3
r J r
dv
(1.19)
only applies to quasi steady-state situations. The physical interpretation of Amperes law is more apparent in integral form. We integrate both sides of (1.21) over an open surface S , bounded by contour C and apply Stokes theorem (0.12) to the left-hand side: d a 0 I B (r) d = 0 J (r) n (1.22)
C S
This law says that the line integral of B around a closed loop C is proportional to the total current owing through the loop (see Fig. 1.5). The units of J are current per area, so the surface integral containing J yields the current I in units of charge per time.
This is called the continuity equation for charge and current densities. Simply stated, if there is net current owing into a volume there ought to be charge piling up inside. For the steady-state situation inherently considered by Ampere, the current into and out of a volume is balanced so that t = 0. Derivation of the Continuity Equation
Consider a volume of space enclosed by a surface S through which current is owing. The total current exiting the volume is I=
S
da Jn
(1.24)
34
is the outward normal to the surface. The units on this equation are that where n of current, or charge per time, leaving the volume. Since we have considered a closed surface S , the net current leaving the enclosed volume V must be the same as the rate at which charge within the volume vanishes: I = t dv
V
(1.25)
Upon equating these two expressions for current, as well as applying the divergence theorem (0.11) to the former, we get Jd v =
V V
d v or t
J+
V
dv = 0 t
(1.26)
Maxwells main contribution (aside from organizing other peoples formulas9 and recognizing them as a complete set of coupled differential equationsa big deal) was the injection of the continuity equation (1.23) into the derivation of Amperes law (1.19). This yields B = 0 J + 0 4 t r
V
rr |r r |3
dv
(1.27)
E t
the last of Maxwells equations (1.4). This revised Amperes law includes the additional term 0 E/t , which is known as the displacement current (density). The displacement current exists even in the absence of any actual charge density .10 It indicates that a changing electric eld behaves like a current in the sense that it produces magnetic elds. The similarity between Faradays law and the corrected Amperes law (1.4) is apparent. No doubt this played a part in motivating Maxwells work. In summary, in the previous section we saw that the basic physics in Amperes law is present in the Biot-Savart law. Infusing it with charge conservation (1.23) yields the corrected form of Amperes law.
9 Although Gauss developed his law in 1835, it was not published until after his death in 1867, well after Maxwell published his laws of electromagnetism, so in practice Maxwell accomplished much more than merely xing Amperes law. 10 Based on (1.27), one might think that the displacement current E/t ought to be zero in a 0 region of space with no charge density . However, in (1.27) appears in a volume integral over a region of space sufciently large (consistent with a previous supposition) to include any charges responsible for the eld E; presumably, all elds arise from sources.
35
Example 1.4
(a) Use Gausss law to nd the electric eld in a gap that interrupts a currentcarrying wire, as shown in Fig. 1.6. (b) Find the strength of the magnetic eld on contour C using Amperes law applied to surface S 1 . (c) Show that the displacement current in the gap leads to the identical magnetic eld when using surface S 2 . Solution: (a) Well assume that the cross-sectional area of the wire A is much wider than the gap separation. Then the electric eld in the gap will be uniform, and the integral on the left-hand side of (1.10) reduces to E A since there is essentially no eld other than in the gap. If the accumulated charge on the plate is Q , then the right-hand side of (1.10) integrates to Q / 0 , and the electric eld turns out to be E = Q /( 0 A ). (b) Let the contour C be a circle at radius r . The magnetic eld points around the circumference with constant strength. The left-hand side of (1.22) becomes 2r B while the right-hand side is 0
S
a = 0 I = 0 J nd
Q t
(c) If instead we use the displacement current 0 E/t in place of J in in the righthand side of right-hand side of (1.22), we get for that piece 0
S
E E Q a = 0 0 nd A = 0 t t t
Example 1.5
+ z 4 y ) cos t (see Example 1.1) and the asFor the electric eld E = (x 2 y 3 x t 3 sin sociated magnetic eld B = 4z x + 3x 2 y 2 z (see Example 1.3), nd the current density J (x , y , z , t ). Solution: B J = 0 E sin t = 0 t 0 x 4 z
x 3
y
y
z 3 x y
z 2 2
0 (x
2 3
+ z 4 y ) sin t y x
sin t 6 x y 2 y + 4 z 3 y + = 6 x 2 y x 0 =
0 x 2 3
0 (x
2 3
+ z 4 y ) sin t y x
y +
6 x 2 y + x 0
0 z
4 z 3 6 x y 2 sin t y 0 0
36
Jp =
P t
(1.28)
We thus write the total current in an optical medium (ignoring magnetic effects) as P J = Jfree + (1.29) t Now lets turn our attention to charge density . We seldom consider the propagation of electromagnetic waveforms through electrically charged materials. We therefore will write free = 0. One might be tempted in this case to set the overall charge density to zero, but this would be wrong. The polarization of a neutral material, described by P, can vary spatially, leading to local concentrations of positive or negative charges. We let p denote the charge density created by variations in the polarization P(r). To determine an expression for p , we write the continuity equation (1.23) as applied to the currents and charges associated with this polarization: Jp = p t (1.30)
To better appreciate local charge buildup due to variation in the medium polarization, consider the divergence theorem (0.11) applied to P (r):
S
da = P (r ) n
V
P (r) d v
(1.32)
37
The left-hand side of (1.32) is a surface integral, which after integrating gives units of charge. Physically, it is the sum of the charges touching the inside of surface S (multiplied by a minus since by convention dipole vectors point from the negatively charged end of a molecule to the positively charged end). When P is zero, there are equal numbers of positive and negative charges touching S from within, as depicted in Fig. 1.7. When P is not zero, the positive and negative charges touching S are not balanced, as depicted in Fig. 1.8. Essentially, excess charge ends up within the volume because the non-uniform alignment of dipoles causes them to be cut preferentially at the surface.11 Since we will ignore free charges (for optical media), we write the charge density according to (1.31) as = P (1.33) In summary, in electrically neutral non-magnetic media, Maxwells equations (in terms of the medium polarization P) are12 P
0
E = B = 0
(Gausss law) (Gausss law for magnetism) (Faradays law) (Amperes law; xed by Maxwell)
B t B E P = 0 + + Jfree 0 t t E =
38
At rst glance, Maxwells equations might not immediately suggest (to the inexperienced eye) that waves are solutions. However, we can manipulate the equations (rst order differential equations that couple E to B) into the familiar wave equation (decoupled second order differential equations for either E or B). You should become familiar with this derivation. In what follows, we will derive the wave equation for E. The derivation of the wave equation for B is very similar (see problem P1.6).
2 E J = 0 t 2 t
(1.39)
Next we apply the differential vector identity (0.10), ( E) = ( E) 2 E, and use Gauss law (1.1) to replace the term E, which brings us to 2 E 0
0
2 E J = 0 + t 2 t 0
(1.40)
Substitution from (1.29) and (1.33) gives the more-useful-for-optics form 2 E 0 Jfree 2 E 2 P 1 = + ( P ) 0 0 t 2 t t 2 0 (1.41)
The left-hand side of (1.41) is the familiar wave equation. However, the righthand side contains a number of source terms, which arise when various currents and/or polarizations are present. The rst term on the right-hand side of (1.41) describes currents of free charges, which are important for determining the reection of light from a metallic surface or for determining the propagation of light in a plasma. The second term on the right-hand side describes dipole oscillations, which behave similar to currents. The nal term on the right-hand side of (1.41) is important in anisotropic media such as crystals. In this case, the polarization P responds to the electric eld along a direction not necessarily parallel to E, due to the inuence of the crystal lattice (addressed in chapter 5). In summary, when light propagates in a material, at least one of the terms on the right-hand side of (1.41) will be non zero. As an example, in glass, Jfree = 0 and P = 0, but 2 P t 2 = 0 since the medium polarization responds to the light eld, giving rise to refractive index (discussed in chapter 2).
Example 1.6
39
Show that the electric eld + z 4 y ) cos t E = (x 2 y 3 x and the associated charge density (see Example 1.1) = 2 0 x y 3 cos t together with the associated current density (see Example 1.5) J=
0 x 2 3
y +
6 x 2 y + x 0
0 z
4 z 3 6 x y 2 sin t y 0 0
Similarly, 0 J cos t + 0 0 2 z 4 + 12z 2 6x y 2 y + = 0 0 2 x 2 y 3 + 6x 2 y x t 0 + 6 x y 2 y cos t + 2 y 3 x + 0 0 2 z 4 + 12z 2 y cos t = 0 0 2 x 2 y 3 + 6x 2 y + 2 y 3 x The two expressions are equivalent, and the wave equation is satised.13
The magnetic eld B satises a similar wave equation, decoupled from E (see P1.6). However, the two waves are not independent. The elds for E and B must be chosen to be consistent with each other through Maxwells equations. After solving the wave equation (1.41) for E, one can obtain the consistent B from E via Faradays law (1.36). In vacuum all of the terms on the right-hand side in (1.41) are zero, in which case the wave equation reduces to 2 E 0 2 E =0 0 t 2 (vacuum) (1.42)
Solutions to this equation can take on every imaginable functional shape (specied at a given instantthe evolution thereafter being controlled by (1.42)). Moreover, since the differential equation is linear, any number of solutions can be added together to create other valid solutions. Consider the subclass of solutions
13 The expressions in Example 1.6 hardly look like waves. The (quite unlikely) current and charge
distributions, which ll all space, would have to be articially induced rather than arise naturally in response to a eld disturbance on a medium.
40
that propagate in a particular direction. These waveforms preserve shape while traveling with speed c 1
0 0
(1.43)
rc t , where u is a unit vector specifying In this case, E depends on the argument u the direction of propagation. The shape is preserved since features occurring at a given position recur downstream at a distance c t after a time t . By checking this solution in (1.42), one conrms that the speed of propagation is c (see P1.8). As mentioned previously, one may add together any combination of solutions (even with differing directions of propagation) to form other valid solutions.
Exercises
41
Exercises
Exercises for 1.1 Gauss Law P1.1 Consider an innitely long hollow cylinder with inner radius a and outer radius b as shown in Fig. 1.9. Assume that the cylinder has a charge density = k /s 2 for a < s < b and no charge elsewhere, where s is the radial distance from the axis of the cylinder. Use Gausss Law in integral form to nd the electric eld produced by this charge for each of the three regions: s < a , a < s < b , and s > b . HINT: For each region rst draw an appropriate Gaussian surface and integrate the charge density over the volume to gure out the enclosed charge. Then use Gausss law in integral form and the symmetry of the problem to solve for the electric eld.
a b
Figure 1.9 A charged cylinder with charge located between a and b .
Exercises for 1.3 Faradays Law P1.2 Suppose that an electric eld is given by E(r, t ) = E0 cos k r t + , where kE0 and is a constant phase. Show that B(r, t ) = is consistent with (1.3). k E0 cos k r t +
Exercises for 1.4 Amperes Law P1.3 A conducting cylinder with the same geometry as P1.1 carries a current along the axis of the cylinder for a < s < b , where s is density J = k /s z the radial distance from the axis of the cylinder. Using Amperes Law in integral form, nd the magnetic eld due to this current. Find the eld for each of the three regions: s < a , a < s < b , and s > b . HINT: For each region rst draw an appropriate Amperian loop and integrate the current density over the surface to gure out how much current passes through the loop. Then use Amperes law in integral form and the symmetry of the problem to solve for the magnetic eld.
Exercises for 1.6 Polarization of Materials P1.4 Memorize Maxwells equations (1.1)(1.4) together with (1.29) and (1.33). Be prepared to reproduce them from memory on an exam, and write them on your homework from memory to indicate completion. Also very briey summarize the physical principles described by
42
each of Maxwells equations, and the assumptions that go into writing (1.29) and (1.33). P1.5 Check that the E and B elds in P1.2, satisfy the rest of Maxwells equations (1.1), (1.2), and (1.4). What are the implications for J and ?
Exercises for 1.7 The Wave Equation P1.6 Derive the wave equation for the magnetic eld B in vacuum (i.e. J = 0 and = 0). Show that the magnetic eld in P1.2 is consistent with the wave equation derived in P1.6. r c t ) satises the vacuum wave equation (1.42), where Verify that E(u E has an arbitrary functional form. r c t ) + is a solution to the vac(a) Show that E (r, t ) = E0 cos k (u is an arbitrary unit vector and k is uum wave equation (1.42), where u a constant with units of inverse length. (b) Show that each wave front forms a plane, which is why such solutions are often called plane waves. HINT: A wavefront is a surface in space where the argument of the cosine (i.e. the phase of the wave) has a constant value. Set the cosine argument to an arbitrary constant and see what positions are associated with that phase. (c) Determine the speed v = r /t that a wave front moves in the u direction. HINT: Set the cosine argument to a constant, solve for r, and differentiate r with respect to t . (d) By analysis, determine the wavelength in terms of k . HINT: Find the distance between identical wave fronts by changing the cosine argument by 2 at a given instant in time.
Screen D Laser A
P1.7
P1.8
P1.9
must be perpendicular to each (e) Use (1.34) to show that E0 and u other in vacuum. L1.10
C
B Rotating Mirror
Delay Path
Measure the speed of light using a rotating mirror. Provide an estimate of the experimental uncertainty in your answer (not the percentage error from the known value). (video) Figure 1.10 shows a simplied geometry for the optical path for light in this experiment. Laser light from A reects from a rotating mirror at B towards C . The light returns to B , where the mirror has rotated, sending the light to point D . Notice that a mirror rotation of deects the beam by 2 .
Exercises
43
Retro-reflecting Collimation Telescope Rotating mirror Long Corridor Front of laser can serve as screen for returning light
Laser
P1.11
Ole Roemer made the rst successful measurement of the speed of light in 1676 by observing the orbital period of Io, a moon of Jupiter with a period of 42.5 hours. When Earth is moving toward Jupiter, the period is measured to be shorter than 42.5 hours because light indicating the end of the moons orbit travels less distance than light indicating the beginning. When Earth is moving away from Jupiter, the situation is reversed, and the period is measured to be longer than 42.5 hours. (a) If you were to measure the time for 40 observed orbits of Io when Earth is moving directly toward Jupiter and then several months later measure the time for 40 observed orbits when Earth is moving directly away from Jupiter, what would you expect the difference between these two measurements be? Take the Earths orbital radius to be 1.5 1011 m. To simplify the geometry, just assume that Earth moves directly toward or away from Jupiter over the entire 40 orbits (see Fig. 1.12). (b) Roemer did the experiment described in part (a), and experimentally measured a 22 minute difference. What speed of light would one deduce from that value?
was
a man of many interests. In addition to measuring the speed of light, he created a temperature scale which with slight modication became the Fahrenheit scale, introduced a system of standard weights and measures, and was heavily involved in civic aairs (city planning, etc.). Scientists initially became interested in Io's orbit because its eclipse (when it went behind Jupiter) was an event that could be seen from many places on earth. By comparing accurate measurements of the local time when Io was eclipsed by Jupiter at two remote places on earth, scientists in the 1600s were able to determine the longitude dierence between the two places.
P1.12
In an isotropic medium (i.e. P = 0), the polarization can often be written as function of the electric eld: P = 0 (E ) E, where (E ) = 1 + 2 E + 3 E 2 . The higher order coefcients in the expansion (i.e. 2 , 3 , ...) are typically small, so only the rst term is important at low intensities. The eld of nonlinear optics deals with intense light-matter interactions, where the higher order terms of the expansion become important. This can lead to phenomena such as harmonic generation. Starting with Maxwells equations, derive the wave equation for nonlinear optics in an isotropic medium: 2 E 0
0
Earth
1 + 1
2 E = 0 t 2
2 2 E + 3 E 2 + E
0
t 2
+ 0
J t
We retain the possibility of current here since, for example, in a gas some of the molecules might ionize in the presence of a strong eld, giving rise to currents.
Chapter 2
46
We are interested in solutions to (2.1) that have the functional form (see P1.9) E(r, t ) = E0 cos k r t +
AM
(2.2)
Frequency (Hz)
Radio
FM
Here represents an arbitrary (constant) phase term. The vector k, called the wave vector, may be written as = k ku 2 u vac (vacuum) (2.3)
Radar
Microwave
is a unit vector dening the direction of where k has units of inverse length, u ) to propagation, and vac is the length by which r must vary (in the direction of u cause the cosine to go through a complete cycle. This distance is known as the (vacuum) wavelength. The frequency of oscillation is related to the wavelength via = 2 c vac (vacuum) (2.4)
Infrared
Visible
Ultraviolet
The frequency has units of radians per second. Frequency is also often expressed as /2 in units of inverse seconds or Hz. Notice that k and cannot be chosen independently; the wave equation requires them to be related through the dispersion relation k= (vacuum) (2.5) c Typical values for vac are given in Fig. 2.1. Sometimes the spatial period of the wave is expressed as 1/vac , in units of cm1 , called the wave number. A magnetic wave accompanies any electric wave, and it obeys a similar wave equation (see P1.6). The magnetic wave corresponding to (2.2) is B(r, t ) = B0 cos k r t + ,
Wavelength (m)
X-rays
(2.6)
Gamma Rays
It is important to note that B0 , k, , and are not independently chosen in (2.6). In order to satisfy Faradays law (1.3), the arguments of the cosine in (2.2) and (2.6) must be identical. Therefore, in vacuum the electric and magnetic elds travel in phase. In addition, Faradays law requires (see P1.2) B0 = k E0 (2.7)
The above cross product means that B0 , is perpendicular to both E0 and k. Meanwhile, Gauss law E = 0 forces k to be perpendicular to E0 . It follows that the magnitudes of the elds are related through B 0 = kE 0 / or B 0 = E 0 /c , in view of (2.5). The inuence of the magnetic eld only becomes important (in comparison to the electric eld) for charged particles moving near the speed of light. This typically takes place only for extremely intense lasers (> 1018 W/cm2 , see P2.12) where the electric eld is sufciently strong to cause electrons to oscillate with velocities near the speed of light. We will be interested in optics problems that take
47
place at far less intensity where the effects of the magnetic eld can typically be safely ignored. Throughout the remainder of this book, we will focus our attention mainly on the electric eld with the understanding that we can at any time deduce the (less important) magnetic eld from the electric eld via Faradays law. Figure 2.2 depicts the electric eld (2.2) and the associated magnetic eld (2.6) like transverse waves on a string. However, they are actually large planar sheets of uniform eld strengths (difcult to draw) that move in the direction of k. The name plane wave is given since a constant argument in (2.2) at any moment describes a plane, which is perpendicular to k. A plane wave lls all space and may be thought of as a series of innite sheets, each with a different uniform eld strength, moving in the k direction. At this point, we rewrite our plane wave solution using complex number notation. Although this change in notation will not make the task at hand any easier (and may even appear to complicate things), we introduce it here in preparation for later sections, where it will save considerable labor. (For a review of complex notation, see section 0.2.) Using complex notation we rewrite (2.2) as 0 e i (krt ) E(r, t ) = Re E 0 as follows:1 where we have hidden the phase term inside of E 0 E0 e i E (2.9) (2.8)
Figure 2.2 Depiction of electric and magnetic elds associated with a plane wave.
The next step we take is to become intentionally sloppy. Physicists throughout the world have conspired to avoid writing Re { } in an effort (or lack thereof if you prefer) to make expressions less cluttered. Nevertheless, only the real part of the eld is physically relevant even though expressions and calculations contain both real and imaginary terms. This sloppy notation is okay since the real and imaginary parts of complex numbers never intermingle when adding, subtracting, differentiating, or integrating. We can delay taking the real part of the expression until the end of the calculation. Also, when hiding a phase inside of the eld amplitude as in (2.8), we drop the tilde (might as well since we are already being sloppy); we will automatically assume that the eld amplitude is complex and contains phase information. Putting this all together, our plane wave solution in complex notation is written simply as E(r, t ) = E0 e i (krt ) (2.10)
It is possible to construct any electromagnetic disturbance from a linear superposition of such waves, which we will do in chapter 7.
1 We have assumed that each vector component of the eld propagates with the same phase. To 0 xE 0x e i x + yE 0y e i y + zE 0z e i z . be more general, one could write E
48
Example 2.1
Verify that the complex plane wave (2.10) is a solution to the wave equation (2.1). Solution: The rst term gives 2 E0 e i (krt ) = E0 2 2 2 + 2 + 2 e i (k x x +k y y +k z z t ) 2 x y z (2.11)
2 2 2 i (krt ) = E0 k x + ky + kz e
Upon insertion into (2.1) we obtain the vacuum dispersion relation (2.5), which species the connection between the wavenumber k and the frequency , emphasizing that k and cannot be chosen independently.
Since we are considering sinusoidal waves, we consider solutions of the form E = E0 e i (krt ) P = P0 e i (krt ) (2.14)
By writing this, we are making the (reasonable) assumption that if an electric eld stimulates a medium at frequency , then the polarization in the medium also oscillates at frequency . This assumption is typically rather good except for extreme electric elds, which can generate frequency harmonics through nonlinear effects (see P1.12). Recall that by our prior agreement, the complex amplitudes of E0 and P0 carry phase information. Thus, while E and P in (2.14) oscillate at the same frequency, they can be out of phase with respect to each
2 Isotropic means the material behaves the same for propagation in any direction. Many crystals
are not isotropic as well see in Chapter 5. 3 Homogeneous means the material is everywhere the same throughout space. 4 This follows for a wave of the form (2.14) if P and k are perpendicular.
49
other. This phase discrepancy is most pronounced for materials that absorb energy at the plane wave frequency. Substitution of the trial solutions (2.14) into (2.13) yields k 2 E0 e i (krt ) + 0 0 2 E0 e i (krt ) = 0 2 P0 e i (krt ) (2.15)
To go further, we need to make an explicit connection between E0 and P0 (external to Maxwells equations). In a linear medium, the polarization amplitude is proportional to the strength of the applied electric eld: P0 () =
0 () E0 ()
(2.16)
This is known as a constitutive relation. We have introduced a dimensionless proportionality factor () called the susceptibility, which depends on the frequency of the eld. We account for the possibility that E and P oscillate out of phase by allowing () to be a complex number. By inserting (2.16) into (2.15) and canceling the eld terms, we obtain the dispersion relation in dielectrics: k2 =
0 0
1 + () 2
or
k=
1 + ()
(2.17)
where we have used c 1/ 0 0 . In general, () is a complex number, which leads to a complex index of refraction, dened by5 N () n () + i () = 1 + () (2.18)
where n and are respectively the real and imaginary parts of the index. (Note that is not k .) According to (2.17), the magnitude of the wave vector is also complex according to N (n + i ) k= = (2.19) c c The use of complex index of refraction only makes sense in the context of complex representation of plane waves. The complex index N takes into account absorption as well as the usual oscillatory behavior of the wave. We see this by explicitly placing (2.19) into (2.14): r i nc u rt E(r, t ) = E0 e Im{k}r e i (Re{k}rt ) = E0 e c u e (2.20) is a real unit vector specifying the direction of k. Again, when looking As before, u at (2.20), by special agreement in advance, we should just think of the real part, namely6 n r r t + E (r , t ) = E 0 e c u cos u (2.21) c
0 E + P = E. See M. Born and E. Wolf, Principles of Optics, 7th ed., p. 3 (Cambridge University Press, 1999). The permittivity encapsulates the constitutive relation that connects P with E. In a linear medium we have 0 (1 + ), so that the index of refraction is given by N = / 0 . 6 For the sake of simplicity in writing (2.21) we assume linearly polarized light. That is, all vector components of E0 have the same complex phase . We will consider other possibilities, such as circularly polarized light, in chapter 6.
50
0 . (The tilde where an overall phase was formerly held in the complex vector E had been suppressed.) Figure 2.3 shows a graph of (2.21). The imaginary part of the index causes the wave to decay as it travels. The real part of the index n is associated with the oscillations of the wave. By inspection of the cosine argument in (eq:2.3.20), we see that the speed of the (diminishing) sinusoidal wave fronts is v phase () = c /n ()
0
(2.22)
It is apparent that n () is the ratio of the speed of the light in vacuum to the speed of the wave in the material. In a dielectric, the vacuum relations (2.3) and (2.4) are modied to read
0 10 20
Re {k} where
2 , u
(2.23)
Figure 2.3 Electric eld of a decaying plane wave. For convenience in plotting, the direction of propagation is chosen to be in the z =z ). direction (i.e. u
vac /n .
(2.24)
While the frequency is the same, whether in a material or in vacuum, the wavelength varies with the real part of the index n . Example 2.2
When n = 1.5, = 0.1, and = 5 1014 Hz, nd (a) the wavelength inside the material, and (b) the propagation distance over which the amplitude of the wave diminishes by the factor e 1 (called the skin depth). Solution: (a) = (b) e
c z
= e 1
z=
The real parts and the imaginary parts in the above equation are separately equal: n 2 2 = 1 + Re and 2n = Im (2.26)
51
When this is substituted into the rst equation of (2.26) we get a quadratic in n 2 n 4 1 + Re n 2 The positive7 real root to this equation is 1 + Re + 1 + Re 2
2
Im 4
=0
(2.28)
n=
+ Im
(2.29)
When absorption is small we can neglect the imaginary part of (), and (2.29) reduces to n () = 1 + () (negligible absorption) (2.30)
52
Unperturbed
At the time of Lorentz, atoms were thought to be clouds of positive charge wherein point-like electrons sat at rest unless stimulated by an applied electric eld. In our modern quantum-mechanical viewpoint, rmicro corresponds to an average displacement of the electronic cloud, which surrounds the nucleus (see Fig. 2.4). The displacement rmicro of the electron charge in an individual atom depends on the local strength of the applied electric eld E at the position of the atom. Since the diameter of the electronic cloud is tiny compared to a wavelength of (visible) light, we may consider the electric eld to be uniform across any individual atom. The Lorentz model uses Newtons equation of motion to describe an electron displacement from equilibrium within an atom. In accordance with the classical laws of motion, the electron mass m e times its acceleration is equal to the sum of the forces on the electron: micro = q e E m e r micro k Hooke rmicro me r (2.32)
In an electric field
The electric eld pulls on the electron with force q e E.8 A drag force (or friction) micro opposes the electron motion and accounts for absorption of energy. m e r Without this term, it is only possible to describe optical index at frequencies away from where absorption takes place. Finally, k Hooke rmicro is a force accounting for the fact that the electron is bound to the nucleus. This restoring force can be thought of as an effective spring that pulls the displaced electron back towards equilibrium with a force proportional to the amount of displacement, so this term is essentially the familiar Hookes law. With some rearranging, (2.32) can be written as qe micro + r micro + 2 r E (2.33) 0 rmicro = me where 0 k Hooke /m e is the natural oscillation frequency (or resonant frequency) associated with the electron mass and the spring constant. There is a subtle problem with our analysis, which we will continue to neglect in this section, but which should be mentioned. The eld E in (2.32) is the net eld, which is inuenced by the presence of all of the dipoles. The actual eld that a dipole feels, however, does not include its own eld. That is, we should remove from E the eld produced by each dipole in its own vicinity. This signicantly modies the result if the density of the material is sufciently high. This effect is described by the Clausius-Mossotti formula, which is treated in appendix 2.B. In accordance with our examination of a single sinusoidal wave, we insert (2.14) into (2.33) and obtain micro + r micro + 2 r 0 rmicro = qe E0 e i (krt ) me (2.34)
Note that within a given atom the excursions of rmicro are so small that k r remains essentially constant, since k r varies with displacements on the scale of an optical
8 The electron also experiences a force due to the magnetic eld of the light, F = q v e micro B,
53
wavelength, which is huge compared to the size of an atom. The inhomogeneous solution to (2.34) is (see P2.1) rmicro = E0 e i (krt ) qe 2 m e 2 0 i (2.35)
The electron position rmicro oscillates (not surprisingly) with the same frequency as the driving electric eld. This solution illustrates the convenience of complex notation. The imaginary part in the denominator implies that the electron oscillates with a phase different from the electric eld oscillations; the damping term (the imaginary part in the denominator) causes the two to be out of phase somewhat. The complex algebra in (2.35) accomplishes quite easily what would otherwise be cumbersome (i.e. working out a trigonometric phase). We are now able to write the polarization in terms of the electric eld. By substituting (2.35) into (2.31) and rearranging, we obtain P= 2 p
0
2 2 0 i
E0 e i (krt )
(2.36)
2 N qe 0 me
(2.37)
(2.38)
0 -10
-5
10
The index of refraction is then found by substituting the susceptibility (2.38) into (2.18). The real and imaginary parts of the index are solved by equating separately the real and imaginary parts of (2.18), namely (n + i )2 = 1 + () = 1 + 2 p
2 2 0 i
Figure 2.5 Real and imaginary parts of the index for a single Lorentz oscillator dielectric with p = 10.
(2.39)
A graph of n and is given in Fig. 2.5. Most materials actually have more than one species of active electron, and different active electrons behave differently. The generalization of (2.39) in this case is f j 2 pj 2 (2.40) (n + i ) = 1 + () = 1 + 2 2 j 0 j i j where f j is the aptly named oscillator strength for the j th species of active electron. Each species also has its own plasma frequency p j , natural frequency 0 j , and damping coefcient j .
9 In a plasma, charges move freely so that both the Hooke restoring force and the dragging term
54
Lorentz introduced this model well before the development of quantum mechanics. Even though the model pays no attention to quantum physics, it works surprisingly well for describing frequency-dependent optical indices and absorption of light. As it turns out, the Schrdinger equation applied to two levels in an atom reduces in mathematical form to the Lorentz model in the limit of low-intensity light. Quantum mechanics also explains the oscillator strength, which before the development of quantum mechanics had to be inserted ad hoc to make the model agree with experiments. The friction term turns out not to be associated with something internal to atoms but rather with collisions between atoms, which on average give rise to the same behavior.
(n + i )2 = 1
2 p i + 2
(2.41)
-20
20
40
Figure 2.6 Real and imaginary parts of the index for conductor with p = 50.
This underscores the fact that P/t is a current very much like Jfree . When we remove the restoring force k Hooke = m e 2 0 from the atomic model, the electrons effectively become free, and it is not surprising that they exactly mimic the behavior of a free current Jfree . A graph of n and in the conductor model is given in Fig. 2.6. Below, we provide the derivation for (2.41) in the context of Jfree rather than as a limiting case of the dielectric model.10
We assume that the current is made up of individual electrons traveling with velocity vmicro : Jfree = N q e vmicro (2.43) As before, N is the number density of free electrons (in units of number per volume). Recall that current density Jfree has units of charge times velocity per volume
10 G. Burns, Solid State Physics, Sect. 9-5 (Orlando: Academic Press, 1985).
55
(or current per cross sectional area), so (2.43) may be thought of as a denition of current density in a fundamental sense. Again, the electrons satisfy Newtons equation of motion, similar to (2.32) except without a restoring force: micro = q e E m e r micro me r (2.44)
For a sinusoidal electric eld E = E0 e i (krt ) , the solution to this equation is micro = vmicro r q e E0 e i (krt ) me i (2.45)
where again we assume that the electron oscillation excursions described by rmicro are small compared to the wavelength so that r can be treated as a constant in (2.44). The current density (2.43) in terms of the electric eld is then Jfree =
2 N qe E0 e i (krt ) me i
(2.46)
We substitute this together with the electric eld into the wave equation (2.42) and get 2 0 N q e 2 E0 e i (krt ) k 2 E0 e i (krt ) + 2 E0 e i (krt ) = i (2.47) c me i This simplies down to the dispersion relation k2 = 2 2 p 1 c2 i + 2 (2.48)
2 which agrees with (2.41). We have made the substitution 2 p = N q e / 0 m e in accor2 2 2 1+ dance with (2.37). As usual, k 2 = ( ) = (n +i ) , so the susceptibility and the
c2
c2
Note that in the low-frequency limit (i.e. ), the current density (2.46) 2 reduces to Ohms law J = E, where = N q e /m e is the DC conductivity. In the high-frequency limit (i.e. ), the behavior changes over to that of a free plasma, where collisions, which are responsible for resistance, become less important since the excursions of the electrons during oscillations become very small. This formula captures the general behavior of metals, but actual values of the index vary from this somewhat (see P2.6 ). In either the conductor or dielectric model, the damping term removes energy from electron oscillations. The damping term gives rise to an imaginary part of the index, which causes an exponential attenuation of the plane wave as it propagates.
+ + +
Figure 2.7 The electrons in a conductor can easily move in response to the applied eld.
56
rather than the eld amplitude directly. In this section we examine the connection between propagating electromagnetic elds (such as the plane waves discussed in this chapter) and the energy transported by such elds. In the late 1800s John Poynting developed (from Maxwells equations) the theoretical foundation that describes light energy transport. You should appreciate and remember the ideas involved, especially the denition and meaning of the Poynting vector, even if you forget the specics of its derivation.
The rst two terms can be simplied using the vector identity P0.8. The next two terms are the time derivatives of 0 E 2 /2 and B 2 /20 , respectively. The relation (2.49) then becomes E B + 0 t
0
E2 B2 + = E J 2 20
(2.50)
This is Poyntings theorem. Each term in this equation has units of power per volume.
(2.52)
is called the Poynting vector , which has units of power per area, called irradiance. The expression 2 B2 0E u eld + (2.53) 2 20 is the energy per volume stored in the electric and magnetic elds. Derivations of the electric eld energy density and the magnetic eld energy density are given in Appendices 2.C and 2.D. (See (2.79) and (2.86).) The derivative u medium EJ t (2.54)
11 See D. J. Grifths, Introduction to Electrodynamics, 3rd ed., Sect. 8.1.2 (New Jersey: Prentice-Hall, 1999).
57
describes the power per volume delivered to the medium from the eld. Equation (2.54) is reminiscent of the familiar circuit power law, Power = Voltage Current. Power is delivered when a charged particle traverses a distance while experiencing a force. This happens when currents ow in the presence of electric elds. Poyntings theorem is essentially a statement of the conservation of energy, where S describes the ow of energy. To appreciate this, consider Poyntings theorem (2.51) integrated over a volume V (enclosed by surface S ). If we also apply the divergence theorem (0.11) to the term involving S we obtain da = Sn
S
(u eld + u medium ) d v
V
(2.55)
Notice that the volume integral over energy densities u eld and u medium gives the total energy stored in V , whether in the form of electromagnetic eld energy density or as energy density that has been given to the medium. The integration of the Poynting vector over the surface gives the net Poynting vector ux directed outward. Equation (2.55) indicates that the outward Poynting vector ux matches the rate that total energy disappears from the interior of V . Conversely, if the Poynting vector is directed inward (negative), then the net inward ux matches the rate that energy increases within V . The vector S denes the ow of energy through space. Its units of power per area are just what is needed to describe the brightness of light impinging on a surface. Example 2.3
(a) Find the Poynting vector S and energy density u eld for the plane wave eld E = E 0 cos (kz t ) traveling in vacuum. (b) Check that S and u eld satisfy Poyntings x theorem. Solution: The associated magnetic eld is (see P1.2) B= k x E0 z kE 0 cos (kz t ) = y cos (kz t )
2 c 0 E 0 =z cos2 (kz t )
E2 B2 + = 2 20
2 E0
cos2 (kz t ) +
2 kE 0
20 2
cos2 (kz t )
2 E0 cos2 (kz t )
Notice that S = cu . The energy density traveling at speed c gives rise to the power per area passing a surface (perpendicular to z ).
58
(b) We have
2 S = c 0E0
whereas u eld = t
0
2 E0
Poyntings theorem (2.50) is satised since = kc . It is common to replace the rapidly oscillating function cos2 (kz t ) with its time average 1/2, but this would have inhibited our ability to take the above derivatives.
When k is complex, B is out of phase with E, and this occurs when absorption takes place. When there is no absorption, then k is real, and B and E carry the same complex phase. Before computing the Poynting vector (2.52), which involves multiplication, we must remember our unspoken agreement that only the real parts of the elds are relevant. We necessarily remove the imaginary parts before multiplying (see (0.22)). To obtain the real parts of the elds, we add their respective complex conjugates and divide the result by 2 (see (0.30)). The real eld associated with the plane-wave electric eld is E (r , t ) = 1 i (k rt ) E0 e i (krt ) + E 0e 2 (2.57)
and the real eld associated with (2.56) is B(r, t ) = 1 k E0 i (krt ) k E 0 i (k rt ) e + e 2 (2.58)
We have merely exercised our previous (conspiratorial) agreement that only the real parts of (2.39) and (2.56) are to be retained. Now we are ready to calculate the Poynting vector. The algebra is a little messy in general, so we restrict the analysis to the case of an isotropic medium for the sake of simplicity.
59
S E =
B 0
1 k E0 i (krt ) k E 1 0 i (k rt ) i (k rt ) E0 e i (krt ) + E e + e 0e 2 20
E (kE0 ) i (kk )r E0 (kE0 ) 2i (krt ) e + 0 e k E r E E0 k E i k k 0 0 ) + 0 e ( e 2i (k rt ) +
1 = 40 =
(2.59) The letters C.C. stand for the complex conjugate of what precedes in the square . We have also brackets. The direction of k is specied with the real unit vector u . used (2.19) to rewrite i (k k ) as 2 (/c ) u The assumption of an isotropic medium (not a crystal) means that E(r, t ) = 0 E0 = 0. We can use this fact together with the BAC-CAB rule P0.3 and therefore u to reduce the above expression to S= u k k 2 c ur + C.C. E0 E (E0 E0 ) e 2i (krt ) + 0 e 40 (2.60)
1 k k r E0 ) e 2i (krt ) + E E0 ) e 2 c u + C.C. E 0 (u (u 40 0
The nal expression shows that (in an isotropic medium) the ow of energy is in (or k). This agrees with our intuition that energy ows in the the direction of u direction that the wave propagates.
Very often, we are interested in the time-average of the Poynting vector, denoted by St . There are no electronics that can keep up with the rapid oscillation of visible light (i.e. > 1014 Hz). Therefore, what is always measured is the timeaveraged absorption of energy. Under time averaging, the rst term in (2.60) vanishes since it rapidly oscillates positive and negative. The time-averaged Poynting vector (including the term C.C.) becomes St = k + k u r 2 u c E0 E 0 e 40 n 0c 2 r |E 0 x |2 + E 0 y + |E 0 z |2 e 2 c u =u 2
(2.61)
We have used (2.19) to rewrite k + k as 2 (n /c ). We have also used (1.43) to rewrite 1/0 c as 0 c . The expression (2.61) is formally called the irradiance (with the direction u included). However, we often speak of the intensity of a eld I , which amounts to . The denition of intensity the same thing, but without regard for the direction u is thus less specic, and it can be applied, for example, to standing waves where the net irradiance is technically zero (i.e. counter-propagating plane waves with
60
zero net energy ow). Nevertheless, atoms in standing waves feel the oscillating eld. In general, the intensity is written as
Radiant Power (of a source): Electromagnetic energy. Units: W = J/s Radiant Solid-Angle Intensity (of a source): Radiant power per steradian emitted from a pointlike source (4 steradians in a sphere). Units: W/Sr Radiance or Brightness (of a source): Radiant solid-angle intensity per unit projected area of an extended source. The projected area foreshortens by cos , where is the observation angle relative to the surface normal. Units: W/(Sr cm2 ) Radiant Emittance or Exitance (from a source): Radiant Power emitted per unit surface area of an extended source (the Poynting ux leaving). Units: W/cm2 Irradiance (to a receiver) Often called intensity: Electromagnetic power delivered per area to a receiver: Poynting ux arriving. Units: W/cm2
I=
n 0c n 0c |E 0 x | 2 + E 0 y E 0 E 0 = 2 2
+ |E 0 z |2
(2.62)
where in this case we have ignored absorption (i.e. 0). Alternatively, we 2 r) could consider |E 0 x |2 , E 0 y , and |E 0 z |2 to include the factor exp(2(/c )u so that they correspond to the local electric eld. Equation (2.62) agrees with S in E 0 is real; the cosine squared averages to 1/2. Example 2.3 where n = 1 and E0 = x
wavelength (nm)
Figure 2.8 The response of a standard human eye under relatively bright conditions (photoptic) and in dim conditions (scotoptic).
61
Photometric units, which may seem a little obscure, were rst dened in terms of an actual candle with prescribed dimensions made from whale tallow. The basic unit of luminous power is called the lumen, dened to be (1/683) W of light with wavelength vac = 555 nm, the peak of the eyes bright-light response. More radiant power is required to achieve the same number of lumens for wavelengths away from the center of the eyes spectral response. Photometric units are often used to characterize room lighting as well as photographic, projection, and display equipment. For example, both a 60 W incandescent bulb and a 13 W compact uorescent bulb emit a little more than 800 lumens of light. The difference in photometric output versus radiometric output reects the fact that most of the energy radiated from an incandescent bulb is emitted in the infrared, where our eyes are not sensitive. Table 2.2 gives the names of the various photometric quantities, which parallel the entries for radiometric quantities in Table 2.1. We include a variety of units that are sometimes encountered. Cones come in three varieties, each of which is sensitive to light in different wavelength bands. Figure 2.9 plots the normalized sensitivity curves13 for short (S), medium (M), and long (L) wavelength cones. Because your brain gets separate signals from each type of cone, this system gives you the ability to measure basic information about the spectral content of light. We interpret this spectral information as the color of the light. When the three types of cones are stimulated equally the light appears white, and when they are stimulated differently the light appears colored. Light with different spectral distributions can produce the exact same color sensation, so our perception of color only gives very general information about the spectral content of light. For example, light coming from a television has a different spectral composition than the light incident on the camera that recorded the image, but both can produce the same color sensation. This ambiguity can lead to a potentially dangerous situation in the lab because lasers from 670 nm to 800 nm all appear the same color. (They all stimulate the L and M cones in essentially the same ratio.) However, your eyes response falls off quickly in the near-infrared, so a dangerous 800 nm high-intensity beam can appear about the same brightness as an innocuous 670 nm laser pointer. Because we have have three types of cones, our perception of color can be well-represented using a three-dimensional vector space referred to as a color space.14 A color space can be dened in terms of three basis light sources
13 A. Stockman, L. Sharpe, and C. Fach, The spectral sensitivity of the human short-wavelength
Luminous Power (of a source): Visible light energy emitted per time from a source. Units: lumens (lm) lm=(1/683) W @ 555 nm Luminous Solid-Angle Intensity (of a source) Luminous power per steradian emitted from a pointlike source. Units: candelas (cd), cd = lm/Sr. Luminance (of a source): Luminous solid-angle intensity per projected area of an extended source. (The projected area foreshortens by cos , where is the observation angle relative to the surface normal.) Units: cd/cm2 = stilb, cd/m2 = nit, nit = 3183 lambert = 3.4 footlambert Luminous Emittance or Exitance (from a source): Luminous Power emitted per unit surface area of an extended source. Units: lm/cm2 Illuminance (to a receiver): Incident luminous power delivered per area to a receiver. Units: lux; lm/m2 = lux, lm/cm2 = phot, lm/ft2 = footcandle
S M L
cones, Vision Research, 39, 2901-2927 (1999); A. Stockman, and L. Sharpe, Spectral sensitivities of the middle- and long-wavelength sensitive cones derived from measurements in observers of known genotype, Vision Research, 40, 1711-1737 (2000). 14 The methods we use to represent color are very much tied to human physiology. Other species have photoreceptors that sense different wavelength ranges or do not sense color at all. For instance, Papilio butteries have six types of cone-like photoreceptors and certain types of shrimp have twelve. Reptiles have four-color vision for visible light, and pit vipers (a subgroup of snakes) have an additional set of eyes that look like pits on the front of their face. These pits are essentially pinhole cameras sensitive to infrared light, and give these reptiles crude night-vision capabilities. (Not surprisingly, pit vipers hunt most actively at night time.) On the other hand, some insects can perceive markings on owers that are only visible in the ultraviolet. Each of these species would
400
800
62
referred to as primaries. Different colors (i.e. the vectors in the color space) are created by mixing the primary light in different ratios. If we had three primaries that separately stimulated each type of rod (S, M, and L), we could recreate any color sensation exactly by mixing those primaries. However, by inspecting Fig. 2.9 you can see that this ideal set of primaries cannot be found because of the overlap between the S, M, and L curves. Any light that will stimulate one type of cone will also stimulate another. This overlap makes it impossible to display every possible color with three primaries. (Although it is possible to quantify all colors with three primaries, even if the primaries cant display the colorswell see how shortly.) The range of colors that can be displayed with a given set of primaries is referred to as the gamut of that color space. As your experience with computers suggests, we are able engineer devices with a very broad gamut, but there are always colors that cannot be displayed. The CIE1931 RGB15 color space is a very commonly encountered color space based on a series of experiments performed by W. David Wright and John Guild in 1931. In these experiments, test subjects were asked to match the color of a monochromatic test light source by mixing monochromatic primaries at 700 nm (R ), 546.1 nm (G ), and 435.8 nm (B ). The relative amount of R , G , and B light required to match the color at each test wavelength was recorded as the color (), shown in Fig. 2.10. Note that the color (), and b (), g matching functions r (), but matching functions sometimes go negative. This is most noticeable for r all three have negative values. These negative values indicate that the test color was outside the gamut of the primaries (i.e. the color of the test source could not be matched by adding primaries). In these cases, the observers matched the test light as closely as possible by mixing primaries, and then they added some of the primary light to the test light until the colors matched. The amount of primary light that had to be added to the test light was recorded as a negative number. In this way they were able to quantify the color, even though it couldnt be displayed using their primaries. It turns out that the eye responds essentially linearly with respect to color perception. That is, if an observer perceives one light source to have components (R 1 , G 1 , B 1 ) and another light source to have components (R 2 , G 2 , B 2 ), a mixture of the two lights will have components (R 1 + R 2 , G 1 + G 2 , B 1 + B 2 ). This linearity allows us to calculate the color components of an arbitrary light source with spectrum I () by integrating the spectrum against the color matching functions: R= d I ()r G= d I ()g B= I ()bd (2.63)
If R , G , or B turn out to be negative for a given I (), then that color of light falls outside the gamut of these particular primaries. However, the negative coordinates still provide a valid abstract representation of that color.
nd the color spaces we use to record and recreate color sensations very inaccurate. 15 This is not the RGB space you may have probably used on a computerthat space is referred to as sRGB. CIE is an abbreviation for the French Commission Internationale de lclairage, an international commission that denes lighting and color standards.
63
The RGB color space is an additive color model, where the primaries are added together to produce color and the absence of light gives black. Subtractive color models produce color using a background that reects all visible light equally so that it appears white (e.g. a piece of paper or canvas) and then placing absorbing pigments over the background to remove portions of the reected spectrum. Some color spaces use four basis vectors. For example, color printers use the subtractive CMYK color space (Cyan, Magenta, Yellow, and Black), and some television manufacturers add a fourth type of primary (usually yellow) to their display. The fourth basis vector increases the range of colors that can be displayed by these systems (i.e. it increases the gamut). However, the fourth basis vector makes the color space overdetermined and only helps in displaying colorswe can abstractly represent all colors using just three coordinates (in an appropriately chosen basis). Example 2.4
The CIE1931 XYZ color space is derived from the CIE1931 RGB space by the transformation X 0.49 0.31 0.20 R 1 Y = 0.17697 0.81240 0.01063 G (2.64) 0.17697 Z 0.00 0.01 0.99 B where X , Y , and Z are the color coordinates in the new basis. The matrix elements in (2.64) were carefully chosen to give this color space some desirable properties: none of the new coordinates ( X , Y , or Z ) are ever negative; the Y gives the photometric brightness of the light and the X and Z coordinates describe the color part (i.e. the chromatisity ) of the light; and the coordinates (1/3,1/3,1/3) give the color white. The XYZ coordinates do not represent new primaries, but rather linear combinations of the original primaries. Find the representation in the CIE1931 RGB basis for each of the basis vectors in the XYZ space. Solution: We rst invert the transformation matrix to nd R 0.4185 0.1587 0.08283 X G = 0.09117 0.2524 0.01571 Y B 0.0009209 0.002550 0.1786 Z Then we can see that X = 0.4185R 0.09117G + 0.0009209B , Y = 0.1587R + 0.2524G 0.002550B , and Z = 0.08283R + 0.01571G + 0.1786B . Because the XYZ primaries contain negative amounts of the physical RGB primaries, the XYZ basis is not physically realizable. However, it is extensively used because it can abstractly represent all colors using a triplet of positive numbers.
64
is the macroscopic eld in the medium, which includes a contribution from all of the dipoles. To avoid double-counting the dipoles own eld, we should replace E with Eactual E Edipole (2.65) and write q e rmicro = Eactual (2.66) That is, we ought not to allow the dipoles own eld to act on itself as we previously (inadvertently) did. Here Edipole is the average eld that a dipole contributes to its quota of space in the material. Since N is the number of dipoles per volume, each dipole occupies a volume 1/N . As will be shown below, the average eld due to a dipole16 centered in such a volume (symmetrically chosen) is Edipole = N q e rmicro 3 o E
1 N 3 o
(2.67)
Substitution of (2.67) and (2.66) into (2.65) yields Eeffective = E + Then ( 2.66) becomes q e rmicro = N Eactual 3 o Eactual = (2.68)
E
1 N 3 o
(2.69)
Now according to (2.16) the susceptibility is dened via P = 0 E, where E is the macroscopic eld. Also, the polarization is always based on the combined behavior of all of the dipoles P = N q e rmicro (see (2.31)). Therefore, the susceptibility is
N ()
o
() =
() 1 N3 o
(2.70)
This is known as the Clausius-Mossotti relation. In section 2.3, we only included the numerator of (2.70). The extra term in the denominator becomes important when N is sufciently large, which is the case for liquid or solid densities. Since we neglect absorption, from (2.25) we have = n 2 1, and we may write n2 1 = N / 0 1 N /3 (2.71)
0 0
=3
n2 1 n2 + 2
16 In principle, the detailed elds of nearby dipoles should also be considered rather than repre-
senting their inuence with the macroscopic eld. However, if they are symmetrically distributed the result is the same. See J. D. Jackson, Classical Electrodynamics, 3rd ed., Sect. 4.5 (New York: John Wiley, 1999). 17 This form of Clausius-Mossotti relation, in terms of the refractive index, was renamed the Lorentz-Lorenz formula, but probably undeservedly so, since it is essentially the same formula.
65
Example 2.5
Xenon vapor at STP (density 4.46105 mol/cm3 ) has index n = 1.000702 measured at wavelength 589nm. Use (a) the Clausius-Mossotti relation (2.70) and (b) the uncorrected formula (i.e. numerator only) to predict the index for liquid xenon with density 2.00 102 mol/cm3 Compare with the measured value of n = 1.332.18
Solution: At the low density, we may may safely neglect the correction in the denominator of (2.71) and simply write Natm / 0 = 1.0007022 1 = 1.404 103 . The liquid density Nliquid is 2.00 102 /4.46 105 = 449 times greater. Therefore, Nliquid / 0 = 449 1.404 103 = 0.630. (a) According to Clausius-Mossotti (2.71), the index is 0.630 n = 1+ = 1.341 1 0.630/3 (b) On the other hand, without the correction in the denominator, we get n= 1 + 0.630 = 1.277
We wish to compute the average eld within a cubic volume V = L 3 that symmetrically encompasses the dipole.19 We take the volume dimension L to be large compared to the dipole dimension d . Integrating the eld over this volume yields
qe Ed v = 4 0
L/2
L/2
L/2
dx
- L/2 L/2 - L/2
dy
- L/2 L/2
dz
+ yy + (z d /2) z xx
3/2 x 2 + y 2 + (z d /2)2
+ yy + (z + d /2) z xx
3/2 x 2 + y 2 + (z + d /2)2
qe = z 2 0
dx
- L/2 - L/2
dy
1 x 2 + y 2 + (L d )2 /4
1 x 2 + y 2 + (L + d )2 /4
of Xenon Liquid and Vapour, J. Phys. B: At. Mol. Phys. 1, 449-457 (1968). 19 Authors often obtain the same result using a spherical volume with the (usually unmentioned) conceptual awkwardness that spheres cannot be closely packed to form a macroscopic medium without introducing voids.
66
and y vanish since they involve odd functions integrated The terms multiplying x over even limits on either x or y , respectively. On the remaining term, the integration on z has been executed. Before integrating the remaining expression over x and y , we make the following approximation based on L >> d : 1 x 2 + y 2 + (L d )2 /4 = = 1 x 2 + y 2 + L 2 /4 1 x 2 + y 2 + L 2 /4 1
/2 1 x 2 +Ld y 2 +L 2 /4
Ld /4 x 2 + y 2 + L 2 /4
which will make integration considerably easier.20 Then integration over the y dimension brings us to21 qe d Ed v = z 4 0
L/2 L/2
dx
-L/2 -L/2
Ld y x 2 + y 2 + L 2 /4
3/2
qe d = z 4 0
L/2
L2d x x 2 + L 2 /4 x 2 + L 2 /2
-L/2
The nal integral is the same as twice the integral from 0 to L /2. Then, with x > 0, we can employ the variable change s = x 2 + L 2 /4 2d x = d s / s L 2 /4 and obtain qe d Ed v = z 4 0
L 2 /2
L2d s s s 2 L 4 /16
= z
L 2 /4
q e d 4 4 0 3
d and dividing by the volume 1/N , allotted to individual Reinstalling rmicro = z dipoles, brings us to the anticipated result (2.67).
E= 4 0 r 3
qe
d /2 rz
d 2 3/2 r d 1z r + 4r 2
d /2 r+z
d 2 3/2 d r 1+z r + 4r 2
qe d 4 0 r 3
) z ] ( zr [3r
r 3d z r d /r ]3/2 = [1 z = 1 2r
This dipole-eld expression, while useful for describing the eld surrounding the dipole, contains no information about the elds internal to the diple. Note that we integrate z through the origin, which would violate the above assumption r d . Alternatively, the inuence of the internal elds on our integral could be accomplished using a delta function as is done in J. D. Jackson, Classical Electrodynamics, 3rd ed., p. 149 (New York: John Wiley, 1999). 21 Two useful integral formulas are (0.61) and (0.61). 22 J. R. Reitz, F. J. Milford, and R. W. Christy, Foundations of Electromagnetic Theory 3rd ed., Sect. 6-3 (Reading, Massachusetts: Addison-Wesley, 1979).
67
charge, or volts) describes the potential energy that a charge would experience if placed at any given point in the eld. The electric eld and the potential are connected through E (r) = (r) (2.73) The energy U necessary to assemble a distribution of charges (owing to attraction or repulsion) can be written in terms of a summation over all of the charges (or charge density (r)) located within the potential: U= 1 2 (r ) (r ) d v
V
(2.74)
We consider the potential to arise from the charges themselves. The factor 1/2 is necessary to avoid double counting. To appreciate this factor consider just two point charges: We only need to count the energy due to one charge in the presence of the others potential to obtain the energy required to bring the charges together. A substitution of (1.1) for (r) into (2.74) gives U=
0
(r) E (r) d v
V
(2.75)
(r) E (r) d v
V
E (r) (r) d v
V
(2.76)
An application of the divergence theorem (0.11) on the rst integral and a substitution of (2.73) into the second integral yields U=
0
da + (r) E (r) n
S
E (r ) E (r ) d v
V
(2.77)
We can consider the volume V (enclosed by S) to be as large as we like, say a sphere of radius R , so that all charges are contained well within it. Then the surface integral over S vanishes as R since 1/R and E 1/R 2 , whereas d a R 2 . Then the total energy is expressed solely in terms of the electric eld: U=
All Space
u E (r ) d v
(2.78)
where u E (r)
0E
(2.79)
68
The energy U necessary to assemble a distribution of currents can be written in terms of a summation over all of the currents (or current density J (r)) located within the vector potential eld: U= 1 2 J (r ) A (r ) d v
V
(2.81)
As in (2.74), the factor 1/2 is necessary to avoid double counting the inuence of the currents on each other. Under the assumption of steady currents (no variations in time), we may substitute Amperes law (1.21) into (2.81), which yields U= 1 20 [ B (r)] A (r) d v
V
(2.82)
Next we employ the vector identity P0.8 from which the previous expression becomes 1 1 U= B (r) [ A (r)] d v [A (r) B (r)] d v (2.83) 20 20
V V
Upon substituting (2.80) into the rst equation and applying the Divergence theorem (0.11) on the second integral, this expression for total energy becomes U= 1 20 B (r ) B (r ) d v
V
1 20
da [A (r) B (r)] n
S
(2.84)
As was done in connection with (2.77), if we choose a large enough volume (a sphere with radius R ), the surface integral vanishes since A 1/R and B 1/R 2 , whereas d a R 2 . The total energy (2.84) then reduces to U=
All Space
u B (r) d v
(2.85)
B2 20
(2.86)
23 J. R. Reitz, F. J. Milford, and R. W. Christy, Foundations of Electromagnetic Theory 3rd ed., Sect.
Exercises
69
Exercises
Exercises for 2.3 The Lorentz Model of Dielectrics P2.1 P2.2 Verify that (2.35) is a solution to (2.34). Derive the Sellmeier equation n2 = 1 + A 2 vac
2 2 vac 0,vac
from (2.39) for a gas with negligible absorption (i.e. = 0, valid far from resonance 0 ), where 0,vac corresponds to frequency 0 and A is a constant. Many materials (e.g. glass, air) have strong resonances in the ultraviolet. In such materials, do you expect the index of refraction for blue light to be greater than that for red light? Make a sketch of n as a function of wavelength for visible light down to the ultraviolet (where 0,vac is located). P2.3 In the Lorentz model, take N = 1028 m3 for the density of bound electrons in an insulator (note that N is number per volume, not just number), and a single transition at 0 = 6 1015 rad/sec (in the UV), and damping = 0 /5 (quite broad). Assume E0 is 104 V/m. For three frequencies = 0 2, = 0 , and = 0 + 2 nd the magnitude and phase (relative to the phase of E0 e i (krt ) ) of the following quantities. Give correct SI units with each quantity. You dont need to worry about vector directions. (a) The charge displacement amplitude rmicro (2.35) (b) The polarization P() (c) The susceptibility (). What would the susceptibility be for twice the E-eld strength as before? For the following no phase is needed: (d) Find n and at the three frequencies. You will have to solve for the real and imaginary parts of (n + i )2 = 1 + (). (e) Find the three speeds of light in terms of c . Find the three wavelengths . (f) Find how far light penetrates into the material before only 1/e of the amplitude of E remains. Find how far light penetrates into the material before only 1/e of the intensity I remains. P2.4 (a) Use a computer graphing program and the Lorentz model to plot n and as a function of frequency for a dielectric (i.e. obtain graphs such as the ones in Fig. 2.5). Use these parameters to keep things
70
simple: p = 1, 0 = 10, and = 1; plot your function from = 0 to = 20. (b) Plot n and as a function of frequency for a material that has three resonant frequencies: 0 1 = 10, 1 = 1, f 1 = 0.5; 0 2 = 15, 2 = 1, f 2 = 0.25; and 0 3 = 25, 3 = 3, f 3 = 0.25. Use p = 1 for all three resonances, and plot the results from = 0 to = 30. Comment on your plots. Exercises for 2.4 Index of Refraction of a Conductor P2.5 For silver, the complex refractive index is characterized by n = 0.13 and = 4.0.24 Find the distance that light travels inside of silver before the eld is reduced by a factor of 1/e . Assume a wavelength of vac = 633 nm. What is the speed of the wave crests in the silver (written as a number times c )? Are you surprised? Use (2.27), (2.29), and (2.48) to estimate the index of silver at = 633nm. The density of free electrons in silver is N = 5.86 1028 m3 and the DC conductivity is = 6.62 107 C2 / (J m s).25 Compare with the actual index given in P2.5.
Answer: n + i = 0.02 + i 4.50
P2.6
P2.7
The dielectric model and the conductor model give identical results for n in the case of a low-density plasma where there is no restoring force (i.e. 0 = 0) and no dragging term (i.e. = 0). Use this to model the ionosphere (the uppermost part of the atmosphere that is ionized by solar radiation to form a low-density plasma). (a) If the index of refraction of the ionosphere is n = 0.9 for an FM station at = /2 = 100 MHz, calculate the number of free electrons per cubic meter. (b) What is the complex refractive index of the ionosphere for radio waves at 1160 kHz (KSL radio station)? Is this frequency above or below the plasma frequency? Assume the same density of free electrons as in part (a). For your information, AM radio reects better than FM radio from the ionosphere (like visible light from a metal mirror). At night, the lower layer of the ionosphere goes away so that AM radio waves reect from a higher layer.
P2.8
Use a computer to plot n and as a function of frequency for a conductor (obtain plots such as the ones in Fig. 2.6). Use these parameters to keep things simple, let = 0.02p and plot your function from = 0.6p to = 2p .
24 Handbook of Optical Constants of Solids, Edited by E. D. Palik (Elsevier, 1997). 25 G. Burns, Solid State Physics, p. 194 (Orlando: Academic Press, 1985).
Exercises
71
Exercises for 2.6 Irradiance of a Plane Wave P2.9 In the case of a linearly-polarized plane wave, where the phase of each vector component of E0 is the same, re-derive (2.61) directly from the real eld (2.21). For simplicity, you may ignore absorption (i.e. = 0). HINT: The time-average of cos2 k r t + is 1/2. P2.10 (a) Find the intensity (in W/cm2 ) produced by a short laser pulse (linearly polarized) with duration t = 2.5 1014 s and energy E = 100 mJ, focused in vacuum to a round spot with radius r = 5 m. (b) What is the peak electric eld (in V/)? HINT: The SI units of electric eld are N/C = V/m. (c) What is the peak magnetic eld (in T = kg/(s C)? P2.11 (a) What is the intensity (in W/cm2 ) on the retina when looking directly at the sun? Assume that the eyes pupil has a radius r pupil = 1 mm. Take the Suns irradiance at the earths surface to be 1.4 kW/m2 , and neglect refractive index (i.e. set n = 1). HINT: The Earth-Sun distance is d o = 1.5 108 km and the pupil-retina distance is d i = 22 mm. The radius of the Sun r Sun = 7.0 105 km is de-magnied on the retina according to the ratio d i /d o . (b) What is the intensity at the retina when looking directly into a 1 mW HeNe laser? Assume that the smallest radius of the laser beam is r waist = 0.5 mm positioned d o = 2 m in front of the eye, and that the entire beam enters the pupil. Compare with part (a). P2.12 Show that the magnetic eld of an intense laser with = 1 m becomes important for a free electron oscillating in the eld at intensities above 1018 W/cm2 . This marks the transition to relativistic physics. Nevertheless, for convenience, use classical physics in making the estimate. HINT: At lower intensities, the oscillating electric eld dominates, so the electron motion can be thought of as arising solely from the electric eld. Use this motion to calculate the magnetic force on the moving electron, and compare it to the electric force. The forces become comparable at 1018 W/cm2 .
Exercises for 2.A Radiometry, Photometry, and Color P2.13 () can be (), and b (), g The CIE1931 RGB color matching function r transformed using (2.64) to obtain color matching functions for the (), y (), and z (), plotted in Fig 2.12. As with the RGB XYZ basis: x color matching functions, the XYZ color matching functions can be
72
used to calculate the color coordinates in the XYZ basis for an arbitrary spectrum: X= I ()xd Y = I () yd Z= I ()zd (2.87)
() was chosen to be exactly the scoptic response curve (The function y (shown in Fig. 2.8), so that Y describes the photometric brightness of the light.)
400 500 600 wavelength (nm) 700
Obtain a copy of the XYZ color matching functions from www.cvrl.org and calculate the XYZ color coordinates for the spectrum I () = I 0 e (500 nm) P2.14
2
Figure 2.12 Color matching functions for the CIE XYZ color space
The color space youve probably encountered most is sRGB, used to represent color on computer displays. The sRGB coordinates are related to the XYZ coordinates by the transformation
R 3.2406 1.5372 0.4986 X G = 0.9689 1.8758 0.0415 Y B 0.0557 0.2040 1.0570 Z
where the XYZ coordinates need to be scaled to values similar to those accepted by the sRGB device (commonly 0 to 255) and then the sRGB , G , and B need to be scaled or clipped to t in the approcoordinates R priate range. (This scaling and clipping result from the fact that your monitor cannot display arbitrarily bright light.) Obtain a copy of the XYZ color matching functions from www.cvrl.org and use it to calculate the sRGB components for monochromatic light from 0 = 400 nm to 0 = 700 nm in 1 nm intervals. Make a plot of the individual sRGB values and also use the coordinates to display a rainbow. HINT: Matlab has all the functions you need to display the rainbow.
Chapter 3
73
74
material on the right as depicted in the Fig. 3.1. When a plane wave traveling in the direction ki is incident the boundary from the left, it gives rise to a reected vector traveling in the direction kr and a transmitted plane wave traveling in the direction kt . The incident and reected waves exist only to the left of the material interface, and the transmitted wave exists only to the right of the interface. The angles i , r , and t give the angles that each respective wave vector (ki , kr , and kt ) makes with the normal to the interface. For simplicity, well assume that both of the materials are isotropic here. (Chapter 5 discusses refraction for anisotropic materials.) In this case, ki , kr , and kt all lie in a single plane, referred to as the plane of incidence, (i.e. the plane represented by the surface of this page). We are free to orient our coordinate system in many different ways (and every textbook seems to do it differently!).2 We choose the y z plane to be the plane of incidence, with the z -direction normal to the interface and the x -axis pointing into the page. The electric eld vector for each plane wave is conned to a plane perpendicular to its wave vector. We are free to decompose the eld vector into arbitrary components as long as they are perpendicular to the wave vector. It is customary to choose one of the electric eld vector components to be that which lies within the plane of incidence. We call this p-polarized light, where p stands for parallel to the plane of incidence. The remaining electric eld vector component is directed normal to the plane of incidence and is called s-polarized light.. The s stands for senkrecht , a German word meaning perpendicular. Using this system, we can decompose the electric eld vector Ei into its p (p ) polarized component E i and its s -polarized component E i(s ) , as depicted in Fig. 3.1. The s component E i(s ) is represented by the tail of an arrow pointing into the page, or the x -direction in our convention. The other elds Er and Et are similarly split into s and p components as indicated in Fig. 3.1. All eld components are considered to be positive when they point in the direction of their respective arrows.3 Note that the s -polarized components are parallel for all three plane waves, whereas the p -polarized components are not (except at normal incidence) because each plane wave travels in a different direction. By inspection of Fig. 3.1, we can write the various wave vectors in terms of the and z unit vectors: y sin i + z cos i ki = k i y sin r z cos r kr = k r y sin t + z cos t kt = k t y Also by inspection of Fig. 3.1 (following the conventions for the electric elds indicated by the arrows), we can write the incident, reected, and transmitted
2 For example, our convention is different than that used by E. Hecht, Optics, 3rd ed., Sect. 4.6.2
Figure 3.1 Incident, reected, and transmitted plane wave elds at a material interface.
(3.1)
and reected elds are parallel for the s -component but anti parallel for the p -component.
75
Er = E r Et = E t
(3.2)
(p )
Each eld has the form (2.8), and we have utilized the k-vectors (3.1) in the exponents of (3.2). Now we are ready to connect the elds on one side of the interface to the elds on the other side. This is done using boundary conditions. As explained in appendix 3.A, Maxwells equations require that the component of E that are parallel to the interface must be the same on either side of the boundary. In and y components are parallel to the interface, and our coordinate system, the x and y components z = 0 denes the interface. This means that at z = 0 the x of the combined incident and reected elds must equal the corresponding components of the transmitted eld:
(s ) cos i + x E i(s ) e i (ki y sin i i t ) + E r y cos r + x Er Ei y e i (kr y sin r r t ) (p ) (p ) (s ) cos t + x Et = Et y e i (kt y sin t t t ) (p )
Figure 3.2 Animation of s- and p-polarized elds incident on an interface as the angle of incidence is varied.
(3.3)
Since this equation must hold for all conceivable values of t and y , we are compelled to set all the phase factors in the complex exponentials equal to each other. The time portion of the phase factors requires the frequency of all waves to be the same: i = r = t (3.4) (We could have guessed that all frequencies would be the same; otherwise wave fronts would be annihilated or created at the interface.) Similarly, equating the spatial terms in the exponents of (3.3) requires k i sin i = k r sin r = k t sin t (3.5)
Now recall from (2.19) the relations k i = k r = n i /c and k t = n t /c . With these relations, (3.5) yields the law of reection r = i and Snells law n i sin i = n t sin t (3.7) The three angles i , r , and t are not independent. The reected angle matches the incident angle, and the transmitted angle obeys Snells law. The phenomenon of refraction refers to the fact that i and t are different. Because the exponents are all identical, (3.3) reduces to two relatively simple and y ): equations (one for each dimension, x
(s ) (s ) E i(s ) + E r = Et
(3.6)
as well as
an improved method for measuring the circumference of the earth. He is most famous for his rediscovery of the law of refraction in 1621. (The law was known (in table form) to the ancient Greek mathematician Ptolemy, to Persian engineer Ibn Sahl (900s), and to Polish philosopher Witelo (1200s).) Snell authored several books, including one on trigonometry, published a year after his death. (Wikipedia)
(3.8)
76
and
Ei
(p )
+ Er
(p )
cos i = E t cos t
(p )
(3.9)
We have derived these equations from the boundary condition (3.52) on the parallel component of the electric eld. This set of equations has four unknowns (p ) (p ) (s ) (s ) (E r , E r , E t , and E t ), assuming that we pick the incident elds. We require two further equations to solve the system. These are obtained using the separate boundary condition on the parallel component of magnetic elds given in (3.56) (also discussed in appendix 3.A). From Faradays law (1.3), we have for a plane wave (see (2.56)) B= kE n E = u c (3.10)
k/k is a unit vector in the direction of k. We have also utilized (2.19) where u for a real index. This expression is useful for writing Bi , Br , and Bt in terms of the electric eld components that we have already introduced. When injecting (3.1) and (3.2) into (3.10), the incident, reected, and transmitted magnetic elds turn out to be ni (p ) E i + E i(s ) z sin i + y cos i e i [ki ( y sin i +z cos i )i t ] x c nr (p ) (s ) Er + Er sin r y cos r e i [kr ( y sin r z cos r )r t ] x Br = z c nt (p ) (s ) Et + Et sin t + y cos t e i [kt ( y sin t +z cos t )t t ] x Bt = z c Bi =
(3.11)
Next, we apply the boundary condition (3.56), namely that the components of B and y dimensions) are the same4 on either parallel to the interface (i.e. in the x side of the plane z = 0. Since we already know that the exponents are all equal and that r = i and n i = n r , the boundary condition gives ni ni nt (p ) (p ) (p ) (s ) (s ) cos i + Er Er cos i = cos t E i + E i(s ) y Et + Et x x y x y c c c (3.12) dimenAs before, (3.12) reduces to two relatively simple equations (one for the x dimension): sion and one for the y ni E i and
(s ) (s ) n i E i(s ) E r cos i = n t E t cos t (p )
Er
(p )
= nt E t
(p )
(3.13)
(3.14)
These two equations together with (3.8) and (3.9) allow us to solve for the reected Er and transmitted elds Et for the s and p polarization components. However, (3.8), (3.9), (3.13), and (3.14) are not yet in their most convenient form.
4 We assume the permeability is the same everywhereno magnetic effects. 0
77
(3.15)
If we add these two equations, we get sin i cos t 2E i(s ) = 1 + E (s ) sin t cos i t and after dividing by E i and doing a little algebra, it turns into
(s ) Et (s )
(3.16)
print until years later. Fresnel made huge advances in the understanding of reection, diraction, polarization, and birefringence. In 1824 Fresnel wrote to Thomas Young, All the compliments that I have received from Arago, Laplace and Biot never gave me so much pleasure as the discovery of a theoretic truth, or the conrmation of a calculation by experiment. Augustin Fresnel is a hero of one of the authors of this textbook. (Wikipedia)
Ei
(s )
To get the ratio of reected to incident, we subtract (3.15) from (3.8) to obtain sin i cos t (s ) 2E r = 1 E (s ) sin t cos i t (3.17)
78
E i(s )
The ratio of the reected and transmitted eld components to the incident eld components are specied by the following coefcients, called the Fresnel coefcients:
(s ) Er
E i(s ) E (s ) ts t E i(s ) (p ) Er r p (p ) Ei (p ) E tp t (p ) Ei
rs
sin t cos i sin i cos t sin (i t ) n i cos i n t cos t = = sin t cos i + sin i cos t sin (i + t ) n i cos i + n t cos t 2 sin t cos i 2 sin t cos i 2n i cos i = = sin t cos i + sin i cos t sin (i + t ) n i cos i + n t cos t cos t sin t cos i sin i tan (i t ) n i cos t n t cos i = = cos t sin t + cos i sin i tan (i + t ) n i cos t + n t cos i
(3.18)
(3.19)
(3.20)
2 cos i sin t 2 cos i sin t 2n i cos i = = cos t sin t + cos i sin i sin (i + t ) cos (i t ) n i cos t + n t cos i (3.21)
-0.5
-1
20
40
60
80
All of the above forms of the Fresnel coefcients are potentially useful, depending on the problem at hand. Remember that the angles in the coefcient are not independently chosen, but are subject to Snells law (3.7). (The right-most expression for each coefcient is obtained from the rst form using Snells law). The Fresnel coefcients pin down the electric eld amplitudes on the two sides of the boundary. They also keep track of phase shifts at a boundary. In Fig. 3.3 we have plotted the Fresnel coefcients for the case of an air-glass interface. Notice that the reection coefcients are sometimes negative in this plot, which corresponds to a phase shift of upon reection (remember e i = 1). Later we will see that when absorbing materials are encountered, more complicated phase shifts can arise due to the complex index of refraction.
Figure 3.3 The Fresnel coefcients plotted versus i for the case of an air-glass interface with n i = 1 and n t = 1.5.
Rs
I i(s )
E i(s )
= |r s |
and
Rp
Ir
(p )
(p ) Ii
Er
(p ) 2
(p ) 2 Ei
= rp
(3.22)
These expressions are applied individually to each polarization component (s or p ). The intensity reected for each of these orthogonal polarizations is additive
79
because the two electric elds are orthogonal and cannot interfere with each other. The total reected intensity is therefore
(s ) (total) Ir = Ir + Ir (p )
= R s I i(s ) + R p I i
2
(p )
(3.23)
1 = ni 0 c 2
E i(s ) + E i
(p ) 2
(3.24)
Since intensity is power per area, we can rewrite (3.23) as incident and reected power: (p ) (p ) (total) (s ) Pr = Pr + P r = R s P i(s ) + R p P i (3.25)
(total) Using this expression and requiring that energy be conserved (i.e. P i(total) = P r + (total) Pt ), we nd that the portion of the power that transmits is
P t(total) = P i(s) + P i
(p)
(s) Pr + Pr
(p) (p)
= (1 R s ) P i(s) + 1 R p P i
(3.26)
From this expression we see that the transmittance (i.e. the fraction of the light that transmits) for either polarization is Ts 1 Rs and Tp 1 Rp (3.27)
Figure 3.4 shows typical reectance and transmittance values for an air-glass interface. You might be surprised at rst to learn that T s = |t s |
2
20
40
60
80
and
Tp = tp
(3.28)
Figure 3.4 The reectance and transmittance plotted versus i for the case of an air-glass interface with n i = 1 and n t = 1.5.
However, recall that the transmitted intensity (in terms of the transmitted elds) depends also on the refractive index. The Fresnel coefcients t s and t p relate the bare electric elds to each other, whereas the transmitted intensity (similar to (3.24)) is 2 1 (p ) (p ) 2 (s ) I t(total) = I t(s ) + I t = n t 0 c E t + Et (3.29) 2 Therefore, we expect T s and T p to depend on the ratio of the refractive indices n t and n i as well as on the squares of t s and t p . There is another more subtle reason for the inequalities in (3.28). Consider a lateral strip of light associated with a plane wave incident upon the material interface in Fig. 3.5. Upon refraction into the second medium, the strip is seen to change its width by the factor cos t / cos i . This is a geometrical effect, owing to the change in propagation direction at the interface. The change in direction alters the intensity (power per area) but not the power. In computing the transmittance, we must remove this geometrical effect from the ratio of the intensities, which leads to the following transmittance coefcients: n t cos t | t s |2 n i cos i n t cos t 2 Tp = tp n i cos i Ts =
80
Note that (3.30) is valid only if a real angle t exists; it does not hold when the incident angle exceeds the critical angle for total internal reection, discussed in section 3.5. In that situation, we must stick with (3.27).
Example 3.2
Show analytically for p -polarized light that R p + Tp = 1, where R p is given by (3.22) and T p is given by (3.30). Solution: From (3.20) we have Rp = = cos t sin t cos i sin i cos t sin t + cos i sin i
2
cos2 t sin2 t 2 cos i sin i cos t sin t + cos2 i sin2 i (cos t sin t + cos i sin i )2
From (3.21) and (3.30) we have Tp = = = Then Rp + Tp = cos2 t sin2 t + 2 cos i sin i cos t sin t + cos2 i sin2 i (cos t sin t + cos i sin i )2 (cos t sin t + cos i sin i )2 (cos t sin t + cos i sin i )2 n t cos t 2 cos i sin t n i cos i cos t sin t + cos i sin i
2
sin i cos t 4 cos2 i sin2 t sin t cos i (cos t sin t + cos i sin i )2 4 cos i sin t sin i cos t (cos t sin t + cos i sin i )2
David Brewster (17811868, Scottish) was born in Jedburgh, Scottland. His father was a teacher and wanted David to become a clergyman. At age twelve, David went to the University of Edinburgh for that purpose, but his inclination for natural science soon became apparent. He became licensed to preach, but his interests in science distracted him from that profession, and he spent much of his time studying diraction. Taking an empirical approach, Brewster independently discovered many of the same things usually credited to Fresnel. He even made a dioptric apparatus for lighthouses before Fresnel developed his. Brewster became somewhat famous in his day for the development of the kaleidoscope and stereoscope for enjoyment by the general public. Brewster was a prolic science writer and editor throughout his life. Among his works is an important biography of Isaac Newton. He was knighted for his accomplishments in 1831. (Wikipedia)
=1
By inspecting Fig. 3.1, we see that this condition occurs when the reected and transmitted wave vectors, kr and kt , are perpendicular to each other. If we insert
81
(3.31) into Snells law (3.7), we can solve for the incident angle i that gives rise to this special circumstance: n i sin i = n t sin i = n t cos i 2 (3.32)
The angle that satises this equation, in terms of the refractive indices, is readily found to be nt B = tan1 (3.33) ni We have replaced the specic i with B in honor of Sir David Brewster who rst discovered the phenomenon. The angle B is called Brewsters angle. At Brewsters angle, no p -polarized light reects (see L 3.4). Physically, the p -polarized light cannot reect because kr and kt are perpendicular. A reection would require the microscopic dipoles at the surface of the second material to radiate along their axes, which they cannot do. Maxwells equations know about this, and so everything is nicely consistent.
Oscillating Dipole
0
270
90
180
Figure 3.6 The intensity radiation pattern of an oscillating dipole as a function of angle. Note that the dipole does not radiate along the axis of oscillation, giving rise to Brewsters angle for reection.
ni sin i nt
sin2 i 1
(i > c ) (3.37)
(see P0.19). In this case, t is a complex number. However, we do not assign geometrical signicance to it in terms of any direction. Actually, we dont even need to know the value for t ; we need only the values for sin t and cos t , as specied in (3.36) and (3.37). Even though sin t is greater than one and cos t is imaginary, we can use their values to compute r s , r p , t s , and t p . (Complex notation is wonderful!)
5 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 1.5.4 (Cambridge University Press, 1999).
82
Upon substitution of (3.36) and (3.37) into the Fresnel reection coefcients (3.18) and (3.20) we obtain
ni nt ni nt
cos i i cos i + i
rs =
n i2 2 nt n i2 2 nt
and rp =
ni cos i i n t n i2 2 nt n i2 2 nt
cos i + i
ni nt
These Fresnel coefcients can be manipulated (see P3.9) into the forms 2 n n t i 2 sin 1 (i > c ) (3.40) r s = exp 2i tan1 i n i cos i n 2
t
and
ni r p = exp 2i tan1 n t cos i 2 sin 1 i n2
n i2
t
(i > c ) (3.41)
Figure 3.7 Animation of light waves incident on an interface both below and beyond the critical angle.
Incident Wave
Evanescent Wave
Figure 3.8 A wave experiencing total internal reection creates an evanescent wave that propagates parallel to the interface. (The reected wave is not shown.)
Each coefcient has a different phase (note n i /n t vs. n t /n i in the expressions), which means that the s- and p -polarized elds experience different phase shifts upon reection. Nevertheless, we denitely have |r s | = 1 and r p = 1. We rightly conclude that 100% of the light reects. The transmittance is zero as dictated by (3.27). We emphasize that one should not employ (3.29) or (3.30) in the case of total internal reection, as the imaginary t makes the geometric factor in this equation invalid. Even with zero transmittance, the boundary conditions from Maxwells equations (as worked out in appendix 3.A) require that the elds be non-zero on the transmitted side of the boundary, meaning t s = 0 and t p = 0. While this situation may seem like a contradiction at rst, it is an accurate description of what actually happens. The coefcients t s and t p characterize evanescent waves that exist on the transmitted side of the interface. The evanescent wave travels parallel to the interface so that no energy is conveyed away from the interface deeper into the medium on the transmission side. To compute the explicit form of the evanescent wave,6 we plug (3.36) and (3.37) into the transmitted eld (3.2):
Et = E t
(s ) cos t z sin t + x Et y e i [kt ( y sin t +z cos t )t ] 2 k t z n n i (p ) i i sin i + x t s E i(s ) e = t p E i y sin2 i 1 z 2 nt nt (p )
n2 i 2 nt
sin2 i 1
i k t y n i sin i t
t
(3.42)
6 G. R. Fowles, Introduction to Modern Optics, 2nd ed., Sect 2.9 (New York: Dover, 1975).
83
Figure 3.8 plots the evanescent wave described by (3.42) along with the associated incident wave. The phase of the evanescent wave indicates that it propagates parallel to the boundary (in the y -dimension). Its strength decays exponentially away from the boundary (in the z -dimension). We leave the calculation of t s and t p as an exercise (P3.10).
0.98
0.96
0.94 0
-0.5 p
-p
20
40
60
80
Figure 3.9 The reectances (top) with associated phases (bottom) for silver, which has index n = 0.13 and = 4.05. Note the minimum of R p corresponding to a kind of Brewsters angle.
7 See M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 14.2 (Cambridge University Press,
1999).
84
These expressions are tedious to evaluate. When evaluating the expressions, it is usually desirable to put them into the form r s = |r s | e i s and r p = r p e i p (3.47)
(3.48)
We refrain from putting (3.45) and (3.46) into this form using the general expressions; we would get a big mess. It is a good idea to let your calculator or a computer do it after a specic value for N n + i is chosen. An important point to notice is that the phases upon reection can be very different for s and p -polarization components (i.e. p and s can be very different). This is true in general, even when the reectivity is high (i.e. |r s | and r p on the order of unity). Brewsters angle exists also for surfaces with complex refractive index. However, in general the expressions (3.46) and (3.48) do not go to zero at any incident angle i . Rather, the reection of p -polarized light can go through a minimum at some angle i , which we refer to as Brewsters angle (see Fig. 3.9). This minimum is best found numerically since the general expression for r p in terms of n and and as a function of i can be unwieldy.
d
S d
Figure 3.10 Interface of two materials.
applied to the rectangular contour depicted in Fig. 3.10. We perform the path integration on the left-hand side around the loop as follows: E d = E 1|| d E 1
1 E 2 2 E 2|| d + E 2 2 + E 1 1
Here, E 1|| refers to the component of the electric eld in the material with index n 1 that is parallel to the interface. E 1 refers to the component of the electric eld in the material with index n 1 which is perpendicular to the interface. Similarly, E 2|| and E 2 are the parallel and perpendicular components of the electric eld
85
in the material with index n 2 . We have assumed that the rectangle is small enough that the elds are uniform within the half rectangle on either side of the boundary. Next, we shrink the loop down until it has zero surface area by letting the lengths 1 and 2 go to zero. In this situation, the right-hand side of Faradays law (3.49) goes to zero da 0 Bn
S
(3.51)
and we are left with E 1|| = E 2|| (3.52) This simple relation is a general boundary condition, which is met at any material interface. The component of the electric eld that lies in the plane of the interface must be the same on both sides of the interface. We now derive a similar boundary condition for the magnetic eld using the integral form of Amperes law:8 B d = 0
C S
J+
E da n t
(3.53)
As before, we are able to perform the path integration on the left-hand side for the geometry depicted in the gure, which gives B d = B 1|| d B 1
1 B 2 2 B 2|| d + B 2 2 + B 1 1
The notation for parallel and perpendicular components on either side of the interface is similar to that used in (3.50). Again, we can shrink the loop down until it has zero surface area by letting the lengths 1 and 2 go to zero. In this situation, the right-hand side of (3.53) goes to zero (ignoring the possibility of surface currents): J+
S
0
E da 0 n t
(3.55)
and we are left with B 1|| = B 2|| (3.56) This is a general boundary condition that must be satised at the material interface.
8 This form can be obtained from (1.4) by integration over the surface S in Fig. 3.10 and applying
86
Exercises
Exercises for 3.2 The Fresnel Coefcients P3.1 P3.2 Derive the Fresnel coefcients (3.20) and (3.21) for p -polarized light. Verify that each of the alternative forms given in (3.18)(3.21) are equivalent (given Snells law). Show that at normal incidence (i.e. i = t = 0) the Fresnel coefcients reduce to
i 0
lim r s = lim r p =
i 0
nt ni nt + ni
and
i 0
lim t s = lim t p =
i 0
2n i nt + ni
P3.3
Undoubtedly the most important interface in optics is when air meets glass. Use a computer to make the following plots for this interface as a function of the incident angle. Use n i = 1 for air and n t = 1.6 for glass. Explicitly label Brewsters angle on all of the applicable graphs. (a) r p and t p (plot together on same graph) (b) R p and T p (plot together on same graph) (c) r s and t s (plot together on same graph) (d) R s and T s (plot together on same graph)
Exercises for 3.3 Reectance and Transmittance L3.4 (a) In the laboratory, measure the reectance for both s and p polarized light from a at glass surface at about ten points. You can normalize the detector by placing it in the incident beam of light before the glass surface. Especially watch for Brewsters angle (described in section 3.4). Figure 3.11 illustrates the experimental setup. (video)
High sensitivity detector Slide detector with the beam
Polarizer Laser
(b) Use a computer to calculate the theoretical air-to-glass reectance as a function of incident angle (i.e. plot R s and R p as a function of i ).
Exercises
87
Take the index of refraction for glass to be n t = 1.54 and the index for air to be one. Plot this theoretical calculation as a smooth line on a graph. Plot your experimental data from (a) as points on this same graph (not points connected by lines). P3.5 A pentaprism is a ve-sided reecting prism used to deviate a beam of light by 90 without inverting an image (see Fig. 3.12). Pentaprisms are used in the viewnders of SLR cameras. (a) What prism angle is required for a normal-incidence beam from the left to exit the bottom surface at normal incidence? (b) If all interfaces of the pentaprism are uncoated glass with index n = 1.5, what fraction of the intensity would get through this system for a normal incidence beam? Compute for p -polarized light, and include transmission through the rst and nal surfaces as well as reection at the two interior surfaces. NOTE: The transmission you calculate will be very poor. The reecting surfaces on pentaprisms are usually treated with a high-reection coating and the transmitting surfaces are treated with anti-reection coatings. P3.6 Show analytically for s -polarized light that R s + T s = 1, where R s is given by (3.22) and T s is given by (3.30).
Figure 3.12
Exercises for 3.4 Brewsters Angle P3.7 Find Brewsters angle for glass n = 1.5.
Exercises for 3.5 Total Internal Reection P3.8 Diamonds have an index of refraction of n = 2.42 which allows total internal reection to occur at relatively shallow angles of incidence. Gem cutters choose facet angles that ensure most of the light entering the top of the diamond will reect back out to give the stone its expensive sparkle. One such cut, the Eulitz Brilliant" cut, is shown in Fig. 3.13. (a) What is the critical angle for diamond? (b) One way to spot fake diamonds is by noticing reduced brilliance in the sparkle. What fraction of p -polarized light (intensity) would make it from point A to point B in the diagram for a diamond? If a piece of fused quartz (n = 1.46) was cut in the Eulitz Brilliant shape, what fraction of p -polarized light (intensity) would make it from point A to point B in the diagram?
88
(c) What is the phase shift due to reection for s-polarized light at the rst internal reection depicted in the gure (incident angle 40.5 ) in diamond? What is the phase shift in fused quartz? P3.9 Derive (3.40) and (3.41) and show that R s = 1 and R p = 1. HINT: See problem P0.15. Compute t s and t p in the case of total internal reection. Put your answer in polar form (i.e. t = |t |e i ). Use a computer to plot the air-to-water transmittance as a function of incident angle (i.e. plot (3.27) as a function of i ). Also plot the water-to-air transmittance on a separate graph. Plot both T s and T p on each graph. The index of refraction for water is n = 1.33. Take the index of air to be one. Light (vac = 500 nm) reects internally from a glass surface (n = 1.5) surrounded by air. The incident angle is i = 45 . An evanescent wave travels parallel to the surface on the air side. At what distance from the surface is the amplitude of the evanescent wave 1/e of its value at the surface?
P3.10
P3.11
P3.12
Exercises for 3.6 Reections from Metal P3.13 Using a computer, plot |r s |, |r p | versus i for silver (n = 0.13 and = 4.05). Make a separate plot of the phases s and p from (3.47) and (3.48). Clearly label each plot, and comment on how the phase shifts are different from those experienced when reecting from glass. 9 Find Brewsters angle for silver (n = 0.13 and = 4.0) by calculating R p and nding its minimum. You will want to use a computer program to do this (Matlab, Maple, Mathematica, etc.). The complex index for silver is given by n = 0.13 and = 4.0. Find r s and r p when reecting from vacuum (n = 1, = 0) at i = 80 and put them into the forms (3.47) and (3.48).
80 s p
P3.14
P3.15
9 Are you surprised that the real part of the index can be less than one?
Chapter 4
90
if there are many mirrors in an optical system. Dielectric multilayer coatings also have the advantage of being more durable and less prone to damage from high-intensity lasers.
91
middle region.1 As of yet, we do not know the amplitudes or phases of the net forward and net (s ) backward traveling plane waves in the middle layer. We denote them by E 1 and (p ) (p ) (s ) E 1 or by E 1 and E 1 , separated into their s and p components as usual. Similarly, (p ) (p ) (s ) (s ) E0 and E 0 as well as E 2 and E 2 are understood to include light that leaks through the boundaries from the middle region. Thus, we need only concern ourselves with the ve plane waves depicted in Fig. 4.1. The various plane-wave elds are connected to each other at the boundaries via the single-boundary Fresnel coefcients (3.18)(3.21). At the rst surface we dene
0 rs
sin 1 cos 0 sin 0 cos 1 sin 1 cos 0 + sin 0 cos 1 2 sin 1 cos 0 0 1 ts sin 1 cos 0 + sin 0 cos 1
1
0 rp
cos 1 sin 1 cos 0 sin 0 cos 1 sin 1 + cos 0 sin 0 2 cos 0 sin 1 0 1 tp cos 1 sin 1 + cos 0 sin 0
1
(4.1)
The notation 0 1 indicates the rst surface from the perspective of starting on the incident side and propagating towards the middle layer. The Fresnel coefcients for the backward traveling light approaching the rst interface from within the middle layer are given by
1 rs 1 ts 0 0 = r s 1 1 rp 1 tp 0 0 = r p 1
(4.2)
where 1 0 again indicates connections at the rst interface, but from the perspective of beginning inside the middle layer. Finally, the single-boundary coefcients for light approaching the second interface are
1 rs
sin 2 cos 1 sin 1 cos 2 sin 2 cos 1 + sin 1 cos 2 2 sin 2 cos 1 1 2 ts sin 2 cos 1 + sin 1 cos 2
2
1 rp
cos 2 sin 2 cos 1 sin 1 cos 2 sin 2 + cos 1 sin 1 2 cos 1 sin 2 1 2 tp cos 2 sin 2 + cos 1 sin 1
2
(4.3)
In a similar fashion, the notation 1 2 indicates connections made at the second interface from the perspective of beginning in the middle layer. To solve for the connections between the ve elds depicted in Fig.4.1, we will need four equations for either s or p polarization (taking the incident eld as a given). To simplify things, we will consider s -polarized light in the upcoming analysis. The equations for p -polarized light look exactly the same; just replace the subscript s with p . Through the remainder of this section and the next, we will continue to economize by writing the equations only for s -polarized light with the understanding that they apply equally well to p -polarized light.
1 The sum of parallel plane waves j j
92
The forward-traveling wave in the middle region arises from both a transmission of the incident wave and a reection of the backward-traveling wave in the middle region at the rst interface. Using the Fresnel coefcients, we can write (s ) (s ) (s ) E1 as the sum of elds arising from E 0 and E 1 as follows:
(s ) 0 1 (s ) 1 0 (s ) E1 = ts E0 + r s E1
(4.4)
0 1 1 0 The factor t s and r s are the single-boundary Fresnel coefcients selected (s ) appropriately from (4.1). Similarly, the overall reected eld E 0 , is given by the reection of the incident eld and the transmission of the backward-traveling eld in the middle region according to (s ) 0 1 (s ) 1 0 (s ) E0 = rs E0 + ts E1
(4.5)
Two connections done; two to go. Before we continue, we need to specify an origin so that we can calculate phase shifts associated with propagation in the middle region. Propagation was not an issue in the single-boundary problem studied back in chapter 3. However, in the double-boundary problem, the thickness of the middle region dictates phase variations that strongly inuence the result. We take the origin to be located on the rst interface, as shown in Fig. 4.1. Since all elds in (4.4) and (4.5) are evaluated at the origin ( y , z ) = (0, 0), there are no phase factors needed. We will connect the plane-wave elds across the second interface at the point (s ) i k1 r d . The appropriate phase-adjusted2 eld at ( y , z ) = (0, d ) is E 1 r=z e = (s ) i k 1 d cos 1 (s ) E1 e , since E 1 is the eld at the origin ( y , z ) = (0, 0). The transmitted eld in the nal medium arises only from the forward-traveling eld in the middle region, and at our selected point it is
(s ) 1 2 (s ) i k 1 d cos 1 E2 = ts E1 e
(4.6)
(s ) Note that E 2 stand for the transmitted eld at the point ( y , z ) = (0, d ); its local phase can be built into its denition so no need to write an explicit phase. The backward-traveling plane wave in the middle region arises from the reection of the forward-traveling plane wave in that region: (s ) i k 1 d cos 1 1 2 (s ) i k 1 d cos 1 E1 e = rs E1 e
(4.7)
(s ) Like before, E 1 is referenced to the origin ( y , z ) = (0, 0). Therefore, the factor i k1 r i k 1 d cos 1 e =e is needed at ( y , z ) = (0, d ). The relations (4.4)(4.7) permit us to nd overall transmission and reection coefcients for the two-interface problem.
Example 4.1
Derive the transmission coefcient that connects the nal transmitted eld to the (s ) (s ) tot incident eld for the double-interface problem according to t s E2 /E 0 .
2 In the middle region, k
1
93
(4.8)
e i k1 d cos 1
(4.9)
Next, substituting both (4.8) and (4.9) into (4.4) yields the connection we seek between the incident and transmitted elds:
(s ) E2 0 e i k1 d cos 1 = t s 1 2 ts 1 (s ) 1 E0 + rs 0 (s ) E2 1 rs 2 2
ts
e i k1 d cos 1
(4.10)
0 ts 1 1 rs
1 e i k1 d cos 1 t s 0 1 rs 2
e 2i k1 d cos 1
(4.11)
tot The coefcient t s derived in Example 4.1 connects the amplitude and phase of the incident eld to the amplitude and phase of the transmitted eld in a manner similar to the single-boundary Fresnel coefcients. The numerator of (4.11) reminds us of the physics of the situation: the eld transmits through the rst interface, acquires a phase due to propagating through the middle layer, and transmits through the second interface. The denominator of (4.11) modies the result to account for feedback from multiple reections in the middle region.3 The overall reection coefcient is found to be (see P4.1) tot rs
(s ) E0 0 = rs (s ) E0 1 0 1 i k 1 d cos 1 1 2 i k 1 d cos 1 1 ts e rs e ts 1 0 1 2 i 2k 1 d cos 1 1 rs rs e 0
(4.12)
Again the equation reminds us of the basic physics, and we did not completely simplify the expression to make this more apparent. There is an initial reection from the rst interface. That light is joined by light that transmits through the rst interface (looking at only the numerator of the second term), propagates through the middle layer, reects from the second interface, propagates back through the middle layer, and transmits back through the rst interface. The denominator of the second term accounts for the effects of multiple-reection feedback.
94
a simpler form than the reection coefcient (4.12), it will be easier to calculate the total transmittance T stot and obtain the reectance, if desired, from the relationship tot T stot + R s =1 (4.13) When the transmitted angle 2 is real, we may write the fraction of the transmitted power as in (3.30): T stot = (p can be switched for s ) n 2 cos 2 tot t n 0 cos 0 s
2 2 2
(2 real) (4.14)
2
(Before squaring, we multiplied the top and bottom of (4.11) by e i k1 d cos 1 to make the denominator more symmetric for later convenience.) Equation (4.14) remains valid even if the angle 1 is complex. Thus, it can be applied to the case of evanescent waves tunneling through a gap where 0 lies beyond the critical angle for total internal reection from the middle layer. This will be studied further in section 4.3. When there are no evanescent waves in any of the regions (i.e. 0 and 1 both do not exceed critical angle) we can simplify (4.14) into the following useful form (see P4.3):4 T smax T stot = (1 and 2 real) (4.15) 1 + F s sin2 2 where T smax 1 T s0 1 T s1
2
1 0 1 Rs Rs 0
(4.16) (4.17)
2k 1 d cos 1 + r s1 and Fs 4 1
1 0 1 Rs Rs
+ r s1
2
Rs Rs
(4.18)
The quantity T smax is the maximum possible transmittance of power through the two surfaces. The single-interface transmittances (T s0 1 and T s1 2 ) and reectances 1 0 1 2 (R s and R s ) are calculated from the single-interface Fresnel coefcients in the usual way as described in chapter 3. The numerator of T smax represents the combined transmittances for the two interfaces without considering feedback due to multiple reections. The denominator enhances this value to account for reinforcing feedback in the middle layer. The phase delay experienced by the plane wave in the middle region is described by . The term 2k 1 d cos 1 represents the phase delay acquired during round-trip propagation in the middle region. The terms r s1 0 and r s1 2 account
4 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 7.6.1 (Cambridge University Press, 1999).
95
for possible phase shifts upon reection from each interface. They are dened indirectly from the single-boundary Fresnel reection coefcients:
1 rs 0 1 = rs 0
i r 1
s
and
1 rs
1 = rs
i r 1
s
(4.19)
If all the indices in the double-boundary system are real, then r s1 0 and r s1 2 can only be zero or (i.e. the coefcients can only be positive or negative real numbers). F s is called the coefcient of nesse (not to be confused with reecting nesse dened in section 4.6), which determines how strongly the transmittance is inuenced when is varied (for example, through varying d or the wavelength vac ). Example 4.2
Consider a beam splitter designed for s -polarized light incident on a substrate of glass (n = 1.5) at 45 as shown in Fig. 4.2. A thin coating of zinc sulde (n = 2.32) is applied to the front of the glass to cause about half of the light to reect. A magnesium uoride (n = 1.38) coating is applied to the back surface of the glass to minimize reections at that surface.5 Each coating constitutes a separate doubleinterface problem. The front coating is deferred to problem P4.5. In this example, nd the highest transmittance possible through the antireection lm at the back of the beam splitter and the smallest possible d 2 that accomplishes this for light with wavelength vac = 633 nm. Solution: For the back coating, we have n 0 = 1.5, n 1 = 1.38, and n 2 = 1. We can nd 0 and 1 from 2 = 45 using Snells law n 1 sin 1 = sin 2 n 0 sin 0 = sin 2 1 = sin
1
Anti-reflection coating
46% 54%
Glass
0 = sin1
sin (1 2 ) sin (30.82 45 ) = = 0.253 sin (1 + 2 ) sin (30.82 + 45 ) sin (1 0 ) sin (30.82 28.13 ) = = 0.0549 sin (1 + 0 ) sin (30.82 + 28.13 )
1 rs
= , r s1
=0
5 We ignore possible feedback between the front and rear coatings. Since the antireection
lms are usually imperfect, beam splitter substrates are often slightly wedged so that unwanted reections from the second surface travel in a different direction.
96
2 2
= T s1
1 = 1 Rs 1 2
= 1 0.0030 = 0.997
= 1 Rs
= 1 0.0640 = 0.936
Rs
Rs
= 0.0570
T s1
0
2 1 2
Rs
Rs
= 0.960
The maximum transmittance occurs when the sine is zero. In that case, T stot = 0.960, meaning that 96% of the light is transmitted. We nd the thickness by setting the argument of the sine to 2k 1 d 2 cos 1 + = 2 Since k 1 = 2n 1 /vac , we have d2 = vac 633 nm = = 134 nm 4n 1 cos 1 4(1.38) cos 30.82
Without the coating, (i.e. d 2 = 0), the transmittance through the antireection coating would be 0.908, so the coating does give an improvement.
97
transmitted wave propagating at an angle 2 . This behavior is sometimes referred to as tunneling. We do not need to deal directly with the complex angle 1 . Rather, we just need sin 1 and cos 1 in order to calculate the single-boundary Fresnel coefcients. From Snells law we have sin 1 = and for the middle layer we write cos 1 = i sin2 1 1 (4.21) n0 n2 sin 0 = sin 2 n1 n1 (4.20)
Note that beyond the critical angle, sin 1 is greater than one. We illustrate how to apply (4.14) via a specic example:
Example 4.3
Calculate the transmittance of p -polarized light through the region between two closely spaced 45 right prisms, as shown in Fig. 4.4, as a function of vac and the prism spacing d . Take the index of refraction of the prisms to be n = 1.5 surrounded by index n = 1, and use 0 = 2 = 45 . Neglect possible reections from the exterior surfaces of the prisms. Solution: From (4.20) and (4.21) we have sin 1 = 1.5 sin 45 = 1.061 and cos 1 = i 1.0612 1 = i 0.3536 Figure 4.4 Frustrated total internal reection in two prisms.
We must compute various expressions involving Fresnel coefcients that appear in (4.14):
0 1
tp
2 =
1 2
(1.061)
1 1 2 2
= 5.76
2
tp
1 1 2 2
+ (i 0.3536) (1.061)
1 1 2 2 1 1 2 2
= 0.640
1 rp
(i 0.3536) (1.061) cos 1 sin 1 cos 0 sin 0 = cos 1 sin 1 + cos 0 sin 0 (i 0.3536) (1.061) +
= e i 1.287
2
1 2 1 For the last step in the r p calculation, see problem P0.15. Also note that r p 1 0 0 1 r p = r p since n 0 = n 2 . We also need
k 1 d cos 1 =
98
We are now ready to compute the total transmittance (4.14). The factors out in front vanish since 0 = 2 and n 0 = n 2 , and we have
tot Tp =
0 tp 1
1 tp
2 2
1 0 1 2 i k 1 d cos 1 e i k1 d cos 1 r p rp e
= e
0 0 0.5 1 1.5 2
i i 2.22
(5.76)(0.640)
d vac
e i 1.287 e i 1.287 e
d vac
i i 2.22
d vac
3.69
2.22
d vac
2.22
i 2.574
2.22
d vac
2.22
d vac
+i 2.574
(4.22)
3.69
4.44
d vac
+e
4.44
d vac
e i 2.574 +e i 2.574 2
3.69
4.44
d vac
+e 3.69 +e
4.44
d vac
2 cos(2.574) + 1.69
4.44
d vac
4.44
d vac
Figure 4.5 shows a plot of the transmittance (4.22) calculated in Example 4.3. Notice that the transmittance is 100% when the two prisms are brought together tot as expected (T p (d /vac = 0) = 1). When the prisms are about a wavelength apart, the transmittance is signicantly reduced, and as the distance gets large compared tot to a wavelength, the transmittance quickly goes to zero (T p (d /vac 1) 0).
4.4 Fabry-Perot
In the 1890s, Charles Fabry realized that a double interface could be used to distinguish wavelengths of light that are very close together. He and a talented experimentalist colleague, Alfred Perot, constructed an instrument and began to use it to make measurements on various spectral sources. The Fabry-Perot instrument6 consists of two identical (parallel) surfaces separated by spacing d . We can use our analysis in section 4.2 to describe this instrument. For simplicity, we choose the refractive index before the initial surface and after the nal surface to be the same (i.e. n 0 = n 2 ). We assume that the transmission angles are such that total internal reection is avoided. The transmission through the device depends on the exact spacing between the two surfaces, the reectivity of the surfaces, as well as on the wavelength of the light. If the spacing d separating the two parallel surfaces is adjustable (scanned), the instrument is called a Fabry-Perot interferometer. If the spacing is xed while the angle of the incident light is varied, the instrument is called a Fabry-Perot etalon. An etalon can therefore be as simple as a piece of glass with parallel
6 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 7.6.2 (Cambridge University Press, 1999).
66.
797-802 (1998).
4.4 Fabry-Perot
99
surfaces. Sometimes, a thin optical membrane called a pellicle is used as an etalon (occasionally inserted into laser cavities to discriminate against certain wavelengths). However, to achieve sharp discrimination between closely-spaced wavelengths, a large spacing d is desirable. As we previously derived (4.11), the transmittance through a double boundary is T max T tot = (4.23) 1 + F sin2 2 In the case of identical interfaces, the transmittance and reectance coefcients are the same at each surface (i.e. T = T 0 1 = T 1 2 and R = R 1 0 = R 1 2 ). In this case, the maximum transmittance and the nesse coefcient simplify to T max = and F= T2 (1 R )2 4R
2
(4.24)
Jean-Baptiste Alfred Perot (18631925, French) was born in Metz, France. He attended the Ecole Polytechnique and then the University of Paris, where he earned a doctorate in 1888. He became a professor in in Marseille in 1894 where he began his collaboration with Fabry. Perot contributed his considerable talent of instrument fabrication to the endeavor. Perot spent much of his later career making precision astronomical and solar measurements. See J. F. Mulligan, Who were Fabry and Perot?, Am. J. Phys.
(1 R ) In principle, these equations should be evaluated for either s - or p -polarized light. However, a Fabry-Perot interferometer or etalon is usually operated near normal incidence so that there is little difference between the two polarizations. When using a Fabry-Perot instrument, one observes the transmittance T tot as the parameter is varied. The parameter can be varied by altering d , 1 , or as prescribed by 4 n 1 d = cos 1 + r (4.26) vac To increase the sensitivity of the instrument, it is desirable to have the transmittance T tot vary strongly when is varied. By inspection of (4.23), we see that T tot varies strongest if the nesse coefcient F is large. We achieve a large nesse coefcient by increasing the reectance R . The basic setup of a Fabry-Perot instrument is shown in Fig. 4.6. In order to achieve a relatively high reectivity R (and therefore large F ), special coatings can be applied to the surfaces, for example, a thin layer of silver to achieve a partial reection, say 90%. Typically, two glass substrates are separated by distance d , with the coated surfaces facing each other as shown in the gure. The substrates are aligned so that the interior surfaces are parallel to each other. It is typical for each substrate to be slightly wedge-shaped so that unwanted reections from the outer surfaces do not interfere with the double boundary situation between the two plates. Technically, each coating constitutes its own double-boundary problem (or multiple-boundary as the case may be). We can ignore this detail and simply think of the overall setup as a single two-interface problem. Regardless of the details of the coatings, we can say that each coating has a certain reectance R and transmittance T . However, as light goes through a coating, it can also be attenuated because of absorption. In this case, we have R +T + A = 1 (4.27)
(4.25)
66.
797-802 (1998).
Incident light
Ag coatings
Figure 4.6 Typical Fabry-Perot setup. If the spacing d is variable, it is called an interferometer; otherwise, it is called an etalon.
100
2p
4p
where A represents the amount of light absorbed at a coating. The attenuation A reduces the amount of light that makes it through the instrument, but it does not impact the nature of the interferences within the instrument. The total transmittance T tot (4.23) through an ideal Fabry-Perot instrument is depicted in Fig. 4.7 as a function of . The various curves correspond to different values of F . Typical values of can be extremely large. For example, suppose that the instrument is used at near-normal incidence (i.e. cos 1 = 1) with a wavelength of vac = 500 nm and an interface separation of d 0 = 1 cm. From (4.26) the value of (ignoring the constant phase terms r ) is approximately 0 = 4(1 cm) = 80, 000 500 nm
Figure 4.7 Transmittance as the phase is varied. The different curves correspond to different values of the nesse coefcient. 0 represents a large multiple of 2 .
Actuated Substrate
Detector
As we vary d , , or 1 by small amounts, we can easily cause to change by 2 as depicted in Fig. 4.7. The gure shows small changes in above a value 0 , which represents a large multiple of 2. The reection phase r in (4.26) depends on the exact nature of the coatings in the Fabry-Perot instrument. However, we do not need to know the value of r (depending on both the complex index of the coating material and its thickness). Whatever the value of r , we only care that it is constant. Experimentally, we can always compensate for the r by tweaking the spacing d , whose exact value is likely not controlled for in the rst place. Note that the required tweak on the spacing need only be a fraction of a wavelength, which is typically tiny compared to the overall spacing d .
Collimated Light
Angle Adjustment
Interferometer Aperture
Trig
Sig
Oscilloscope
Transmittance
Figure 4.9 Transmittance as the separation d is varied (F = 100). d 0 represents a large distance for which is a multiple of 2.
101
plate separation. After all, to see fringes, we just need to cause in (4.23) to vary in some way. According to (4.26), we can do that as easily by varying 1 as we can by varying d . One way to obtain a range of angles is to observe light from a point source, as depicted in Fig. 4.10. Different portions of the beam go through the device at different angles. When aligned straight on, the transmitted light forms a bulls-eye pattern on a screen. In Fig. 4.11 we graph the transmittance T tot (4.23) as a function of angle (holding vac = 500 nm and d = 1 cm xed). Since cos 1 is not a linear function, the spacing of the peaks varies with angle. As 1 increases from zero, the cosine steadily decreases, causing to decrease. Each time decreases by 2 we get a new peak. Not surprisingly, only a modest change in angle is necessary to cause the transmittance to vary from maximum to minimum, or vice versa. The bulls-eye pattern in Fig. 4.10 can be understood as the curve in Fig. 4.11 rotated about a circle. Depending on the exact spacing between the plates, the radii (or angles) where the fringes occur can be different. For example, the center spot could be dark. Spectroscopic samples often are not compact point-like sources. Rather, they are extended diffuse sources. The point-source setup shown in Fig. 4.10 wont work for extended sources unless all of the light at the sample is blocked except for a tiny point. This is impractical if there remains insufcient illumination at the nal screen for observation. In order to preserve as much light as possible, we can sandwich the etalon between two lenses. We place the diffuse source at the focal plane of the rst lens. We place the screen at the focal plane of the second lens. This causes an image of the source to appear on the screen.7 Each point of the diffuse source is mapped to a corresponding point on the screen. Moreover, the light associated with any particular point of the source travels as a unique collimated beam in the region between the lenses. Each collimated beam traverses the etalon with a unique angle. Thus, light associated with each emission point traverses the etalon with higher or lower transmittance, according to the differing angles. The result is that a bulls eye pattern becomes superimposed on the image of the diffuse source. The lens and retina of your eye can be used for the nal lens and screen.
Point Source
Etalon
Angle Adjustment
Screen
Figure 4.10 A diverging monochromatic beam traversing a FabryPerot etalon. (The angle of divergence is exaggerated.)
Transmission
10
15
Figure 4.11 Transmittance through a Fabry-Perot etalon (F = 10) as the angle 1 is varied. It is assumed that the distance d is chosen such that is a multiple of 2 when the angle is zero.
Diffuse Source Lens Etalon Lens Screen
102
Transmittance
Consider a Fabry-perot interferometer where the transmittance through the instrument is plotted as a function of surface separation d . Let the spacing d 0 correspond to the case when is a multiple of 2 for the wavelength vac = 0 . Next, suppose we adjust the wavelength of the light from vac = 0 to vac = 0 + while observing the transmittance. As we do this, the value of changes. Figure 4.13 shows what happens as we scan the spacing d of the interferometer in the neighborhood of d 0 . The dashed line corresponds to a different wavelength. As the wavelength changes, the plate separation at which a particular fringe occurs also changes. We now derive the connection between a change in wavelength and the amount that changes, which gives rise to the fringe shift seen in Fig. 4.13. At the wavelength 0 , we have 0 = 4n 1 d 0 cos 1 + r 0 (4.28)
which we previously supposed is an integer times 2. At a new wavelength (all else remaining the same) we have = 4n 1 d 0 cos 1 + r 0 + (4.29)
Figure 4.13 Transmittance as the spacing d is varied for two different wavelengths (F = 100). The solid line plots the transmittance of light with a wavelength of 0 , and the dashed line plots the transmittance of a wavelength shorter than 0 . Note that the fringes shift positions for different wavelengths.
The change in wavelength is usually very small compared to 0 , so we can represent the denominator with the rst two terms of a Taylor-series expansion: 1 1 1 /0 = = 0 + 0 (1 + /0 ) 0 Then the difference between 0 and can be rewritten as 0 = 4n 1 d 0 cos 1 2 0 (4.31) (4.30)
If the change in wavelength is enough to cause = 2, the fringes in Fig. 4.13 shift through a whole period, and the picture looks the same. This brings up an important limitation of the instrument. If the fringes shift by too much, we might become confused as to what exactly has changed, owing to the periodic nature of the fringes. If two wavelengths arent sufciently close, the fringes of one wavelength may be shifted past several fringes of the other wavelength, and we will not be able to tell by how much they differ. This introduces the concept of free spectral range, which is the wavelength change FSR that causes the fringes to shift through one period. We nd this by setting (4.31) equal to 2. After rearranging, we get FSR = 2 vac 2n 1 d 0 cos 1 (4.32)
The free spectral range tends to be extremely narrow; a Fabry-Perot instrument is not well suited for measuring wavelength ranges wider than this. In summary, the
103
free spectral range is the largest change in wavelength permissible while avoiding confusion. To convert this wavelength difference FSR into a corresponding frequency difference, one differentiates = c /vac to get |FSR | = c FSR 2 vac (4.33)
Example 4.4
A Fabry-Perot interferometer has plate spacing d 0 = 1 cm and index n 1 = 1. If it is used in the neighborhood of vac = 500 nm, nd the free spectral range of the instrument. Solution: From (4.32), the free spectral range is FSR = 2 vac 2n 1 d 0 cos 1 = FSR = (500 nm)2 = 0.0125 nm 2 (1) (1 cm) cos 0
This means that we should not use the instrument to distinguish wavelengths that are separated by more than this small amount.
We next consider the smallest change in wavelength that can be noticed, or resolved with a Fabry-Perot instrument. For example, if two very near-by wavelengths are sent through the instrument simultaneously, we can distinguish them only if the separation between their corresponding fringe peaks is at least as large as the width of individual peaks. This situation of two barely resolvable fringe peaks is illustrated in Fig. 4.14 for a diverging beam traversing an etalon. We will look for the wavelength change that causes a peak to shift by its own width. We dene the width of a peak by its full width at half maximum (FWHM). Again, let 0 be a multiple of 2 so that a peak in transmittance occurs when = 0 . In this case, we have from (4.23) that T tot = T max 1+F
0 sin2 2
= T max
(4.34)
since sin (0 /2) = 0. If varies from 0 to 0 FWHM /2, then, by denition, the transmittance drops to one half. Therefore, we may write T tot = T max 1 + F sin2
0 FWHM /2 2
Figure 4.14 Transmittance of a diverging beam through a FabryPerot etalon. Two nearby wavelengths are sent through the instrument simultaneously, (top) barely resolved and (bottom) easily resolved.
T max 2
(4.35)
In solving for (4.35) for FWHM , we see that this equation requires F sin2 FWHM =1 4 (4.36)
where we have taken advantage of the fact that 0 is a multiple of 2. Next, we suppose that FWHM is rather small so that we may represent the sine by its
104
argument. This approximation is okay if the nesse coefcient F is rather large (say, 100). With this approximation, (4.36) simplies to 4 . FWHM = F (4.37)
The ratio of the period between peaks 2 to the width FWHM of individual peaks is called the reecting nesse (or just nesse). f 2 FWHM = F 2 (4.38)
This parameter is often used to characterize the performance of a Fabry-Perot instrument. Note that a higher nesse f implies sharper fringes in comparison to the fringe spacing. The free spectral range FSR compared to the minimum wavelength FWHM is the same ratio f . Therefore, we have FWHM = 2 FSR vac = f n 1 d 0 cos 1 F (4.39)
As a nal note, the ratio of 0 to min , where min is the minimum change of wavelength that the instrument can distinguish in the neighborhood of 0 is called the resolving power . For a Fabry-Perot instrument it is RP 0 FWHM (4.40)
Fabry-Perot instruments tend to have very high resolving powers since they respond to very small differences in wavelength.
Example 4.5
If the Fabry-Perot interferometer in Example 4.4 has reectivity R = 0.85, nd the nesse, the minimum distinguishable wavelength separation, and the resolving power. Solution: From (4.25), the nesse coefcient is F= and by (4.38) the nesse is f = F 151 = = 19.3 2 2 4R (1 R )2 = 4 (0.85) (1 (0.85))2 = 151
The minimum resolvable wavelength change is then FWHM = FSR 0.0125 nm = = 0.00065 nm f 19 (4.41)
105
The instrument can distinguish two wavelengths separated by this tiny amount, which gives an impressive resolving power of RP = vac 500 nm = = 772, 000 FWHM 0.00065 nm
For comparison, the resolving power of a typical grating spectrometer is much less (a few thousand). However, a grating spectrometer has the advantage that it can simultaneously observe wavelengths over hundreds of nanometers, whereas the Fabry-Perot instrument is conned to the extremely narrow free spectral range.
106
z-direction
In each layer, only two plane waves exist, each of which is composed of light arising from the many possible bounces from various layer interfaces. The arrows pointing right indicate plane wave elds in individual layers that travel roughly in the forward (incident) direction, and the arrows pointing left indicate plane wave elds that travel roughly in the backward (reected) direction. In the nal (p ) region, there is only one plane wave traveling with a forward direction (E N +1 ) which gives the overall transmitted eld. As we have studied in chapter 3 (see (3.9) and (3.13)), the boundary conditions for the parallel components of the E eld and for the parallel components of the B eld lead respectively to cos 0 E 0 + E 0
(p ) (p )
= cos 1 E 1 + E 1
(p )
(p )
(4.43)
and n0 E 0 E 0
(p ) (p )
= n1 E 1 E 1
(p )
(p )
(4.44)
Similar equations give the eld connection for s -polarized light (see (3.8) and (3.14)). We have applied these boundary conditions at the rst interface only. Of course there are many more interfaces in the multilayer. For the connection between the j th layer and the next, we may similarly write cos j E j e i k j
(p ) j
cos j
+ E j e i k j
(p ) (p )
cos j
= cos j +1 E j +1 + E j +1
(p ) (p ) (p ) (p )
(4.45)
and n j E j eikj
(p ) j
cos j
E j e i k j
cos j
= n j +1 E j +1 E j +1
(4.46)
Here we have set the origin within each layer at the left surface. Then when making the connection with the subsequent layer at the right surface, we must = k j j cos j . This corresponds specically take into account the phase k j j z to the phase acquired by the plane wave eld in traversing the layer with thickness j . The right-hand sides of (4.45) and (4.46) need no phase adjustment since the ( j + 1)th eld is evaluated on the left side of its layer.
107
cos N
+ E N e i k N
(p )
cos N
= cos N +1 E N +1
(p )
(4.47)
and n N E N e i kN
(p ) N
cos N
E N e i k N
(p )
cos N
= n N +1 E N +1
(p )
(4.48)
since there is no backward-traveling eld in the nal medium. At this point we are ready to solve (4.43)(4.48). We would like to eliminate (p ) (p ) (p ) all elds besides E 0 , E 0 , and E N +1 . Then we will be able to nd the overall reectance and transmittance of the multilayer coating. In solving (4.43)(4.48), we must proceed with care, or the algebra can quickly get out of hand. Fortunately, you have probably had training in linear algebra, and this is a case where that training pays off. We rst write a general matrix equation that summarizes the mathematics in (4.43)(4.48), as follows: cos j e i j n j ei j where j and E N +1 0
(p )
cos j e i j n j e i j
Ej (p ) Ej 0 kj
(p )
cos j +1 n j +1
cos j +1 n j +1
E j +1 (p ) E j +1
(p )
(4.49)
cos j
j =0 1 j N
(4.50)
(4.51)
(It would be good to take a moment to convince yourself that this set of matrix equations properly represents (4.43)(4.48) before proceeding.) We rewrite (4.49) as Ej (p ) Ej
(p )
cos j e i j n j ei j
cos j e i j n j e i j
cos j +1 n j +1
cos j +1 n j +1
E j +1 (p ) E j +1
(p )
(4.52) Keep in mind that (4.52) represents a distinct matrix equation for each different j . We can substitute the j = 1 equation into the j = 0 equation to get E0 (p ) E0
(p )
cos 0 n0
cos 0 n 0
M1
(p )
cos 2 n2
cos 2 n 2
E2 (p ) E2
(p )
(4.53)
where we have grouped the matrices related to the j = 1 layer together via M1
(p )
cos 1 n1
cos 1 n 1
cos 1 e i 1 n 1 e i 1
cos 1 e i 1 n 1 e i 1
(4.54)
We can continue to substitute into this equation progressively higher order equations (i.e. for j = 2, j = 3, ... ) until we reach the j = N layer. All together this will
108
give E0 (p ) E0
(p )
cos 0 n0
cos 0 n 0
N j =1
Mj
(p )
cos N +1 n N +1
cos N +1 n N +1
E N +1 0
(p )
(4.55) where the matrices related to the j th layer are grouped together according to Mj =
(p )
cos j nj
cos j n j
cos j e i j n j ei j
cos j e i j n j e i j
(4.56)
The matrix inversion in the rst line was performed using (0.35). The symbol signies the product of the matrices with the lowest subscripts on the left:
N j =1
M j M1 M2 M N
(p )
(p )
(p )
(p )
(4.57)
(p )
As a nishing touch, we divide (4.55) by the incident eld E 0 as well as perform the matrix inversion on the right-hand side to obtain 1 E where 0 0 j =1 (4.59) In the nal matrix in (4.59) we have replaced the entries in the right column with zeros. This is permissable since it operates on a column vector with zero in the bottom component. Equation (4.58) represents two equations, which must be solved simultane(p ) (p ) (p ) (p ) ously to nd the ratios E 0 /E 0 and E N +1 /E 0 . Once the matrix A (p ) is computed, this is a relatively simple task: A (p ) = Mj
(p ) (p ) 0
(p ) 0
= A (p )
E N +1 0
(p )
E0
(p )
(4.58)
a 11 (p ) a 21
(p )
a 12 (p ) a 22
(p )
1 2n 0 cos 0
n0 n0
cos 0 cos 0
cos N +1 n N +1
tp
E N +1 E
(p ) 0 (p )
(p )
1 a 11
(p ) (p ) (p )
(Multilayer) (4.60)
rp
E0 E0
(p )
a 21
a 11
(Multilayer) (4.61)
The convenience of this notation lies in the fact that we can deal with an arbitrary number of layers N with varying thickness and index. The essential information for each layer is contained succinctly in its respective 2 2 characteristic matrix M . To nd the overall effect of the many layers, we need only
109
multiply the matrices for each layer together to nd A from which we compute the reection and transmission coefcients for the whole system. The derivation for s -polarized light is similar to the above derivation for p polarized light. The equation corresponding to (4.58) for s -polarized light turns out to be (s ) (s ) 1 EN E0 +1 (s ) = A (4.62) (s ) (s ) E0 E0 0 where A (s ) and M (js ) =
(s ) a 11 (s ) a 21 (s ) a 12 (s ) a 22
1 2n 0 cos 0
n 0 cos 0 n 0 cos 0
1 1
N j =1
M (js )
1 n N +1 cos N +1
0 0 (4.63) (4.64)
The transmission and reection coefcients are found (as before) from ts (Multilayer) (4.65)
rs
(Multilayer) (4.66)
1 sin
(4.67)
1 (4.68) (A + D) . 2 This formula relies on the condition AD BC = 1, which is true for matrices of the form (4.56) and (4.64) or any product of them. Here, A , B , C , and D represent the elements of a matrix composed of a block of matrices corresponding to a repeated pattern within the stack. In general, high-reection coatings are designed with alternating high and low refractive indices. For high reectivity, each layer should have a quarterwave thickness. Since the layers alternate high and low indices, at every other
Figure 4.16 A repeated multilayer structure with alternating high and low indexes where each layer is a quarter wavelength in thickness. This structure can achieve very high reectance.
substrate
...
110
boundary there is a phase shift of upon reection from the interface. Hence, the quarter wavelength spacing is appropriate to give constructive interference in the reected direction. Example 4.6
Derive the reection and transmission coefcients for p polarized light interacting with a high reector constructed using a /4 stack. Solution: For a /4 stack we need j = 2
vac 4n j cos j
0 i n j / cos j
i cos j /n j 0
The matrices for a high and a low refractive index layer are multiplied together in the usual manner. Each layer pair takes the form 0
i nH cos H H i cos n
H
0
i nL cos L
L i cos n
L
L cos H n n cos H
0
H cos L n n L cos H
Mj = =
(p )
L cos H n n cos H
0
n H cos L n cos
L H
n L cos H q n H cos L
0
n H cos L n cos
L H
Substituting this into (4.59), we obtain q cos n H cos L N +1 L cos H n 1 n H cos L cos 0 + n L cos H (p ) A = n L cos H q cos N +1 n H cos L 2 n cos cos 0 n cos
H L L H
q n N +1 n0 q n N +1 n0
0 0
(4.69)
With A (p ) in hand, we can now calculate the transmission coefcient from (4.60) tp = 1
q cos N +1 L cos H n n H cos L cos 0
H cos L + n n cos L H
q n N +1 n0
q n N +1 n0 q n
N +1 n0
111
The quarter-wave multilayer considered in Example 4.6 can achieve extraordinarily high reectivity. In the limit of q , we have t p 0 and r p 1 (see Fig. 4.17), giving 100% reection with a phase shift.
t
-0.5
-1 0 5 q 10
Figure 4.17 The transmission and reection coefcients for a quarter wave stack as q is varied (n L = 1.38 and n H = 2.32).
112
Exercises
Exercises for 4.1 Double-Interface Problem Solved Using Fresnel Coefcients P4.1 P4.2
tot given in (4.12). Use (4.4)(4.7) to derive r s
Consider a 1 micron thick coating of dielectric material (n = 2) on a piece of glass (n = 1.5). Use a computer to plot the magnitude of the overall Fresnel coefcient (4.11) from air into the glass at normal incidence. Plot as a function of wavelength for wavelengths between 200 nm and 800 nm (assume the index remains constant over this range).
Exercises for 4.2 Two-Interface Transmittance at Sub Critical Angles P4.3 Verify that in the case that 1 and 2 are real that (4.14) simplies to (4.15). A light wave impinges at normal incidence on a thin glass plate with index n and thickness d . (a) Show that the transmittance through the plate as a function of wavelength is 1 T tot = 2 2 n 1 nd 1 + ( 4n 2 ) sin2 2 vac HINT: Find r1 and then use T0 T1
1 2
P4.4
=r1
= r 0
=
1
n 1 n +1
= 1 R0 = 1 R1
(b) If n = 1.5, what is the maximum and minimum transmittance through the plate? (c) If the plate thickness is d = 150 m, what wavelengths transmit with maximum efciency? Express your answer as a formula involving an integer j . P4.5 Show that the maximum reectance possible from the front coating in Example 4.2 is 46%. Find the smallest possible d 1 that accomplishes this for light with wavelength vac = 633 nm.
Exercises
113
Exercises for 4.3 Beyond Critical Angle: Tunneling of Evanescent Waves P4.6 L4.7 Re-compute (4.22) in the case of s -polarized light. Write the result in the same form as the last expression in (4.22). Consider s -polarized microwaves (vac = 3 cm) encountering an air gap separating two parafn wax prisms (n = 1.5). The 45 right-angle prisms are arranged with the geometry shown in Fig. 4.4. The presence of the second prism frustrates the total internal reection.
Microwave Source
Paraffin Lens
Paraffin Prisms
Figure 4.19
Paraffin Lens
Microwave Detector
(a) Use a computer to plot the transmittance through the gap (i.e. the result of P4.6) as a function of separation d (normal to gap surface). Neglect reections from other surfaces of the prisms. (b) Measure the transmittance of the microwaves through the prisms as function of spacing d (normal to the surface) and superimpose the results on the graph of part (a). Figure 4.18 shows a plot of typical data taken with this setup. (video)
Separation (cm)
Figure 4.18 Theoretical vs. measured microwave transmission through wax prisms. Mismatch is presumably due to imperfections in microwave collimation and/or extraneous reections.
Exercises for 4.6 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument P4.8 A Fabry-Perot interferometer has silver-coated plates each with reectance R = 0.9, transmittance T = 0.05, and absorbance A = 0.05. The plate separation is d = 0.5 cm with interior index n 1 = 1. Suppose that the wavelength being observed near normal incidence is 587 nm. (a) What is the maximum and minimum transmittance through the interferometer? (b) What are the free spectral range FSR and the fringe width FWHM ? (c) What is the resolving power? P4.9 Generate a plot like Fig. 4.11(a), showing the fringes you get in a FabryPerot etalon when 1 is varied. Let Tmax = 1, F = 10, = 500 nm, d = 1 cm, and n 1 = 1. (a) Plot T vs. 1 over the angular range used in Fig. 4.11(a). (c) Suppose d was slightly different, say 1.00001 cm. Make a plot of T vs 1 for this situation.
114
P4.10
Consider the conguration depicted in Fig. 4.10, where the center of the diverging light beam vac = 633 nm approaches the plates at normal incidence. Suppose that the spacing of the plates (near d = 0.5 cm) is just right to cause a bright fringe to occur at the center. Let n 1 = 1. Find the angle for the m th circular bright fringe surrounding the central spot (the 0th fringe corresponding to the center). HINT: cos = 1 2 /2. The answer has the form a m ; nd the value of a . Characterize a Fabry-Perot etalon in the laboratory using a HeNe laser (vac = 633 nm). Assume that the bandwidth HeNe of the HeNe laser is very narrow compared to the fringe width of the etalon FWHM . Assume two identical reective surfaces separated by 5.00 mm. Deduce the free spectral range FSR , the fringe width FWHM , the resolving power, and the reecting nesse (small f ). (video)
Laser Diverging Lens
L4.11
Filter
Fabry-Perot Etalon
Figure 4.20
CCD Camera
L4.12
Filter
Fabry-Perot Etalon
CCD Camera
Use the same Fabry-Perot etalon to observe the Zeeman splitting of the yellow line = 587.4 nm emitted by a krypton lamp when a magnetic eld is applied. As the line splits and moves through half of the free spectral range, the peak of the decreasing wavelength and the peak of the increasing wavelength meet on the screen. When this happens, by how much has each wavelength shifted? (video)
Figure 4.21
Exercises for 4.7 Multilayer Coatings P4.13 (a) Write (4.43) through (4.48) for s -polarized light. (b) From these equations, derive (4.62)(4.64). P4.14 Show that (4.65) for a single layer (i.e. two interfaces), is equivalent to (4.11). WARNING: This is more work than it may appear at rst.
Exercises for 4.8 Repeated Multilayer Stacks P4.15 (a) What should be the thickness of the high and the low index layers in a periodic high-reector mirror? Let the light be p -polarized and strike
Exercises
115
the mirror surface at 45 . Take the indices of the layers be n H = 2.32 and n L = 1.38, deposited on a glass substrate with index n = 1.5. Let the wavelength be vac = 633 nm. (b) Find the reectance R with 1, 2, 4, and 8 periods in the high-low stack. P4.16 Find the high-reector matrix for s -polarized light that corresponds to (4.69). Design an anti-reection coating for use in air (assume the index of air is 1): (a) Show that for normal incidence and /4 lms (thickness= 1 4 the wavelength of light inside the material), the reectance of a single layer (n 1 ) coating on a glass is R=
2 n g n1 2 n g + n1 2
P4.17
(b) Show that for a two coating setup (air-n 1 -n 2 -glass; n 1 and n 2 are each a /4 lm), that 2 2 2 n2 n g n1 R= 2 2 n2 + n g n1 (c) If n g = 1.5, and you have a choice of these common coating materials: ZnS (n = 2.32), CeF (n = 1.63) and MgF (n = 1.38), nd the combination that gives you the lowest R for part (b). (Be sure to specify which material is n 1 and which is n 2 .) What R does this combination give? P4.18 Consider a two-coating anti-reection optic (each coating set for /4, as in problem P4.17) using n 1 = 1.6 and n 2 = 2.1 applied to a glass substrate n g = 1.5 at normal incidence. Suppose the coating thicknesses are optimized for = 550 nm (in the middle of the visible range) and ignore possible variations of the indices with . Use the matrix techniques and a computer to plot R (air ) for 400 to 700 nm (visible range). Do this for a single bilayer (one layer of each coating), two bilayers, four bilayers, and 25 bilayers.
Chapter 5
NaCl) are highly symmetric and respond to electric elds the same in any direction.
117
118
However, at low intensities the response of materials is still linear (or proportional) to the strength of the electric eld. The linear constitutive relation which connects P to E in a crystal can be expressed in its most general form as
Px Py = Pz
xx yx 0 zx
x y y y z y
xz Ex y z E y zz Ez
(5.1)
Figure 5.1 A physical model of an electron bound in a crystal lattice with the coordinate system specially chosen along the principal axes so that the susceptibility tensor takes on a simple form.
The matrix in (5.1) is called the susceptibility tensor. To visualize the behavior of electrons in such a material, we imagine each electron bound as though by tiny springs with different strengths in different dimensions to represent the anisotropy (see Fig. 5.1). When an external electric eld is applied, the electron experiences a force that moves it from its equilibrium position. The springs (actually the electric force from ions bound in the crystal lattice) exert a restoring force, but the restoring force is not equal in all directionsthe electron tends to move more along the dimension of the weaker spring. The displaced electron creates a microscopic dipole, but the asymmetric restoring force causes P to be in a direction different than E as depicted in Fig. 5.2. To understand the geometrical interpretation of the many coefcients i j , assume, for example, that the electric eld is directed along the x -axis (i.e. E y = E z = 0) as depicted in Fig. 5.2. In this case, the three equations encapsulated in (5.1) reduce to Px = Py = Pz =
0 xx E x 0y x E x 0 zx E x
Figure 5.2 The applied eld E and the induced polarization P in general are not parallel in a crystal lattice.
direction with Notice that the coefcient xx connects the strength of P in the x the strength of E in that same direction, just as in the isotropic case. The other two coefcients ( y x and zx ) describe the amount of polarization P produced in the and z directions by the electric eld component in the x -dimension. Likewise, y the other coefcients with mixed subscripts in (5.1) describe the contribution to P in one dimension made by an electric eld component in another dimension. As you might imagine, working with nine susceptibility coefcients can get complicated. Fortunately, we can greatly reduce the complexity of the description by a judicious choice of coordinate system. In Appendix 5.A we explain how conservation of energy requires that the susceptibility tensor (5.1) for typical non-aborbing crystals be real and symmetric (i.e. i j = j i ).2 Appendix 5.B shows that, given a real symmetric tensor, it is always possible to choose a coordinate system for which off-diagonal elements vanish. This is true even if the lattice planes in the crystal are not mutually orthogonal (e.g. rhombus, hexagonal, etc.). We will imagine that this rotation of coordinates
2 By typical we mean that the crystal does not exhibit optical activity. Optically active crystals have a complex susceptibility tensor, even when no absorption takes place. Conservation of energy in this more general case requires that the susceptibility tensor be Hermitian (i j = ). ji
119
has been accomplished. In other words, we can let the crystal itself dictate the orientation of the coordinate system, aligned to the principal axes of the crystal for which the off-diagonal elements of (5.1) are zero With the coordinate system aligned to the principal axes, the constitutive relation for a non absorbing crystal simplies to
Px Py = Pz
x 0 0 0
0 y 0
0 Ex 0 Ey z Ez
(5.2)
or without the matrix notation (since it no longer offers much convenience) 0 x E x + y 0y E y + z 0 z E z P=x (5.3)
By assumption, x , y , and z are all real. (We have dropped the double subscript; x stands for xx , etc.)
(5.4)
where restrictions on E0 , B0 , P0 , and k are yet to be determined. As usual, the phase of each wave is included in the amplitudes E0 , B0 , and P0 , whereas k is real in accordance with our assumption of no absorption. We can make a quick observation about the behavior of these elds by applying Maxwells equations directly. Gausss law for electric elds requires ( 0 E + P) = k ( 0 E + P) = 0 and Gausss law for magnetism gives B = kB = 0 (5.6) (5.5)
We immediately notice the following peculiarity: From its denition, the Poynting vector S E B/0 is perpendicular to both E and B, and by (5.6) the k-vector is perpendicular to B. However, by (5.5) the k-vector is not necessarily perpendicular to E, since in general k E = 0 if P points in a direction other than E. Therefore, k and S are not necessarily parallel in a crystal. In other words, the ow of energy and the direction of the phase-front propagation can be different in anisotropic media.
120
Our main goal here is to relate the k-vector to the susceptibility parameters x , y , and z . To do this, we plug our trial plane-wave elds into the wave equation (1.41). Under the assumption Jfree = 0, we have 2 E 0
0
2 P 2 E = + ( E ) 0 t 2 t 2
(5.7)
(5.8)
+ 1 + y E y y + 1 + z E z z = k (k E ) 1 + x E x x
(5.9)
This relationship is unwieldy because of the mix of electric eld components that appear in the expression. This was not a problem when we investigated isotropic materials for which the k-vector is perpendicular to E, making the right-hand side of the equations zero. However, there is a trick for dealing with this. Relation (5.9) actually contains three equations, one for each dimension. Explicitly, these equations are 2 k 2 2 1 + x E x = k x (k E ) (5.10) c k2 and k2 2 1 + y c2 2 1 + z c2 E y = k y (k E ) (5.11)
E z = k z (k E )
(5.12)
We have replaced the constants 0 0 with 1/c 2 in accordance with (1.43). We multiply (5.10)(5.12) respectively by k x , k y , and k z and also move the factor in square brackets in each equation to the denominator on the right-hand side. Then by adding the three equations together we get
2 kx (k E ) + 2 (1+x ) 2 k c2 2 ky (k E )
k2
(1+ y )
c2
2 kz (k E ) = k x E x + k y E y + k z E z = (k E ) 2 (1+z ) 2 k c2
(5.13) Now k E appears in every term and can be divided away. This gives the dispersion relation (unencumbered by eld components):
2 kx
k 2 c 2 /2 1 + x
2 ky
k 2 c 2 /2 1 + y
2 kz
k 2 c 2 /2 1 + z
2 (5.14) c2
121
The dispersion relation (5.14) allows us to nd a suitable k, given values for , x , y , and z . Actually, it only restricts the magnitude of k; we must still decide on a direction for the wave to travel (i.e. we must choose the ratios between k x , k y , and k z ). To remind ourselves of this fact, we introduce a unit vector that points in the direction of k + ky y + kz z = k ux x + uy y + uz z = ku k = kx x (5.15)
With this unit vector inserted, the dispersion relation (5.14) for plane waves in a crystal becomes
2 ux
k 2 c 2 /2 1 + x
u2 y k 2 c 2 /2 1 + y
2 uz
k 2 c 2 /2 1 + z
2 (5.16) k 2c 2
We may dene refractive index as the ratio of the speed of light in vacuum c to the speed of phase propagation in a material /k (see P1.9). The relation introduced for isotropic media (i.e. (2.19) for real index) remains appropriate. That is kc n= (5.17) This familiar relationship between k and , in the case of a crystal, depends on the direction of propagation in accordance with (5.16). Inspired by (2.30), we will nd it helpful to introduce several refractive-index parameters: n x 1 + x ny nz
2 ux 2 n2 nx
1 + y 1 + z
2 uz 2 n2 nz
(5.18)
With these denitions (5.17)-(5.18), the dispersion relation (5.16) becomes + u2 y n2 n2 y + = 1 n2 (5.19)
This is called Fresnels equation (not to be confused with the Fresnel coefcients studied in chapter 3). The relationship contains the yet unknown index n that ). varies with the direction of the k-vector (i.e. the direction of the unit vector u After multiplying through by all of the denominators (and after a fortuitous 2 2 cancelation owing to u x + u2 y + u z = 1), Fresnels equation (5.19) can be rewritten as a quadratic in n 2 . The two solutions are n2 = where
2 2 2 2 2 A ux nx + u2 y n y + uz nz 2 2 2 2 2 2 2 2 2 2 2 B ux nx n2 y + nz + u y n y nx + nz + uz nz nx + n y
B B 2 4 AC 2A
(5.20)
2 2 2 nx n y nz
122
The upper and lower signs (+ and ) in (5.20) give two positive solutions for n 2 . The positive square root of these solutions yields two physical values for n . It turns out that each of the two values for n is associated with a polarization direction of the electric eld, given a propagation direction k. A broader analysis carried out in appendix 5.C renders the orientation of the electric elds, whereas here we only show how to nd the two values of n . We refer to the two indices as the slow and fast index, since the waves associated with each propagate at speed v = c /n . In the special cases of propagation along one of the principal axes of the crystal, the index n takes on two of the values n x , n y , or n z , depending on which are orthogonal to the direction of propagation.
Example 5.1
Calculate the two possible values for the index of refraction when k is in the z direction (in the crystal principal frame). Solution: With u z = 1 and u x = u y = 0 we have
2 A = nz ; 2 2 B = nz nx + n2 y ; 2 2 2 C = nx n y nz
2 2 = nz nx n2 y
Inserting this expression into (5.20), we nd the two values for the index: n = nx , n y The index n x is experienced by light whose electric eld points in the x -dimension, and the index n y is experienced by light whose electric eld points in the y dimension (see appendix 5.C ).
Before moving on, let us briey summarize what has been accomplished so far. Given values for x , y , and z associated with light in a crystal at a given frequency, you can dene the indices n x , n y , and n z , according to (5.18). Next, a direction for the k-vector is chosen (i.e. u x , u y , and u z ). This direction generally has two values for the index of refraction associated with it, found using Fresnels equation (5.20). Each index is associated with a specic polarization direction for the electric eld as outlined in appendix 5.C. Every propagation direction u has its own natural set of polarization components for the electric eld. The two polarization components travel at different speeds, even though the frequency is the same. This is known as birefringence.
Figure 5.3 Spherical coordinates.
123
(5.24)
2.22 2.35
2.35
2.41
While nding the direction of the optic axes in a biaxial crystal is not too bad, an expression for the two indices of refraction is messy. The smaller value is commonly referred to as the fast index and the larger value the slow index. Figure 5.4 shows the two refractive indices (i.e. the solutions to Fresnels equation (5.20)) for a biaxial crystal plotted with color shading on the surface of a sphere. Each point on the sphere represents a different and . The two optic axes are apparent in the plot of the difference between n slow and n fast . When propagating in these directions, either polarization experiences the same index. For the remainder of this chapter, we will focus on the simpler case of uniaxial crystals. In uniaxial crystals two of the coefcients x , y , and z are the same. In this case, there is only one optic axis for the crystal (hence the name uniaxial). By convention, in uniaxial crystals we label the dimension that has the unique susceptibility as the z -axis (i.e. x = y = z ). This makes the z -axis the optic axis. The unique index of refraction is called the extraordinary index n z = ne (5.26)
0.19
Figure 5.4 The fast and slow refractive indices (and their difference) as a function of direction for potassium niobate (KNbO3 ) at = 500 nm (n x = 2.22, n y = 2.35, and n z = 2.41) .
124
These names were coined by Huygens, one of the early scientists to study light in crystals (see appendix 5.D). A uniaxial crystal with n e > n o is referred to as a positive crystal, and one with n e < n o is referred to as a negative crystal. To calculate the index of refraction for a wave propagating in a uniaxial crystal, we use denitions (5.26) and (5.27) along with the spherical representation of u (5.24) in Fresnels equation (5.20) to nd the following two values for n (see P5.4): n = no and n = n e ( )
1.56 1.68
2 2 no sin2 + n e cos2
1.68
1.68
The index n e ( ) in (5.29) is also commonly referred to as the extraordinary index along with the constant n e = n z . While this has the potential for some confusion, the practice is so common that we will perpetuate it here. We will write n e ( ) when the angle dependent quantity specied by (5.29) is required, and write n e in formulas where the constant (5.26) is called for (as in the right hand side of (5.29)). Notice that n e ( ) depends only on (the polar angle measured from the ) and not (the azimuthal angle). Figure 5.5 shows the two refractive optic axis z indices (5.28) and (5.29) as a function and . Since n e ( ) has no dependence and n o is constant, the variation is much simpler than for the biaxial case. As outlined in appendix 5.C, the index n o corresponds to an electric eld and z (e.g. if component that points perpendicular to the plane containing u is in the x -z plane, n o is associated with light polarized in the y -dimension). u On the other hand, the index n e ( ) corresponds to eld polarization that lies and z . In this case, the polarization component within the plane containing u is directed partially along the optic axis (i.e. it has a z -component). That is why (5.29) gives for the refractive index a mixture of n o and n e . If = 0, then the k-vector is directed exactly along the optic axis, and n e ( ) reduces to n o so that both polarization components experience same index n o .
0.12
Figure 5.5 The extraordinary and ordinary refractive indices (and their difference) as a function of direction for beta barium borate (BBO) at = 500 nm (n o = 1.68 and n e = 1.56).
125
If we assume that the index outside of the crystal is one, Snells law for the ordinary polarization is sin i = n o sin t (ordinary polarized light) (5.30)
where n o is the ordinary index inside the crystal. The extraordinary polarized light also obeys Snells law, but now the index of refraction in the crystal depends on direction of propagation inside the crystal relative to the optic axis. Snells law for the extraordinary polarization is sin i = n e ( ) sin t (extraordinary polarized light) (5.31)
where is the angle between the optic axis inside the crystal and the direction of propagation in the crystal (given by t in the plane of incidence). When the optic axis is at an arbitrary angle with respect to the surface the relationship between and t is cumbersome. We will examine Snells law only for the specic case when the optic axis is perpendicular to the crystal surface, for which t = . Example 5.2
Derive Snells law for a uniaxial crystal with optic axis perpendicular to the surface. Solution: Refer to Fig. 5.6. With the optic axis perpendicular to the surface, if the light hits the crystal surface straight on, the index of refraction is n o , regardless of the orientation of polarization since = 0. When the light strikes the surface at an angle, s -polarized light continues to experience the index n o , while p -polarized light experiences the extraordinary index n e ( ). 3 When we insert (5.29) into Snells law (5.31) with = t , the expression can be inverted to nd the transmitted angle t in terms of i (see P5.5): tan t = no n e sin i
2 ne sin2 i
y-axis
Figure 5.6 Propagation of light in a uniaxial crystal with its optic axis perpendicular to the surface.
As strange as this formula may appear, it is Snells law, but with an angularly dependent index.
126
giving rise to two different images. This phenomenon is one of the more commonly observed manifestations of birefringence. Since the Poynting vector dictates the direction of energy ow, it is the direction of S that determines the separation of the double image seen when looking through a birefringent crystal. Snells law dictates the connection between the directions of the incident and transmitted k-vectors. The Poynting vector S for purely ordinary polarized light points in the same direction as the k-vector, so the direction of energy ow for ordinary polarized light also obeys Snells law. However, for extraordinary polarized light, the Poynting vector S is not parallel to k (recall the discussion in connection with (5.5) and (5.6)). Thus, the energy ow associated with extraordinary polarized light does not obey Snells law. When Christiaan Huygens saw this in the 1600s, he exclaimed how extraordinary! Huygens method for describing the phenomenon is outlined appendix 5.D. To analyze this situation, it is necessary to derive an expression for extraordinary polarized light similar to Snells law, but which applies to S rather than to k. This describes the direction that the energy associated with extraordinary rays takes upon entering the crystal. To calculate the direction that the extraordinary polarized S takes upon entering a crystal, we rst calculate the direction of k inside the crystal using Snells law (5.31). Then we use the expression (5.62) for E along with B = (k E)/, to evaluate S = E B/0 . In general, this process is best done numerically, since Snells law (5.31) for extraordinary polarized light usually does not have simple analytic solutions.
Example 5.3
Find a relationship between direction of the Poynting Vector in a uniaxial crystal and the angle of incidence in the special case where the optic axis is perpendicular to the surface. Solution: To nd the direction of energy ow, we must calculate S = E B/0 . We will need to know E associated with n e ( ). We can obtain E from the procedures outlined in appendix 5.C. Equivalently, we can obtain it from the constitutive relation (5.3) with the denitions (5.18):
0
E+P = =
0 0
+ 1 + y E y y + 1 + z E z z 1 + x E x x
2 2 2 + no + ne no Ex x Ey y Ez z
(5.33)
sin t + z cos t . Let the k-vector lie in the y -z plane. We may write it as k = k y Then the ordinary component of the eld points in the x -direction, while the extraordinary component lies in the y -z plane. Equation (5.33) requires sin t + z cos t k ( 0 E + P) = k y =
0 0
2 2 2 + no + ne no Ex x Ey y Ez z
2 2 k no E y sin t + n e E z cos t
(5.34)
=0
127
Therefore, the y and z components of the extraordinary eld are related through Ez =
2 no Ey 2 ne
tan t
(5.35)
tan t e i (krt )
The associated magnetic eld (see (2.56)) is sin t + z cos t E y y z o 2 tan t kE k y ne = e i (krt ) B= 2 kE y n o = x sin t tan t + cos t e i (krt ) 2 ne (extraordinary polarized) (5.37) The time-averaged Poynting vector then becomes
B 0
n2
St = Re {E} Re k |E y | 2
Let us label the direction of the Poynting vector with the angle S . By denition, the tangent of this angle is the ratio of the two vector components of S: tan S Sy Sz =
2 no 2 ne
tan t
While the k-vector is characterized by the angle t , the Poynting vector is characterized by the angle S . Combining (5.32) and (5.39), we can connect S to the incident angle i : tan S = ne n o sin i
2 ne sin2 i
As we noted in the last example, we have the case where ordinary polarized light is s -polarized light, and extraordinary polarized light is p -polarized light due to our specic choice of orientation for the optic axis in this section. In general, the s - and p -polarized portions of the incident light can each give rise to both extraordinary and ordinary rays.
128
x y y y z y
xz Fx y z F y zz Fz
(5.41)
The column vector on the left represents the components of the displacement r. We next invert (5.41) to nd the force of the electric eld on an electron as a function of its displacement4
Fx k xx Fy = kyx Fz k zx
kx y ky y kz y
k xz x kyz y k zz z
(5.42)
where k xx kyx k zx
kx y ky y kz y
k xz xx 2 N qe y x kyz 0 k zz zx
x y y y z y
1 xz y z zz
(5.43)
The total work done on an electron in moving it to its displaced position is given by W=
path
F (r ) d r
(5.44)
While there are many possible paths for getting the electron to any specic displacement (each path specied by a different history of the electric eld), the work done along any of these paths must be the same if the system is conservative + yy we could (i.e. no absorption). For example, for a nal displacement of r = x x have the following two paths:
(x,y,0)
Path 2
(x,y,0)
Path 1
(0,0,0)
(0,0,0)
4 This inversion assumes the eld changes slowly so the forces on the electron are always essentially balanced. This is not true for optical elds, but the proof gives the right avor for why conservation of energy results in the symmetry. A more formal proof that doesnt make this assumption can be found in Principles of Optics, 7th Ed., Born and Wolf, pp. 790-791.
129
We can use (5.42) in (5.44) to calculate the total work done on the electron along path 1:
x y
W= = =
0 x 0
F x (x , y = 0, z = 0)d x +
y
F y (x = x , y , z = 0)d y
k xx x d x +
(k y x x + k y y y ) d y
ky y 2 k xx 2 x + kyx x y + y 2 2
y x
F x (x , y = y , z = 0)d x
ky y y d y +
(k xx x + k x y y ) d x k xx 2 x 2
ky y 2
y 2 + kx y x y +
Since the work must be the same for these two paths, we clearly have k x y = k y x . Similar arguments for other pairs of dimensions ensure that the matrix of k coefcients is symmetric. From linear algebra, we learn that if the inverse of a matrix is symmetric then the matrix itself is also symmetric. When we combine this result with the denition (5.43), we see that the assumption of no absorption requires the susceptibility matrix to be symmetric.
(5.45)
xx x y xz
x y y y y z
xz y z zz
(5.46)
Our task is to nd a new coordinate system x , y , and z for which the susceptibility tensor is diagonal. That is, we want to choose x , y , and z such that P = where
Ex E Ey Ez Px P Py Pz
0
E,
(5.47)
xx 0 0
0 y 0
y
0 0 z z
(5.48)
130
To arrive at the new coordinate system, we are free to make pure rotation transformations. In a manner similar to (6.29), a rotation through an angle about the z -axis, followed by a rotation through an angle about the resulting y -axis, and nally a rotation through an angle about the new x -axis, can be written as
R 11 R 12 R 13 R R21 R22 R23 R 31 R 32 R 33 1 0 0 cos 0 sin cos sin 0 0 1 0 sin cos 0 = 0 cos sin 0 sin cos sin 0 cos 0 0 1 cos cos cos sin sin sin cos = cos sin sin sin cos cos cos sin sin sin sin sin cos sin cos sin cos cos sin sin cos cos (5.49)
The matrix R produces an arbitrary rotation of coordinates in three dimensions. Specically, we can write: E = RE (5.50) P = RP These transformations can be inverted to give E = R1 E P = R1 P where
cos cos cos sin sin sin cos cos cos sin sin sin = cos sin sin sin cos R 11 R 21 R 31 = R 12 R 22 R 32 = RT R 13 R 23 R 33 sin sin cos sin cos sin cos cos sin sin cos cos (5.52)
(5.51)
Note that the inverse of the rotation matrix is the same as its transpose, an important feature that we exploit in what follows. Upon inserting (5.51) into (5.45) we have
R1 P = 0 R1 E
or P =
0
(5.53)
RR1 E
(5.54)
131
From this equation we see that the new susceptibility tensor we seek for (5.47) is RR1 R 11 R 12 R 13 xx = R 21 R 22 R 23 x y R 31 R 32 R 33 xz x x x y x z = x y y y y z x z y z z z
x y y y y z
xz R 11 y z R 12 zz R 13
R 21 R 22 R 23
R 31 R 32 R 33
(5.55)
We have expressly indicated that the off-diagonal terms of are symmetric (i.e. i j = j i ). This can be veried by performing the multiplication in (5.55). It is a consequence of being symmetric and R1 being equal to RT The three off-diagonal elements of (appearing both above and below the diagonal) are found by performing the matrix multiplication in the second line of (5.55). The specic expressions for these three elements are not particularly enlightening. The important point is that we can make all three of them equal to zero since we have three degrees of freedom in the angles , , and . Although, we do not expressly solve for the angles, we have demonstrated that it is always possible to set x y = 0 x z = 0 y z = 0 This justies (5.3). (5.56)
kx k y
2 c2 2 2 1 + y kx kz
kx kz k y kz
2 c2 2 2 1 + z k x ky
kx k y kx kz
k y kz
Ex Ey = 0 Ez (5.57)
2 2 2 where we have used k x + ky + kz = k 2 . We can divide every element by k 2 and employ the denitions (5.15), (5.17), and (5.18) to make this matrix equation look
5 A. Yariv and P . Yeh, Optical Waves in Crystals, Sect. 4.2 (New York: Wiley, 1984).
132
slightly nicer:
2 nx n2
2 u2 y uz
ux u y
n2 y n2 2 2 uz ux
ux uz u y uz
2 nz n2
ux u y ux uz
u y uz
2 u2 ux y
Ex Ey = 0 Ez
(5.58)
For (5.58) to have a non-trivial solution (i.e. non zero elds), the determinant of the matrix must be zero. Imposing this requirement is an equivalent way to derive Fresnels equation (5.19) for n . and a value for n (from Fresnels equation), we can use Given a direction for u (5.58) to determine the direction of the electric eld associated with that index. It is left as an exercise to show that when all three components are nonzero (i.e. u x = 0, u y = 0, and u z = 0), the appropriate eld direction for a value of n is given by ux
Ex Ez
Ey
2 n2 nx uy 2 n n2 y uz
(5.59)
2 n2 nz
This is a proportionality rather than an equation because Maxwells equations only specify the direction of Ewe are free to choose the amplitude. Because Fresnels equation gives two values for n , (5.59) species two distinct polarization . These polarization components associated with each propagation direction u components form a natural basis for describing light propagation in a crystal. When light is composed of a mixture of these two polarizations, the two polarization components experience different indices of refraction. (i.e. u x , u y , or u z ) is precisely zero, the correIf any of the components of u sponding entry in (5.59) yields a zero-over-zero situation. This happens when at least one of the dimensions in (5.58) becomes decoupled from the others. In these cases, you can and re-solve (5.58) for the polarization directions as in the following example.
Example 5.4
Determine the directions of the two polarization components associated with light =z direction. (Compare with Example 5.1.) propagating in the u Solution: In this case we have u x = u y = 0, so as noted above, we have to go back to (5.58) and re-solve. In our case, the set of equations becomes
2 nx n2
1 0 0
n2 y n2
0 1 0
Ex =0 0 Ey 2 Ez nz
n2
(5.60)
133
Notice that all three dimensions are decoupled in this system (i.e. there are no off-diagonal terms). In Example 5.1 we found that the two values of n associated =z are n x and n y . If we use n = n x in our set of equations, we have with u 0
n2 y
2 nx
0 1 0
0 0
2 nx
Ex 0 Ey = 0 2 nz Ez
Assuming n x and n y are unique so that n y /n x = 1, these equations require E y = E z = 0 but allow E x to be non-zero. This proves our earlier assertion that the index =z . n x is associated with light polarized in the x -dimension in the special case of u Similarly, when n y is inserted into (5.60), we nd that it is associated with light polarized in the y -dimension.
We can use (5.59) to study the behavior of polarization direction as the direction of propagation varies. Figure 5.7 shows plots of the polarization direction (i.e. normalized E x , E y , and E z ) in Potassium Niobate as the propagation direction (5.24) is varied. The plot is created by inserting the spherical representation of u into Fresnels equation (5.20) for a chosen sign of the , and then inserting the resulting n into (5.59) to nd the associated electric eld. As we saw in Example 5.4, at = 0 the light associated with the slow index is polarized along the y -axis and the light associated with the fast index is polarized along the x -axis. In Fig. 5.7(c) we have plotted the angle between the two polarization components. At = 0, the two polarization components are 90 apart, as you might expect. However, notice that in other propagation directions the two linear polarization components are not precisely perpendicular. Even so, the two polarization components of E are orthogonal in a mathematical sense,6 so that they still comprise a useful basis for decomposing the light eld.
(5.61)
This eld component is associated with the ordinary wave because just as in an isotropic medium such as glass, the index of refraction for light with this polarization does not vary with . The polarization component associated with n e ( ) is
6 The two components of the electric displacement vector D = 0 E + P remain perpendiular.
Figure 5.7 Polarization direction associated with the two values of n in Potassium Niobate (KNbO3 ) at = 500 nm (n x = 2.22, n y = 2.34, and n z = 2.41) and = /4. Frame (c) shows the angle between the two polarization components.
134
sin cos
(5.62)
Notice that this polarization component is partially directed along the optic axis E e (u ) = 0 (see (i.e. it has a z -component), and it is not perpendicular to k since u P5.10). It is, however, perpendicular to the ordinary polarization component, since Ee Eo = 0. Notice that when = 0, (5.29) reduces to n = n o so that both indices are the same. On the other hand, if = /2 then (5.29) reduces to n = n e .
y-axis
135
by the hypotenuse of the right triangle seen in Fig. 5.8. Let the point where the wave front touches the ellipse be denoted by y , z = (z tan S , z ). The slope (rise over run) of the line that connects these two points is then z dz = dy ct / sin i z tan S (5.65)
At the point where the wave front touches the ellipse (i.e., y , z = (z tan S , z )), the slope of the curve for the ellipse is dz = dy
2 yn e
n o ct
y2 (c t /n e )2
2 y ne 2 z no
2 ne 2 no
tan S
(5.66)
We would like these two slopes to be the same. We therefore set them equal to each other:
2 ne 2 no
tan S =
(5.67)
tan2 S + 1
(5.68)
2 ne
no
tan2 S + 1 2
2
4 ne tan2 S 2 no sin2 i 2 no 2 ne
2 ne 2 no
tan2 S + 1 n o sin i ne
2 ne sin2 i
(5.69) (5.70)
2 ne
sin i
1 tan2 S =
tan S =
This agrees with (5.40) as anticipated. Again, Huygens approach obtained the correct direction of the Poynting vector associated with the extraordinary wave.
136
Exercises
Exercises for 5.2 Plane Wave Propagation in Crystals P5.1 Solve Fresnels equation (5.19) to nd the two values of n associated . Show that both solutions yield a positive index of with a given u refraction HINT: Show that (5.19) can be manipulated into the form
0= +
2 2 6 ux + u2 y + uz 1 n 2 2 2 2 2 2 2 2 2 2 2 nx + n2 y + nz ux n y + nz u y nx + nz uz nx + n y
n4
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 nx n y + nx nz + n2 y nz ux n y nz u y nx nz uz nx n y n + nx n y nz
The coefcient of n 6 is identically zero since by denition we have 2 2 ux + u2 y + u z = 1. P5.2 Suppose you have a crystal with n x = 1.5, n y = 1.6, and n z = 2.0. Use Fresnels equation to determine what the two indices of refraction are = ( for a k-vector in the crystal along the u x + 2 y + 3 z)/ 14 direction.
Exercises for 5.3 Biaxial and Uniaxial Crystals P5.3 Given that the optic axes are in the x -z plane, show that the direction of the optic axes are given by (5.25). HINT: The two indices are the same when B 2 4 AC = 0. You will want to use polar coordinates for the direction unit vector, as in (5.24). Set = 0 so you are in the x -z plane. Use sin2 + cos2 = 1 to get an equation that only has cosine terms and solve for cos2 . P5.4 Use denitions (5.26) and (5.27) along with the spherical representa (5.24) in Fresnels equation (5.20) to calculate the two values tion of u for the index in a uniaxial crystal (i.e. (5.28) and (5.29)). HINT: First show that
2 2 A = no sin2 + n e cos2 2 2 4 2 2 B = no ne + no sin2 + n e n o cos2 4 2 C = no ne
and then use these expressions to evaluate Fresnels equation. P5.5 Derive (5.32).
Exercises
137
P5.6
Suppose you have a quartz plate (a uniaxial crystal) with its optic axis oriented perpendicular to the surfaces. The indices of refraction for quartz are n o = 1.54424 and n e = 1.55335. A plane wave with wavelength vac = 633 nm passes through the plate. After emerging from the crystal, there is a phase difference between the two polarization components of the plane wave, and this phase difference depends on incident angle i . Use a computer to plot as a function of incident angle from zero to 90 for a plate with thickness d = 0.96 mm . HINT: For s -polarized light, show that the number of wavelengths that d t in the plate is (s ) . For p -polarized light, show that the number of wavelengths that t in the plate and the extra leg outside d of the plate (see Fig. 5.9) is , where (p ) + vac (vac /n p ) cos t
(s ) = d tan t tan t (p ) (vac /n o ) cos t
sin i
and n p is given by (5.29). Find the difference between these expressions and multiply by 2 to nd . L5.7 In the laboratory, send a HeNe laser (vac = 633 nm) through two crossed polarizers, oriented at 45 and 135 . Place the quartz plate described in P5.6 between the polarizers on a rotation stage. Now equal amounts of s - and p -polarized light strike the crystal as it is rotated from normal incidence. (video)
Dim spots Bright spots
Laser
Polarizer
Polarizer
Screen
Phase Difference
If the phase shift between the two paths discussed in P5.6 is an odd integer times , the polarization direction of the light transmitted through the crystal is rotated by 90 , and the maximum transmission through the second polarizer results. (In this conguration, the crystal acts as a half wave plate, which we discuss in Chapter 6. If the phase shift is an even integer times , then the polarization is rotated by 180 and minimum transmission through the second polarizer results. Plot these measured maximum and minimum points on your computergenerated graph of the previous problem.
Exercises for 5.C Electric Field in Crystals P5.8 Show that (5.59) is a solution to (5.58).
138
P5.9
Show that the eld polarization component associated with n = n o in a uniaxial crystal is directed perpendicular to the plane containing u by substituting this value for n into (5.58) and determining what and z combination of eld components are allowable. with = 0 (the index is the same for all HINT: Use (5.24) to represent u , so you may as well use one that makes calculation easy). When you substitute into (5.58) you will nd that E y can be any value because of the location of zeros in the matrix. To get a requirement on E x and E z , collapse the matrix equation down to a 2 2 system. For non-trivial solutions to exist (i.e. E x = 0 or E y = 0), the determinant of the matrix must be zero. Show that this is only the case if n o = n e (i.e. the crystal is isotropic).
P5.10
) in a Show that the electric eld for extraordinary polarized light Ee (u ), but that it is perpenuniaxial crystal is not perpendicular to k (i.e. u ). dicular to the ordinary polarization component Eo (u
Review, Chapters 15
To prepare for an exam, you should understand the following questions and problems thoroughly enough to be able to work them without referring back to previous chapters. True and False Questions R1 T or F: The optical index of any material (not vacuum) varies with frequency. T or F: The frequency of light can change as it enters a crystal (consider low intensityno nonlinear effects). T or F: The entire expression E0 e i (krt ) associated with a light eld (both the real part and the imaginary parts) is physically relevant. T or F: The real part of the refractive index cannot be less than one. T or F: s -polarized light and p -polarized light experience the same phase shift upon reection from a material with complex index. T or F: When light is incident upon a material interface at Brewsters angle, only one polarization can transmit. T or F: When light is incident upon a material interface at Brewsters angle one of the polarizations stimulates dipoles in the material to oscillate with orientation along the direction of the reected k-vector. T or F: The critical angle for total internal reection exists on both sides of a material interface. T or F: From any given location above a (smooth at) surface of water, it is possible to see objects positioned anywhere under the water. T or F: From any given location beneath a (smooth at) surface of water, it is possible to see objects positioned anywhere above the water. T or F: An evanescent wave travels parallel to an interface surface on the transmitted side. 139
R2
R3
R4 R5
R6
R7
R8
R9
R10
R11
140
Review, Chapters 15
R12
T or F: When p -polarized light enters a material at Brewsters angle, the intensity of the transmitted beam is the same as the intensity of the incident beam. T or F: For incident angles beyond the critical angle for total internal reection, the Fresnel coefcients t s and t p are both zero. T or F: It is always possible to completely eliminate reections using a single-layer antireection coating if you are free to choose the coating thickness but not its index. T or F: For a given incident angle and value of n , there is only one single-layer coating thickness d that will minimize reections. T or F: When coating each surface of a lens with a single-layer antireection coating, the thickness of the coating on the exit surface will need to be different from the thickness of the coating on the entry surface. T or F: As light enters a crystal, the Poynting vector always obeys Snells law. T or F: As light enters a crystal, the k-vector does not obey Snells law for the extraordinary wave.
R13
R14
R15
R16
R17
R18
Problems R19 (a) Write down Maxwells equations. (b) Derive the wave equation for E under the assumptions that Jfree = 0 and P = 0 E. Note: ( f) = ( f) 2 f. (c) Show by direct substitution that E (r, t ) = E0 e i (krt ) is a solution to the wave equation. Find the resulting connection between k and . Give appropriate denitions for c and n , assuming that is real. and E0 = E 0 x , nd the associated B-eld. (d) If k = k z (e) The Poynting vector is S = E B/0 , where the elds are real. Derive an expression for I S t .
z-axis x-axis directed into page
R20
Consider an interface between two isotropic media where the incident eld is dened by Ei = E i
(p )
(a) By inspection of the gure, write down similar expressions for the reected and transmitted elds (i.e. Er and Et ).
141
(b) Find an expression relating Ei , Er , and Et using the boundary condition at the interface. From this expression obtain the law of reection and Snells law. (c) The boundary condition requiring that the tangential component of B must be continuous leads to n i (E i E r ) = n t E t
(p ) (p ) (p )
Use this and the results from part (b) to derive rp You may use the identity sin i cos i sin t cos t tan (i t ) = sin i cos i + sin t cos t tan (i + t ) R21 The Fresnel equations are rs
(s ) Er
Er Ei
(p ) (p )
tan (i t ) tan (i + t )
E i(s )
(s ) Et
sin t cos i sin i cos t sin t cos i + sin i cos t 2 sin t cos i sin t cos i + sin i cos t cos t sin t cos i sin i cos t sin t + cos i sin i 2 cos i sin t cos t sin t + cos i sin i
ts
Ei
(s )
rp
Er Ei
(p ) (p )
tp
Et Ei
(p )
(p )
(a) Find what each of these equations reduces to when i = 0. Give your answer in terms of n i and n t . (b) What percent of light (intensity) reects from a glass surface (n = 1.5) when light enters from air (n = 1) at normal incidence? (c) What percent of light reects from a glass surface when light exits into air at normal incidence? R22 Light goes through a glass prism with optical index n = 1.55. The light enters at Brewsters angle and exits at normal incidence as shown in Fig. 5.13. (a) Derive and calculate Brewsters angle B . You may use the results of R20 (c).
Figure 5.13
142
Review, Chapters 15
(b) Calculate . (c) What percent of the light (power) goes all the way through the prism if it is p -polarized? You may use the Fresnel coefcients given in R21. (d) What percent for s -polarized light? R23 A 45 - 90 - 45 prism is a good device for reecting a beam of light parallel to the initial beam (see Fig. 5.14). The exiting beam will be parallel to the entering beam even when the incoming beam is not normal to the front surface (although it needs to be in the plane of the drawing). (a) How large an angle can be tolerated before there is no longer total internal reection at both interior surfaces? Assume n = 1 outside of the prism and n = 1.5 inside. (b) If the light enters and leaves the prism at normal incidence, what will the difference in phase be between the s and p -polarizations? You may use the Fresnel coefcients given in R21. R24 A thin glass plate with index n = 1.5 is oriented at Brewsters angle so that p -polarized light with wavelength vac = 500 nm goes through with 100% transmittance. (a) What is the minimum thickness that will make the reection of s -polarized light be maximum? (b) What is the total transmittance T stot for this thickness assuming s -polarized light? R25 Consider a Fabry-Perot interferometer. Note: R 1 = R 2 = R . (a) Show that the free spectral range for a Fabry-Perot interferometer is FSR = (b) Show that the fringe width is FWHM = where F
4R . (1R )2
Figure 5.14
2 2nd cos
2 F nd cos
(c) Derive the reecting nesse f = FSR /FWHM . R26 For a Fabry-Perot etalon, let R = 0.90, vac = 500 nm, n = 1, and d = 5.0 mm. (a) Suppose that a maximum transmittance occurs at the angle = 0. What is the nearest angle where the transmittance will be half of the maximum transmittance? You may assume that cos = 1 2 /2.
143
(b) You desire to use a Fabry-Perot etalon to view the light from a large diffuse source rather than a point source. Draw a diagram depicting where lenses should be placed, indicating relevant distances. Explain briey how it works. R27 You need to make an antireective coating for a glass lens designed to work at normal incidence. The matrix equation relating the incident eld to the reected and transmitted elds (at normal incidence) is 1 n0 + 1 n 0 E0 = E0 cos k 1 i n 1 sin k 1
i n1
sin k 1 cos k 1
1 n2
E2 E0
(a) What is the minimum thickness the coating should have? HINT: It is less work if you can gure this out without referring to the above equation. You may assume n 1 < n 2 . (b) Find the index of refraction n 1 that will make the reectivity be zero. R28 Second harmonic generation (the conversion of light with frequency into light with frequency 2) can occur when very intense laser light travels in a material. For good harmonic production, the laser light and the second harmonic light need to travel at the same speed in the material. In other words, both frequencies need to have the same index of refraction so that harmonic light produced down stream joins in phase with the harmonic light produced up stream, referred to as phase matching. This ensures a coherent building of the second harmonic eld rather than destructive cancellations. Unfortunately, the index of refraction is almost never the same for different frequencies in a given material, owing to dispersion. However, we can achieve phase matching in some crystals where one frequency propagates as an ordinary wave and the other propagates as an extraordinary wave. We cause the two indices to be precisely the same by tuning the angle of the crystal. Consider a ruby laser propagating and generating the second harmonic in a uniaxial KDP crystal (potassium dihydrogen phosphate). The indices of refraction are given by n o and no ne
2 2 cos2 no sin2 + n e
Figure 5.15
where is the angle made with the optic axis. At the frequency of a ruby laser, KDP has indices n o () = 1.505 and n e () = 1.465. At the frequency of the second harmonic, the indices are n o (2) = 1.534 and n e (2) = 1.487.
144
Review, Chapters 15
Show that phase matching can be achieved if the laser is polarized so that it experiences only the ordinary index and the second harmonic light is polarized perpendicular to that. At what angle does this phase matching occur?
Selected Answers
R21: (b) 4% (c) 4%. R22: (b) 33 , (c) 95%, (d) 79%. R23: (a) 4.8 , (b) 74 . R24: (a) 100 nm. (b) 0.55. R26: (a) 0.074 . P27: (b) 1.24. R28: 51.12 .
Chapter 6
Polarization of Light
When the direction of the electric eld of light oscillates in a regular, predictable fashion, we say that the light is polarized. Polarization describes the direction of the oscillating electric eld, a distinct concept from dipoles per volume in a material P also called polarization. In this chapter, we develop a formalism for describing polarized light and the effect of devices that modify polarization. If the electric eld oscillates in a plane, we say that it is linearly polarized. The electric eld can also spiral around while a plane wave propagates, and this is called elliptical polarization. There is a convenient way for keeping track of polarization using a two-dimensional Jones vector. Many devices can affect polarization such as polarizers and wave plates. Their effects on a light eld can be represented by 2 2 Jones matrices that operate on the Jones vector representing the light. A Jones matrix can describe, for example, a linear polarizer oriented at an arbitrary angle with respect to the coordinate system. Likewise, a Jones matrix can describe the manner in which a wave plate introduces a relative phase between two components of the electric eld. A wave plate can be used to convert, for example, linearly polarized light into circularly polarized light. In this chapter, we will also see how reection and transmission at a material interface inuences eld polarization. The Fresnel coefcients studied in chapters 3 and 4 can be conveniently incorporated into the 2 2 matrix formulation for handling polarization. As we saw previously, the amount of light reected from a surface depends on the type of polarization, s or p . In addition, upon reection, s -polarized light can acquire a phase lag or phase advance relative to p -polarized light. This is especially true at metal surfaces, which have complex indices of refraction. Ellipsometry, outlined in appendix 6.A, is the science of characterizing optical properties of materials through an examination of these effects. Throughout this chapter, we consider light to have well characterized polarization. However, most common sources of light (e.g. sunlight or a light bulb) have an electric-eld direction that varies rapidly and randomly. Such sources are commonly referred to as unpolarized. It is common to have a mixture of unpolarized and polarized light, called partially polarized light. The Jones vector 145
146
formalism used in this chapter is inappropriate for describing the unpolarized portions of the light. In appendix 6.B we describe a more general formalism for dealing with light having an arbitrary degree of polarization.
The wave vector k species the direction of propagation. We neglect absorption so that the refractive index is real and k = n /c = 2n /vac (see (2.19)(2.24)). In an isotropic medium we know that k and E0 are perpendicular, but even after the direction of k is specied, we are still free to have E0 point anywhere in the two dimensions perpendicular to k. If we orient our coordinate system with the z -axis in the direction of k, we can write (6.1) as + Eyy e i (kz t ) E (z , t ) = E x x (6.2)
As always, only the real part of (6.2) is physically relevant. The complex amplitudes of E x and E y keep track of the phase of the oscillating eld components. In general the complex phases of E x and E y can differ, so that the wave in one of the dimensions lags or leads the wave in the other dimension. The relationship between E x and E y describes the polarization of the light. For example, if E y is zero, the plane wave is said to be linearly polarized along the x -dimension. Linearly polarized light can have any orientation in the x y plane, and it occurs whenever E x and E y have the same complex phase (or a phase differing by an integer times ). For our purposes, we will take the x -dimension to be horizontal and the y -dimension to be vertical unless otherwise noted. As an example, suppose E y = i E x , where E x is real. The y -component of the eld is then out of phase with the x -component by the factor i = e i /2 . Taking the real part of the eld (6.2) we get + Re e i /2 E x e i (kz t ) y E (z , t ) = Re E x e i (kz t ) x
y x z
Figure 6.2 The combination of two orthogonally polarized plane waves that are out of phase results in elliptically polarized light. Here we have left circularly polarized light created as specied by (6.3).
In this example, the eld in the y -dimension lags behind the eld in the x dimension by a quarter cycle. That is, the behavior seen in the x -dimension happens in the y -dimension a quarter cycle later. The eld never goes to zero simultaneously in both dimensions. In fact, in this example the strength of the electric eld is constant, and it rotates in a circular pattern in the x - y dimensions. For this reason, this type of eld is called circularly polarized. Figure 6.2 graphically shows the two linear polarized pieces in (6.3) adding to make circularly polarized light.
147
If we view a circularly polarized light eld throughout space at a frozen instant in time (as in Fig. 6.2), the electric eld vector spirals as we move along the z dimension. If the sense of the spiral (with time frozen) matches that of a common wood screw oriented along the z -axis, the polarization is called right handed. (It makes no difference whether the screw is ipped end for end.) If instead the eld spirals in the opposite sense, then the polarization is called left handed. The eld shown in Fig. 6.2 is an example of left-handed circularly polarized light. An equivalent way to view the handedness convention is to imagine the light impinging on a screen as a function of time. The eld of a right-handed circularly polarized wave rotates counter clockwise at the screen, when looking along the k direction (towards the front side of the screen). The eld rotates clockwise for a left-handed circularly polarized wave. Linearly polarized light can become circularly or, in general, elliptically polarized after reection from a metal surface if the incident light has both s - and p -polarized components. A good experimentalist working with light needs to know this. Reections from multilayer dielectric mirrors can also exhibit these phase shifts.
(6.4)
(6.5)
spent most of his professional career at Polaroid Corporation in Cambridge MA, until his retirement in 1982. He is well-known for a series of papers on polarization published during the period
(6.6) (6.7)
1941-1956. He also contributed greatly to the development of infrared detectors. He was an avid train enthusiast, and even wrote papers on railway engineering. See J. Opt. Soc. Am. p. 52 (Aug. 2004). (1972). Also see SPIE oemagazine,
|E x | |E x |2 + E y
2
63, 519-522
Ey |E x |2 + E y
2
(6.8)
148
y x
(6.9)
Linearly polarized along x 1 0 Linearly polarized along y 0 1 Linearly polarized at angle (measured from the x -axis) cos sin Right circularly polarized 1 2 1 i
Please notice that A and B are real non-negative dimensionless numbers that satisfy A 2 + B 2 = 1. If E y is zero, then B = 0 and everything is well-dened. On the other hand, if E x happens to be zero, then its phase e i x is indeterminant. In this case we let E eff = |E y |e i y , B = 1, and = 0. The overall eld strength E eff is often unimportant in a discussion of polarization. It represents the strength of an effective linearly polarized eld that would correspond to the same intensity as (6.4). Specically, from (6.5) and (2.62) we have 1 1 I = S t = nc 0 E E = nc 0 |E eff |2 (6.10) 2 2 The phase of E eff represents an overall phase shift that one can trivially adjust by physically moving the light source (a laser, say) forward or backward by a fraction of a wavelength. The portion of (6.5) that is relevant to our discussion of polarization is the +B e i y , referred to as the Jones vector. This vector contains the essential vector A x information regarding eld polarization. Notice that the Jones vector is a kind + B e i y ) ( Ax + B e i y ) = 1. (The asterisk represents of unit vector, in that ( A x and the complex conjugate.) When writing a Jones vector we dispense with the x notation and organize the components into a column vector (for later use in y matrix algebra) as follows: A (6.11) B ei This vector can describe the polarization state of any plane wave eld. Table 6.1 lists some Jones vectors representing various polarization states.
with respect to the x -axis (see P6.8). This angle sometimes corresponds to the minor axis and sometimes to the major axis of the ellipse, depending on the exact values of A , B , and . The other axis of the ellipse (major or minor) then occurs at /2 (see Fig. 6.3). We can deduce whether (6.12) corresponds to the major or minor axis of the ellipse by comparing the strength of the electric eld when it spirals through the
149
direction specied by and when it spirals through /2. The strength of the electric eld at is given by (see P6.8) E = |E eff | A 2 cos2 + B 2 sin2 + AB cos sin 2 (E max or E min ) (6.13)
and the strength of the eld when it spirals through the orthogonal direction ( /2) is given by E /2 = |E eff | A 2 sin2 + B 2 cos2 AB cos sin 2 (E max or E min ) (6.14)
After computing (6.13) and (6.14), we decide which represents E min and which E max according to E max E min (6.15) We could predict in advance which of (6.13) or (6.14) corresponds to the major axis and which corresponds to the minor axis. However, making this prediction is as complicated as simply evaluating (6.13) and (6.14) and determining which is greater. Elliptically polarized light is often characterized by the ellipticity, given by the ratio of the minor axis to the major axis: e E min E max (6.16)
Figure 6.3 The electric eld of elliptically polarized light traces an ellipse in the plane perpendicular to its propagation direction. The two plots are for different values of A , B , and . The angle can describe the major axis (top) or the minor axis (bottom), depending on the values of these parameters.
The ellipticity e ranges between zero (corresponding to linearly polarized light) and one (corresponding to circularly polarized light). Finally, the helicity or handedness of elliptically polarized light is as follows (see P6.2): 0<< < < 2 left-handed helicity right-handed helicity (6.17) (6.18)
150
Transmission Axis
Figure 6.4 Light transmitting through a Polaroid sheet. The conducting polymer chains run vertically in this drawing, and light polarized along the chains is absorbed. Light polarized perpendicular to the polymer chains passes through the polarizer.
polymer molecules. For this polarization component, the wave passes through the material much like it does through typical dielectrics such as glass (i.e. the refractive index is real). Today, there is a wide variety of technologies for making polarizers, many very different from Polaroid. A polarizer can be represented as a 22 matrix that operates on Jones vectors.2 The function of a polarizer is to pass only the component of electric eld that is oriented along the polarizer transmission axis. Thus, if a polarizer is oriented with its transmission axis along the x -dimension, then only the x -component of polarization transmits; the y -component is killed. If the polarizer is oriented with its transmission axis along the y -dimension, then only the y -component of the eld transmits, and the x -component is killed. These two scenarios can be represented with the following Jones matrices: 1 0 0 0 0 0 0 1 (polarizer with transmission along x-axis) (6.19) (polarizer with transmission along y-axis) (6.20)
These matrices operate on any Jones vector representing the polarization of incident light. The result gives the Jones vector for the light exiting the polarizer. Example 6.1
Use the Jones matrix (6.19) to calculate the effect of a horizontal polarizer on light that is initially horizontally polarized, vertically polarized, and arbitrarily polarized. Solution: First we consider a horizontally polarized plane wave traversing a polarizer with its transmission axis oriented also horizontally (x -dimension): 1 0 0 0 1 0 = 1 0 (horizontal polarizer on horizontally polarized eld)
As expected, the polarization state is unaffected by the polarizer. (We have ignored possible attenuation from surface reections.) Now consider vertically polarized light traversing the same horizontal polarizer. In this case, we have: 1 0 0 0 0 1 = 0 0 (horizontal polarizer on vertical linear polarization)
As expected, the polarizer extinguishes the light. Finally, when a horizontally oriented polarizer operates on light with an arbitrary Jones vector (6.11), we have 1 0 0 0 A B ei = A 0 (horizontal polarizer on arbitrary polarization)
151
While you might readily agree that the matrices given in (6.19) and (6.20) can be used to get the right result for light traversing a horizontal or a vertical polarizer, you probably arent very impressed as of yet. In the next few sections, we will derive Jones matrices for a number of optical elements that can modify polarization: polarizers at arbitrary angle, wave plates at arbitrary angle, and reection or transmissions at an interface. Table 6.2 shows Jones matrices for each of these devices. Before deriving these specic Jones matrices, however, we take a moment to appreciate why the Jones matrix formulation is useful. The real power of the formalism becomes clear as we consider situations where light encounters multiple polarization elements in sequence. In these situations, we use a product of Jones matrices to represent the effect of the compound systems. We can represent this situation by A B = Jsystem A B ei (6.21)
Linear polarizer cos2 sin cos Half wave plate cos 2 sin 2 sin 2 cos 2 sin cos sin2
where the unprimed Jones vector represents light going into the system and the primed Jones vector represents light emerging from the system. In general, A and B will turn out to be complex. However, if desired they can be changed into the usual form by writing A B = e i A |A | |B |e i
Quarter wave plate cos2 + i sin2 (1 i ) sin cos (1 i ) sin cos sin2 + i cos2 Right circular polarizer 1 2 1 i i 1
where A is an unimportant overall phase, and is the phase difference between B and A . The matrix Jsystem is a Jones matrix formed by the series polarization devices. If there are N devices in the system, the compound matrix is calculated as
1 2
1 i
i 1
Jsystem JN JN 1 J2 J1
(6.22)
where Jn is the matrix for the n th polarizing optical element encountered in the system. Notice that the matrices operate on the Jones vector in the order that the light encounters the devices. Therefore, the matrix for the rst device (J1 ) is written on the right, and so on until the last device encountered, which is written on the left, farthest from the Jones vector. When part of the light is absorbed by passing through one or more polarizers in a system, the Jones vector of the exiting light does not necessarily remain normalized to magnitude one (see Example 6.1). Since the components of a Jones vector represent the electric eld, we nd the factor by which the intensity of the light decreases by dotting the vector with its complex conjugate. In accordance with (6.10), the intensity of the exiting light is 1 I = nc 2 1 = nc 2
0 |E eff |
Table 6.2 Common Jones Matrices. The angle is measured with respect to the x -axis and species the transmission axis of a linear polarizer or the fast axis of a wave plate.
+ B ei y Ax + B ei y Ax
2
2 A 0 |E eff |
+ B
(6.23)
152
Notice that the intensity is attenuated by the factor A + B after propagating through the system. Recall that E eff represents the effective strength of the eld before it enters the polarizer (or other device), so that the initial Jones vector is normalized to one (see (6.10)). As a reminder, we normally remove an overall phase factor from the Jones vector so that A is real and non-negative, and we choose so that B is real and non-negative. However, if we dont bother doing this, the absolute value signs on A and B in (6.23) ensure that we get the correct value for intensity.
Transmission Axis
In this section we develop a Jones matrix for describing an ideal polarizer with its transmission axis at an arbitrary angle from the x -axis. We will do this in a general context so that we can take advantage of the present work when discussing wave plates in the next section. To help keep things on a more conceptual level, we revert back to using electric eld components directly. We will make the connection with Jones calculus at the end. The polarizer acts on a plane wave with arbitrary polarization. The electric eld of our plane wave may be written as + Eyy e i (kz t ) E (z , t ) = E x x (6.24)
Transmitted component
Figure 6.5 Light transmitting through a polarizer oriented with transmission axis at angle from x -axis.
1 Let the transmission axis of the polarizer be specied by the unit vector e 2 (orthogonal to the and the absorption axis of the polarizer be specied by e 1 is oriented at an angle from the x -axis, as transmission axis). The vector e shown in Fig. 6.6. We need to write the electric eld components in terms of the 1 and e 2 . By inspection of the geometry, the x - y unit new basis specied by e vectors are connected to the new coordinate system via: = cos e 1 sin e 2 x = sin e 1 + cos e 2 y Substitution of (6.25) into (6.24) yields for the electric eld 1 + E2e 2 ) e i (kz t ) E (z , t ) = (E 1 e where E 1 E x cos + E y sin E 2 E x sin + E y cos (6.26) (6.25)
(6.27)
Now we introduce the effect of the polarizer on the eld: E 1 is transmitted unaffected, while E 2 is extinguished. To account for the effect of the device, we multiply E 2 by a parameter . In the case of the polarizer, is zero, but when we consider wave plates we will use other values for . After traversing the polarizer, the eld becomes 1 + E 2 e 2 ) e i (kz t ) Eafter (z , t ) = (E 1 e (6.28)
153
We now have the eld after the polarizer, but it would be nice to rewrite it in terms of the original x y basis. By inverting (6.25), or again by inspection of Fig. 6.6, we see that 1 = cos x + sin y e (6.29) 2 = sin x + cos y e Substitution of these relationships into (6.28) together with the denitions (6.27) for E 1 and E 2 yields + sin y Eafter (z , t ) = E x cos + E y sin cos x + cos y e i (kz t ) + E x sin + E y cos sin x e i (kz t ) = E x cos2 + sin2 + E y (sin cos sin cos ) x e i (kz t ) + E x (sin cos sin cos ) + E y sin2 + cos2 y (6.30) Notice that if = 1 (i.e. no polarizer), then we get back exactly what we started with (i.e. (6.30) reduces to (6.24)). To get to the Jones matrix for the polarizer, we note that (6.30) is a linear mixture of E x and E y which can be represented with matrix algebra. If we represent the electric eld as a two-dimensional column vector with its x component in the top and its y component in the bottom (like a Jones vector), then we can rewrite (6.30) as
cos2 + sin2 sin cos sin cos sin cos sin cos sin2 + cos2 Ex Ey e i (kz t ) (6.31)
Eafter (z , t ) =
The matrix here is a properly normalized Jones matrix, even though we did not bother factoring out E eff to make a properly normalized Jones vector, as specied in (6.5). We can now write down the Jones matrix for a polarizer by inserting = 0 into the matrix: cos2 sin cos sin cos sin2 (polarizer with transmission axis at angle ) (6.32)
Notice that when = 0 this matrix reduces to that of a horizontal polarizer (6.19), and when = /2, it reduces to that of a vertical polarizer (6.20).
154
When a plane wave passes through a wave plate, the component of the electric eld oriented along the fast axis travels faster than its orthogonal counterpart, which introduces a relative phase between the two polarization components. As light passes through a wave plate of thickness d , the phase difference that accumulates between the fast and the slow polarization components is k slow d k fast d = 2 d (n slow n fast ) vac (6.33)
Slow axis
By adjusting the thickness of the wave plate, one can introduce any desired phase difference. The most common types of wave plates are the quarter-wave plate and the half-wave plate. The quarter-wave plate introduces a phase difference of k slow d k fast d = /2 + 2m
Fast axis
between the two polarization components, where m is an integer. This means that the polarization component along the slow axis is delayed spatially by a quarter wavelength (or ve quarters, etc.). The half-wave plate introduces a phase difference of k slow d k fast d = + 2m (half-wave plate) (6.35)
where m is an integer. This means that the polarization component along the slow axis is delayed spatially by a half wavelength (or three halves, etc.). When m = 0 in either (6.34) or (6.35), the wave plate is said to be zero order. The derivation of the Jones matrix for the two wave plates is essentially the 1 correspond same as the derivation for the polarizer in the previous section. Let e 2 correspond to the slow axis, as illustrated in Fig. 6.7. We to the fast axis, and let e proceed as before. However, instead of setting equal to zero in (6.31), we must choose values for appropriate for each wave plate. Since nothing is absorbed, should have a magnitude equal to one. The important feature is the phase of . As seen in (6.33), the eld component along the slow axis accumulates excess phase relative to the component along the fast axis, and we let account for this. In the case of the quarter-wave plate, the appropriate factor from (6.34) is = e i /2 = i (quarter-wave plate) (6.36)
This describes a relative phase delay for the light emerging with polarization along the slow axis. Substituting (6.36) into (6.30) yields the Jones matrix for a quarter wave plate: quarter-wave plate Jones matrix cos2 + i sin2 sin cos i sin cos sin cos i sin cos sin2 + i cos2 (6.37)
For the half-wave plate, the appropriate factor applied to the slow axis is = e i = 1 (half-wave plate) (6.38)
155
and the Jones matrix becomes: cos2 sin2 2 sin cos 2 sin cos sin2 cos2 = cos 2 sin 2 sin 2 cos 2 (6.39) half-wave plate Jones matrix
Remember that refers to the angle that the fast axis makes with respect to the x -axis. Before moving on, consider the following two examples that illustrate how wave plates are often used: Example 6.2
Calculate the Jones matrix for a quarter-wave plate at = 45 , and determine its effect on horizontally polarized light. Solution: At = 45 , the Jones matrix for the quarter-wave plate (6.37) reduces to e i /4 2 1 i i 1 (quarter-wave plate, fast axis at = 45 )
(6.40)
Figure 6.8 Animation showing effects of polarizers and wave plates on polarized light.
The overall phase factor e i /4 in front is not important since it merely accompanies the overall phase of the beam, which can be adjusted arbitrarily by moving the light source forwards or backwards through a fraction of a wavelength. Now we calculate the effect of the quarter-wave plates (oriented at = 45 ) operating on horizontally polarized light: 1 2 1 i i 1 1 0 = 1 2 1 i (6.41)
The previous example shows that a quarter-wave plate (properly oriented) can turn linearly polarized light into right-circularly polarized light (see Table 6.1). On the other hand, as seen in the next example, a half-wave plate can rotate the polarization angle of linearly polarized light by varying degrees while preserving the linear polarization. Example 6.3
Calculate the effect of a half wave plate at an arbitrary on horizontally polarized light. Solution: Carrying out the multiplication, we obtain cos 2 sin 2 sin 2 cos 2 1 0 = cos 2 sin 2 (6.42)
The resulting Jones vector describes linearly polarized light an angle of = 2 from the x -axis (see Table 6.1).
156
Figure 6.9 Incident, reected and transmitted plane waves, each propagating along the z -axis of its own reference frame.
By convention, we place the minus sign on the coefcient r p to take care of handedness inversion. We could put the minus sign on r s instead; the important point is that the two polarizations acquire a relative phase differential of when the propagation direction ips.3 The Fresnel coefcients specify the ratios of the exiting elds to the incident ones. When (6.43) operates on an arbitrary Jones vector such as (6.11), r p
3 The minus sign is needed for our specic convention of eld directions, as drawn in Fig. 6.9. In our convention, r s and r p are identical at normal incidence.
6.A Ellipsometry
157
multiplies the horizontal component of the eld, and r s multiplies the vertical component of the eld. Especially in the case of reection from an absorbing surface such as a metal, the phases of the two polarization components can vary markedly (see P6.11). Thus, linearly polarized light containing both s - and p components in general becomes elliptically polarized when reected from such a surface. When light undergoes total internal reection, again the phases of the s and p -components differ markedly, which can cause linearly polarized light to become elliptically polarized (see P6.12). Transmission through a material interface can also inuence the polarization of the eld, although typically to a lesser degree. However, there is no handedness inversion, since the light continues on in a forward sense. The Jones matrix for transmission is tp 0 0 ts (Jones matrix for transmission) (6.44)
If a beam of light encounters a series of mirrors, the nal polarization is determined by multiplying the sequence of appropriate Jones matrices (6.43) onto the initial polarization. This procedure is straightforward if the normals to all of the mirrors lie in a single plane (say parallel to the surface of an optical bench). However, if the beam path deviates from this plane (due to vertical tilt on the mirrors), then we must reorient our coordinate system before each mirror to have a new horizontal (p -polarized dimension) and the new vertical (s -polarized dimension). Earlier in this chapter we performed a rotation of a coordinate system through an angle , described in (6.27), which is also useful here. The rotation can be accomplished by multiplying the following matrix onto the incident Jones vector: cos sin sin cos (rotation of coordinates through an angle ) (6.45)
original y-axis
original x-axis
rotated y-axis
This is understood as a rotation about the z -axis. The angle of rotation is chosen such that the rotated x -axis lies in the plane of incidence for the mirror. When such a reorientation of coordinates is necessary, the two orthogonal eld components in the initial coordinate system are stirred together to form the eld components in the new system. This does not change the intrinsic characteristics of the polarization, just its representation.
Figure 6.10 If the plane of incidence does not coincide for successive elements in an optical system, a rotation matrix must be applied to rotate the x -axis to the plane of incidence before computing the effect of each element.
158
do not try to extract the helicity of the light, but only the ellipticity. In this case only polarizers are needed, which can be made to work over a wide range of wavelengths. If, in addition, a variety of incident angles are measured, it is possible to extract detailed information about the optical constants n and and the thicknesses of possibly many layers of materials inuencing the reection. Commercial ellipsometers4 typically employ two polarizers, one before and one after the sample, where s and p -polarized reections take place. The rst polarizer ensures that linearly polarized light arrives at the test surface (polarized at angle to give both s and p -components). The Jones matrix for the test surface reection is given by (6.43), and the Jones matrix for the analyzing polarizer oriented at angle is given by (6.32). The Jones vector for the light arriving at the detector is then cos2 sin cos sin cos sin2 r p 0 = 0 rs cos sin (6.46)
r p cos cos2 + r s sin sin cos r p cos sin cos + r s sin sin2
2
and the intensity arriving to the detector is I r p cos cos2 + r s sin cos sin + r p cos cos sin + r s sin sin2 = r p cos cos + |r s | sin sin
2 2 2 2 2 2 rp rs + rs rp 2
(6.47) For ellipsometry measurements, it is customary to express the ratio of Fresnel coefcients as r p r s tan e i (6.48) In this case, the intensity may be shown to be proportional to (see problem P6.13) I 1 sin 2 + cos 2 where 2 (6.49)
sin 2 sin 2
tan cos tan tan2 tan2 and (6.50) tan2 + tan2 tan2 + tan2 In commercial ellipsometers, the angle of the analyzing polarizer often rotates at a high speed, and the time dependence of the light reaching a detector is analyzed. From this type of measurement, the coefcients and can be extracted with high precision. Then equations (6.50) can be inverted (see problem P6.13) to reveal 1+ |tan | and cos = tan = sign() (6.51) 1 1 2 From a series of these types of measurements, it is possible to extract the values of n and for materials from the expressions for r s and r p (with the aid of a computer!). A more extensive series of such measurements are needed in the case of multilayers involving multiple layers with varying thicknesses.
4 See Spectroscopic Ellipsometry Tutorial at J. A. Woollam Co.
159
The degree of polarization takes on values between zero and one. Thus, if the light is completely unpolarized (such that I pol = 0), the degree of polarization is zero, and if the beam is fully polarized (such that I un = 0), the degree of polarization is one. A Stokes vector, which characterizes a partially polarized beam, is written as S0 S 1 S2 S3 The parameter S0 I I in (6.54)
160
is a comparison of the beams intensity (or power) to a benchmark or input intensity, I in , measured before the beam enters the optical system under consideration. I represents the intensity at the point of investigation, where one wishes to characterize the beam. Thus, the value S 0 = 1 represents the input intensity, and S 0 can drop to values less than one, to account for attenuation of light by polarizers in the system. (S 0 could increase in the atypical case of amplication.) The next parameter, S 1 , describes how much the light looks either horizontally or vertically polarized, and it is dened as S1 2 I hor S0 I in (6.55)
Here, I hor represents the amount of light detected if an ideal linear polarizer is placed with its axis aligned horizontally directly in front of the detector (inserted where the light is characterized). S 1 ranges between negative one and one, taking on its extremes when the light is linearly polarized either horizontally or vertically, respectively. If the light has been attenuated, it may still be perfectly horizontally polarized even if S 1 has a magnitude less than one. (Alternatively, you might wish to examine S 1 /S 0 , which is guaranteed to range between negative one and one.) The parameter S 2 describes how much the light looks linearly polarized along the diagonals. It is given by 2 I 45 S2 S0 (6.56) I in Similar to the previous case, I 45 represents the amount of light detected if an ideal linear polarizer is placed with its axis at 45 directly in front of the detector (inserted where the light is characterized). As before, S 2 ranges between negative one and one, taking on extremes when the light is linearly polarized either at 45 or 135 . Finally, S 3 characterizes the extent to which the beam is either right or left circularly polarized: 2 I r-cir S3 S0 (6.57) I in Here, I r-cir represents the amount of light detected if an ideal right-circular polarizer is placed directly in front of the detector. A right-circular polarizer is one that passes right-handed polarized light, but blocks left handed polarized light. One way to construct such a polarizer is a quarter wave plate, followed by a linear polarizer with the transmission axis aligned 45 from the wave-plate fast axis, followed by another quarter wave plate at 45 from the polarizer (see P6.14).6 Again, this parameter ranges between negative one and one, taking on the extremes for right and left circular polarization, respectively. Importantly, if any of the parameters S 1 , S 2 , or S 3 take on their extreme values (i.e. a magnitude equal to S 0 ), the other two parameters necessarily equal zero. As an example, if a beam is linearly horizontally polarized with I = I in , then we have
6 The nal quarter wave plate is to put the light back into the original circular state not needed to measure the Stokes parameter.
161
I hor = I in , I 45 = I in /2, and I r-cir = I in /2. This yields S 0 = 1, S 1 = 1, S 2 = 0, and S 3 = 0. As a second example, suppose that the light has been attenuated to I = I in /3 but is purely left circularly polarized. Then we have I hor = I in /6, I 45 = I in /6, and I r-cir = 0. Whereas the Stokes parameters are S 0 = 1/3, S 1 = 0, S 2 = 0, and S 3 = 1/3. Another interesting case is completely unpolarized light, which transmits 50% through all of the polarizers discussed above. In this case, I hor = I 45 = I r-cir = I /2 and S 1 = S 2 = S 3 = 0. Example 6.4
Find the Stokes parameters for perfectly polarized light, represented by an arbitrary A where A and B are complex.7 Depending on the values A and B , Jones vector B the polarization can follow any ellipse. Solution: The input intensity of this polarized beam is I in = I pol = | A |2 + |B |2 , ac2 2 | |2 cording to Eq. (6.23), where we absorb the factor 1 2 0 c E eff into | A | and |B | for convenience. The Jones vector for the light that passes through a horizontal polarizer is 1 0 A A = 0 0 B 0 which gives a measured intensity of I hor = | A |2 . Similarly, the Jones vector when the beam is passed through a polarizer oriented at 45 is 1 2 leading to an intensity of I 45 = | A + B |2 | A |2 + |B |2 + A B + AB = 2 2 1 1 1 1 A B = A +B 2 1 1
Finally, the Jones vector for light passing through a right-circular polarizer (see P6.14) is 1 A +iB 1 i A 1 = B i 2 i 1 2 giving an intensity of I r-cir = | A + i B |2 | A |2 + |B |2 + i ( A B AB ) = 2 2
S1 =
2 | A |2 | A | 2 + | B |2 | A |2 | B |2 = I in I in I in
A B instead of | A| |B | e i , where is the phase
162
S2 = S3 =
| A |2 + |B |2 + A B + AB | A |2 + |B |2 A B + AB = I in I in I in
| A |2 + |B |2 + i ( A B AB ) | A |2 + |B |2 ( A B AB ) =i I in I in I in
It is clear from the linear dependence of S 0 , S 1 , S 2 , and S 3 on intensity (see Eqs. (6.54)(6.57)) that the overall Stokes vector may be regarded as the sum of the individual Stokes vectors for polarized and unpolarized light. That is, we may (pol) write S j = S j + S (un) , j = 0, 1, 2, 3. j This is certainly true for S0 = I pol + I un I = I in I in (6.58)
and in the other cases the unpolarized portion of the light does not contribute to the Stokes parameters. Half of the unpolarized light survives any of the test lters, which cancels neatly with the unpolarized portion of S 0 in Eqs. (6.55)(6.57). With the aid of the results in Example 6.4, a completely general form of the Stokes vector may then be written as I pol + I un S0 S 1 | A |2 | B |2 1 (6.59) = S 2 I in A B + AB i ( A B AB ) S3 where the Jones vector for the polarized portion of the light is A B and the intensity of the polarized portion of the light is I pol = | A |2 + |B |2 (6.60)
| |2 Again, we have hidden the factor 1 2 0 c E eff for the polarized portion of the light 2 2 inside | A | and |B | . We would like to express the degree of polarization in terms of the Stokes parameters. We rst note that the quantity | A |2 | B | 2 I in | A |2 + | B |2 I in I pol I in
2 2 2 2 S1 + S2 + S3 can be expressed as 2
2 2 2 S1 + S2 + S3 =
( A B + AB ) I in
i ( A B AB ) I in
= =
(6.61)
163
Substituting (6.58) and (6.61) into the expression for the degree of polarization (6.53) yields 1 2 2 2 pol S1 + S2 + S3 (6.62) S0 If the light is polarized such that it perfectly transmits through or is perfectly extinguished by one of the three test polarizers associated with S 1 , S 2 , or S 3 , then the degree of polarization will be unity. Obviously, it is possible to have pure polarization states that are not aligned with the axes of any one of these test polarizers. In this situation, the degree of polarization is still one, although the values S 1 , S 2 , and S 3 may all three contribute to (6.62). Finally, it is possible to represent polarizing devices as matrices that operate on the Stokes vectors in much the same way that Jones matrices operate on Jones vectors. Since Stokes vectors are four-dimensional, the matrices used are four-by-four. These are known as Mueller matrices.8
(see table 6.1). As usual, let give the angle of the transmission axis relative to the horizontal. The Jones matrix (6.23) acts on the polarized portion of the light as follows A2 B2 = cos2 cos sin cos sin sin2
A1 B1
A B
A2 B2
= [ A cos + B sin ]
cos sin
and
vectors. He was an engaging lecturer into the 1950s and was known for his exciting demonstrations. He was a student of Arnold Sommerfeld, and did seminal work on ferroelectricity (he is reported to have coined the term). See Laszlo Tisza,Adventures of a Theoretical Physicist, Part II: America, Phys. Perspect.
the two beams are not coherent. As mentioned previously, unpolarized light necessarily contains multiple frequencies, and so the elds from the polarized and unpolarized beams destructively interfere as often as they constructively interfere. In this case, we simply add intensities rather than elds. That is, we have I un 2 2 2 A = A1 + A2 = + | A cos + B sin |2 cos2 2 I un = + | A |2 cos2 + |B |2 sin2 + A B + AB sin cos cos2 2 S 0 cos 2 sin 2 = I in + S1 + S 2 cos2 2 2 2 Similarly, B
2
= B1 + B2
= I in
164
Since the light has gone through a linear polarizer, we are guaranteed that A and B have the same phase. Therefore, A B = A B = | A ||B |. In view of (6.59), these results lead to S0 = S1 = = A A
2
+ B I in B I in
=
2
cos2 2 sin 4 cos 2 S0 + S1 + S2 2 2 4 A B + A B sin 2 S 0 cos 2 S2 = + S1 + S 2 cos sin =2 I in 2 2 2 sin 2 sin 4 sin2 2 S0 + S1 + S2 2 4 2 A B A B S3 = i =0 I in = These transformations expressed in matrix format become 1 S0 S 1 cos 2 1 = S 2 sin 2 2 S3 0 cos 2 cos2 2 1 2 sin 4 0
1 2
0 S0 S1 0 0 S2 S3 0
The Mueller matrix for a half wave plate is worked out below. The Mueller matrix for a quarter wave plate is deferred to problem P6.15
As usual, is the angle of the fast axis relative to the horizontal. (As expected, 2 2 A + B = | A |2 +|B |2 ; the intensity of the light is unaltered.) Using (6.59) we get S1 = = A
2
B I in
| A | |B |2 cos 4 + ( A B + AB ) sin 4
165
S1 = = S2 =
B I in
| A | |B |2 cos 4 + ( A B + AB ) sin 4
2
| A |2 |B |2 AB + A B sin 4 cos 4 = S 1 sin 4 S 2 cos 4 Ii n Ii n A B A B S3 = i Ii n ( A cos 2 + B sin 2 ) ( A sin 2 B cos ) =i Ii n ( A cos 2 + B sin 2 ) ( A sin 2 B cos ) i Ii n A B AB = S 3 = i Ii n These transformations expressed in matrix format become S0 1 S 0 1 = S 0 2 0 S3 0 cos 4 sin 4 0 0 sin 4 cos 4 0 0 S0 0 S1 0 S2 1 S3
166
Exercises
Exercises for 6.2 Jones Vectors for Representing Polarization P6.1 + B e i y Ax + B e i y Show that A x with (6.5).
= 1, as dened in connection
P6.2
Prove that if 0 < < , the helicity is left-handed, and if < < 2 the helicity is right-handed. HINT: Write the relevant real eld associated with (6.5) A cos kz t + + y B cos kz t + + E (z , t ) = |E eff | x where is the phase of E eff . Freeze time at, say, t = /. Determine the eld at z = 0 and at z = /4 (a quarter cycle), say. If E (0, t ) E (/4, t ) points in the direction of k, then the helicity matches that of a wood screw.
L6.3
Determine how much right-handed circularly polarized light (vac = 633 nm) is delayed (or advanced) with respect to left-handed circularly polarized light as it goes through approximately 3 cm of Karo syrup (the neck of the bottle). This phenomenon is called optical activity. Because of a denite-handedness to the molecules in the syrup, rightand left-handed polarized light experience slightly different refractive indices. (video)
Karo Light Corn Syrup Polarizer Polarizer Screen
Laser
HINT: Linearly polarized light contains equal amounts of right and left circularly polarized light. Consider 1 2 1 i + ei 2 1 i
where is the phase delay of the right circular polarization. Show that this can be written as cos /2 ei sin /2
Exercises
167
The overall phase is unimportant. Compare this with cos sin where is the angle of linearly polarized light (see table 6.1). Exercises for 6.3 Elliptically Polarized Light P6.4 For the following cases, what is the orientation of the major axis, and what is the ellipticity of the light? Case I: A = B = 1/ 2; = 0 Case II: A = B = 1/ 2; = /2; Case III: A = B = 1/ 2; = /4.
Exercises for 6.4 Linear Polarizers and Jones Matrices P6.5 (a) Suppose that linearly polarized light is oriented at an angle with respect to the horizontal axis (x -axis) (see table 6.1). What fraction of the original intensity gets through a vertically oriented polarizer? (b) If the original light is right-circularly polarized, what fraction of the original intensity gets through the same polarizer?
Exercises for 6.5 Jones Matrix for Polarizers at Arbitrary Angles P6.6 Horizontally polarized light ( = 0) is sent through two polarizers, the rst oriented at 1 = 45 and the second at 2 = 90 . What fraction of the original intensity emerges? What is the fraction if the ordering of the polarizers is reversed? (a) Suppose that linearly polarized light is oriented at an angle with respect to the horizontal or x -axis. What fraction of the original intensity emerges from a polarizer oriented with its transmission at angle from the x -axis?
Answer: cos2 ( ); compare with P6.5.
P6.7
(b) If the original light is right circularly polarized, what fraction of the original intensity emerges from the same polarizer? P6.8 Derive (6.12), (6.13), and (6.14). HINT: Analyze the Jones vector just as you would analyze light in the laboratory. Put a polarizer in the beam and observe the intensity of the light as a function of polarizer angle. Compute the intensity via (6.23). Then nd the polarizer angle (call it ) that gives a maximum (or a minimum) of intensity. The angle then corresponds to an axis of the ellipse inscribing the E-eld as it spirals. When taking the arctangent, remember that it is dened only over a range of . You can add for another valid result (which corresponds to the second ellipse axis).
168
Exercises for 6.6 Jones Matrices for Wave Plates L6.9 Create a source of unknown elliptical polarization by reecting a linearly polarized laser beam (with both s and p -components) from a metal mirror with a large incident angle (i.e. i 80 ). Use a quarterwave plate and a polarizer to determine the Jones vector of the reected beam. Find the ellipticity, the helicity (right or left handed), and the orientation of the major axis. (video)
Silver Mirror ~80 angle of incidence Polarizer
o
Laser
Polarizer o set at 45
HINT: A polarizer alone can reveal the direction of the major and minor axes and the ellipticity, but it does not reveal the helicity. Use a quarterwave plate (oriented at a special angle ) to convert the unknown elliptically polarized light into linearly polarized light. A subsequent polarizer can then extinguish the light, from which you can determine the Jones vector of the light coming through the wave plate. This must equal the original (unknown) Jones vector (6.11) operated on by the wave plate (6.37). As you solve the matrix equation, it is helpful to note that the inverse of (6.37) is its own complex conjugate. P6.10 What is the minimum thickness (called zero-order thickness) of a quartz plate made to operate as a quarter-wave plate for vac = 500 nm? The indices of refraction are n fast = 1.54424 and n slow = 1.55335.
Exercises for 6.7 Polarization Effects of Reection and Transmission P6.11 Light is linearly polarized at = 45 with a Jones vector according to table 6.1. The light is reected from a vertical silver mirror with angle of incidence i = 80 , as described in (P3.15). Find the Jones vector representation for the polarization of the reected light. NOTE: The answer may be somewhat different than the result measured in L 6.9. For one thing, we have not considered that a silver mirror inevitably has a thin oxide layer. P6.12 Calculate the angle to cut the glass in a Fresnel rhomb such that after the two internal reections there is a phase difference of /2 between
80 s p
Exercises
169
the two polarization states. The rhomb then acts as a quarter wave plate. HINT: You need to nd the phase difference between (3.40) and (3.41). Set the difference equal to /4 for each bounce. The equation you get does not have a clean analytic solution, but you can plot it to nd a numerical solution.
Answer: There are two angles that work: = 50 and = 53 .
Fresnel Rhomb
Side View
Exercises for 6.A Ellipsometry P6.13 Derive (6.49) and (6.51), often used for ellipsometry measurements. HINT: Using sin2 =
1cos 2 2 1+cos 2 , rst show 2
and cos2 =
I 1
rp rs + rs rp
|r s |2 tan
rp
|r s |2 + tan2
sin 2 +
rp rp
2 2
|r s |2 tan2 |r s |2 + tan2
cos 2
Exercises for 6.B Partially Polarized Light P6.14 (a) One way to construct a right-circular polarizer is using a quarter wave plate with fast axis at 45 , followed by a linear polarizer oriented vertically, and nally a quarter wave plate with fast axis at 45 . Calculate the Jones matrix for this system. Answer:
1 2
1 i
i 1
(b) Check that the device leaves right-circularly polarized light unaltered while killing left-circularly polarized light. P6.15 Derive the Mueller matrix for a quarter wave plate.
Answer: 1 0 0 0 0 cos2 2 1 2 sin 4 sin 2 0
1 2 sin 4 sin2 2
cos 2
0 sin 2 cos 2 0
Chapter 7
172
Sir Isaac Newton (16431727, English) was born in Lincolnshire, England three months after the death of his father who was a farmer. Newton spent much of his childhood with his maternal grandmother, after his mother remarried. (Newton did not like his stepfather.) In his teenage years, Newton's mother tried to persuade him to take up farming, but his love for education won out. He became the top-ranked student and was admitted into Trinity College, Cambridge at age 18. Newton was inuenced by the works of Descartes, Copernicus, Galileo, and Kepler. Upon graduation four years later, the university closed for two years because of a plague. Newton's return to farm life coincided with a remarkable period when he rst developed ideas on calculus, gravitation, and optics. Newton later returned to Cambridge where he spent his extraordinarily prolic career and became the rst scientist to be knighted. In optics, Newton advanced the ray theory of light and image formation. He showed that `white' light is comprised of many colors and that the amount of refraction depends on color. He built the rst reecting telescope, which avoids chromatic aberration. Newton advocated against the wave theory of light in favor of his `corpuscular' theory. (Imagining that by this Newton foresaw the quantized nature of light energy gives too much credit!) (Wikipedia)
ment of the center of the wave packet. For narrowband packets (i.e. packets comprised of a narrow range of frequencies and hence long duration), the packet tends to maintain its shape (with some spreading) while propagating at the group velocity. On the other hand, broadband pulses (i.e. packets comprised of many frequencies and possibly of short duration) tend to distort severely while propagating in materials. Nevertheless, the group velocity tracks the center of the pulse. It turns out that group velocity can become superluminal when signicant absorption and/or amplication of the light pulse is involved. This is no cause for alarm (nor is it cause for an abundance of gee-wiz papers on the subject). Absorption and amplication can cause a pulse to appear to move unexpectedly fast through a reshaping effect. Group velocity, or rather its inverse group delay, takes this into account, which makes it remarkably general. In such a scenario, energy can be lost from the back of a pulse or perhaps added to an already-present forward portion of a pulse such that the average pulse position appears to advance abruptly. When all energy is accounted for (both the energy in the medium and in the light pulse), however, nothing advances faster than the universal speed limit c . Appendix 7.B gives a good look under the hood at how a medium exchanges energy with a pulse to produce these eye-catching effects.
E j e i (k j r j t )
(7.1)
B j e i (k j r j t ) =
j
kj Ej j
e i (k j r j t )
(7.2)
As usual, the (time- and space-independent) individual eld components E j contain both amplitude and phase information for each plane wave. The Poynting vector (2.52) associated with the elds (7.1) and (7.2) is S(r, t ) = Re{E (r, t )} = Re {B (r, t )} 0
1 Re E j e i (k j r j t ) km Re Em e i (km rm t ) j ,m m 0
(7.3)
where we have assumed that the km vectors are real. (Recall the conspiracy that only the real parts of the elds are relevant crucial when multiplying.) The above expression is cumbersome because of the many cross terms that arise when
173
the two summations are multiplied. We need some simplifying assumptions before we can make any real progress on this expression. For example, we can time-average the rapid uctuations in the expression that vary on the scale of optical frequencies. Additionally, it is common to encounter the situation where all plane-wave components travel roughly parallel to each other, which will be a big help in simplifying (7.3). Intensity for Quasi Parallel-Traveling Light
For simplicity, we assume that all vectors k j are real. If the wave vectors are complex, the result is essentially the same, but, as in (2.62), the eld amplitudes E j correspond to local amplitudes (adjusted for absorption or amplication during prior propagation). We apply the BAC-CAB rule (P0.3) to (7.3) and obtain S(r, t ) = 1 km Re E j e i j ,m m 0 Re Em e
i (km rm t ) k j r j t
Re Em e i (km rm t ) (7.4) km
Re E j e
i k j r j t
The last term in (7.4) can be dismissed if all k-vectors are approximately parallel to each other, in which case all of the km are essentially perpendicular to each of the E j . We will make this rather stringent assumption and kill the last line in (7.4). The magnitude of the Poynting vector then becomes (with the help of (0.30)) i k j r j t i k j r j t E e + E e j km j S (r , t ) = 2 m 0 j ,m = km E j Em e i j ,m 4m 0
i + E j E me i (km rm t ) Em e i (km rm t ) + E me
2
i + E j Em e k j +km r j +m t
(parallel k-vectors)
k j +km r j +m t
k j km r j m t
i + E j Em e
k j km r j m t
(7.5) The terms involving ( j + m )t oscillate rapidly and time-average to zero. By comparison, the terms involving ( j m )t oscillate slowly (especially when the j are all in the neighborhood of the m ) or not at all when j = m . We retain the slower uctuations and discard the rapid oscillations. For purposes of computing the intensity (as opposed to determining phase changes with propagation) we can approximate the index as a constant, and write k m /(m 0 ) n 0 c . (We seldom measure intensity inside of materials anyway.) With these simplications, (7.5) becomes S (r, t )osc = = = n 0c 2
i E j E me j ,m k j km r j m t
+ E Em e i j 2
k j km r(n m )t
n 0c Re 2
Ej ei
j
k j r j t
i (km rm t ) E me
n 0c Re E (r, t ) E (r, t ) . 2
174
The nal expression in (7.6) is already manifestly real so there is no need to apply the operation Re {}. The time-averaged intensity for light composed of parallel wave vectors is then well-approximated by (valid for parallel or antiparallel k-vectors and constant n ) I (r, t ) = n 0c E(r, t ) E (r, t ) 2 (7.7)
In a surprising turn of events, it is important that E(r, t ) in (7.7) be written as the entire complex expression for the electric eld rather than just the real part. Then (7.7) automatically time-averages over rapid oscillations in such a way that I (r, t ) retains a slowly varying time dependence. This expression is reminiscent of (2.62), but it should be kept in mind that we previously considered only a single plane wave (perhaps with two distinct polarization components). If some of the k-vectors point in an anti-parallel direction, we can still use (7.7) with negative signs entered explicitly for those components. This brings up a distinction between irradiance S and intensity I . For example, S is zero for standing waves because there is no net ow of energy, whereas (7.7) still gives a result. Intensity species whether atoms locally experience an oscillating electric eld without regard for whether there is a net ow of energy carried by a light eld. At extreme intensities, however, when the inuence of the magnetic eld becomes comparable to that of the electric eld, the distinction between propagating and standing elds becomes important to the behavior of charged particles in that eld. The assumption that all vectors k j are parallel is not as serious as it might seem at rst. For example, the output of a Michelson interferometer (studied in chapter 8) is the superposition of two elds, each composed of a range of frequencies with parallel k j s. We can relax the restriction of parallel k j s slightly and apply (7.7) also to plane waves with nearly parallel k j s such as occurs in a Youngs two-slit diffraction experiment (studied in chapter 8). In such diffraction problems, (7.7) is viewed as an approximation valid to the extent that the vectors k j are close to parallel. For the remainder of the chapter we will assume that the k-vectors for all frequency components in our waveform are essentially parallel.
and
E2 = E0 e i (k2 r2 t )
(7.8)
As we previously studied (see P1.9), the velocities of the wave crests for these two waves are v p 1 = 1 /k 1 and v p 2 = 2 /k 2 (7.9) These are known as the phase velocities of the individual plane waves.
175
Next consider a composite wave created from the superposition of the above two plane waves: (7.10) E(r, t ) = E0 e i (k1 r1 t ) + E0 e i (k2 r2 t ) The two plane waves interfere, producing regions of higher and lower intensity that move in time. Remarkably, these intensity peaks can propagate at speeds quite different from either of the phase velocities in (7.9). The intensity (7.7) for the eld (7.10) is computed as follows: I (r , t ) = n 0c i (k1 r1 t ) + e i (k2 r2 t ) e i (k1 r1 t ) + e i (k2 r2 t ) E0 E 0 e 2 n 0c i [(k2 k1 )r(2 1 )t ] E0 E + e i [(k2 k1 )r(2 1 )t ] = 0 2+e 2 = n 0 c E0 E 0 [1 + cos [(k2 k1 ) r (2 1 ) t ]] = n 0 c E0 E 0 [1 + cos (k r t )] where k k2 k1 2 1
(7.11)
(7.12)
The darker line in Fig. 7.2 shows the intensity computed with (7.11). Keep in mind that this intensity is averaged over rapid oscillations. For comparison, the lighter line shows the Poynting ux with the rapid oscillations retained, according to (7.5). It is left as an exercise (see P7.3) to show that the rapid-oscillation peaks in Fig. 7.2 move with a phase velocity derived from the average k and average of the two plane waves: ave{} (7.13) vp ave{k } An examination of the cosine argument in (7.11) reveals that the time-averaged curve in Fig. 7.2 (solid) travels with speed vg k (7.14)
Intensity
Position
This is known as the group velocity. Essentially, v g may be thought of as the velocity for the envelope that encloses the rapid oscillations. In general, v g and v p are not the same. This means that as the waveform propagates, the rapid oscillations move within the larger modulation pattern, for example, continually disappearing at the front and reappearing at the back of each modulation. The group velocity is identied with the propagation of overall waveforms. The presence of eld energy in a waveform is clearly tied more to v g than to v p . Example 7.1
Determine the phase velocity and group velocity for the superposition of two plane waves in a plasma (see P2.7).
Figure 7.2 Intensity of two interfering plane waves. The solid line shows intensity averaged over rapid oscillations.
176
The phase velocity for each frequency is computed as vp = 1 + 1 c = n plasma (1 )1 /c + n plasma (2 )2 /c n plasma () (7.16)
For convenience, we have taken 1 and 2 to lie very close to each other. Since n plasma < 1, both of these velocities exceed c . However, the group velocity is vg = dk d = = k dk d
1
d n plasma () d c
= n plasma () c
(7.17)
which is clearly less than c . The derivation of the nal expression in (7.17) from the previous one is left as an exercise.
Example 7.1 illustrates that in an environment where the index of refraction is real (i.e. no net exchange of energy with the medium), the group velocity does not exceed c , even when the phase velocity does. The fast-moving phase velocity v p results merely from an interplay between the eld and the plasma. In a similar sense, the intersection of an ocean wave with the shoreline can also exceed c , if different points on the wave front happen to strike the shore nearly simultaneously. The point of intersection between the wave and the shoreline does not constitute an actual object under motion. Similarly, wave crests of individual plane waves do not necessarily constitute actual objects that are moving. In short, v p is not the relevant speed at which events up stream inuence events down stream. Individual plane waves have innite length and innite duration. They do not exist in isolation except in our imagination. All real waveforms are comprised of a range of frequency components, and so interference always happens. Energy is associated with regions of constructive interference between those waves. Group velocity v g tracks the presence of eld energy, whether that energy propagates or is extracted from a medium. Although sometimes v g can exceed c (i.e. when absorption or amplication is involved), energy is never transported faster than the universal speed limit c . An examination of energy ow is given in Appendix 7.B.
E (r, ) e i t d
(7.18)
177
The function E (r, ), called the spectrum, has units of eld per frequency. Essentially, it gives the amplitude and phase of each plane wave that makes up the overall waveform. It includes any spatially dependent factors such as exp {i k () r}. We distinguish the spectrum E (r, ) from the wholly separate function E(r, t ) by its argument (i.e. instead of t ). (Sorry for using E for both functions, but this is standard notation.) The operation (7.18) is called an inverse Fourier transform as outlined in section 0.4; actually, it would be a good idea to review section 0.4 thoroughly. Now. Why havent you turned to section 0.4 yet? The factor 1/ 2 is introduced to match our Fourier-transform convention. Regardless of what the function is called, please notice that (7.18) merely sums together a range of plane waves in much the same way that our earlier discrete summation (7.1) does. If we already have/know a waveform E(r, t ), one might wonder what plane waves should be added together in order to construct it. Equation (7.18) can be inverted, which remarkably has a very similar form: E (r, ) = 1 2
E (r , t ) e i t d t
(7.19)
This operation is called the Fourier transform. It is used to generate the spectrum E (r, ) from the eld E(r, t ) in much the same way that (7.18) is used to generate the eld E(r, t ) from the spectrum E (r, ). Although only the real part of E(r, t ) is physically relevant, we can continue our habit of working with the complex eld and taking the real part of E (r, t ) at our leisure.1 In fact, we will nd it advantageous to work with the complex eld instead of only the real part. We will not run into trouble as long as we remember never to discard the imaginary part of E (r, ), only the imaginary part of E (r, t ). The intensity formula (7.7) remains useful for continuous superpositions of plane waves (i.e. a eld dened by the inverse Fourier transform (7.18)): I (r , t ) n 0c E(r, t ) E (r, t ) 2 (7.20)
Remember, this formula specically requires the elds to be in complex format, and it takes care of the time-average over rapid oscillations automatically.2 Moreover, the above expression for I (r, t ) assumes that all relevant k-vectors are essentially parallel. Similarly, we will dene the power spectrum produced from E (r, ), which we write as n 0c I (r, ) E (r, ) E (r, ) (7.21) 2
1 Since Fourier transforms are linear, one can take the Fourier transform of the real and imaginary
parts of a eld separately. Appropriate modications to E (r, ) in the frequency domain will not cause the two parts to become mingled. Upon taking the inverse Fourier transform to obtain E(r, t ) again, the original real part remains purely real, and the original imaginary part remains purely imaginary. 2 To use this expression there needs to be a sufcient number of oscillations within the waveform to make the rapid time average meaningful.
178
The power spectrum I (r, ) is what one observes when the waveform is sent into a spectral analyzer or spectrometer. We must apologize again for the potentially confusing notation (in wide usage): I (r, ) is not the Fourier transform of I (r, t )! They are dened exclusively through (7.20) and (7.21). Parsevals theorem (see Example 0.7) imposes an interesting connection between the time-integral of the intensity and the frequency-integral of the power spectrum:
I (r , t )d t =
I (r, ) d
(7.22)
With the above formalities out of the way, we will illustrate the use of Fourier transforms through some examples.
Example 7.2
Find E (r, ) associated with the eld E (r , t ) = E 0 (r ) e t Figure 7.3 Real part of electric eld (7.23) with T = 4/0 and T = 10/0 , where 2/0 is the period of the carrier frequency.
2
2T 2
e i 0 t
(7.23)
The real part of this eld is shown in Fig. 7.3 for two different durations T . The intensity prole computed by (7.20) is shown in Fig. 7.4 . Solution: The argument r is unimportant to our calculation. It merely species that we are considering the eld at the point r. We compute the Fourier transform as follows: 2 2 1 E (r, ) = E0 (r) e t 2T e i 0 t e i t d t 2 (7.24) E 0 (r ) t 2 /2T 2 +i (0 )t e dt = 2
This integral can be performed with the help of (0.55), and we obtain E (r, ) = T E0 (r) e
T 2 (0 )2 2
(7.25)
Notice that E (r, ) has units of Field multiplied by time, or in other words, eld per frequency.
In general, E (r, ) is a complex function. E (r, ) keeps track of the amplitude and phase of each plane wave needed to compose the waveform E(r, t ). More often than not, E (r, ) exhibits a complicated complex phase structure, depending on the time-shape of E(r, t ). The spectrum of the eld in Example 7.2 is shown in Fig. 7.5. The complex phase turns out to be boringly uniform for this example; if E0 is real, the imaginary part of the spectrum turns out to be zero for all frequencies. The corresponding power spectrum (7.21) is plotted in Fig. 7.6. As expected, the waveform includes
179
frequencies in the neighborhood of 0 . A range of frequencies are needed to construct waveforms that turn on and off. The shorter the duration of the waveform, the more frequency components that are necessary. This trend can be seen for the two pulse durations T plotted.
Example 7.3
Check Parsevals theorem for the eld and spectrum in Example 7.2. Solution: The time integration in (7.22) yields
I (r , t )d t =
n 0c E 0 (r ) E 0 (r ) 2
e t
T2
dt
n 0c = E 0 (r ) E 0 (r ) T 2 where we have used (0.55) to perform the integration. This result has units of energy per area. It is the energy per area absorbed by a detector after the pulse has concluded. The frequency integration in (7.22) yields
I (r, ) d =
n 0c 2 E 0 (r ) E 0 (r ) T 2
e T
(0 )2
n 0c 2 E 0 (r ) E 0 (r ) T 2 T
Figure 7.5 Spectral components (7.25) of the elds in Fig. 7.3 with T = 4/0 and T = 10/0 , where 2/0 is the period of the carrier frequency.
As mentioned previously, the inverse Fourier transform is interpreted as summing together many plane waves to create a waveform.
Example 7.4
Take the inverse Fourier transform of (7.25) to recover the original waveform (7.23).
E (r, ) e i t d
T 2 (0 )2 2
T E 0 (r ) 2 T E 0 (r ) 2
e i t d
T 2 0 2 2
(7.26)
T 2 2 2
+(T 2 0 i t )
Figure 7.6 Power spectrum based on (7.21) for the spectral components shown in Fig. 7.5.
180
This integral can be performed with the help of (0.55), which gives T E 0 (r ) 2
2
E (r , t ) =
(T 2 0 i t ) 4(T 2 /2) e T 2 /2
2T 2
T 2 2 0 2
= E 0 (r ) e t
e i 0 t
Since only the real part of the time prole E(r, t ) is physically relevant, you might be curious about how the Fourier transform of the real part of the eld compares with that of the complex version of the eld that we have been using. Indeed, there are situations where it is more appropriate to use the real version of the eld rather than its complex form. For example, if a waveform includes multiple propagation directions or if a waveform contains only a few cycles, then the motivation/interpretation behind (7.20) and the convenience of the complex format begins to wane.
Example 7.5
Take the Fourier transform of just the real part of waveform (7.23). Solution: The real part of (7.23) is E r (r , t ) = E (r , t ) + E (r , t ) 2 i 0 t i 0 t + E 0 (r ) e t 2 2 T 2 E 0 (r ) e =e 2
2 /2T 2
(7.27)
If E0 (r) is real, then this eld can be written as E0 (r) e t transform (7.19) yields (see P0.24) Figure 7.7 Spectrum based on (7.28) with T = 10/0 . Compare with the lower curve in Fig. 7.5 Er (r, ) = T E 0 (r ) e
T 2 (+0 )2 2
+ E 0 (r ) e
T 2 (0 )2 2
(7.28)
From the above example, you might notice that the transform of the real part of a eld tends to be more cumbersome than the transform of the entire complex eld. For the real eld, both positive and negative frequency components contribute to the overall spectrum.3 Moreover, the Fourier transform of a real function Er (r, t ) obeys the following symmetry relation: Er (r, ) = E r (r, ) (if Er (r, t ) is real) (7.29)
The spectrum in Fig. 7.7 obeys this symmetry relation, whereas the Fourier transform of the complex eld depicted in Fig. 7.5 does not.
3 Essentially, the spectrum of the complex representation of the eld can be understood to be
twice the spectrum of the real representation, but plotted only for the positive frequencies.
181
E(r0 + r, )e i t d
E(r0 , )e i (k()rt ) d
(7.31)
Example 7.6 If a waveform at r0 = 0 has the form E (0, t ) = E0 e t /2T e i 0 t , if propagation occurs in vacuum in the compute the waveform at r = z z z -direction.
2 2
in vacuum, the waveform will Solution: Of course, after traversing r = z z look the same, only arriving a time z /c later. Well demonstrate that the tools described above yield this expected result. The Fourier transform of the Gaussian pulse is given in (7.25): E (0, ) = T E0 e
T 2 (0 )2 2
4 See J. D. Jackson, Classical Electrodynamics, 3rd ed., Sect. 7.8 (New York: John Wiley, 1999).
182
= To nd the eld downstream we invoke (7.30), assuming k () = k vac () z c z, which gives the appropriate phase shift for each plane wave component: 2 2 E (z , ) = E (0, ) e i k()r = T E0 e
T
(0 )
2
ei c z
E0 Te
T 2 (0 )2 2
e i c z e i t d = E0 e
c) (t z /2 2T
e i 0 (t z /c )
(7.32) Not surprisingly, after traveling a distance z though vacuum, the pulse looks identical to the original pulse, only delayed by time z /c . A waveform propagating in a material such as glass can undergo signicant temporal dispersion, as different frequency components experience different indices of refraction. Each frequency component propagates at its own phase velocity. The speed of the pulse, however, can be quite different; it propagates approximately with the group velocity, as will be shown below. The exponent in (7.30) is called the phase delay for the pulse propagation. It is often expanded in a Taylor series about the pulse carrier frequency 0 : k k r = k|0 + ( 0 ) + 1 2 k 2 2 ( 0 )2 + r (7.33)
The k-vector has a sometimes-complicated frequency dependence through the functional form of n (). If we retain only the rst two terms in this expansion then (7.31) becomes E ( r 0 + r , t ) = 1 2
E(r0 , )e
k 0
k(0 )+
k (0 ) 0
rt
d
k r 0
= e
i k(0 )0
1 2
E (r0 , ) e
i t
= e i [k(0 )r0 t ]
1 2
E (r0 , ) e i (t t ) d
(7.34)
where in the last line we have introduced the denition t k Re {k} r = 0 r (7.35)
and assumed that the imaginary part of k is roughly constant near 0 so that t is real. Then the integral in (7.34) is recognized as the Fourier transform of the original pulse with a new time argument: E (r0 + r, t ) = E r0 , t t e i (k(0 )r0 t ) (7.36)
183
Notice that (7.32) from Example 7.6 agrees with this result, since kvac (0 ) r = 0 z /c . The second factor in (7.36) merely gives an overall phase shift due to propagation. The phase shift is dictated by the phase velocity of the carrier frequency (see (7.9)): 0 v p (0 ) = (7.37) k (0 ) Otherwise (7.36) is unaltered except for a delay t , the time required for the pulse to traverse the displacement r. The function Rek r is known as the group delay function, and in (7.35) it is evaluated only at the carrier frequency 0 . Traditional group velocity is obtained by dividing the displacement r by the group delay time t to obtain
1 vg (0 ) =
Re{k ()}
(7.38)
0
Group delay (or group velocity) essentially tracks the center of the packet. In our derivation we have assumed that the phase delay k() r could be wellrepresented by the rst two terms of the expansion (7.33). While this assumption gives results that are often useful, higher-order terms can also play a role. In section 7.5 well nd that the next term in the expansion controls the rate at which the wave packet spreads as it travels. We should also note that there are times when the expansion (7.33) fails to converge (when 0 is near a resonance of the medium), and the above expansion approach is not valid. Well analyze pulse propagation in these sticky situations in section 7.6.
184
numerically. For our present purposes, we again resort to an expansion of the type (7.33), but this time we will keep one additional term:
1 k () z ( 0 ) z + ( 0 )2 z + = k0 z + v g
(7.39)
where k 0 k (0 ) = 0 n (0 ) c n k (0 ) 0 n (0 ) 1 vg = + 0 c c 1 2 k 2 2
0
n (0 ) 0 n (0 ) + c 2c
As before, we have supposed that the imaginary part of the index is negligible. Unfortunately, we cant calculate a general formula for the affect of quadratic dispersion on an arbitrary initial pulse. However, we can get a general idea for how quadratic dispersion works by considering the specic example of a Gaussian pulse.
Example 7.7
25 fs 56 fs
A Gaussian waveform similar to that in Example 7.6 propagates throught a piece of glass with thickness r = z . Compute the waveform exiting the glass. Solution: Again, the Fourier transform of the Gaussian pulse before propagation is given by (7.25): E (0, ) = T E0 e
T 2 (0 )2 2
With the aid of expansion (7.39), the inverse Fourier transform (7.31) (which yields the pulse after propagation) becomes E (z , t ) = 1 2
E0 Te
T 2 (0 )2 2
e i k0 z +i v g
(0 )z +i (0 )2 z i t
d (7.43)
T E0 e i (k0 z 0 t ) 2
1 z i t ( ( (T 2 /2i z )(0 )2 i v g 0) 0)
We can avoid considerable clutter if we change variables to 0 . Then the inverse Fourier transform becomes E (z , t ) = T E0 e i (k0 z 0 t ) 2
T2 2
(1i 2z /T 2 ) 2 i (t z /v g ) d
(7.44)
185
The above integral can be performed with the aid of (0.55). The result is T E0 e i (k0 z 0 t ) 2
T2 2
i
E (z , t ) =
1 i 2z /T 2
2 z T2
2 t z /v g 2 T 4 2 1i 2z /T 2
) (7.45)
= E0 e i (k0 z 0 t )
e2
4
tan1
2T 2 1+ 2z /T 2
(t z /v g )2 (1+i 2z /T 2 ) ( )2
1 + 2z /T 2
Next, we spruce up the appearance of this rather cumbersome formula as follows: E (z , t ) = where ( z ) and (z ) T T 1 + 2 ( z ) (7.48) E0 (z )/T T e
( t z / v g )2
2 (z ) 2T
(t z /v g )2 (z )+i (k
2 (z ) 2T
1 0 z 0 t )+i 2
tan1 (z )
(7.46)
2 z T2
(7.47)
We can immediately make a few observations about (7.46). First, note that at z = 0 (i.e. zero thickness of glass), (7.46) reduces to the input pulse E (0, t ) = 2 2 E0 e t /2T e i 0 t , as it should. Secondly, the peak of the pulse moves at speed v g 2 2 since the factor e (t z /v g ) /2T (z ) controls the pulse amplitude, while the other terms (multiplied by i ) in the exponent of (7.46) merely alter the phase. Also note that the duration of the pulse increases and its peak intensity decreases as it (z ) increases with z . In P7.8 we will nd that (7.46) also predicts travels, since T that for large z , the eld of the spread-out pulse oscillates less rapidly at the beginning of the pulse than at the end (assuming > 0). This phenomenon, known as pulse chirping, means that red frequencies get ahead of blue frequencies during propagation since the red frequencies experience a lower index of refraction. While Example 7.7 is worked out for the specic case of a Gaussian pulse, the results are qualitatively similar for all pulses. The exact details vary with pulse shape, but all short pulses eventually broaden and chirp as they propagate through a dispersive medium such as glass. Higher-order terms in the expansion (7.33) that were neglected cause additional spreading, chirping, and other deformations to the pulses as they propagates. The inuence of each order becomes progressively more cumbersome to study analytically. In that case, it is easier to perform the inverse Fourier transform numerically; there is no need to resort to the expansion of k () if the integration is done numerically.
Figure 7.9 Animation of a Gaussian-envelope pulse (electric eld) undergoing dispersion during transit.
186
Figure 7.11 Transit time dened as the difference between arrival time at two points.
describe accurately the phase delay k () r. Moreover, if the bandwidth of the waveform is wider than the spectral resonance of the medium, the series altogether fails to converge. These difculties have led to the traditional viewpoint that group velocity loses meaning for broadband waveforms near a resonance. In this section, we study a broader context for group velocity (or rather its inverse, group delay d k /d ), which is always valid, even for broadband pulses where the expansion (7.33) utterly fails. The analysis avoids the expansion and so is not restricted to a narrowband context. We are interested in the arrival time of a waveform (or pulse) to a point, say, where a detector is located. The denition of the arrival time of pulse energy need only involve the Poynting ux (or the intensity), since it alone is responsible for energy transport. To deal with arbitrary broadband pulses, the arrival time should avoid presupposing a specic pulse shape, since the pulse may evolve in complicated ways during propagation. For example, the pulse peak or the midpoint on the rising edge of a pulse are poor indicators of arrival time if the pulse contains multiple peaks or a long and non-uniform rise time. For the reasons given, we use a time expectation integral (or time center-ofmass) to describe the arrival time of a pulse:
t I (r , t ) d t t r
(7.49) I (r, t ) d t
For simplication, we have assumed that the light travels in a uniform direction by using intensity rather than the Poynting vector. Consider a pulse as it travels from point r0 to point r = r0 + r in a homogeneous medium. The difference in arrival times at the two points is t t r t r 0 (7.50)
The pulse shape can evolve in complicated ways between the two points, spreading with different portions being absorbed (or amplied) during transit as depicted in Fig. 7.11. Nevertheless, (7.50) renders an unambiguous time interval between the passage of the pulse center at each point.
187
This difference in arrival time can be shown to consist of two terms (see P7.11):6 t = tG (r) + t R (r0 ) (7.51) The rst term, called the net group delay, dominates if the eld waveform is initially symmetric in time (e.g. an unchirped Gaussian). It amounts to a spectral average of the group delay function taken with respect to the spectral content of the pulse arriving at the nal point r = r0 + r:
Before Propagation
tG (r) =
I (r, )
Rek
r d (7.52)
I (r, ) d
where I (r, ) is given in (7.21). The two curves in Fig. 7.12 show I (r0 , ) (before propagation) and I (r, ) (after propagation) for an initially Gaussian pulse. As seen in (7.52), the pulse travel time depends on the spectral shape of the pulse at the end of propagation. Note the close resemblance between the formulas (7.49) and (7.52). Both are expectation integrals. The former is executed as a center-of-mass integral on time; the latter is executed in the frequency domain on Rek r/, the group delay function (7.38). The group delay at every frequency present in the pulse inuences the result. If the pulse has a narrow bandwidth in the neighborhood of 0 , the integral reduces to Rek/|0 r, in agreement with (7.38) (see P7.9). The net group delay depends only on the spectral content of the pulse, independent of its temporal organization (i.e., the phase of E (r, ) has no inuence). Only the real part of the k-vector plays a direct role in (7.52). The second term in (7.51) is the reshaping delay t R . It represents a delay that arises solely from a reshaping of the spectral amplitude. Often this term is negligible. The term takes into account how the pulse time center-of-mass shifts as portions of the spectrum are removed (or added), as illustrated in Fig. 7.13. It is computed at r0 before propagation takes place:7 t R (r 0 ) = t r 0
altered t r0
After Propagation
Figure 7.12 Normalized power spectrum of a broadband pulse before and after propagation through an absorbing medium with the complex index shown in Fig. 7.10. The absorption line eats a hole in the spectrum.
(7.53)
Here t r0 represents the usual arrival time of the pulse at the initial point r0 , according to (7.49). The intensity at this point is associated with a eld E (r0 , t ) whose spectrum is E (r0 , ). On the other hand, t r0 altered is the arrival time of a pulse with modied spectrum E (r0 , ) e Imkr . Notice that E (r0 , ) e Imkr is still evaluated at the initial point r0 . Only the spectral amplitude (not the phase) is modied, according to what is anticipated to be lost (or gained) during the trip. In contrast to the net group delay, the reshaping delay is sensitive to how a pulse
6 M. Ware, S. A. Glasgow, and J. Peatross, The Role of Group Velocity in Tracking Field Energy in
Figure 7.13 The center of a chirped pulse can shift owing to the reshaping effect when spectrum is removed.
Linear Dielectrics, Opt. Express 9, 506-518 (2001). 7 The reshaping delay can instead be computed after propagation takes place, in which case the net group delay should be computed with the initial rather than nal spectrum.
188
is organized. The reshaping delay is negligible if the pulse is initially symmetric (in amplitude and phase) before propagation. The reshaping delay also goes to zero in the narrowband limit, and the total delay reduces to the net group delay.
Example 7.8
Find the time required for a Gaussian pulse (7.23) to traverse a slab of absorption material (neglecting possible surface reections). Let the material response be described by the Lorentz model described in section 2.2 with the carrier frequency of the pulse 0 , coinciding with the material resonance frequency. Let the slab have thickness r = c 1 /10 and absorption strength 2 p = 10. Solution: The spectrum of the initially Gaussian pulse is given by (7.25), and its power spectrum is8 2 2 I (r0 , ) e T (0 ) After propagating from r0 to r = r0 + r , the power spectrum becomes I (r, ) e T The net group delay is then
2 ) (0 )2 2 ( c r
Figure 7.14 Animation comparing narrowband vs. broadband Gaussian pulses traversing an absorbing slab (green stripe) on resonance. Note the logarithmic scale. See Example 7.8.
t G (r ) = r
I (r, )
(n /c )
I (r, )d
r = c
e T
(0 )2 2 c r
n n + d
c r
e T
2 ( )2 0
e 2
The index of refraction n + i is given by (2.39) (see also (2.27) and (2.29)). Since the expressions for n and are complicated, the integration in the above formula must be performed numerically. The result when T = T1 = 101 / 2 (narrowband) is tG = 5.1/ = 51r /c = 0.72T1 and the result when T = T2 = 1 / 2 (broadband) is tG = 0.67/ = 6.7r /c = 0.95T2 The reshaping delay 7.53 in both cases is negligible.
The narrowband pulse (with duration T1 ) in Example 7.8 traverses the absorbing medium superluminally (i.e. faster than c ). The negative transit time means that the center-of-mass of the exiting pulse emerges even before the center-of-mass of the entering pulse reaches the medium! On the other hand, the broadband pulse (with the shorter duration T2 ) has a large positive delay time, indicating that the exiting pulse emerges subluminally.
8 In general, one should write 0 to distinguish the carrier frequency of the pulse from the
189
Figure 7.14 shows the intensity proles for these two pulses as they traverse the absorption slab, calculated with the aid of (7.31). By eye, one can see how the centers of the two pulses are either advanced or delayed as they go through the absorption medium. In both cases, the pulse that emerges is well within the envelope of the original pulse propagated forward at c . In the case of the broadband pulse, the absorption peak eats a hole in the center of the spectrum as shown in Fig. 7.12, causing the emerging pulse to be distorted in time. The analysis in this section predicts the center of pulses, whereas to see the shape of pulses one needs to calculate (7.31). The results for the two pulse durations in Example 7.8 indicate a trend. Superluminal behavior only occurs for long boring pulses. In the case of a single absorption resonance, this comes with a severe cost of attenuation. Figure 7.15 shows the delay time as a function of pulse duration. As the injected pulse becomes more sharply dened in time, the superluminal behavior does not persist. Sharply dened waveforms (i.e. broadband) cannot propagate superluminally precisely because much of their bandwidth lies away from the frequencies with superluminal group delays. We should mention that superluminal propagation cannot persist for indenite distances since the medium eventually removes the superluminal spectral components through absorption (or else adds subluminal spectral components in the case of amplication). This limits the amount that a pulse center can be advancedon the scale of the pulses own duration. As we saw for the absorption situation the exiting pulse is tiny and resides well within the original envelope of the pulse propagated forward at speed c , as depicted in Fig. 7.16. Without the absorbing material in place, the signal would be detectable just as early. This statement is also true for any spectral behavior of a medium, including amplifying media. you can use the Lorentz model (2.40) to describe an amplifying medium with a negative oscillator strength f . Figure 7.17 shows narrowband and broadband pulse traversing an amplifying medium. In this case, superluminal behavior occurs for spectra near by but not on an amplifying resonance. If the pulse is too broadband, its spectrum will be amplied, which adds slower components to the overall group delay.
Figure 7.17 Animation comparing narrowband vs. broadband Gaussian pulses traversing an amplifying slab (green stripe) slightly off resonance.
190
First Grating
Second Grating
separated. In the present analysis, we will consider an innitely wide plane wave pulse incident upon grating. The scenario is depicted in Fig. 7.19: A short plane wave pulse strikes the grating at an angle, and a spreading pulse emerges. Consider a plane-wave pulse that ricochets between a pair of parallel grating surfaces. Although different k-vectors point with different angles, they are all straightened out upon diffracting from the second grating. For simplicity, we will consider a pulse just before the rst bounce and just after the second bounce, even though we are interested in the dispersion that takes place between the gratings. Therefore, we can consider all k-vectors as being parallel with each other. Consider the a plane wave incident on a grating at an incident angle i with respect to the grating normal (aligned with the x -axis in our coordinate system) as depicted in Fig. 7.18. The plane wave diffracts from the rst grating at an angle r (also referenced from the grating normal). This angle is governed by the grating diffraction formula9 2 c sin i (7.54) r () = sin1 d where d is the grating groove spacing. By examining the geometry of the gure, cos r + y sin r /c . we see that the reected k-vector is given by k = x Suppose we know the pulse at a point r0 on the rst grating. Next we choose a point r0 + r on the second grating where we will determine the outgoing pulse. Since we are considering an innitely wide plane-wave pulse, it doesnt matter where we choose that point as long as it lies on the surface of the second grating. The waveform will be the same everywhere along the surface of the second gratin, only its arrival time will trivially differ. For convenience, we might as well take the as shown in Fig. 7.18. second point to be r0 + r = r0 + L x The phase delay needed for (7.30) becomes k () r = L cos r c (7.55)
Figure 7.18 Direction of k-vector between parallel gratings (top view). Grating rulings run in and out of the page.
We will express this as a Taylor-series expansion similar to (7.39) so that we can perform the inverse Fourier transform analytically. We will approximate (7.55) as
1 k () r k 0 L + v g ( 0 ) L + ( 0 )2 L +
(7.56)
so that we can take advantage of formula (7.46). To calculate the terms in this expansion we will need the derivative of r :
d r = d 1 1
2 c d 2
sin i
2 c = 2 d
1 1 sin r
2
2 c 2 d (7.57)
191
L d r dk r = cos r sin r d c d sin i + sin r L cos r + sin r = c cos r L 1 + sin r sin i = c cos r
(7.58)
and
L sin r (1 + sin r sin i ) d r d 2k r = sin i + 2 d c cos2 r d sin i + sin r L sin i + sin r = c cos2 r cos r = L (sin i + sin r )2 c cos3 r
(7.59)
1 vg
(7.61)
0
1 d 2k 2 d 2
(7.62)
0
In the case of a Gaussian pulse, we can employ (7.46), where L takes the place of 1 z , and k 0 , v g and are dened by (7.60) (7.62). The duration of the pulse is controlled by (7.62) and the spacing between the gratings L .
Figure 7.19 Animation showing a short plane-wave pulse diffracting from a grating positioned along the left edge of the frame.
519-532 (2001).
192
transfer of energy from the medium. The actual transport of energy is strictly bounded by c ; superluminal propagation of a signal front is impossible. In accordance with Poyntings theorem (2.51), the total energy density stored in an electromagnetic eld and in a medium is given by u (r, t ) = u eld (r, t ) + u med (r, t ) + u (r, ) (7.63)
where the time-dependent accumulation of energy transferred into the medium from the eld (ignoring possible free current Jfree ) is
t
u med (r, t ) =
E r, t
P r, t t
dt
(7.64)
The expression (7.63) for the energy density includes all (relevant) forms of energy, including a non-zero integration constant u (r, ) corresponding to energy stored in the medium before the arrival of any pulse (important in the case of an amplifying medium). u eld (r, t ) and u med (r, t ) are both zero before the arrival of the pulse (i.e. at t = ). In addition, u eld (r, t ), given by (2.53), returns to zero after the pulse has passed (i.e. at t = +). As u med increases, the energy in the medium increases. Conversely, as u med decreases, the medium surrenders energy to the electromagnetic eld. While it is possible for u med to become negative, the combination u med + u () (i.e. the net energy in the medium) can never go negative since a material cannot surrender more energy than it has to begin with. Poyntings theorem (2.51) has the form of a continuity equation which when integrated spatially over a small volume V yields S da =
A
u dV
V
(7.65)
where the left-hand side has been transformed into an surface integral (via the divergence theorem (0.11)) representing the power leaving the volume. Let the volume be small enough to take S to be uniform throughout V . We can dene an energy transport velocity (directed along S) as the effective speed at which all of the energy density would need to travel in order to achieve the Poynting ux: S vE (7.66) u Note that this ratio of the Poynting ux to the energy density has units of velocity. When the total energy density u is used in computing (7.66), the energy transport velocity has a ctitious nature; it is not the actual velocity of the total energy (since part is stationary), but rather the effective velocity necessary to achieve the same energy transport that the electromagnetic ux alone delivers. If we reduce the denominator to the subset of the energy that can move, namely u eld , the Cauchy-Schwartz inequality (i.e. 2 + 2 2) ensures an energy transport
193
velocity v E remains strictly bounded by the speed of light in vacuum c . The total energy density u is at least as great as the eld energy density u eld . Hence, this strict luminality is maintained. Centroid of Energy
Consider a weighted average of the energy transport velocity: vE vE u d 3 r u d 3r = S d 3r u d 3r (7.67)
u d 3r
(7.68)
where we have assumed that the volume for the integration encloses all energy in the system and that the eld near the edges of this volume is zero. Since we have included all energy, Poyntings theorem (2.51) can be written with no source terms (i.e. S + u /t = 0). This means that the total energy in the system is conserved and is given by the integral in the denominator of (7.68). This allows the derivative to be brought out in front of the entire expression giving vE = r t where r ru d 3 r u d 3r (7.69)
The latter expression represents the center-of-mass or centroid of the total energy in the system, which is guaranteed to evolve strictly luminally since vE is everywhere luminal.11
It is enlightening to consider u med within a frequency-domain context. In an isotropic medium, the polarization for an individual plane wave can be written in terms of the linear susceptibility dened in (2.16): P (r, ) =
0 (r, ) E (r, )
(7.70)
We can use this to express u med in terms of the electric eld and material susceptibility.
11 Although (7.69) guarantees that the centroid of the total energy moves strictly luminally, there is
no such limitation on the centroid of eld energy alone. The steps leading to (7.69) are not possible if u eld is used in place of u . Explicitly, that is S u eld = t ru eld d 3 r u eld d 3 r
As was pointed out, the left-hand side is strictly luminal. However, the right-hand side can easily exceed c as the medium exchanges energy with the eld. In an amplifying medium, for example, the rapid appearance of a pulse downstream can occur when the leading portion of a pulse stimulates energy already present in the medium to convert to the form of eld energy. Group velocity is related to this method of accounting, which is why it also can become superluminal.
194
P (r, ) e i t d
P(r, t ) i = t 2
P (r, ) e i t d (7.71)
u med (r, ) =
1 2
i 0 E r, e i t d 2
(r, ) E (r, ) e i t d d t
(7.72)
where we have incorporated (7.70) and evaluated u med after the pulse is over at t = . We may change the order of integration and write
u med (r, ) = i
d (r, ) E (r, )
d E r,
(7.73) The nal integral is a delta function a delta function similar to (0.54), which allows the middle integral also to be performed. The expression for u med then reduces to
1 2
e i (+ )t d t
u med (r, ) = i
(7.74)
In this derivation, we take E(r, t ) and P(r, t ) to be real functions, so we can employ the symmetry (7.29) along with P (r, ) = P (r, ) Then we obtain
and
(r, ) = (r, ) .
u med (r, ) =
(7.75)
The expression (7.75) describes the net energy density transfered to a point in the medium after all action has nished (i.e. at t = ). It involves the power spectrum of the pulse. We can modify this formula in an intuitive way so that it describes the transfer of energy density to the medium for any time during the pulse. Since the medium is unable to anticipate the spectrum of the entire pulse before experiencing it, the material responds to the pulse according to the history of the eld up to each instant. In particular, the material has to be prepared for
12 We assume that the real forms of the elds in the time domain are used for the sake of this
multiplication.
195
the possibility of an abrupt cessation of the pulse at any moment, in which case all exchange of energy with the medium immediately ceases. In this extreme scenario, there is no possibility for the medium to recover from previously incorrect attenuation or amplication, so it must have gotten it right already. If the pulse were in fact to abruptly terminate at a given instant, it would not be necessary to integrate the inverse Fourier transform (7.19) beyond the termination time t after which all contributions are zero. Causality requires that the medium be indifferent to whether a pulse actually terminates if that possibility lies in the future. Therefore, (7.75) can apply for any time t (not just for t = ) if the spectrum (7.19) is evaluated just for that portion of the eld previously experienced by the medium (up to time t ). The following is then an exact representation for the energy density (7.64) transferred to the medium:
(7.76)
E t (r, )
1 2
E r, t e i t d t
(7.77)
This time dependence enters only through Et (r, ) E t (r, ), known as the instantaneous power spectrum. The expression (7.76) gives physical insight into the manner in which causal dielectric materials exchange energy with different parts of an electromagnetic pulse. Since the function E t () is the Fourier transform of the pulse truncated at the current time t and set to zero thereafter, it can include many frequency components that are not present in the pulse taken in its entirety. This explains why the medium can respond differently to the front of a pulse compared to the back. Even though absorption or amplication resonances may lie outside of the spectral envelope of a pulse taken in its entirety, the instantaneous spectrum on a portion of the pulse can momentarily lap onto or off of resonances in the medium. In view of (7.76) and (7.77) it is straightforward to predict when the electromagnetic energy of a pulse will exhibit superluminal or subluminal behavior. In section 7.5, we saw that this behavior is controlled by the group velocity function. However, with (7.76) and (7.77), it is not necessary to examine the group velocity directly, but only the imaginary part of the susceptibility (r, ). If the entire pulse passing through point r has a spectrum in the neighborhood of an amplifying resonance, but not on the resonance, superluminal behavior can result. The instantaneous spectrum during the front portion of the pulse is generally wider and can therefore lap onto the nearby gain peak. The medium accordingly amplies this perceived spectrum, and the front of the pulse grows. The energy is then returned to the medium from the latter portion of the pulse as the instantaneous spectrum narrows and withdraws from the gain peak. The
Figure 7.20 Real and imaginary parts of the refractive index for an amplifying medium.
196
Figure 7.21 Animation of a narrowband pulse traversing an amplifying medium off resonance. The black dot shows the movement of the center of all energy. The red line inside the medium shows the energy held in that medium, which cannot go negative. The lower gure shows the instantaneous spectrum of the pulse at the front of the medium relative to the narrow amplifying resonance.
effect is not only consistent with the principle of causality, it is a direct and general consequence of causality as demonstrated by (7.76) and (7.77). As an illustration, consider the broadband waveform with T2 = 1 / 2 described in Example 7.8. Consider an amplifying medium with index shown in Fig. 7.20 with the amplifying resonance (negative oscillator strength) set on the 0 + 2, where 0 is the carrier frequency. Thus, the resonance frequency 0 = structure is centered a modest distance above the carrier frequency, and there is only minor spectral overlap between the pulse and the resonance structure. Superluminal behavior can occur in amplifying materials when the forward edge of a narrow-band pulse receives extra amplication. Fig. 7.21 shows how the early portion of a pulse has a wide instantaneous spectrum computed by (7.77) that can lap onto the amplifying resonance. As the wings grow and access the neighboring resonance, the pulse extracts more energy from the medium. As the wings diminish, the pulse surrenders much of that energy back to the medium, which shifts the center of the pulse forward. In this appendix we have indirectly proven that a sharply dened signal edge cannot propagate faster than c. If a signal edge begins abruptly at time t 0 , the instantaneous spectrum E t () clearly remains identically zero until that time. In other words, no energy may be exchanged with the medium until the eld energy from the pulse arrives. Since, as was pointed out in connection with (7.66), the Cauchy-Schwartz inequality prevents the eld energy from traveling faster than c, at no point in the medium can a signal front exceed c.
(7.78)
They made an argument based on causality (i.e. effect cannot precede cause), which allows one to obtain the real part of () from the imaginary part of (), if it is known for all . Similarly, one can obtain the imaginary part of () from the real part of (). We develop the Kramers-Kronig formulas below.13 We can replace E () in (7.78) with the Fourier transform of E (t ) in accordance with (7.19). In addition, we take the inverse Fourier transform (7.19) of both sides of (7.78) and obtain 1 0 P (t ) = () E t e i t d t e i t d (7.79) 2 2
13 See J. D. Jackson, Classical Electrodynamics, 3rd ed., Sect. 7.10 (New York: John Wiley, 1999). Also B. Y.-K. Hu, Kramers-Kronig in two lines, Am. J. Phys. 57, 821 (1989).
197
E t
() e i (t t ) d d t
(7.80)
Now for the causality argument: The polarization of the medium P (t ) cannot depend on the eld E t at future times t > t . Therefore the expression in square brackets must be identically zero unless t t > 0. This places a restriction on the functional form of () as we shall see. The causality argument comes explicitly into play when we employ the following integral formula:14 e
i (t t
) = sign{t t } 1 i
ei (tt ) d
(7.81)
+1 (t > t ) . 1 (t < t ) Upon substitution of (7.81) into (7.80) and after changing the order of integration within the square brackets we obtain 1 ( ) 0 (7.82) E t d e i (t t ) d d t P (t ) = 2 i
(7.83)
Re + i Im
(7.84)
Finally, equating separately the real and imaginary parts of the above equation yields 1 Re () =
Im
1 and Im () =
Re
(7.85)
14 This integral, which is a specic instance of Cauchys theorem, is tricky because it involves two
diverging pieces, to either side of the singularity = . The divergences have opposite sign so that they cancel. The integration must approach the singularity in the same manner from either side, in which case the result is called the principal value. In practical terms, if the integral is performed numerically, the sampling of points should straddle the singularity symmetrically; other sampling schemes can change the result dramatically, which is incorrect.
198
These are known as the Kramers-Kronig relations on real and imaginary parts of .15 If the real part of is known at all frequencies, we can use the Kramers-Kronig relations to generate the imaginary part, and visa versa. We see that the real and imaginary parts of cannot be chosen independently, if we are to respect the principle of causality.
Example 7.9
Show that the expression in square brackets of (7.80) is zero when t > t , if () satises the Kramers-Kronig relations (7.85). Solution: The expression may be written as
() e i (t t ) d =
Re () e i (t t ) d + i
Im () e i (t t ) d
Re () e i (t t ) d + i
Re
d e i (t t ) d
Re () e i (t t ) d +
1 Re i
e i (t t ) d d (7.86)
where we have invoked the Kramers-Kronig relation for Im () (7.85) and interchanged the order of integration in the nal expression. Since we are specically considering future times t > t , we have by (7.81) 1 i Hence
e i (t t ) d = e i (t t )
() e
i (t t
)d =
Re () e
i (t t
)d
Re e i (t t ) d
=0 (7.87)
Finally, it is worth noting that the Kramers-Kronig relations also apply to the
15 As with (7.81), the principal value of the integral must be calculated. If the integral is performed numerically, the sampling of points should straddle the singularity symmetrically. Separately, the integral on each side of = diverges, but with opposite sign.
199
1 and () =
n 1
(7.88)
One can use the Kramers-Kronig relations to nd the real part of the index from a measurement of absorption, if the measurement is done over a broad enough range of the spectrum. This is the most useful form of the Kramers-Kronig relations. It is sometimes convenient to multiply the numerator and denominator inside the integrands of (7.88) by + . Then noting that n is an even function and is an odd function allows us to dismiss either or in the numerator and integrate17 over positive frequencies only: 2 n () 1 =
2 2
2 and () =
n 1 2 2
(7.89)
16 This follows from Cauchys theorem since the index (subtract one) is the square root of ().
The Kramers-Kronig relations for () guarantee that () has no poles in the upper half complex plane, when is considered (for mathematical purposes) to be a complex variable. Taking the square root does not introduce poles into the upper half plane. 17 The integrals (7.88) and (7.89) diverge to either side of = , but with opposite sign. Again, the principal value of the integral is required, which means a numeric grid should straddle the singularity symetrically.
200
Exercises
Exercises for 7.1 Intensity of Superimposed Plane Waves P7.1 E 1 e i (kz t ) (a) Consider two counter-propagating elds described by x E 2 e i (kz t ) where E 1 and E 2 are both real. Show that their sum and x can be written as E tot (z ) e i ((z )t ) x where E tot (z ) = E 1 and (z ) = tan1 Outside the range 2 kz
2
E2 E1
+4
E2 cos2 kz E1
(b) Suppose that two counter-propagating laser elds have separate intensities, I 1 and I 2 = I 1 /100. The ratio of the elds is then E 2 /E 1 = 1/10. In the standing interference pattern that results, what is the ratio of the peak intensity to the minimum intensity ? Are you surprised how high this is? P7.2 Equation (7.7) implies that there is no interference between elds that are polarized along orthogonal dimensions. That is, the intensity of
)rt ] )rt ] E 0 e i [(k z E 0 e i [(k x E (r , t ) = x +y
according to (7.7) is uniform throughout space. Of course (7.7) does not apply since the k-vectors are not parallel. Show that the time-average of S (r, t ) according to (7.4) exhibits interference in the distribution of net energy ow.
Exercises for 7.2 Group vs. Phase Velocity: Sum of Two Plane Waves P7.3 Show that (7.10) can be written as E (r , t ) = 2 E 0 e
i
k2 +k1 2
2 +1 2
cos
k r t 2 2
From this show that the speed of the rapid-oscillation intensity peaks where /k in Fig. 7.2 is v p = (k 1 + k 2 ) k 2 P7.4 and (1 + 2 ) 2
Exercises
201
Exercises for 7.3 Frequency Spectrum of Light P7.5 The continuous eld of a very narrowband continuous laser may be approximated as a pure plane wave: E(r, t ) = E0 e i (k0 z 0 t ) . Suppose the wave encounters a shutter at the plane z = 0. (a) Compute the power spectrum of the light before the shutter. HINT: The answer is proportional to the square of a delta function centered on 0 (see (0.54)). (b) Compute the power spectrum after the shutter if it is opened during the interval T /2 t T /2. Plot the result. Are you surprised that the shutter appears to create extra frequency components? HINT: Write your answer in terms of the sinc function dened by sinc sin /. P7.6 (a) Determine the Full-Width-at-Half-Maximum of the intensity (i.e. the width of I (r, t ) represented by t FWHM ) and of the power spectrum (i.e. the width of I (r, ) represented by FWHM ) for the Gaussian pulse dened in (7.25). HINT: Both answers are in terms of T . (b) Give an uncertainty principle for the product of t FWHM and FWHM .
Exercises for 7.5 Quadratic Dispersion P7.7 The intensity of a Gaussian laser pulse has a FWHM duration TFWHM = 25 fs with carrier frequency 0 corresponding to vac = 800 nm. The pulse goes through a lens of thickness = 1 cm (laser quality glass type BK7) with index of refraction given approximately by n () = 1.4948 + 0.016 0 What is the full-width-at-half-maximum of the intensity for the emerging pulse? HINT: For the input pulse we have T= (see P7.6). P7.8 If the pulse dened in (7.46) travels through the material for a very long (z ) T (z ) and tan1 (z ) /2, show that distance z such that T the instantaneous frequency of the pulse is 0 + t 2z /v g 4 z TFWHM 2 ln 2
202
COMMENT: As the wave travels, the earlier part of the pulse oscillates more slowly than the later part. This is called chirp, and it means that the red frequencies get ahead of the blue ones since they experience a lower index.
Exercises for 7.6 Generalized Context for Group Delay P7.9 When the spectrum is narrow compared to features in a resonance (such as in Fig. 7.10), the reshaping delay (7.53) tends to zero and can be ignored. Show that when the spectrum is narrow the net group delay (7.52) reduces to Rek lim tG (r) = r T When the spectrum is very broad the reshaping delay (7.53) also tends to zero and can be ignored. Show that when the spectrum is extremely broad, the net group delay reduces to lim tG (r) = r c
P7.10
T 0
assuming k and r are parallel. This implies that a sharply dened signal cannot travel faster than c . HINT: The real index of refraction n goes to unity far from resonance, and the imaginary part goes to zero. P7.11 Work through the derivation of (7.51). HINT: This somewhat lengthy derivation can be found in the reference in the footnote near (7.51).
Exercises for 7.A Pulse Chirping in a Grating Pair P7.12 A Gaussian pulse with T = 20 fs is incident with i = 20 on a grating pair with groove separation d = 1.67 m. What grating separation L will lead to a pulse duration of T = 100 ps? Assume two passes through the grating pair for a total effective separation of 2L . Take the pulse carrier frequency to corresponds to 0 = 800 nm.
Chapter 8
Coherence Theory
Coherence theory is the study of correlations that exist between different parts of a light eld. In temporal coherence theory, we focus on the correlation between the elds at different times, E(r, t ) and E(r, t + ). In spatial coherence theory, we focus on the correlations between elds at different spatial locations, E(r, t ) and E(r + r, t ). Because light oscillations are too fast to resolve directly, we usually need to study optical coherence using interference techniques. In these techniques, light from different times or places in the light eld are brought together at a detection point. If the two elds have a high degree of coherence, they consistently interfere either constructively or destructively at the detection point. If the two elds are not coherent, the interference at the detection point rapidly uctuates between constructive and destructive interference, so that a time-averaged signal does not show interference. You are probably already familiar with two instruments that measure coherence: the Michelson interferometer, which measures temporal coherence, and Youngs two-slit interferometer, which measures spatial coherence. Your preliminary understanding of these instruments was probably gained in terms of single-frequency plane waves, which are perfectly coherent for all separations in time and space. In this chapter, we build on that foundation and derive descriptions that are appropriate when light with imperfect coherence is sent through these instruments. We also discuss a practical application known as Fourier spectroscopy (Section 8.4) which allows us to measure the spectrum of light using a Michelson interferometer rather than a grating spectrometer.
Beam Splitter
Detector
204
dened by d /c . If the input light is a plane-wave, the net eld at the detector consists of the eld coming from one arm of the interferometer E0 e i (kz t ) added to the eld coming from the other arm E0 e i (kz (t )) . These two elds are identical except for the delay . The intensity seen at the detector as a function of path difference is computed to be
c 0 E0 e i (kz t ) + E0 e i (kz (t )) E0 e i (kz t ) + E0 e i (kz (t )) 2 c 0 = 2E 0 E 0 + 2E0 E0 cos() 2 = 2 I 0 [1 + cos()] (Plane Wave Input) (8.1) c 0 where I 0 2 E0 E0 is the intensity from one beam alone (when the other arm of the interferometer is blocked). This formula is probably familiar. It describes how the intensity at the detector oscillates between zero and four times the intensity of the beam from one arm when the other is blocked,1 as plotted in Fig. 8.2. When light containing a continuous band of frequencies is sent through the interferometer, (8.1) no longer holds. Instead of repeating indenitely, the oscillations in the intensity at the detector become less pronounced as increases. The concept of temporal coherence describes how fast fringe visibility diminishes as delay is introduced in an arm of the Michelson interferometer. The less coherent the light source, the faster the fringes die out as is increased. To model this behavior, we need to expand our analysis beyond (8.1). Consider an arbitrary waveform E(t ) (comprised of many frequency components) that has traveled through the rst arm of a Michelson interferometer to arrive at the detector in Fig. 8.1. Again, E(t ) is the value of the eld at the detector when the second arm is blocked. The beam that travels through the second arm of the interferometer is identical, but delayed by the round-trip delay : E (t ). The total eld at the detector is the sum of these two elds:
I tot () =
Figure 8.2 The intensity seen at the detector of a Michelson interferometer with a plane-wave input. Because the plane wave is innitely coherent, the output oscillates forever in both directions. Energy is conserved, so when the intensity at the detector is zero, all of the input light is being sent back on the input arm of the interferometer.
Etot (t , ) = E (t ) + E (t ) The total intensity I tot at the detector is found using (7.21) with n = 1:
(8.2)
c 0 Etot (t , ) E tot ( t , ) 2 c 0 = E ( t ) E ( t ) + E ( t ) E ( t ) + E ( t ) E ( t ) + E ( t ) E ( t ) 2 c 0 = I ( t ) + I ( t ) + E ( t ) E ( t ) + E ( t ) E ( t ) 2 = I (t ) + I (t ) + c 0 Re E(t ) E (t ) (8.3) The function I (t ) corresponds to the intensity of one of the beams arriving at the detector while the opposite path of the interferometer is blocked. I tot (t , ) =
1 Keep in mind that if a 50:50 beam splitter is used, then the intensity arriving to the detector
from one arm alone (with other arm blocked) is one fourth of the original beam, since the light meets the beam splitter twice.
205
For now we treat E (t ) as a pulse with a nite duration and energy to simplify the math. Later we illustrate how to adapt this analysis for continuous light sources. In (8.3) we have retained the t dependence of I tot (t , ) in addition to the dependence on the path delay . This allows for pulses with arbitrary duration and shape. The rapid oscillations of the light are automatically averaged away in I (t ) since we used (7.21), but the slowly varying envelope of the pulse is retained. For a pulsed source, the physical signal from a Michelson interferometer is proportional to the total amount of pulse energy arriving at the detector as a function of .2 This physical signal, which well denote by Sig(), is proportional to the total energy per area, or uence, accumulated at the detector:
Sig()
Itot (t, ) dt
(8.4)
The proportionality constant will depend on the area of the beam, as well as the units with which the detector reports Sig() (volts, etc.). We can manipulate the uence integral in (8.4) into a more useful form that will make the coherence properties more evident.
merchant. Michelson attended high school in San Fransisco. He entered the US Naval Academy in 1869 (with intervention from US President Grant after Michelson pleaded his case on the grounds near the White House). After two years at sea, Michelson returned to the Naval Academy to teach physics and mathematics for several
I tot (t , ) d t =
I ( t )d t +
I (t ) d t + c 0 Re
E ( t ) E ( t ) d t
3
(8.5)
The rst two integrals on the right-hand side of (8.5) are equal, and give the uence E from one arm of the interferometer when the other arm is blocked:
years. Michelson was fascinated by the problem of determining the speed of light, and developed successive experiments to measure it more accurately. He is probably most famous for his experiment conducted at Case School of Applied Science in Cleveland with Edward Morley to detect the motion of the earth through the ether. Michelson later was a professor at the University of Chicago and then at Caltech. In 1907, he became the rst American to win the Nobel prize, for his contributions to optics. Michelson married late in life and was the father of four. (Wikipedia)
I ( t )d t =
I ( t ) d t
(8.6)
The nal integral in (8.5) remains unchanged if we take a Fourier transform followed by an inverse Fourier transform: 1 1 E ( t ) E ( t ) d t = d e i d e i E ( t ) E ( t ) d t 2 2 (8.7) The reason for this procedure is so that we can take advantage of the autocorrelation theorem described in P0.27. With it, the expression in square brackets
2 For sub-nanosecond laser pulses, a detector automatically integrates the entire energy of the pulse since a detector cannot keep up with temporal variations on such a rapid time scale. For longer pulses, it may be necessary to force the integration. 3 Note that the second integral is insensitive to since a change of variables t = t converts it into the rst integral.
206
simplies to 2E () E () = 22 I () /c 0 . Then with the aid of (8.6) and (8.7), the overall uence (8.5) becomes 1 I tot (t , ) d t = 2E 1 + Re I ()e i d (8.8) E
With (8.8), we can rewrite the physical signal (8.4) in the more useful form Sig() 2E 1 + Re () (8.9)
where the dependence on the path delay is entirely contained in the degree of coherence function ():4
()
I () e i d
(8.10) I () d
The denominator of (8.10) was rewritten with the help of Parsevals theorem E I (t )d t = I () d . Remarkably, the signal out of the Michelson interferometer does not depend on the phase of E (). It depends only on the amount 0c of light associated with each frequency through I () 2 E () E (). Alternate derivation of (8.9)
We could have derived (8.9) using another strategy, which may seem more intuitive than the approach above. Equation (8.1) gives the intensity at the detector when a single plane wave of frequency goes through the interferometer. Now suppose that a waveform composed of many frequencies is sent through the interferometer. The intensity associated with each frequency acts independently, obeying (8.1) individually. The total energy (per area) accumulated at the detector is then a linear superposition of the spectral intensities of all frequencies present:
I tot (, ) d =
2 I () [1 + cos ()] d
(8.11)
While this procedure may seem obvious, the fact that we can do it is remarkable! Remember that it is usually the elds that we must add together before nding the intensity of the resulting superposition. The formula (8.11) with its superposition of intensities relies on the fact that the different frequencies inside the interferometer when time-averaged (over all time) do not interfere. Certainly, the elds at different frequencies do interfere (or beat in time). However, they constructively interfere as often as they destructively interfere, and in a time-averaged picture it is as though the individual frequency components transmit independently. Again,
4 M. Born and E. Wolf, Principles of Optics, 7th ed., p. 570 (Cambridge University Press, 1999).
207
in writing (8.11) we considered the light to be pulsed rather than continuous so that the integrals converge. We can manipulate (8.11) as follows:
I tot (, ) d = 2
I () d 1 +
I () cos () d
(8.12)
I () d
This is the same as (8.8) since we can replace cos() with Re e i , and we can apply Parsevals theorem (8.6) to the other integrals. Thus, the above arguments lead to (8.9) and (8.10).
Example 8.1
Compute the output signal when a Gaussian pulse with spectrum (7.25) is sent into a Michelson interferometer. Solution: The power spectrum of the pulse is5 I (r, ) = c 2 T 2 (0 )2 E0 E 0T e 2
0
where T is the pulse duration, not to be confused with , the delay of the interferometer arm. As shown in Example 7.3, we also have
I (r, ) d =
c E0 E 0T 2
0
-1
e T
(0 )2 i
d T (2 T e T2
2 i 2 0 4T 2
Figure 8.3 The output or signal from a Michelson interferometer for light with a Gaussian spectrum. d = )
T 2 2 0
= =e
T 2 2 +(2T 2 0 i )T 2 2 0
2 2 4T
e i 0
Formula (0.55) was used to complete the integration. According to (8.9), the signal at the detector is then Sig() 2E 1 + Re () = 2E 1 + e
2 4T 2
cos (0 )
Figure 8.3 shows this signal for a given T . As delay is added (or subtracted), the output signal oscillates. Eventually enough delay is introduced such that the very short pulses no longer interfere (arriving sequentially), and the output signal becomes steady.
5 Technically, the output intensity is one fourth this, but our calculation of the degree of coher-
208
( ) d = 2
0
( ) d
(8.13)
The coherence length is the distance that light travels in this time:
c
c c
(8.14)
Another useful concept is fringe visibility. The fringe visibility is dened in the following way: max Sig() min Sig() V () (8.15) max Sig() + min Sig() where max Sig() refers to the detector signal when the mirror is positioned such that the amount of throughput to the detector is a local maximum, and min Sig() refers to the detector signal when the mirror is positioned such that the amount of throughput to the detector is a local minimum. The minimum and the maximum dont occur at exactly the same , but for optical frequencies the difference in is only about half an optical period. As the mirror moves a large distance from the equal-path-length position, the oscillations in Sig() become less pronounced as the max and min tend to the same value, and the fringe visibility goes to zero when () = 0. It is left as an exercise (see P8.1) to show that the fringe visibility can be written as6 V () = () (8.16)
Note that the fringe visibility depends only upon the frequency content of the light without regard to whether the frequency components are organized into a short pulse or a longer time pattern.
6 M. Born and E. Wolf, Principles of Optics, 7th ed., p. 570 (Cambridge University Press, 1999).
209
Example 8.2
Find the fringe visibility and the coherence time for the Gaussian pulse studied in Example 8.1. Solution: By (8.16), the fringe visibility is V ( ) = ( ) = e
2 4T 2
This is shown as the dashed line in Fig. 8.4. As expected, the fringe visibility dies off as delay gets farther from the origin, the point where the interferometer arms are equidistant. From (8.13) the coherence time is
c =
( ) d =
2 2T 2
d =
2 T
Figure 8.4 Re () (solid) and |()| (dashed) for a light pulse with a Gaussian spectrum as in examples 8.1 and 8.2.
I ( t )d t
T /2
The duration T must be large enough to average over any uctuations that are present in the light source. The average in (8.17) should not be used on a pulsed light source since the result would depend on the duration T of the temporal window. For a continuous light source, the signal at the detector (8.9) becomes Sig() 2 I (t )t 1 + Re () (continuous source) (8.18)
Although technically the integrals used in (8.10) to compute () also diverge in the case of continuous light, the numerator and the denominator diverge in the same way. Therefore, we may renormalize I () in any way we like to deal with this problem. Both the numerator and denominator of (8.10) contain I (), so regardless of how large I () is or what units the measurement gives (volts or whatever), we can just plug the instrument reading directly into (8.10). The units in the numerator and denominator cancel so that () always remains dimensionless. Once we have the degree of coherence function (), we can calculate the coherence time and fringe visibility just as we did for pulsed sources.
210
Sig () 2E + 2Re
I ()e i d
(8.19)
Typically, the signal comes in the form of a voltage or a current from a sensor. However, the signal can easily be normalized to the beam uence. In particular, for large the fringe visibility goes to zero (i.e. () = 0), and the normalized signal must approach
lim Sig () = 2E = 2
I ( t )d t
(8.20)
We will assume that this normalization has taken place and write (8.19) as an equality. Given our measurement of Sig(), we would like to nd the power spectrum I (). Unfortunately, I () is buried within an integral in (8.19). However, since the integral looks like an inverse Fourier transform of I (), we will be able to extract the desired spectrum after some manipulation. This procedure for extracting I () from an interferometric measurement is known as Fourier spectroscopy.7 Extracting I ()
We rst take the Fourier transform of (8.19):8 F Sig () = F {2E } + F 2Re
I () e
(8.21)
The left-hand side is known since it is the measured data, and a computer can be employed to take the Fourier transform of it. The rst term on the right-hand side is the Fourier transform of a constant: F {2E } = 2E 1 2
e i d = 2E 2 ()
(8.22)
Notice that (8.22) is zero everywhere except where = 0, where a spike occurs. This represents the DC component of F Sig () . The second term of (8.21) can be written as i F 2Re I () e d = F I () e i d +
I () e
7 J. Peatross and S. Bergeson, Fourier Spectroscopy of Ultrashort Laser Pulses, Am. J. Phys. 74, 842-845 (2006). 8 This is weird since normally we take Fourier transforms on elds rather than expressions involving intensity!
211
I ( )e i d e i d
which we rearrange to 1 i ( ) 2 I ( ) e d d + 2
1 I ( ) 2
e
i ( +)
d d
From (0.52) we note that the terms in parentheses are delta functions, so we have 2
I ( ) d +
I ( ) + d
The remaining frequency integrals can then be easily performed to obtain our nal form: F 2Re I () e i d = 2 [ I () + I ()] (8.23)
The Fourier transform of the measured signal is seen to contain three terms, one of which is the power spectrum I () that we are after. Fortunately, when graphed as a function of (shown in Fig. 8.5), the three terms on the right-hand side typically do not overlap. As a reminder, the measured signal as a function of looks something like that in Fig. 8.3. The oscillation frequency of the fringes lies in the neighborhood of 0 . The procedure to obtain I () is (1) Record Sig (); (2) if desired, normalize by its value at large ; (3) take its Fourier transform; and (4) extract the curve at positive frequencies.
212
Point Source
Figure 8.6 A point source produces coherent (locked phases) light. When this light which traverses two slits and arrives at a screen it produces a fringe pattern.
Depending on the coherence of the light entering each slit, the fringe pattern observed can exhibit good or poor visibility. Just as the Michelson interferometer is sensitive to the spectral content of light, the Youngs two-slit setup is sensitive to the spatial extent of the light source illuminating the two slits. For example, if light from a distant star (restricted by a lter to a narrow spectral range) is used to illuminate a double-slit setup, the resulting interference pattern appearing on a subsequent screen shows good or poor fringe visibility depending on the angular width of the star. Michelson was the rst to use this type of setup to measure the angular width of stars. Light emerging from a single ideal point source has wave fronts that are spatially uniform in a lateral sense (see Fig. 8.6). Such wave fronts are said to be spatially coherent, even if the temporal coherence is not perfect (i.e. if a range of frequencies is present). When spatially coherent light illuminates a Youngs two-slit setup, fringes of maximum visibility are seen at a distant screen, meaning the fringes vary between a maximum intensity and zero. Consider a Youngs two-slit setup illuminated by a single point source. We represent the elds on a subsequent screen that transmit through each slit, respectively, as E0 e i (kd1 t ) and E0 e i (kd2 t ) . We have assumed that the slits are equidistant from the point source and that the two elds at the screen are identical other than for their phases. In close analogy with (8.1), the resulting intensity pattern on a far-away screen is I tot (h ) = 2 I 0 [1 + cos (kd 2 kd 1 )] = 2 I 0 1 + cos kh y /D (8.25)
Notice the close similarity between this expression and the output from a Michelson interferometer for a plane wave (8.1). We will consider h (the separation of the slits) to be the counterpart of (the delay introduced by moving a mirror in the Michelson interferometer). To obtain the nal expression in (8.25) we made
Fringe Pattern
213
Extended Source
Figure 8.7 Light from an extended source is only partially coherent. Fringes are still possible, but they exhibit less contrast.
y h /2
+ D2
=D
1+
y h /2 D2
y h /2 = D 1+ 2D 2
(8.26)
1+
y + h /2 D2
y + h /2 = D 1+ 2D 2
(8.27)
These approximations are valid as long as D y and D h . We next consider how to modify (8.25) so that it applies to the case when the two slits are illuminated by a collection of point sources distributed over a nite lateral extent. This situation is depicted in Fig. 8.7 and it leads to partial spatial coherence if the phase of each point emitter uctuates randomly.9 When a Youngs two-slit setup is illuminated by an extended random source, the wave fronts at the two slits are less correlated. This makes the fringes move around on the screen rapidly and partially wash out when time averaged, meaning worse fringe visibility. To simplify our analysis, we restrict the distribution of point sources to vary only in the y dimension.10 We assume that the light is quasi-monochromatic so that its frequency is approximately with a phase that uctuates randomly over time intervals much longer than the period of oscillation 2/.11 The light emerging from the j th point at y j travels by means of two very narrow slits to a point y on a screen. Let E1 ( y j ) and E2 ( y j ) be the elds on the
9 A laser beam does not t the denition of a light source with randomly varying spatial phase.
Instead, in this section we consider a source such as the surface of a star (ltered to a narrow frequency range). See appendix 8.B for more discussion. 10 The results can be generalized to a two-dimensional source. 11 Random phase uctuations necessarily imply some frequency bandwidth, however small. Hence the need to specify quasi-monochromatic light.
Fringe Pattern
214
screen at y , both originating from the point y j , but traveling respectively through the two different slits. We assume that these elds have the same polarization, and we will suppress the vectorial nature of the elds. For simplicity, we assume the two elds have the same (real) amplitude at the screen E 0 ( y j ). Thus, we write the two elds as E 1 ( y j ) = E 0 ( y j )e and E 2 ( y j ) = E 0 ( y j )e
i k r 2 ( y j )+d 2 ( y ) t +( y j ) i k r 1 ( y j )+d 1 ( y ) t +( y j )
(8.28)
(8.29)
We have explicitly included an arbitrary phase ( y j ), which we will take to be different for each point source. We now set about nding the cumulative eld at y arising from the many points indexed by the subscript j . The total eld on the screen at point y is E tot (h ) =
j
E1(y j ) + E2(y j )
(8.30)
Obviously, in addition to h , the total eld depends on y , R , D , and k as well as on the phase ( y j ) at each point. Nevertheless, in the end we will mainly emphasize the dependence on the slit separation h . The intensity associated with (8.30) is I tot (h ) =
0c
2 0c = 2 0c = 2
0c
|E tot (h )|2
j
E1(y j ) + E2(y j )
E1(ym ) + E2(ym )
j ,m
E 1 ( y j )E 1 ( y m ) + E 2 ( y j )E 2 ( y m ) + E 1 ( y j )E 2 ( y m ) + E 2 ( y j )E 1 (ym ) i k r 1 ( y j )r 1 ( y m ) i k r 2 ( y j )r 2 ( y m )
j ,m
E0(y j ) E0(ym ) e
+e
+2Re e
i k r 1 ( y j )r 2 ( y m )
e i k (d1 ( y )d2 ( y ))
i ( y j )( y m )
(8.31) At this juncture we make a critical assumption: that the phase of the emission ( y j ) varies in time independently at every point on the source. This is sometimes called the stochastic assumption, and it is appropriate for the emission from thermal sources such as starlight, a glowing lament (ltered to a narrow frequency range), or spontaneous emission from an excited gas or plasma. However, it is not appropriate for coherent sources like lasers. A wonderful simplication happens to (8.31) when the phase difference ( y j ) ( y m ) varies randomly. If j = m , then exp{i (( y j ) ( y m ))} time-averages to zero. On the other hand, if j = m , then the factor reduces to e 0 = 1. Formally, this is written e
i ( y j )( y m ) t
= j ,m
1 if j = m , 0 if j = m .
215
where j ,m is known as the Kronecker delta function. The time-averaged intensity under the stochastic assumption (8.32) then reduces to I tot (h )t =
j
I (y j ) +
I ( y j ) + 2Re
I ( y j )e
i k r 1 ( y j )r 2 ( y j )
e i k (d1 ( y )d2 ( y ))
(8.33) h y /D . Very similarly, we may also We may use (8.26) to simplify d 1 ( y ) d 2 ( y ) = write r 1 ( y j ) r 2 ( y j ) = h y j /R . The only thing left to do is to put (8.33) into a slightly more familiar form: I tot (h )t = 2
j
I (y j )
1 + Re (h )
We have introduced e i (h )
kh y D
kh y
j j
I ( y j )e i I (y j )
(8.35)
which is known as the degree of coherence. It controls the fringe pattern seen at the screen. We can generalize (8.34) so that it applies to the case of a continuous distribution of light as opposed to a collection of discrete point sources. In Appendix 8.A we show how summations in (8.34) and (8.35) become integrals over the source intensity distribution, and we write I net (h )t = 2 I oneslit t 1 + Re (h ) where e i (h )
kh y D
I ( y )e i I ( y )d y
kh y R
dy (8.37)
Here I ( y ) has units of intensity per length. The factor exp i kh y /D denes the positions of the periodic fringes on the screen. The remainder of (8.37) controls the depth of the fringes as the slit separation h is varied. When the slit separation h increases, the amplitude of (h ) tends to diminish until the intensity at the screen becomes uniform. When the kh y two slits have very small separation (such that e i R = 1) then we have (h ) = 1 and very good fringe visibility results. (h ) dictates the degree of spatial coherence in much the same way that () dictates the degree of temporal coherence. Notice the close similarity between (8.37) and (8.10). As the slit separation h increases, the fringe visibility V (h ) = (h ) (8.38)
216
diminishes, eventually approaching zero (see (8.16)). In analogy to the temporal case (see (8.13)), we can dene a slit separation sufciently large to make the fringes at the screen wash out:
hc 2
0
(h ) d h
(8.39)
E1(y j )
E 1 ( y )d y
and
m
E1(ym )
E 1 ( y )d y (8.40) E 2 ( y )d y
E2(y j )
E 2 ( y )d y
and
m
E2(ym )
Rather than deal with a time average of randomly varying phases, we will instead work with a linear superposition of all conceivable phase factors. That is, we will write the phase ( y ) as K y , where K is a parameter with units of inverse length, which we allow to take on all possible real values with uniform likelihood. The way we modify (8.32) for the continuous case is then e
i ( y j )( y m ) t
1 = j ,m 2
e i K ( y y ) d K = ( y y )
(8.41)
d y E (y )
dy
E ( y ) e i k (r 1 ( y )r 1 ( y )) + e i k (r 2 ( y )r 2 ( y )) ( y y )
(8.42) Again, consistent with (8.26), we may write d 1 ( y ) d 2 ( y ) = h y /D and r 1 ( y ) r2(y ) = h y /R , and (8.42) reduces to kh y kh y I tot (h ) = 2 I ( y )d y + 2Re e i D I ( y )e i R d y (8.43)
where I (y )
1 0c E (y ) 2
(8.44)
217
For I tot to have normal units of intensity, I ( y ) must have units of intensity per length of source, implying that E ( y ) has units of eld per square root of length. Hence, I ( y )d y is the intensity at the screen caused by the entire extended source when only one slit is open. We see that (8.43) is equivalent to (8.36) and (8.37).
ky 2 2R
I tot (h )
kh y D
E ( y ) e i ( y )+i
kh y 2R
dy
E (y ) e
i ( y )+i
ky 2 2R
kh y 2R
dy
+ 2Ree i
E y
e i ( y )+i
ky 2 2R
e i
kh y 2R
dy
E y
e i ( y )+i
ky 2 2R
ei
kh y 2R
dy
(8.45)
where we have employed (8.26) and (8.27) and similar expressions involving R and y . The rst term on the right-hand side of (8.45) is the intensity on the screen when the lower slit is covered. The second term is the intensity on the screen when the upper slit is covered. The last term is the interference term, which modies the sum of the individual intensities when both slits are uncovered. Notice the occurrence of Fourier transforms (over position) on the quantities inside of the square brackets. Later, when we study diffraction theory, we will recognize these transforms as determining the strength of elds impinging on the individual slits. This corresponds to a major difference between a spatially coherent source and a random-phase source. With the random-phase source, the slits are always illuminated with the same strength regardless of the separation. However, with a coherent source, beaming can occur such that the strength as well as phase of the eld at each slit depends on the slit separation. A beautiful simplication occurs when the phase of the emitted light has the following distribution: ky 2 ( y ) = (8.46) 2R Equation (8.46) is not as arbitrary as it may rst appear. This particular phase is an approximation to a concave spherical wave front converging to the center between the two slits. This type of wave front is created when a plane wave passes
218
through a lens. With the special phase (8.46), the intensity (8.45) reduces to
2 2 i
kh y 2R
E (y ) e
dy
E y
2
kh y 2R
dy (8.47)
+ 2Ree i
kh y D
E ( y ) e i
kh y 2R
dy
E ( y ) e i
kh y 2R
dy
(8.48)
and the magnitude of the degree of coherence V = (h /2) from (8.37). Again, this corresponds to the eld that goes through the upper slit, when it is positioned at h /2, and which impinges on the screen. Let this eld be denoted by |E 1 (h /2)|. The eld strength when the single slit is positioned at h compared to that when it is positioned at zero is
E 1 (h ) = E 1 (0)
E ( y ) e i
kh y R
dy (8.49)
E (y ) d y
This looks very much like (h ) of (8.37) except that the magnitude of the eld appears in (8.49), whereas the intensity appears in (8.37). This may seem rather contrived, but at least it is cute, and it is known as the van Cittert-Zernike theorem.12 It says that the spatial coherence of an extended source with randomly varying phase drops off with lateral slit separation in the same way that the eld pattern at the focus of a converging spherical wave would drop off, whose eld amplitude distribution is the same as the original intensity distribution.
12 M. Born and E. Wolf, Principles of Optics, 7th ed., p. 574 (Cambridge University Press, 1999).
Exercises
219
Exercises
Exercises for 8.2 Coherence Time and Fringe Visibility P8.1 (a) Verify that (8.16) gives the fringe visibility. HINT: Write = e i and assume that the oscillations in that give rise to fringes are due entirely to changes in and that is a slowly varying function in comparison to the oscillations. (b) What is the coherence time c of the light in P8.4? P8.2 (a) Show that the fringe visibility of a Gaussian spectral distribution (see Example 8.2) goes from 1 to e /2 = 0.21 as the round-trip path in one arm of the instrument is extended by a coherence length. (b) Find the FWHM bandwidth in wavelength FWHM in terms of the coherence length c and the center wavelength 0 . HINT: First determine FWHM , dened to be the width of I () at half c of its peak. To convert to a wavelength difference, use = 2 2 c FWHM = 2 FWHM . You can ignore the minus sign; it simply 0 means that wavelength decreases as frequency increases.
Exercises for 8.3 Temporal Coherence of Continuous Sources P8.3 Show that Re{()} dened in (8.10) reduces to cos (0 ) in the case of a plane wave E (t ) = E 0 e i (k0 z 0 t ) being sent through a Michelson interferometer. In other words, the output intensity from the interferometer reduces to I = 2 I 0 [1 + cos (0 )] as you already expect. HINT: Dont be afraid of delta functions. After integration, the left-over delta functions cancel. P8.4 Light emerging from a dense hot gas has a collisionally broadened power spectrum described by the Lorentzian function I () = 1+ I (0 )
2 0 FWHM /2
The light is sent into a Michelson interferometer. Make a graph of the average power arriving to the detector as a function of . HINT: See (0.56).
220
P8.5
Consider the light source described in P8.4 (a) Regardless of how the phase of E () is organized, the oscillation of the energy arriving to the detector as a function of is the same. The spectral phase of the light in P8.4 is randomly organized. Describe qualitatively how the light probably looks as a function of time. (b) Now suppose that the phase of the light is somehow neatly organized such that i E (0 ) e i c z E () = 0 i + FWHM /2 Perform the inverse Fourier transform on the eld and nd how the intensity of the light looks a function of time. HINT:
e i ax dx = x +
2i e i a 0
if a >0 if a <0
Im > 0
Exercises for 8.4 Fourier Spectroscopy L8.6 (a) Use a scanning Michelson interferometer to measure the wavelength of the ultrashort laser pulses from a mode-locked Ti:sapphire oscillator.13 (b) Measure the coherence length of the source by observing the distance over which the visibility diminishes. From your measurement, what is the bandwidth FWHM of the source, assuming the Gaussian prole in the previous problem? See P8.2. (c) Use a computer to perform a fast Fourier transform (FFT) of the signal output. For the positive frequencies, plot the laser spectrum as a function of and compare with the results of (a) and (b).
Detector
Beam Splitter
(d) How do the results change if the ultrashort pulses are rst stretched in time by traversing a thick piece of glass?
Figure 8.8
Exercises for 8.5 Youngs Two-Slit Setup and Spatial Coherence P8.7 (a) A point source with wavelength = 500 nm illuminates two parallel slits separated by h = 1.0 mm. If the screen is D = 2 m away, what is the separation between the diffraction peaks on the screen? Make a sketch.
13 J. Peatross and S. Bergeson, Fourier Spectroscopy of Ultrashort Laser Pulses, Am. J. Phys. 74,
842-845 (2006).
Exercises
221
(b) A thin piece of glass with thickness d = 0.01 mm and index n = 1.5 is placed in front of one of the slits. By how many fringes does the pattern at the screen move? HINT: Add to k (d 2 d 1 ) in (8.25) , where 2 1 is the relative phase between the two paths. Compare the phase of the light when traversing the glass versus traversing an empty region of the same thickness. L8.8 (a) Carefully measure the separation of a double slit in the lab (h 0.1 mm separation) by shining a HeNe laser ( = 633 nm) through it and measuring the diffraction peak separations on a distant wall (say, 2 m from the slits). HINT: For better accuracy, measure across several fringes and divide.
Diffuser Laser CCD Camera Rotating diffuser to create phase variation Single slit width a Double slit separation h
Filter
Figure 8.9
(b) Create an extended light source with a HeNe laser using a timevarying diffuser followed by an adjustable single slit. (The diffuser must rotate rapidly to create random time variation of the phase at each point as would occur automatically for a natural source such as a star.) Place the double slit at a distance of R 100 cm after the rst slit. (Take note of the exact value of R , as you will need it for the next problem.) Use a lens to image the diffraction pattern that would have appeared on a far-away screen into a video camera. Observe the visibility of the fringes. Adjust the width of the source with the single slit until the visibility of the fringes disappears. After making the source wide enough to cause the fringe pattern to degrade, measure the single slit width a by shining a HeNe laser through it and observing the diffraction pattern on the distant wall. (video) HINT: As we will study later, a single slit of width a produces an intensity pattern on a screen a distance L away described by I (x ) = I peak sinc2 where sinc ()
sin
a x L
and lim
sin
= 1.
222
NOTE: It would have been nicer to vary the separation of the two slits to determine the width of a xed source. However, because it is hard to make an adjustable double slit, we varied the size of the source until the spatial coherence of the light matched the slit separation. P8.9 (a) Compute h c for a uniform intensity distribution of width a using (8.39). (b) Use this formula to check that your measurements in L8.8 agree with spatial coherence theory. HINT: In your experiment h c is the double slit separation. Use your measured R and h to calculate what the width of the single slit (i.e. a ) should have been when the fringes disappeared and compare this calculation to your direct measurement of a .
a /2
(h ) =
a /2
I 0 exp i kh R + D
a /2
dy =
y e i kh D
a /2 a /2
y e i kh R
dy =
e i kh D e
i kh
y R i kh R
a /2
a /2
I0d y
a /2
e = e i kh D
/2 i kh aR
e i kh
a /2 R
/2 2i kh aR
= e i kh D sinc kha 2R
Note that
sin2 x (x )2
dx =
Review, Chapters 68
True and False Questions R29 T or F: In our notation (widely used), I (t ) is the Fourier transform of I (). T or F: The integral of I (t ) over all t equals the integral of I () over all . T or F: The phase velocity of light (the speed of an individual frequency component of the eld) never exceeds the speed of light c . T or F: The group velocity of light in a homogeneous material can exceed c if absorption or amplication takes place. T or F: The group velocity of light never exceeds the phase velocity. T or F: A Michelson interferometer can be used to measure the spectral intensity of light I (). T or F: A Michelson interferometer can be used to measure the duration of a short laser pulse and thereby characterize its chirp. T or F: A Michelson interferometer can be used to measure the wavelength of light. T or F: A Michelson interferometer can be used to measure the phase of E (). T or F: The Fourier transform (or inverse Fourier transform if you prefer) of I () is proportional to the degree of temporal coherence. T or F: A Michelson interferometer is ideal for measuring the spatial coherence of light. T or F: The Youngs two-slit setup is ideal for measuring the temporal coherence of light.
R30
R31
R32
R33 R34
R35
R36
R37
R38
R39
R40
223
224
Review, Chapters 68
R41
T or F: Vertically polarized light illuminates a Youngs double-slit setup and fringes are seen on a distant screen with good visibility. A half wave plate is placed in front of one of the slits so that the polarization for that slit becomes horizontally polarized. Heres the statement: The fringes at the screen will shift position but maintain their good visibility.
Problems
Horizontal Polarizer Vertical Polarizer
R42
(a) Horizontally polarized light enters a system and rst travels through a horizontal and then a vertical polarizer in series. What is the Jones vector of the transmitted eld? (b) Now a polarizer at 45 is inserted between the two polarizers in the system described in (a). What is the Jones vector of the transmitted eld? How does the nal intensity compare to initial intensity? (c) Now a quarter wave plate with a fast-axis angle at 45 is inserted between the two polarizers (instead of the polarizer of part (b)). What is the Jones vector of the transmitted eld? How does the nal intensity compare to initial intensity?
Figure 8.10
R43
(a) Find the Jones matrix for half wave plate with its fast axis making an arbitrary angle with the x -axis. HINT: Project an arbitrary polarization with E x and E y onto the fast and slow axes of the wave plate. Shift the slow axis phase by , and then project the eld components back onto the horizontal and vertical axes. The answer is cos2 sin2 2 sin cos 2 sin cos sin2 cos2
y-axis
(b) We desire to create a variable attenuator for a polarized laser beam using a half wave plate and a polarizer aligned to the initial polarization of the beam (see gure). The fast axis of the half wave plate is initially aligned in the direction of polarization and then rotated through an angle . What is the ratio of the intensity exiting the polarizer to the incoming intensity as a function of ? R44 (a) What is the spectral content (i.e., I ()) of a square laser pulse E (t ) = E 0 e i 0 t 0 , |t | /2 , |t | > /2
Make a sketch of I (), indicating the location of the rst zeros. (b) What is the temporal shape (i.e., I (t )) of a light pulse with frequency content E 0 , | 0 | /2 E () = 0 , | 0 | > /2
225
where in this case E 0 has units of E-eld per frequency. Make a sketch of I (t ), indicating the location of the rst zeros. (c) If E () is known (any arbitrary function, not the same as above), and the light goes through a material of thickness and index of refraction n (), how would you nd the form of the pulse E (t ) after passing through the material? Please set up the integral. R45 (a) Prove Parsevals theorem:
|E ()| d =
| E ( t )| 2 d t .
HINT: 1 t t = 2
e i (t t ) d
(b) Explain the physical relevance of Parsevals theorem to light pulses. Suppose that you have a detector that measures the total energy in a pulse of light, say 1 mJ directed onto an area of 1 mm2 . Next you measure the spectrum of light and nd it to have a width of = 50 nm, centered at 0 = 800 nm. Assume that the light has a Gaussian frequency prole I () = I (0 )e Use as an approximate value = units for I (0 ). HINT:
2 0 2
2 c . 2
e Ax
+B x +C
dx =
B 2 /4 A +C e A
Re { A } > 0
R46
Continuous light entering a Michelson interferometer has a spectrum described by I 0 , | 0 | /2 I () = 0 , | 0 | > /2 The Michelson interferometer uses a 50:50 beam splitter. The emerging light has intensity I det (t , )t = 2 I (t )t 1 + Re () , where degree of coherence is
() =
I () e
I ()d
Find the fringe visibility V ( I max I min )/( I max + I min ) as a function of (i.e. the round-trip delay due to moving one of the mirrors).
226
Review, Chapters 68
Extended Source
R47
Light emerging from a point travels by means of two very narrow slits to a point y on a screen. The intensity at the screen arising from a point source at position y is found to be I screen y , h = 2 I ( y ) 1 + cos kh y y + D R
where an approximation has restricted us to small angles. (a) Now, suppose that I ( y ) characterizes emission from a wider source with randomly varying phase across its width. Write down an expression (in integral form) for the resulting intensity at the screen:
I screen (h )
I screen y , h d y
(b) Assume that the source has an emission distribution with the form 2 2 I ( y ) = I 0 / y e y / y . What is the function (h ) where the intensity is written I screen (h ) = 2 I 0 1 + Re(h ) ? HINT:
e Ax
+B x +C
dx =
B 2 /4 A +C e A
Re { A } > 0.
(c) As h varies, the intensity at a point on the screen y oscillates. As h grows wider, the amplitude of oscillations decreases. How wide must the slit separation h become (in terms of R , k , and y ) to reduce the visibility to I max I min 1 V = I max + I min 3
Selected Answers
R42: (b) 1/4, (c) 1/2. R45: (b) 3.8 1016 J/ cm2 s1 .
Fringe Pattern
Chapter 9
Light as Rays
So far in our study of optics, we have described light in terms of waves, which satisfy Maxwells equations. However, as you are probably aware, in many situations light can be thought of as rays pointing along the direction of wave propagation. A ray picture is useful when one is interested in the macroscopic ow of light energy, but rays fail to reveal ne details, in particular wave and diffraction phenomena. For example, simple ray theory suggests that a lens can focus light down to a point. However, if a beam of light were concentrated onto a true point, the intensity would be innite! Nevertheless, ray theory is useful for predicting where a focus occurs. It is also useful for describing imaging properties of optical systems (e.g. lenses and mirrors). Beginning in section 9.3 we study the details of ray theory and the imaging properties of optical systems. First, however, we examine the justication for ray theory starting from Maxwells equations. In the short-wavelength limit, Maxwells equations give rise to the eikonal equation, which governs the direction of rays in a medium with an index of refraction that varies with position. The German word eikonal comes from the Greek from which the modern word icon derives. The eikonal equation therefore has a descriptive title since it controls the formation of images. Although we will not use the eikonal equation extensively, we will show how it embodies the underlying justication for ray theory. As will be apparent in its derivation, the eikonal equation relies on an approximation that the features of interest in the light distribution are large relative to the wavelength of the light. The eikonal equation describes the direction of ray propagation, even in complicated situations such as desert mirages where air is heated near the ground and has a different index than the air farther from the ground. Rays of light from the sky that initially are directed toward the ground can be bent such that they travel parallel to or even up from the ground, owing to the inhomogeneous refractive index. The eikonal equation can also be used to deduce Fermats principle, which in short says that light travels from point A to point B following a path that takes the minimum time. This principle can be used, for example, toderive Snells law. Of course Fermat asserted his principle more than a century before Maxwells 227
228
equations were known, but it is nice to give justication retroactively to Fermats principle using the modern perspective. In this chapter, we will analyze the propagation of rays through optical systems composed of lenses and/or curved mirrors in the context of paraxial ray theory . The paraxial approximation restricts rays to travel nearly parallel to the axis of such systems. We consider the effects of three basic optical elements acting on paraxial rays: 1) Unobstructed propagation through a distance d in a uniform medium; a ray may move farther away from (or closer to) the optical axis, as it travels. 2) Reection from a curved spherical mirror, which changes a rays angle with respect to the optical axis. 3) Transmission through a spherical interface between two materials with differing refractive indices. The effects of each of these basic elements on a ray of light can be represented as a 2 2 matrix, which can be multiplied together to construct more complex imaging systems (such as a lens or a series of lenses and curved mirrors). We will study image formation in the context of the paraxial approximation, which in the case of a curved mirror or a thin lens gives rise to the familiar formula 1 1 1 = + f do di (9.1)
Even a complicated multi-element optical system obeys (9.1) if d o and d i are measured from principal planes rather than the single plane of, for example, a thin lens. Paraxial ray theory can also be used to study the stability of laser cavities. The formalism predicts whether a ray, after many round trips in the cavity, remains near the optical axis (trapped and therefore stable) or if it drifts endlessly away from the axis of the cavity on successive round trips. In appendix 9.A we address deviations from the paraxial ray theory known as aberrations. We also comment on ray-tracing techniques, used for designing optical systems that minimize such aberrations.
229
As a trial solution for (9.2), we take E(r, t ) = E0 (r) e i [kvac R (r)t ] where k vac = 2 = c vac (9.3)
(9.4)
Here R (r) is a real scalar function (which depends on position) having the dimension of length. By taking R (r) to be real, we do not take into account possible absorption or amplication in the medium. Even though the trial solution (9.3) looks somewhat like a plane wave,1 the function R (r) accommodates wave fronts that can be curved or distorted as depicted in Fig. 9.1. At any given instant t , the phase of the curved surfaces described by R (r) = const ant can be interpreted as wave fronts of the solution. The wave fronts travel in the direction for which R (r) varies the fastest. This direction is is aligned with R (r), which lies in the direction perpendicular to surfaces of constant phase. The substitution of the trial solution (9.3) into the wave equation (9.2) gives 1
2 k vac
(9.5)
Figure 9.1 Wave fronts (i.e. surfaces of constant phase given by R (r)) distributed throughout space in the presence of a spatially inhomogeneous refractive index. The gradient of R gives the direction of travel for a wavefront.
+i k vac E 0x (r) 2 R (r) + 2i k vac [E 0x (r)] [R (r)] e i kvac R (r) Upon combining the result for each vector component of E0 (r), the required spatial derivative can be written as
2 2 E0 (r) e i kvac R (r) = 2 E0 (r) k vac E0 (r) [R (r)] [R (r)] + i k vac E0 (r) 2 R (r)
[E 0x (r)] [R (r)] + y E 0 y (r) [R (r)] +2i k vac x [E 0z (r)] [R (r)]}) e i kvac R (r) +z
1 If the index is spatially independent (i.e. n (r) n ), then (9.3) reduces to the usual plane-wave
solution of the wave equation. In this case, we have R (r) = k r/k vac and the eld amplitude becomes constant (i.e. E0 (r) E0 ).
230
After performing the Laplacian and after some rearranging, (9.5) becomes R (r) R (r) n 2 (r) E0 (r) = 2 E0 (r)
2 k vac
k vac 2i E 0 y (r) R (r) + z E 0z (r) R (r) + y k vac (9.6) Dont be afraid; at this point we are ready to make an important approximation. We take the limit of a very short wavelength (i.e. 1/k vac = vac /2 0), and the entire right-hand side of (9.6) vanishes. (Thank goodness!) With it we lose the effects of diffraction. We also lose surface reections at abrupt index changes unless specically considered. This approximation works best in situations where only macroscopic features are of concern. Our wave equation has been simplied to [R (r)] [R (r)] = n 2 (r) Written another way, this equation is (r) R (r) = n (r) s (9.8) (9.7)
2 R (r) +
is a unit vector pointing in the direction R (r), the direction normal to where s wave front surfaces. Equation (9.8) is called the eikonal equation.2
Example 9.1
Suppose that a region of air above the desert on a hot day has an index of refraction that varies with height y according to n y = n 0 1 + y 2 /h 2 . Verify that R x , y = n 0 x y 2 /2h is a solution to the eikonal equation. (See problem P9.1 for a more general solution.)
x
h/2 h/4
Solution: The gradient of our trial solution gives Figure 9.2 Depiction of possible light ray paths in a region with varying index. y y /h R x , y = n 0 x Substituting this into (9.7) gives
2 y y /h n 0 x y y /h = n 0 R R = n 0 x 1 + y 2 /h 2 = n 2 y
Computed at various heights, the direction for rays turns out to be (h ) = s y x 2 (h /2) = s y /2 x 5/4 (h /4) = s y /4 x 17/16
2 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 3.1.1 (Cambridge University Press, 1999).
231
These are represented in Fig. 9.2. In a desert mirage, light from the sky can appear to come from a lower position. We can determine a path for the rays by setting : d y /d x equal to the slope of s dy y = dx h y = y 0 e (x x0 )/h
Under the assumption of an innitely short wavelength, the Poynting vector as demonstrated in P9.2. In other words, the direction of is directed along s species the direction of energy ow. The unit vector s at each location in s space points perpendicular to the wave fronts and indicates the direction that the distributed waves travel as seen in Fig. 9.1. We refer to a collection of vectors s throughout space as rays. In retrospect, we might have jumped straight to (9.8) without going through the above derivation. After all, we know that each part of a wave front advances in the direction of its gradient R (r) (i.e. in the direction that R (r) varies most rapidly). We also know that each part of a wave front dened by R (r) = constant travels at speed c /n (r). The slower a given part of the wave front advances, the more rapidly R (r) changes with position r and the closer the contours of constant phase. It follows that R (r) must be proportional to n (r) since R (r) denotes the rate of change in R (r).
(r)] = [R (r)] = 0 [ n (r ) s This can be integrated over an open surface of area A to give (r)] d a = [ n (r ) s
A C
(9.9)
ometry, probability theory, and number theory. He was often quite secretive about the methods used to obtain his results. Mathematicians suspect that Fermat didn't actually prove his famous
(r ) d = 0 n (r ) s
(9.10)
last theorem, which was not able to be veried until the 1990's. Fermat was the rst to assert that the path taken by a beam of light is the one that can be traveled in the least amount of time. (Wikipedia)
where we have applied Stokes theorem (0.12) to convert the area integral into a path integral around the perimeter contour C .
3 M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 3.3.2 (Cambridge University Press, 1999). 4 The curl of a gradient is identically zero for any function.
232
d around a closed loop is always Equation (9.10) states that the integration of n s zero. If we consider a closed loop comprised of a path from point A to point B and then a different path from point back to point A again, the integrals for the two legs always cancel, even while holding one path xed while varying the other. This means
B
d ns
A
(9.11)
, as depicted in Fig. 9.3. In Now consider a path from A to B that is parallel to s this case, the cosine in the dot product is always one. If we choose some other path that connects A and B, the cosine associated with the dot product is less than one at most points along that path, whereas the result of the integral is the same. Therefore, if we articially remove the dot product from the integral (i.e. exclude the cosine factor), the result of the integral will exceed the true value unless the (i.e. the path that corresponds to the one path chosen follows the direction of s that light rays actually follow).
(9.12)
The integral on the right is called the optical path length (OP L ) between points A and B:
B
B
Figure 9.3 A ray of light leaving point A arriving at B.
OP L |B A
nd
(9.13)
The conclusion is that the true path that light follows between two points (i.e. ) is the one with the shortest optical path length. the one that stays parallel to s The index n may vary with position and therefore can be different for each of the incremental distances d .
Fermats principle is usually stated in terms of the time it takes light to travel between points. The travel time t depends not only on the path taken by the light but also on the velocity of the light v (r), which varies spatially with the refractive index: B B OP L |B d d A B t | A = = = (9.14) v (r ) c /n (r) c
A A
To nd the correct path for the light ray that leaves point A and crosses point B, we need only minimize the optical path length between the two points. Minimizing the optical path length is equivalent to minimizing the time of travel since it differs from the time of travel only by the constant c . The optical path length is not the actual distance that the light travels; it is proportional to the number of wavelengths that t into that distance (see (2.24)). Thus, as the wavelength shortens due to a higher index of refraction, the optical path length increases. The correct ray traveling from A to B does not necessarily follow a straight line but can follow a complicated curve according to how the index varies.
233
An imaging situation occurs when many paths from point A to point B have the same optical path length. An example of this occurs when a lens causes an image to form. In this case all rays leaving point A (on an object) and traveling through the system to point B (on the image) experience equal optical path lengths. This situation is depicted in Fig. 9.4. Note that while the rays traveling through the center of the lens have a shorter geometric path length, they travel through more material so that the optical path length is the same for all rays. To summarize Fermats principle, of the many rays that might emanate from a point A, the ray that crosses a second point B is the one that follows the shortest optical path length. If many rays tie for having the shortest optical path, we say that an image of point A forms at point B. It should be noted that Fermats principle, as we have written it, does not work for anisotropic media such as crystals where n depends on the direction of a ray as well as on its location (see P9.4). Example 9.2
Use Fermats principle to derive Snells law. Solution: Consider the many rays of light that leave point A seen in Fig. 9.5. Only one of the rays passes through point B. Within each medium we expect the light to travel in a straight line since the index is uniform. However, at the boundary we must allow for bending since the index changes. The optical path length between points A and B may be written OP L = n i x i2 + y i2 + n t
2 2 xt + yt
Figure 9.4 Rays of light leaving point A with the same optical path length to B.
(9.15)
B
We need to minimize this optical path length to nd the correct one according to Fermats principle. Since points A and B are xed, we may regard x i and x t as constants. The distances y i and y t are not constants although the combination y tot = y i + y t is constant. Thus, we may rewrite (9.15) as OP L y i = n i x i2 + y i2 + n t
2 xt + y tot y i 2
(9.16)
A
(9.17) Figure 9.5 Rays of light leaving point A; not all of them will traverse point B.
where everything on the right-hand side is constant except for y i . We now minimize the optical path length by taking the derivative and setting it equal to zero: d (OP L ) = ni d yi Notice that sin i = yi x i2 + y i2 yi x i2 + y i2 + nt y tot y i
2 xt + y tot y i 2
=0
(9.18)
and
sin t =
yt
2 2 xt + yt
(9.19)
234
When these are substituted into (9.18) we obtain n i sin i = n t sin t which is the familiar Snells law. (9.20)
Example 9.3
Use Fermats principle to derive the equation of curvature for a reective surface that causes all rays leaving one point to image to another. Do the calculation in two dimensions rather than in three.5 Solution: We adopt the convention that the origin is half way between the points, which are separated by a distance 2a , as shown in Fig. 9.6. If the points are to image to each other, Fermats principle requires that the total path length be a constant; call it b . By inspection of the gure, we that path (which reects once) from one point to the other is ( x + a )2 + y 2 + ( x a )2 + y 2 = b (9.21)
To get (9.21) into a more recognizable form, we isolate the rst square root and square both sides of the equation, which gives ( x + a )2 + y 2 = b 2 + ( x a )2 + y 2 2 b Figure 9.6 ( x a )2 + y 2
After squaring the two binomial terms, some nice cancelations occur, and we get 4ax b 2 = 2b which we square again to obtain 16a 2 x 2 4ab 2 x + b 4 = 4b 2 x 2 2ax + a 2 + y 2 After some cancellations and regrouping this becomes 16a 2 4b 2 x 2 4b 2 y 2 = 4a 2 b 2 b 4 Finally, we divide both sides by the term on the right to obtain the (hopefully) familiar form of an ellipse x2 y2 + 2 =1 (9.22) 2 b b 2 a 4 4 ( x a )2 + y 2
5 This conguration is used to direct ash lamp energy into a laser amplier rod. One point in Fig. 9.6 represents the end of an amplier rod while the other represents the end of a thin ash-lamp tube.
235
Here, the angle (in radians) represents the angle that a particular ray makes with respect to the optical axis. There is an important mathematical reason for this approximation. The sine is a nonlinear function, but at small angles it is approximately linear and can be represented by its argument. It is this linearity that is crucial to the process of forming images. The linearity also greatly simplies the formulation since it reduces the problem to linear algebra. Conveniently, we will be able to keep track of imaging effects with a 22 matrix formalism. Consider a ray propagating in the y z plane where the optical axis is in the z direction. Let us specify a ray at position z 1 by two coordinates: the displacement from the axis y 1 and the orientation angle 1 (see Fig. 9.7). If the index is uniform everywhere, the ray travels along a straight path. It is straightforward to predict the
236
coordinates of the same ray down stream, say at z 2 . First, since the ray continues in the same direction, we have 2 = 1 (9.25) By referring to Fig. 9.7 we can write y 2 in terms of y 1 and 1 : y 2 = y 1 + d tan 1 (9.26)
where d z 2 z 1 . Equation (9.26) is nonlinear in 1 . However, in the paraxial approximation (9.24) becomes linear, which after all is the point of the approximation. In this approximation the expression for y 2 simplies to y 2 = y 1 + 1 d (9.27)
Equations (9.25) and (9.27) describe a linear transformation which in matrix notation can be consolidated into the form ABCD matrix for propagation through a distance d y2 2 = 1 d 0 1 y1 1 (9.28)
Here, the vectors in this equation specify the essential information about the ray before and after traversing the distance d , and the matrix describes the effect of traversing the distance. This type of matrix is called an ABCD matrix;,6 sometimes physicists are not very inventive with names.
Example 9.4
Let the distance d be subdivided into two distances, a and b , such that d = a + b . Show that an application of the ABCD matrix for distance a followed by an application of the ABCD matrix for b renders same result as an application of the ABCD matrix for distance d . Solution: Individually, the effects of propagation through a and through b are y mid mid = 1 0 a 1 y1 1 and y2 2 = 1 0 b 1 y mid mid (9.29)
where the subscript mid refers to the ray in the middle position after traversing the distance a . If we combine the equations, we get y2 2 1 0 b 1 1 0 a 1 y1 1
(9.30)
which is in agreement with (9.28) since the ABCD matrix for the entire displacement is A B 1 b 1 a 1 a +b = = (9.31) C D 0 1 0 1 0 1
6P . W. Milonni and J. H. Eberly, Lasers, Sect. 14.2 (New York: Wiley, 1988).
237
since the ray has no chance to go anywhere. We adopt the widely used convention that, upon reection, the positive z direction is reoriented so that we consider the rays still to travel in the positive z sense. An easy way to remember this is that the positive z direction is always taken to be down stream of where the light is headed. Notice that in Fig. 9.8, the reected ray approaches the z -axis. In this case 2 is a negative angle (as opposed to 1 which is drawn as a positive angle) and is equal to 2 = (1 + 2i ) (9.33)
where i is the angle of incidence with respect to the normal to the spherical mirror surface. By the law of reection, the incident and reected ray both occur at an angle i referenced to the surface normal. The surface normal points towards the center of curvature of the mirror surface, which we assume is on the z -axis a distance R away. By convention, the radius of curvature R is a positive number if the mirror surface is concave and a negative number if the mirror surface is convex. Elimination of i from (9.33) in favor of 1 and y 1
By inspection of Fig. 9.8 we can write y1 = sin = R (9.34)
where we have applied the paraxial approximation (9.23). (The angles in Fig. 9.8 are exaggerated. In fact, when is small enough for (9.34) to hold, we may also neglect the small distance .) By inspection of the geometry, we also have = 1 + i and when this is combined with (9.34), we get i = y1 1 R (9.36) (9.35) Figure 9.8 A ray depicted in the act of reection from a spherical surface.
With this we are able to put (9.33) into a useful linear form: 2 = 2 y 1 + 1 R (9.37)
238
Equations (9.32) and (9.37) describe a linear transformation that can be concisely formulated as y2 1 0 y1 = (9.38) 2 2/R 1 1 The ABCD matrix in this transformation describes the act of reection from a concave mirror with radius of curvature R . The radius R is negative when the mirror is convex. The nal basic element that we shall consider is a spherical interface between two materials with indices n i and n t (see Fig. 9.9). This has an effect similar to that of the curved mirror, which changes the direction of a ray without altering its distance y 1 from the optical axis. Please note that here the radius of curvature is considered to be positive for a convex surface (opposite convention from that of the mirror). In this way, if the lower index is on the left, a positive radius R for either the interface or the mirror tends to deect rays towards the axis. Again, we are interested only in the act of transmission without any travel before or after the interface. As before, (9.32) applies (i.e. y 2 = y 1 ). At the interface, the rays obey Snells obeys, which in the paraxial approximations is written n i i = n t t (9.39) The angles i and t are referenced from the surface normal, as seen in Fig. 9.9. Substituting 1 , 2 and y 1 into Snells Law
Figure 9.9 A ray depicted in the act of transmission at a curved material interface.
By inspection of Fig. 9.9, we have i = 1 + and t = 2 + (9.41) where is the angle that the surface normal makes with the z -axis. As before (see (9.34)), within the paraxial approximation we may write = y 1 /R When this is used in (9.40) and (9.41), which are substituted into (9.39), Snells law becomes ni y 1 ni 2 = 1 + 1 (9.42) nt R nt (9.40)
The compact matrix form of (9.32) and (9.42) is written ABCD matrix for a curved interface y2 2 = 1 (n i /n t 1) /R 0 n i /n t y1 1 (9.43)
239
Example 9.5
Derive the ABCD matrix for a thin lens, where the thickness between the two lens surfaces is ignored. (See P 9.6 for the more general case of a thick lens.) Solution: A thin lens is depicted in Fig. 9.10. R 1 is the radius of curvature for the rst surface (which is positive if convex as drawn), and R 2 is the radius of curvature for the second surface (which is negative as drawn). For either surface, the radius of curvature is considered to be positive if the surface is convex from the perspective of rays that encounter it. We take the index outside of the lens to be unity while that of the lens material to be n . We apply the ABCD matrix (9.43) in sequence, once for entering the lens and once for exiting: A C B D = =
1 R2
Thick lens
1 1 1+ d n R1 1 1 n 1 2 n (1n ) R R + R d 1 2 1 R2 d n 1 1 1 d n R2
1 (n 1) 1
0 n
1 R1 1 R 2
1
1 R1 1 n
0 1
1 n
0 1
(9.44)
(n 1)
The matrix for the rst interface is written on the right, where it operates rst on an incoming ray vector. In this case, n i = 1 and n t = n . The matrix for the second surface is written on the left so that it operates afterwards. For the second surface, n i = n and n t = 1.
Notice the close similarity between the ABCD matrix for a thin lens (9.44) and the ABCD matrix for a curved mirror (9.38). The ABCD matrix for either the thin lens or the mirror can be written as A C B D = 1 1/ f 0 1 (9.45)
Figure 9.10 Thin lens.
where in the case of the thin lens the focal length is given by the lens makers formula 1 1 1 = (n 1) (focal length of thin lens) (9.46) f R1 R2
240
and in the case of a curved mirror, the focal length is f = R /2 (focal length for a curved mirror) (9.47)
Example 9.6
Derive the ABCD matrix for a window with thickness d and index n . Solution: We can again take advantage of the ABCD matrix for a curved interface (9.43), only in this problem we will let R 1 = and R 2 = to provide at surfaces. We take the index outside of the window to be unity and the index inside the window to be n . We use the ABCD matrix (9.43) twice, once for each interface, sandwiching matrix (9.31), which endows the window with thickness: A C B D = = 1 0 1 0 0 n d /n 1 1 0 d 1 1 0 0
1 n
(9.48)
(window)
As far as rays are concerned, a window is effectively shorter to traverse than free space.7 Fig. 9.11 illustrates why this is the case. The displacement of the exiting ray is not as great as it would have been without the window. The window impedes the rate at which the ray can move away from or toward the optical axis.
Example 9.7
2 1 Find ray 2 that results when 1 propagates through a distance a , reects from a mirror of radius R , and then propagates through a distance b . See Fig. 9.12.
Solution: The nal ray in terms of the initial one is computed as follows: y2 2 = = = 1 0 b 1 1 2/R 0 1 1 0 a 1 y1 1 y1 1 (9.49)
1 2b /R 2/R
a + b 2ab /R 1 2a /R
(1 2b /R ) y 1 + (a + b 2ab /R ) 1 (2/R ) y 1 + (1 2a /R ) 1
As always, the ordering of the matrices is important. The rst effect that the ray experiences is represented by the matrix on the right, which is in the position that rst operates on Figure 9.12 A ray that travels through a distance a , reects from a mirror, and then travels through a distance b .
y1 1
7 In contrast, the optical path length OPL is effectively longer than free space by the factor n.
241
We have derived our basic ABCD matrices for rays traveling in the y z plane, as suggested in Figs. 9.79.12. This may have given the impression that it is necessary to work within a plane that contains the optical axis (i.e. the z -axis in our case). However, within the paraxial approximation, the ABCD matrices are valid for rays that become displaced simultaneously in both the x and y dimensions during propagating along z . As we demonstrate below, the behavior of rays functions independently in the x and y dimensions. If desired, one can write a ray vector for each dimension, namely x and y . Moreover, the identical matrices, for example any in table 9.1, are used for either dimension. Figs. 9.79.12 therefore represent projections of rays onto the y z plane. To complete the story, one can imagine corresponding gures representing the projection of the rays onto the x z plane.
x y
In the paraxial approximation, we have cos = 1 2 /2. And since in this approximation we may also write = x 2 + y 2 R , (9.50) becomes x2 y2 + = 2R 2 R (9.51)
the phases of Venus, similar to the phases of the moon. He used these observations to argue in favor of the Copernican model of the solar system, but this conicted with the prevailing views of the Catholic Church at the time, and he was placed under house arrest and forbidden to publish of any of his works. While under house arrest, he wrote much on kinematics and other principles of physics and is considered to be the father of modern physics. Galileo attempted to measure the speed of light by observing an assistant uncover a lantern on a distant hill in response to a light signal. He concluded that light is really fast if not instantaneous. (Wikipedia)
In the paraxial approximation, we see that the curve of the mirror is parabolic, and therefore separable between the x and y dimensions. That is, the curvature in the x -dimension (i.e. /x = x /R ) is independent of y , and the curvature in the y -dimension (i.e. / y = y /R ) is independent of x . A similar argument can be made for a spherical interface between two media.
242
process is A C B D = 1 b/ f 1/ f a + b ab / f 1 a/ f (9.52)
where by (9.47) we have replaced 2/R with 1/ f . Because of the similarity between the behavior of a curved mirror and a thin lens, the above expression can also represent a ray traveling a distance a , traversing a thin lens with focal length f , and then traveling a distance b . The only difference is that, in the case the thin lens, f is given by lens makers formula (9.46). As is well known, it is possible to form an image with either a curved mirror or a lens. Suppose that the initial ray is one of many rays that leaves a particular point on an object positioned a = d o before the mirror (or lens). In order for an image to occur at d i = b , it is essential that all rays leaving the particular point on the object converge to a corresponding point on the image. That is, we want rays leaving the point y 1 on the object (which may take on a range of angles 1 ) all to converge to a single point y 2 at the image. In the following equation we need y 2 to be independent of 1 : y2 2 = A C B D y1 1 = Ay 1 + B 1 C y 1 + D 1 (9.53)
B =0
which is the familiar imaging formula (9.1). When the object is innitely far away (i.e. d o ), the image appears at d i f . This gives a physical interpretation to the focal length f , as we have been calling it. Please note that d o and d i can each be either positive (real as depicted in Fig. 9.13) or negative (virtual meaning a screen cannot be inserted to display the image). The magnication of the image is found by comparing the size of y 2 to y 1 . From (9.52)(9.55), the magnication is found to be M y2 di di = A = 1 = y1 f do (9.56)
The negative sign indicates that for positive distances d o and d i the image is inverted. In the above discussion, we have examined image formation by a thin lens or a curved mirror. Of course, images can also be formed by thick lenses or by more complex composite optical systems (e.g. a system of lenses and spaces). The ABCD matrices for the elements in a composite system are simply multiplied together (the rst element that rays encounter appearing on the right) to obtain an
243
overall ABCD matrix. The principles for image formation with an arbitrary ABCD matrix are the same as those for a thin lens or curved mirror. As before, consider propagation a distance d o from an object to the optical element followed by propagation a distance d i to an image. The ABCD matrix for the overall operation is 1 di 0 1 A C B D 1 do 0 1 = = A + d iC C A C B D d o A + B + d o d iC + d i D d oC + D
(9.57)
An image occurs according to (9.54) when B = 0, or d o A + B + d o d iC + d i D = 0, with magnication M = A + d iC (9.59) For a complex lens system, the matrix elements A , B , C and D can be complicated expressions. There is a convenient way to simplify the analysis, which is discussed in the next section. Example 9.8
Beginning students are often taught to draw ray diagrams such as the one in Fig. 9.14, which shows a real image formed by a thin lens. Several key rays aid in a graphic prediction of the location and size of the image. Use ABCD-matrix analysis to describe the effect of the lens on the three rays drawn.
(9.58)
object
A
B C
image
Figure 9.14 Formation of a real image by a thin lens. Solution: Ray A is parallel to the axis with height y 1 before traversing the lens. Just after the lens, ray A is described by y2 2 = 1 1/ f 0 1 y1 0 =
1 0 f 1
y1 y1/ f
y1 y1 / f = 0 y1 / f
244
Meanwhile, ray B traverses the lens just where it crosses the axis. The lens does nothing to this ray: y2 2 = 1 1/ f 0 1 0 y 1 /d o = 0 y 1 /d o
Ray B is un-deected. Finally, ray C, which goes through the point d = f before the lens, becomes parallel to the axis following the lens: y2 2 = 1 1/ f 0 1 M y 1 M y 1 / f = M y 1 0
Note that starting from the left focus, we have just before the lens 1 0 f 1 0 M y 1 / f = M y 1 M y 1 / f
Figure 9.15 A multi-element system represented as an ABCD matrix for which principal planes always exist.
(9.60) The nal matrix is that of a simple thin lens, and it takes the place of the composite system including the distances to the principal planes.
8 R. Guenther, Modern Optics, p. 186 (New York: Wiley, 1990). 9 The starting and ending refractive index must be the same.
245
Our task is to nd the values of p 1 and p 2 that make (9.60) true. We can straightaway make the denition f eff 1/C (9.61) We can also solve for p 1 and p 2 by setting the diagonal elements of the matrix to 1. Explicitly, we get 1D (9.62) p 1C + D = 1 p 1 = C and 1 A A + p 2C = 1 p 2 = (9.63) C It remains to be shown that the upper right element in (9.60) (i.e. p 1 A + B + p 1 p 2C + p 2 D ) automatically goes to zero for our choices of p 1 and p 2 . This may seem unlikely at rst, but watch what happens! When (9.62) and (9.63) are substituted into the upper right matrix element of (9.60) we get p 1 A + B + p 1 p 2C + p 2 D = 1D 1D 1 A 1 A A +B + C+ D C C C C 1 = [1 AD + BC ] C 1 A B = 1 C D C
(9.64)
This vanishes (as desired) if the determinant of the original ABCD matrix equals one. Fortunately, this is always the case as long as we begin and end in the same index of refraction: A B =1 (9.65) C D Notice that the determinants of all of the matrices in table 9.1 are one. Moreover, ABCD matrices constructed of these will also have determinants equal to one.10
246
(a)
(b)
(c)
(d)
Figure 9.16 (a) A ray bouncing between two parallel at mirrors. (b) A ray bouncing between two curved mirrors in an unstable conguration. (c) A ray bouncing between two curved mirrors in a stable conguration. (d) Stable cavity utilizing a lens and two at end mirrors.
As might be expected, the mirrors must be carefully aligned or successive reections might cause rays to walk continuously away from the optical axis, so that they eventually leave the cavity out the side. If a simple cavity is formed with two at mirrors that are perfectly aligned parallel to each other, one might suppose that the mirrors would provide ideal feedback. However, all rays except for those that are perfectly aligned to the mirror surface normals would eventually wander out of the side of the cavity as illustrated in Fig. 9.16a. Such a cavity is said to be unstable. We would like to do a better job of trapping the light in the cavity. To improve the situation, a cavity can be constructed with concave end mirrors to help conne the beams within the cavity. Even so, one must choose carefully the curvature of the mirrors and their separation L . If this is not done correctly, the curved mirrors can overcompensate for the tendency of the rays to wander out of the cavity and thus aggravate the problem. Such an unstable scenario is depicted in Fig. 9.16b. Figure 9.16c depicts a cavity made with curved mirrors where the separation L is chosen appropriately to make the cavity stable. Although a ray, as it makes successive bounces, can strike the end mirrors at a variety of points, the curvature of the mirrors keeps the trajectories contained within a narrow region so that they cannot escape out the sides of the cavity. There are many ways to make a stable laser cavity. For example, a stable cavity can be made using a lens between two at end mirrors as shown in Fig. 9.16d. Any combination of lenses (perhaps more than one) and curved mirrors can be used to create stable cavity congurations. Ring cavities can also be made to be stable where in no place do the rays retro-reect from a mirror but circulate through a series of elements like cars going around a racetrack. The ABCD matrix for a round trip in the cavity will be useful for this analysis.
Example 9.9
Find the round-trip ABCD matrix for the cavities shown in Figs. 9.16c and 9.16d. Solution: The round-trip ABCD matrix for the cavity shown in Fig. 9.16c is A C B D = 1 0 L 1 1 2/R 2 0 1 1 0 L 1 1 2/R 1 0 1 (9.66)
where we have begun the round trip just after a reection from the rst mirror. The round-trip ABCD matrix for the cavity shown in Fig. 9.16d is A C B D = 1 0 2L 1 1 1 1/ f 0 1 1 0 2L 2 1 1 1/ f 0 1 (9.67)
where we have begun the round trip just after a transmission through the lens moving to the right. It is somewhat arbitrary where a round trip begins. The multiplication on the above matrices will need to be carried out to do problems P9.15 and P9.16.
247
To determine whether a given conguration of a cavity is stable, we need to know what a ray does after making many round trips in the cavity. To nd the effect of propagation through many round trips, we multiply the round-trip ABCD matrix together N times, where N is the number of round trips that we wish to consider. We can then examine what happens to an arbitrary ray after making N round trips in the cavity as follows: y N +1 N +1 = A C B D
N
y1 1
(9.68)
At this point you might be concerned that taking an ABCD matrix to the N th power can be a lot of work. (It is already a signicant work just to compute the ABCD matrix for a single round trip.) In addition, we are interested in letting N be very large, perhaps even innity. You can relax because we have a neat trick to accomplish this daunting task. By Sylvesters theorem in appendix 0.3, we have A C where B D
N
1 sin
(9.69)
1 (9.70) (A + D) . 2 This is valid as long as the determinant of the ABCD matrix is one. As noted earlier (see (9.65)), we are in luck! The determinant is one any time a ray begins and stops in the same refractive index, which by denition is guaranteed for any round trip. We therefore can employ Sylvesters theorem for any N that we might choose, including very large integers. We would like the elements of (9.69) to remain nite as N becomes very large. If this is the case, then we know that a ray remains trapped within the cavity and stays reasonably close to the optical axis. Since N only appears within the argument of a sine function, which is always bounded between 1 and 1 for real arguments, it might seem that the elements of (9.69) always remain nite as N approaches innity. However, it turns out that can become imaginary depending on the outcome of (9.70), in which case the sine becomes a hyperbolic sine, which can blow up as N becomes large. In the end, the condition for cavity stability is that a real must exist for (9.70), or in other words we need cos = 1 < 1 (A + D) < 1 2 (condition for a stable cavity) (9.71)
It is left as an exercise to apply this condition to (9.66) and (9.67) to nd the necessary relationships between the various element curvatures and spacing in order to achieve cavity stability.
248
(b)
Figure 9.17 (a) Paraxial theory predicts that the light imaged from a point source will converge to a point (i.e. have spherical wave fronts coming to the image point). (b) The image of a point source made by a real lens with aberrations is an extended and blurred patch of light and the converging wavefronts are only quasispherical.
The paraxial approximation places serious limitations on the performance of optical systems (see (9.23) and (9.24)). To stay within the approximation, all rays traveling in the system should travel very close to the optic axis with very shallow angles with respect to the optical axis. To the extent that this is not the case, the collection of rays associated with a single point on an object may not converge to a single point on the associated image. The resulting distortion or blurring of the image is known as aberration. Common experience with photographic and video equipment suggests that it is possible to image scenes that have a relatively wide angular extent (many tens of degrees), in apparent serious violation of the paraxial approximation. The paraxial approximation is indeed violated in these devices, so they must be designed using more complicated analysis techniques than those we have learned in this chapter. The most common approach is to use a computationally intensive procedure called ray tracing in which sin and tan are rendered exactly. The nonlinearity of these functions precludes the possibility of obtaining analytic solutions describing the imaging performance of such optical systems. The typical procedure is to start with a collection of rays from a test point such as shown in Fig. 9.18. Each ray is individually traced through the system using the exact representation of geometric surfaces as well as the exact representation of Snells law. On close analysis, the rays typically do not converge to a distinct imaging point. Rather, the rays can be blurred out over a range of points where the image is supposed to occur. Depending on the angular distribution of the rays as well as on the elements in the setup, the spread of rays around the image point can be large or small. The engineer who designs the system must determine whether the amount of aberration is acceptable, given the various constraints of the device. To minimize aberrations below typical tolerance levels, several lenses can be used together. If properly chosen, the lenses (some positive, some negative) separated by specic distances, can result in remarkably low aberration levels over certain ranges of operation for the device. Ray tracing is best done with commercial software designed for this purpose (e.g. Zemax or other professional products). Such software packages are able to develop and optimize designs for specic applications. A nice feature is that the user can specify that the design should employ only standard optical components available from known optics companies. In any case, it is typical to specify that all lenses in the system should have spherical surfaces since these are much less expensive to manufacture. We mention briey a few types of aberrations that you may encounter. Multiple aberrations can often be observed in a single lens. Chromatic aberration arises from the fact that the index of refraction for glass varies with the wavelength of light. Since the focal length of a lens depends on the index of refraction (see, for example, Eq. (9.46)), the focal length of a lens varies with the wavelength of light. Chromatic aberration can be compensated for by using a pair of lenses made from two types of glass as shown in Fig. 9.19
249
(the pair is usually cemented together to form a doublet lens). The lens with the shortest focal length is made of the glass whose index has the lesser dependence on wavelength. By properly choosing the prescription of the two lenses, you can exactly compensate for chromatic aberration at two wavelengths and do a good job for a wide range of others. Achromatic doublets can also be designed to minimize spherical aberration (see below), so they are often a good choice when you need a high quality lens. Monochromatic aberrations arise from the shape of the lens rather than the variation of n with wavelength. Before the advent computers facilitated the widespread use of ray tracing, these aberrations had to be analyzed primarily with analytic techniques. The analytic results derived previously in this chapter were based on rst order approximations (e.g. sin ). This analysis predicts that a lens can image a point source to an exact image point, which predicts spherically converging wavefronts at the image point as shown in Fig. 9.17(a). You can increase the accuracy of the theory for non-paraxial rays by retaining secondorder correction terms in the analysis. With these second-order terms included, the wave fronts converging towards an image point are mostly spherical, but have second-order aberration terms added in (shown conceptually in Fig. 9.17(b)). There are ve aberration terms in this second-order analysis, and these represent a convenient basis for discussing aberration. The rst aberration term is known as spherical aberration. This type of aberration results from the fact that rays traveling through a spherical lens at large radii experience a different focal length than those traveling near the axis. For a converging lens, this causes wide-radius rays to focus before the near-axis rays as shown in Fig. 9.20. This problem can be helped by orienting lenses so that the face with the least curvature is pointed towards the side where the light rays have the largest angle. This procedure splits the bending of rays more evenly between the front and back surface of the lens. As mentioned above, you can also cement two lenses made from different types of glass together so that spherical aberrations from one lens are corrected by the other. The aberration term referred to as astigmatism occurs when an off-axis object point is imaged to an off-axis image point. In this case a spherical lens has a different focal length in the horizontal and vertical dimensions. For a focusing lens this causes the two dimensions to focus at different distances, producing a vertical line at one image plane and a horizontal line at another. A lens can also be inherently astigmatic even when viewed on axis if it is football shaped rather than spherical. In this case, the astigmatic aberration can be corrected by inserting a cylindrical lens at the correct orientation (this is a common correction needed in eyeglasses). A third aberration term is referred to as coma. This is observed when off-axis points are imaged and produces a comet shaped tail with its head at the point predicted by paraxial theory. (The term coma refers to the atmosphere of a comet, which is how the aberration got its name.) This aberration is distinct from astigmatism, which is also observed for off-axis points, since coma is observed
Figure 9.19 Chromatic aberration causes lenses to have different focal lengths for different wavelengths. It can be corrected using an achromatic doublet lens.
250
c b a
c b
a Image on screen
Undistorted
Figure 9.21 Illustration of coma. Rays traveling through the center of the lens are imaged to point a as predicted by paraxial theory. Rays that travel through the lens at radius b in the plane of the gure are imaged to point b . Rays that travel through the lens at radius b , but outside the plane of the gure are imaged to other points on the circle (in the image plane) containing point b . Rays at that travel through the lens at other radii on the lens (e.g. c ) also form circles in the image plane with radius proportional to 2 with the center offset from point a a distance proportional to 2 . When light from each of these circles combines on the screen it produces an imaged point with a comet tail.
Barrel Distortion
Pincushion Distortion
Figure 9.22 Distortion occurs when magnication is not constant across an extended image.
even when all of the rays are in one plane (see Fig. 9.21). You have probably seen coma if youve ever played with a magnifying glass in the sunjust tilt the lens slightly and you see a comet-like image rather than a point. The curvature of the eld aberration term arises from the fact that spherical lenses image spherical surfaces to another spherical surface, rather than imaging a plane to a plane. This is not so bad for your eyeball, which has a curved screen, but for things like cameras and movie projectors we would like to image to a at screen. When a at screen is used and the curvature of the eld aberration is present, the image will be focus well near the center, but become progressively out of focus as you move to the edge of the screen (i.e. the at screen is farther from the curved image surface as you move from the center). The nal aberration term is referred to as distortion. This aberration occurs when the magnication of a lens depends on the distance from the center of the screen. If magnication decreases as the distance from the center increases, then barrel distortion is observed. When magnication increases with distance, pincushion distortion is observed (see Fig. 9.22). All lenses will exhibit some combination of the aberrations listed above (i.e. chromatic aberration plus the ve second-order aberration terms). In addition to the ve named monochromatic aberrations, there are many other higher order aberrations that also have to be considered. Aberrations can be corrected to a high degree with multiple-element systems (designed using ray-tracing techniques) composed of lenses and irises to eliminate off-axis light. For example, a camera lens with a focal length of 50 mm, one of the simplest lenses in photography, is typically composed of about six individual elements. However, optical systems never completely eliminate all aberration, so designing a system always involves some degree of compromise in choosing which aberrations to minimize and which ones you can live with.
Exercises
251
Exercises
Exercises for 9.1 The Eikonal Equation P9.1 Consider the index described in Example 9.1. The solution given in the example corresponds to rays that asymptotically approach y = 0. A more general solution is given by 1+ y R = n 0 x y 2 /h 2 1 + > 0 and y 2 /h 2 > 0
This corresponds to rays that either hit the ground or return toward the sky without reaching the ground, depending on the sign of . (a) Verify that R satises the eikonal equation and determine the function R x , y . HINT: d 2 =
2
2 2 ln +
x x 0 h 1+
( > 0).
x x 0 h 1+
when
> 0 and is given by y = h || sinh when < 0. Consider only the region y > 0 (i.e. above ground). Notice that these solutions can make rays that travel either to the right or to the left. d d HINT: cosh2 sinh2 = 1 d cosh = sinh d sinh = cosh . (c) Make a sketch of these two solution classes in the case of = 4. P9.2 Prove that under the approximation of very short wavelength, the . Poynting vector is directed along R (r) or s
Solution: (partial) First, from Faradays law (1.36) we have B(r, t ) = i E0 (r)e i (kvac R (r)t )
Applying the identity a = ( a) + a to this equation, we obtain: B(r, t ) = i i (kvac R (r)t ) e [ E0 (r)] + i k vac e i (kvac R (r)t ) [R (r) E0 (r)] i vac i [kvac R (r)t ] 1 e = [ E0 (r)] e i [kvac R (r)t ] [R (r) E0 (r)] 2 c c
The rst term vanishes in the limit of very short wavelength, and we have: 1 B(r, t ) [R (r)] E0 (r) e i [kvac R (r)t ] . c Next, from Gausss law (1.34) and the constitutive relation (2.16) we have 1 + (r) E0 (r)e i (kvac R (r)t ) = 0 (9.72)
252
Applying the identity (a) = a + a to this expression yields: e i (kvac R (r)t ) 1 + (r) E0 (r) + i k vac e i (kvac R (r)t ) 1 + (r) [R (r) E0 (r)] = 0 Canceling the common exponential term, using k vac = 2/vac , and some algebra then gives i vac 1 + (r) E0 (r) 2 1 + (r) + R (r) E0 (r) = 0
In the limit of very short wavelength, this becomes R (r) E0 (r) 0 Finally, compute the time average of the Poynting vector S= 1 Re {E(r, t )} Re {B(r, t )} 0 1 E (r, t ) + E (r, t ) B(r, t ) + B (r, t ) = 40 (9.73)
You will need to employ expressions (9.72) and (9.73), as well as the BAC-CAB rule (see P0.3).
Use Fermats Principle to derive the law of reection (3.6) for a reective surface. HINT: Do not consider light that goes directly from A to B; require a single bounce.
P9.4
Show that Fermats Principle fails to give the correct path for an extraordinary ray entering a uniaxial crystal whose optic axis is perpendicular to the surface. HINT: With the index given by (5.29), show that Fermats principle leads to an answer that neither agrees with the direction of the k-vector (5.32) nor with the direction of the Poynting vector (5.40).
Figure 9.23
Exercises for 9.4 Reection and Refraction at Curved Surfaces P9.5 Derive the ABCD matrix that takes a ray on a round trip through a simple laser cavity consisting of a at mirror and a concave mirror of radius R separated by a distance L . HINT: Start at the at mirror. Use the matrix in (9.28) to travel a distance L . Use the matrix in (9.38) to represent reection from the curved mirror. Then use the matrix in (9.28) to return to the at mirror. The matrix for reection from the at mirror is the identity matrix (i.e. R at ).
Exercises
253
P9.6
Derive the ABCD matrix for a thick lens made of material n 2 surrounded by a liquid of index n 1 . Let the lens have curvatures R 1 and R 2 and thickness d .
Answer: A C B D = n2 1 1
n n1 1 1 n2 n1 n2 1 1 d + R1 R2 R1 R2 2 n2 n1 d 1+ R n
d n1
d 1 R
2
n1 n2 1
Exercises for 9.6 Image Formation P9.7 (a) Show that the ABCD matrix for a thick lens (see P9.6) reduces to that of a thin lens (9.45) when the thickness goes to zero. Take the index outside of the lens to be n 1 = 1. (b) Find the ABCD matrix for a thick window (thickness d ). Take the index outside of the window to be n 1 = 1. HINT: A window is a thick lens with innite radii of curvature. P9.8 An object is placed in front of a concave mirror. Find the location of the image d i and magnication M when d o = R , d o = R /2, d o = R /4, and d o = R /2 (virtual object). Make a diagram for each situation, depicting rays traveling from a single off-axis point on the object to a corresponding point on the image. You may want to emphasize especially the ray that initially travels parallel to the axis and the ray that initially travels in a direction intersecting the axis at the focal point R /2. Perform an analysis similar to example 9.8 for the virtual image formed by the positive lens in Fig. 9.24. Perform an analysis similar to example 9.8 for the virtual image formed by the negative lens in Fig. 9.25.
C A image object B
C object image B
P9.9
P9.10
Figure 9.25 Formation of a virtual image by a thin lens with negative focal length.
Exercises for 9.7 Principal Planes for Complex Optical Systems P9.11 A complicated lens element is represented by an ABCD matrix. An object placed a distance d 1 before the unknown element causes an image to appear a distance d 2 after the unknown element. Suppose that when d 1 = , we nd that d 2 = 2 . Also, suppose that when d 1 = 2 , we nd that d 2 = 3 /2 with magnication 1/2. What is the ABCD matrix for the unknown element? HINT: Use the conditions for an image (9.58) and (9.59). If the index of refraction is the same before and after, then (9.65) applies. HINT: First nd linear expressions for A , B , and C in terms of D . Then put the results into (9.65).
unknown element
Figure 9.26
254
Principal Plane Principal Plane
P9.12
(a) Consider a lens with thickness d = 5 cm, R 1 = 5 cm, R 2 = 10 cm, n = 1.5. Compute the ABCD matrix of the lens. HINT: See P9.6. (b) Where are the principal planes located and what is the effective focal length f eff for this system?
Figure 9.27
L9.13
Deduce the positions of the principal planes and the effective focal length of a compound lens system. Reference the positions of the principal planes to the outside ends of the metal hardware that encloses the lens assembly. (video) HINT: Obtain three sets of distances to the object and image planes and place the data into (9.58) to create three distinct equations for the unknowns A, B, C, and D. Find A, B, and C in terms of D and place the results into (9.65) to obtain the values for A, B, C, and D. The effective focal length and principal planes can then be found through (9.61) (9.63).
Figure 9.28
P9.14
Use a computer program to calculate the ABCD matrix for the compound system shown in Fig. 9.29, known as the Tessar lens. The details of this lens are as follows (all distances are in the same units, and only the magnitude of curvatures are givenyou decide the sign): Convex-convex lens 1 (thickness 0.357, R 1 = 1.628, R 2 = 27.57, n = 1.6116) is separated by 0.189 from concave-concave lens 2 (thickness 0.081, R 1 = 3.457, R 2 = 1.582, n = 1.6053), which is separated by 0.325 from plano-concave lens 3 (thickness 0.217, R 1 = , R 2 = 1.920, n = 1.5123), which is directly followed by convex-convex lens 4 (thickness 0.396, R 1 = 1.920, R 2 = 2.400, n = 1.6116). HINT: You can reduce the number of matrices you need to multiply by using the thick lens matrix.
3 4
Figure 9.29
Exercises for 9.8 Stability of Laser Cavities P9.15 (a) Show that the cavity depicted in Fig. 9.16c is stable if 0 < 1 L R1 1 L <1 R2
(b) The two concave mirrors have radii R 1 = 60 cm and R 2 = 100 cm. Over what range of mirror separation L is it possible to form a stable laser cavity? HINT: There are two different stable ranges with an unstable range between them. P9.16 Find the stable ranges for L 1 = L 2 = L for the laser cavity depicted in Fig. 9.16d with focal length f = 50 cm.
Exercises
255
L9.17
Experimentally determine the stability range of a HeNe laser with adjustable end mirrors. Check that this agrees reasonably well with theory. Can you think of reasons for any discrepancy? (video)
Figure 9.30
Chapter 10
Diffraction
In the 1600s, Christian Huygens developed a wave description for light. Unfortunately, his ideas were largely overlooked at the time because Sir Isaac Newton promoted a competing theory. Newton proposed that light should be thought of as many tiny bullets, or corpuscles, as he called them. Newtons ideas prevailed for more than a century, perhaps because he was right on so many other things, until 1807 when Thomas Young performed his famous two-slit experiment, conclusively demonstrating the wave nature of light. Even then, Youngs conclusions were accepted only gradually by others, a notable exception being a young Frenchman named Augustin Fresnel. The two formed a close friendship through correspondence, and it was Fresnel that followed up on Youngs conclusions and dedicated his life to a study of light. Fresnels skill as a mathematician allowed him to transform physical intuition into powerful and concise ideas. Perhaps Fresnels greatest accomplishment was the adaptation of Huygens principle of wavelet superposition into a mathematical formula. Ironically, he used Newtons calculus to achieve this. Huygens principle asserts that a wave front can be thought of as many wavelets, which propagate and interfere to form new wave fronts. This is illustrated in Fig. 10.1. The phenomenon of diffraction is then understood as the spilling of wavelets around obstructions in the path of light. After formulating Huygens principle as a diffraction integral, Fresnel made an approximation to his own formula, called the Fresnel approximation, for the sake of making the integration easier to perform. As far as approximations go, the Fresnel approximation is surprisingly accurate in describing the light eld in the region down stream from an aperture. The diffraction pattern can evolve in complicated ways as the distance from an aperture increases. At distances far down stream from an aperture, the diffraction pattern acquires a nal form that no longer evolves, other than to grow in proportion to distance. This far-eld limit is often of interest, and it turns out that the Fresnel diffraction formula can be simplied further in this case. The far-eld limit of the Fresnel diffraction formula is called the Fraunhofer approximation. From the modern perspective, Fresnels diffraction formula needs justica257
258
Chapter 10 Diffraction
tion starting from Maxwells equation. The diffraction formula is based on scalar diffraction theory, which ignores polarization effects. In some situations, ignoring polarization is benign, but in other situations ignoring polarization effects produces signicant errors. These issues as well as the approximations leading to scalar diffraction theory are discussed in section 10.2.
e i kR dx dy R
(10.1)
where R=
Figure 10.2
( x x )2 + ( y y )2 + z 2
(10.2)
is the radius of each wavelet as it individually intersects the point (x , y , z ). The factor i / in front of the integral in (10.1) ensures the right phase and eld strength (not to mention units). Justication for this factor is given in section 10.3 and in appendix 10.A. To summarize, (10.1) tells us how to compute the eld
1 For simplicity, we use the term spherical wave in this book to refer to waves of the type imagined by Huygens (i.e. of the form e i kR /R ). There is a different family of waves based on
spherical harmonics that are also sometimes referred to as spherical waves. These waves have angular as well as radial dependence, and they are solutions to Maxwells equations. See J. D. Jackson, Classical Electrodynamics, 3rd ed., pp. 429432 (New York: John Wiley, 1999).
259
down stream given knowledge of the eld in an aperture. The eld at each point (x , y ) in the aperture, which may vary with strength and phase, is treated as the source for a spherical wave. The integral in (10.1) sums the contributions for all of these wavelets.
Example 10.1
Find the on-axis2 (i.e. x , y = 0) intensity following a circular aperture of diameter illuminated by a uniform plane wave. Solution: The diffraction integral (10.1) takes the form E (0, 0, z ) = i E x , y ,0
aperture
eik
x 2 + y 2 +z 2
x 2 + y 2 + z2
The circular hole encourages a change to cylindrical coordinates: x = cos and y = sin ; d x d y d d . In this case, the limits of integration dene of the geometry of the aperture, and the integration is accomplished as follows: i E0 E (0, 0, z ) =
2 /2
d
0 0
eik
2 +z 2
2 + z2
2 +z 2
/2
= E 0 e i k
0
( /2)2 +z 2
e i kz
( /2)2 +z 2
e i kz
e i k
( /2)2 +z 2
e i kz
( /2)2 + z 2 kz (10.3)
When an aperture has a complicated shape, it may be convenient to break up the diffraction integral (10.1) into several pieces. You are probably already used to doing this sort of piecewise approach to integration in other settings. It seems hardly worth giving a name to this technique, but it is called Babinets principle; perhaps in Babinets day people were not as comfortable with calculus. As an example of how to use Babinets principle, suppose that we have an aperture that consists of a circular obstruction within a square opening as depicted in Fig. 10.4. Thus, the light transmits through the region between the circle and the square. One can evaluate the overall diffraction pattern by rst evaluating the diffraction integral for the entire square (ignoring the circular block) and then subtracting the diffraction integral for a circular opening having the shape of the
2 An analytical solution is not possible off axis.
Figure 10.4 Aperture comprised of the region between a circle and a square.
260
Chapter 10 Diffraction
Mask Block
Figure 10.5 A block in a plane wave giving rise to diffraction in the geometric shadow.
block. This removes the unwanted part of the previous integration and yields the overall result. It is important to add and subtract the integrals (i.e. elds), not their squares (i.e. intensity). As trivial as Babinets principle may seem to you, it may not be obvious at rst that Babinets principle also applies to an innitely wide plane wave that is interrupted by nite obstructions. In this case, one computes the diffraction of the blocked portions of the eld as though these portions were openings in a mask. This result is then subtracted from the plane wave (no integration needed for the plane), as depicted in Fig. 10.5. When Fresnel rst presented his diffraction formula to the French Academy of Sciences, a certain judge of scientic papers named Simon Poisson noticed that Fresnels formula predicted that there should be light in the center of the geometric shadow behind a circular obstruction. This seemed so absurd to Poisson that he initially disbelieved the theory, until the spot was shortly thereafter experimentally conrmed, much to Poissons chagrin. Needless to say, Fresnels paper was then awarded rst prize, and this spot appearing behind circular blocks has since been known as Poissons spot. Example 10.2
Find the on-axis (i.e. x , y = 0) intensity behind a circular block of diameter in a uniform plane wave. placed
Solution: From Example 10.1, the on-axis eld behind a circular aperture is E 0 e i kz e i k ( /2) +z . Babinets principle says to subtract this result from a plane wave to obtain the eld behind the circular block. The situation is depicted in Fig. 10.5 (side view). The on-axis eld is then E (0, 0, z ) = E 0 e i kz E 0 e i kz e i k The on axis intensity becomes I (0, 0, z ) E (0, 0, z ) E (0, 0, z ) = |E 0 |2 e i k
( /2)2 +z 2 i k ( /2)2 +z 2
2 2
= E0e i k
( /2)2 +z 2
( /2)2 +z 2
= |E 0 |2
This result says that, in the exact center of the shadow behind a circular obstruction, the intensity is the same as the illuminating plane wave for all distance z . A spot of light in the center forms right away; no wonder Poisson was astonished!
261
where k n /c is the magnitude of the usual wave vector (see also (9.2)). Equation (10.4) is called the Helmholtz equation. Again, it is merely the wave equation written for the case of a single frequency, where the trivial time dependence has been removed. To obtain the full wave solution, just append the factor e i t to the solution of the Helmholtz equation E (r). At this point we take an egregious step: We ignore the vectorial nature of E(r) and write (10.4) using only the magnitude E (r). When using scalar diffraction theory, we must keep in mind that it is based on this serious step. Under the scalar approximation, the vector Helmholtz equation (10.4) becomes the scalar Helmholtz equation: 2 E (r) + k 2 E (r) = 0 (10.5) This equation of course is consistent with (10.4) in the case of a plane wave. However, we are interested in spherical waves of the form E (r ) = E 0 r 0 e i kr /r . It turns out that such spherical waves are exact solutions to the scalar Helmholtz equation (10.5). The proof is left as an exercise (see P10.3). Nevertheless, spherical waves of this form only approximately satisfy the vector Helmholtz equation (10.4). We can get away with this sleight of hand if the radius r is large compared to a wavelength (i.e., kr 1) and if we restrict r to a narrow range perpendicular to the polarization. Signicance of the Scalar Wave Approximation
The solution of the scalar Helmholtz equation is not completely unassociated with the solution to the vector Helmholtz equation. In fact, if E scalar (r) obeys the scalar Helmholtz equation (10.5), then E (r) = r E scalar (r) obeys the vector Helmholtz equation (10.4). Consider a spherical wave, which is a solution to the scalar Helmholtz equation: E scalar (r) = E 0 r 0 e i kr /r (10.7) (10.6)
Remarkably, when this expression is placed into (10.6) the result is zero. Although zero is in fact a solution to the vector Helmholtz equation, it is not very interesting. A more interesting solution to the scalar Helmholtz equation is i e i kr E scalar (r) = r 0 E 0 1 cos kr r (10.8)
which is one of an innite number of unique spherical solutions that exist. Notice that in the limit of large r , this expression looks similar to (10.7), aside from the factor cos . The vector form of this eld according to (10.6) is r0E0 1 E (r) = i e i kr sin kr r (10.9)
This eld looks approximately like the scalar spherical wave solution (10.7) in the limit of large r if the angle is chosen to lie near = /2 (spherical coordinates).
262
Chapter 10 Diffraction
Since our use of the scalar Helmholtz equation is in connection with this spherical wave under these conditions, the results are close to those obtained from the vector Helmholtz equation.
Fresnel developed his diffraction formula (10.1) a half century before Maxwell assembled the equations of electromagnetic theory. In 1887, Gustav Kirchhoff demonstrated that Fresnels diffraction formula satises the scalar Helmholtz equation. In doing this he clearly showed the approximations implicit in the theory, and made a slight revision to the formula: E x , y, z =
Figure 10.6
E x ,y ,z = 0
aperture
) e i kR 1 + cos (R, z dx dy R 2
(10.10)
The factor in square brackets, Kirchhoffs revision, is known as the obliquity factor. . Notice that this ) indicates the cosine of the angle between R and z Here, cos(R, z factor is approximately equal to one when the point (x , y , z ) is chosen to be in the forward direction; we usually study diffraction under this circumstance. On the other hand, the obliquity factor equals zero for elds traveling in the reverse direction). This xes a problem with Fresnels version of direction (i.e. in the z the formula (10.1) based on Huygens wavelets, which suggested that light could as easily diffract in the reverse direction as in the forward direction In honor of Kirchhoffs work, (10.10) is referred to as the Fresnel-Kirchhoff diffraction formula. The details of Kirchhoffs more rigorous derivation, including how the factor i / naturally arises, are given in Appendix 10.A. Since the FresnelKirchhoff formula can be understood as a superposition of spherical waves, it is not surprising that it satises the scalar Helmholtz equation (10.5).
3 J. W. Goodman, Introduction to Fourier Optics, Sect. 4-1 (New York: McGraw-Hill, 1968).
263
The above approximation is wholly inappropriate in the exponent of (10.10) since small changes in R can result in dramatic variations in the periodic function e i kR . To approximate R in the exponent, we must proceed with caution. To this end we expand (10.2) under the assumption z 2 (x x )2 + ( y y )2 . Again, this is consistent with the idea of restricting ourselves to relatively small angles. The expansion of (10.2) is written as ( x x )2 + y y z2
2
+ 2z 2 (exponent; Fresnel approximation) (10.12) Substitution of (10.11) and (10.12) into the Fresnel diffraction formula (10.1) yields ie E x , y, z =
k x2+y 2) i kz i 2 z(
R=z
1+
x x = z 1+
+ yy
E x , y , 0 e i 2z ( x
aperture
+y
xx + y y ) ) e i k z( dx dy
(Fresnel approximation) (10.13) This is Fresnels approximation to his diffraction integral formula. It may look a bit messier than before, but in terms of being able to make progress on integration we are better off than previously. Notice that the integral can be interpreted as a k 2 2 two-dimensional Fourier transform on E x , y , 0 e i 2z (x + y ) . Example 10.3
Compute the Fresnel diffraction eld following a rectangular aperture (dimensions x by y ) illuminated by a uniform plane wave. Solution: According to (10.13), the eld down stream is e i kz i k (x 2 + y 2 ) e 2z E x , y , z = i E 0 z
x /2 y /2
k i2 zx 2
dx e
i kx z x y /2
d y e i 2z y e i
ky z
x /2
Unfortunately, the integration in the preceding example must be performed numerically. This is often the case for diffraction integrals in the Fresnel approximation. Figure 10.7 shows the result of such an integration for a rectangular aperture with a height twice its width. Paraxial Wave Equation
If we assume that the light coming through the aperture is highly directional, such that it propagates mainly in the z -direction, we are motivated to write the eld (x , y , z )e i kz . Upon substitution of this into the scalar Helmholtz as E (x , y , z ) = E equation (10.5), we arrive at 2 E 2 E 2 E E + + 2 i k + =0 x 2 y 2 z z 2 (10.14) Figure 10.7 Field amplitude following a rectangular aperture computed in the Fresnel approximation.
264
Chapter 10 Diffraction
E E | At this point we make the paraxial wave approximation,4 which is |2k |. z | z 2 That is, we assume that the amplitude of the eld varies slowly in the z -direction such that the wave looks much like a plane wave. We permit the amplitude to change as the wave propagates in the z -direction as long as it does so on a scale much longer than a wavelength. This leads to the paraxial wave equation:
2 2 (x , y , z ) + 2 + 2i k E =0 2 x y z
It turns out that the Fresnel approximation (10.13) is an exact solution to the paraxial wave equation. As demonstrated in problem P10.5, (10.15) is satised by i (x , y , z ) E = z
2 2 k (x , y , 0)e i 2z (x x ) +( y y ) d x d y E
(10.16)
4P . W. Milonni and J. H. Eberly, Laser, Sect. 14.4 (New York: Wiley, 1988). 5 J. W. Goodman, Introduction to Fourier Optics, p. 61 (New York: McGraw-Hill, 1968).
265
By removing the factor (10.17) from (10.13), we obtain the Fraunhofer diffraction formula: ie E x , y, z =
k x2+y 2) i kz i 2 z(
E x , y , 0 e i z (xx + y y ) d x d y
aperture
(10.19)
(Fraunhofer approximation)
k 2 2 Obviously, the removal of e i 2z (x + y ) from the integrand improves our chances of being able to perform the integration. Notice that the integral can now be interpreted as a two-dimensional Fourier transform on the aperture eld E x , y , 0 . Once we are in the Fraunhofer regime, a change in z is not very interesting since it appears in the combination x /z or y /z inside the integral. At a larger distance z , the same diffraction pattern is obtained with a proportionately larger value of x or y . The Fraunhofer diffraction pattern thus preserves itself indenitely as the eld propagates. It grows in size as the distance z increases, but the angular size dened by x /z or y /z remains the same.
Example 10.4
Compute the Fraunhofer diffraction pattern following a rectangular aperture (dimensions x by y ) illuminated by a uniform plane wave. Solution: According to (10.19), the eld down stream is e i kz i k (x 2 + y 2 ) E x , y , z = i E 0 e 2z z
x /2 y /2 i kx z x y /2
dx e
d y e i
ky z
x /2
It is left as an exercise (see P10.8) to perform the integration and compute the intensity. The result turns out to be I x , y, z = I 0 x 2 y 2 x y sinc2 x sinc2 y 2 2 z z z
0
(10.20)
Figure 10.8 Fraunhofer diffraction pattern (eld amplitude) generated by a uniformly illuminated rectangular aperture with a height twice the width.
where x 2 + y 2 . Under cylindrical symmetry, the two-dimensional integration over x and y in (10.13) or (10.19) can be reduced to a single-dimensional integral over a cylindrical coordinate . With the coordinate transformation x cos y sin x cos y sin (10.22)
266
Chapter 10 Diffraction
z = 25/k
d
0
d E , 0 ei
k 2 2z
e i z (
aperture
(10.23) Notice that in the exponent of (10.23) we can write cos cos + sin sin = cos (10.24)
500/k z = 75/k
d E ,0 e
k 2 2z
d e i
0
k z
cos( )
(10.25)
aperture
We are able to perform the integration over with the help of the formula (0.57):
2
500/k
0
e i
k z
cos( )
d = 2 J 0
k z
(10.26)
z = 200/k
d E , 0 ei
k 2 2z
J0
aperture
k z
k 2
The integral in (10.27) is called a Hankel transform on E , 0 e i 2z . In the case of the Fraunhofer approximation, the diffraction integral becomes a Hankel transform on just the eld E , z = 0 since exp i Under cylindrical symmetry, the Fraunhofer approximation is 2i e i kz e i E , z = z
k 2 2z
k 2 2z
goes to one.
d E , 0 J0
aperture
k z
500/k
(Fraunhofer approximation with cylindrical symmetry) (10.28) Just as fast Fourier transform algorithms aid in the numerical evaluation of diffraction integrals in Cartesian coordinates, fast Hankel transforms also exist and can be used with cylindrically symmetric diffraction integrals. Example 10.5
Compute the Fresnel and Fraunhofer diffraction patterns following a circular aperture (diameter ) illuminated by a uniform plane wave. Solution: According to (10.27), the eld down stream is 2e i kz e i E , z = i E 0 z
k 2 2z
Figure 10.9 Field amplitude following a circular aperture computed in the Fresnel approximation.
/2
d ei
0
k 2 2z
J0
k z
267
Unfortunately, this Fresnel integral must be performed numerically. The result of the calculation for a uniform eld illuminating a circular aperture is shown in Fig. 10.9. On the other hand, the eld in the Fraunhofer limit (10.28) is 2e i kz e i E , z = i E 0 z
k 2 2z
/2
d J0
0
k z
which can be integrated analytically. It is left as an exercise to perform the integration and to show that the intensity of the Fraunhofer pattern is I , z = I0
2 J ()
2 4 z
J 1 k /2z k /2z
(10.29)
The function 1 (sometimes called the jinc function) looks similar to the sinc function (see Example 10.4) except that its rst zero is at = 1.22 rather than at . 2 J () Note that lim 1 = 1.
0
Figure 10.10 Fraunhofer diffraction pattern (eld amplitude) generated for a uniformly illuminated circular aperture.
V U V da = n n
U 2V V 2U d v
V
(10.30)
The notation /n implies a derivative in the direction normal to the surface. We choose the following functions: V e i kr /r U E (r) (10.31)
where E (r) is assumed to satisfy the scalar Helmholtz equation, (10.5). When these functions are used in Greens theorem (10.30), we obtain E
S
e i kr e i kr E da = n r r n
E 2
V
e i kr e i kr 2 E dv r r
(10.32)
6 See J. W. Goodman, Introduction to Fourier Optics, Sect. 3-3 (New York: McGraw-Hill, 1968). 7 We exclude the point r = 0; see P0.4 and P0.5.
268
Chapter 10 Diffraction
where we have taken advantage of the fact that E (r) and e i kr /r both satisfy (10.5). This is exactly the reason for our judicious choices of the functions V and U since with them we were able to make half of (10.30) disappear. We are left with E
S
e i kr e i kr E da = 0 n r r n
(10.34)
Now consider a volume between a small sphere of radius at the origin and an outer surface of whatever shape. The total surface that encloses the volume is comprised of two parts (i.e. S = S 1 + S 2 as depicted in Fig. 10.11). When we apply (10.34) to the surface in Fig. 10.11, we have E
S2
e i kr e i kr E da = n r r n
S1
e i kr e i kr E da n r r n
(10.35)
Our motivation for choosing this geometry with multiple surfaces is that eventually we want to nd the eld at the origin (inside the little sphere) from knowledge of the eld on the outside surface. To this end, we assume that is so small that E (r) is approximately the same everywhere on the surface S 1 . Then the integral over S 1 becomes
Figure 10.11 A two-part surface enclosing volume V .
E
S1 2 e i kr e i kr e i kr E d a = lim d E r = 0 n r r n r r 0 0
r e i kr n r
E r r 2 sin d r n (10.36)
where we have used spherical coordinates. Notice that we have employed the chain rule to execute the normal derivative /n . Since r always points opposite , the normal derivative r /n is always to the direction of the surface normal n 8 equal to 1. We can now perform the integration in (10.36) as well as take the limit as 0 to obtain
e i kr e i kr E e i kr e i kr e i kr E d a = 4 lim r 2 2 + i k E r2 0 n r r n r r r r = 4 lim = 4E (0) (10.37) e i k + i k e i k E e i k E r
lim
0 S1
r=
r=
With the aid of (10.37), Greens theorem applied to our specic geometry reduces to 1 e i kr E e i kr E (0) = E da (10.38) 4 r n n r
S2
8 From the denition of the normal derivative we have r /n r n = n n = 1.
269
If we know E everywhere on the outer surface S 2 , this equation allows us to predict the eld E (0) at the origin. Of course we are free to choose any coordinate system in order to nd the eld anywhere inside the surface S 2 , by moving the origin. Now let us choose a specic surface S 2 . Consider an innite mask with a nite aperture connected to a hemisphere of innite radius R . In the end, we will suppose that light that enters through the mask and propagates to our origin point (among other points). In our present coordinate system, the vectors r and n opposite to the incoming light. We must evaluate (10.38) on the surface depicted in the gure. For the portion of S 2 which is on the hemisphere, the integrand tends to zero as R becomes large. To argue this, it is necessary to recognize the fact that at large distances the eld takes on a form proportional to e i kR /R so that the two terms in the integrand cancel. On the mask, we assume, as did Kirchhoff, that both E /n and E are zero.9 Thus, we are left with only the integration over the open aperture: 1 E (0) = 4 e i kr E e i kr E da r n n r (10.39)
mask
aperture origin
aperture
We have essentially arrived at the result that we are seeking. The eld coming through the aperture is integrated to nd the eld at the origin, which is located beyond the aperture. Let us manipulate the formula a little further. The second term in the integral of (10.39) can be rewritten as follows: e i kr e i kr r ik 1 i ke i kr ) ) = = 2 e i kr cos (r, n cos (r, n r n r r r n r r r (10.40)
. We ) indicates the cosine of the angle between r and n where r /n = cos (r, n have also assumed that the distance r is much larger than a wavelength in order to drop a term. Next, we assume that the eld illuminating the aperture can be x , y e i kz . This represents a plane-wave eld traveling through written as E =E the aperture from left to right. Then, we have E E z x , y e i kz (1) = i kE = = i kE n z n Substituting (10.40) and (10.41) into (10.39) yields E (0) = i E
aperture
(10.41)
e i kr r
) 1 + cos (r, n da 2
(10.42)
Finally, we wish to rearrange our coordinate system to that depicted in Fig. 10.2. In our derivation, it was less cumbersome to place the origin at a point after the
9 Later Sommerfeld noticed that these two assumptions actually contradict each other, and he revised Kirchhoffs work to be more accurate. In practice this revision makes only a tiny difference as light spills onto the back of the aperture, over a length scale of a wavelength. We will ignore this effect and go with Kirchhoffs (slightly awed) assumption. For further discussion see J. W. Goodman, Introduction to Fourier Optics, Sect. 3-4 (New York: McGraw-Hill, 1968).
270
Chapter 10 Diffraction
aperture. Now that we have completed our mathematics, it is convenient to make a change of coordinate system and move the origin to the plane of the aperture as in Fig. 10.2. Then, we can obtain the eld at a point lying somewhere after the aperture by computing E x , y, z = d = i E x ,y ,z = 0
aperture
) e i kR 1 + cos (r, z dx dy R 2
(10.43)
where R= ( x x )2 + y y
2
+ d2
(10.44)
Equation (10.10) is the same as (10.42) after applying a coordinate transformation. It is called the Fresnel-Kirchhoff diffraction formula and it agrees with (10.1) )]/2. except for the obliquity factor [1 + cos (r, z
f dv
(10.45)
always points normal to the surface of volume V over which The unit vector n the integral is taken. Let the vector function f be U V , where U and V are both analytical functions of the position coordinate r. Then (10.45) becomes da = (U V ) n
S V
(U V ) d v
(10.46)
as the directional derivative of V , directed along the surface We recognize V n . This is often represented in shorthand notation as normal n V n V n (10.47)
The argument of the integral on the right-hand side of (10.46) can be expanded with the chain rule: (U V ) = U V + U 2V (10.48) With these substitutions, (10.46) becomes U
S
V da = n
U V + U 2V d v
V
(10.49)
Actually, so far we havent done much. Equation (10.49) is nothing more than the divergence theorem applied to the vector function U V . Similarly, we can apply the divergence theorem to an alternative vector function given by the reverse
271
combination V U . Thus, we can write an equation similar to (10.49) where U and V are interchanged: V
S
U da = n
V U + V 2U d v
V
(10.50)
We subtract (10.50) from (10.49), and this leads to (10.30) known as Greens theorem.
272
Chapter 10 Diffraction
Exercises
Exercises for 10.1 Huygens Principle as Formulated by Fresnel P10.1 Huygens principle is often used to describe diffraction through a slits, but it can be also used to describe refraction. Use a drawing program or a ruler and compass to produce a picture similar to Fig. 10.13, which shows that the graphical prediction of refracted angle from the Huygens principle. Verify that the Huygens picture matches the numerical prediction from Snells Law for an incident angle of your choice. Use n i = 1 and n t = 2. HINT: Draw the wavefronts hitting the interface at an angle and treat each point where the wavefronts strike the interface as the source of circular waves propagating into the n = 2 material. The wavelength of the circular waves must be exactly half the wavelength of the incident light since = vac /n . Use at least four point sources and connect the matching wavefronts by drawing tangent lines as in the gure. P10.2 (a) Show that the function f (r ) = A cos (kr t ) r
Figure 10.13
is a solution to the wave equation in spherical coordinates with only radial dependence, 1 2 f 1 2 f r = r 2 r r v 2 t 2 Determine what v is, in terms of k and . (b) If the electric eld were a scalar eld, we might be done there. However, its a vector eld, and moreover it must satisfy Maxwells equations. We know from experience that its generally transverse, and since its traveling radially lets make a guess that its oscillating in the direction: A cos (kr t ) r Show that this choice for E is not consistent with Maxwells equations. In particular: (i) show that it does satisfy Gausss Law (1.1); (ii) compute the curl of E use Faradays Law (1.3) to deduce B; (iii) Show that this B does satisfy Gausss Law for magnetism (1.2); (iv) but this B it does not satisfy Amperes law (1.4). E (r ) =
(c) A somewhat more complicated spherical wave E(r , ) = A sin 1 cos (kr t ) sin (kr t ) r kr
Exercises
273
does satisfy Maxwells equations. Describe how this wave behaves as a function of r and . What conditions need to be satised for this equation to reduce to the spherical wave formula used in the diffraction formulas?
Exercises for 10.2 Scalar Diffraction Theory P10.3 Show that E (r ) = E 0 r 0 e i kr /r is a solution to the scalar Helmholtz equation (10.5). HINT: 2 = P10.4 1 1 2 1 2 sin + r + r r 2 r 2 sin r 2 sin2 2
Learn by heart the derivation of the Fresnel-Kirchhoff diffraction formula (outlined in Appendix 10.A). Indicate the percentage of how well you understand the derivation. If you write 100% percent, it means that you can reproduce the derivation after closing your notes. Check that (10.16) is the solution to the paraxial wave equation (10.15).
P10.5
Exercises for 10.4 Fraunhofer Approximation P10.6 (a) Repeat Example 10.1 to nd the on-axis intensity after a circular aperture in both the Fresnel and Fraunhofer approximations. (HINT: Use (10.27) and (10.28) to obtain the elds = 0.) Also make suitable approximations directly to (10.3) to obtain the same answers. (b) Check how well the Fresnel and Fraunhofer approximations work by graphing the three curves (i.e. (10.3) and the curves obtained in part (a)) on a single plot as a function of z . Take = 10 m and = 500 nm. To see the result better, use a log scale on the z -axis.
4 3 2 1 0 -3 10 Fresnel Fraunhofer
Fresnel-Kirchoff
10
-2
10 z (mm)
-1
Figure 10.15
274
Chapter 10 Diffraction
L10.7
(a) Why does the on-axis intensity behind a circular opening uctuate (see Example 10.1) whereas the on-axis intensity behind a circular obstruction remains constant (see Example 10.2)? (b) Create a collimated laser beam several centimeters wide. Observe the on-axis intensity on a movable screen (e.g. a hand-held card) behind a small circular aperture and behind a small circular obstruction placed in the beam. (video) (c) In the case of the circular aperture, measure the distance to several on-axis minima and check that it agrees with prediction. (See problem P10.6.)
Laser
Figure 10.16
P10.8
Calculate the Fraunhofer diffraction eld and intensity patterns for a rectangular aperture (dimensions x by y ) illuminated by a plane wave E 0 . In other words, derive (10.20). A single narrow slit has a mask placed over it so the aperture function is not a square pulse but rather a cosine: E (x , y , 0) = E 0 cos(x /L ) for L /2 < x < L /2 and E (x , y , 0) = 0 otherwise. Calculate the far-eld (Fraunhofer) diffraction pattern. Make a plot of intensity as a function of xkL /2z ; qualitatively compare the pattern to that of a regular single slit.
P10.9
Exercises for 10.5 Diffraction with Cylindrical Symmetry P10.10 Calculate the Fraunhofer diffraction intensity pattern (10.29) for a circular aperture (diameter ) illuminated by a plane wave E 0 .
Chapter 11
Diffraction Applications
In this chapter, we consider a number of practical examples of diffraction. We rst discuss diffraction theory in systems involving lenses. The Fraunhofer diffraction pattern discussed in section 10.4, applicable in the far-eld limit, is imaged to the focus of a lens when the lens is placed in the stream of light. This has important implications for the resolution of instruments such as telescopes or the human eye. The array theorem, which applies to Fraunhofer limit, is introduced in section 11.3. This theorem is a powerful mathematical tool that enables one to deal conveniently with diffraction from an array of identical apertures. One of the important uses of the array theorem is in determining Fraunhofer diffraction from a grating, since a diffraction grating can be thought of as an array of narrow slit apertures. In section 11.5, we study the workings of a diffraction spectrometer. To nd the resolution limitations, one combines the diffraction properties of gratings with the Fourier properties of lenses. Finally, we consider a Gaussian laser beam to understand its focusing and diffraction properties. The information presented here comes up remarkably often in research activity. We often think of lasers as collimated beams of light that propagate indenitely without expanding. However, the laws of diffraction require that every nite beam eventually grow in width. The rate at which a laser beam diffracts depends on its beam waist size. Because laser beams usually have narrow divergence angles and therefore obey the paraxial approximation, we can calculate their behavior via the Fresnel approximation discussed in section 10.3. Appendix 11.A discusses the ABCD law for Gaussian beams, which is a method of computing the effects of optical elements represented by ABCD matrices on Gaussian laser beams.
276
from an aperture is sufciently large (see (10.18) and (10.19)). Mathematically, it is obtained via a two-dimensional Fourier transform. The intensity of the far-eld diffraction pattern is
2
1 I x , y, z = c 2
1 z
E x , y , 0 e i k
aperture
x z
y +z
dx dy
(11.1)
Notice that the dependence of the diffraction on x , y , and z comes only through the combinations x = x /z and y = y /z . Therefore, the diffraction pattern in the Fraunhofer limit is governed by the two angles x and y , and the pattern preserves itself indenitely. As the light continues to propagate, the pattern increases in size at a rate proportional to distance traveled so that the angular width is preserved. The situation is depicted in Fig. 11.1. Recall that in order to use the Fraunhofer diffraction formula we need to 2 satisfy z aperture radius / (see (10.18)). As an example, if an aperture with a 1 cm radius (not necessarily circular) is used with visible light, the light must travel more than a kilometer in order to reach the Fraunhofer limit. It may therefore seem unlikely to reach the Fraunhofer limit in a typical optical system, especially if the aperture or beam size is relatively large. Nevertheless, spectrometers, which typically utilize diffraction gratings many centimeters wide, depend on achieving the Fraunhofer limit within the connes of a manageable instrument box. This is accomplished using imaging techniques. The Fraunhofer limit is also important to the performance of other optical instruments that use lenses (e.g. a telescope). Consider a lens with focal length f placed in the path of light following an aperture (see Fig. 11.2). Let the lens be placed an arbitrary distance L after the aperture. The lens produces an image of the Fraunhofer pattern at a new location d i following the lens according to the imaging formula (see (9.55)) 1 1 1 + . = f (z L ) d i (11.2)
Keep in mind that the lens interrupts the light before the Fraunhofer pattern has a chance to form. This means that the Fraunhofer diffraction pattern may
Figure 11.2 Imaging of the Fraunhofer diffraction pattern to the focus of a lens.
277
be thought of as a virtual object a distance z L to the right of the lens. Since the Fraunhofer diffraction pattern occurs at very large distances (i.e. z ) the image of the Fraunhofer pattern appears at the focus of the lens: di = f. (11.3)
Thus, a lens makes it very convenient to observe the Fraunhofer diffraction pattern even from relatively large apertures. It is not necessary to let the light propagate for kilometers. We need only observe the pattern at the focus of the lens as shown in Fig. 11.2. Notice that the spacing L between the aperture and the lens is unimportant to this conclusion. Even though we know that the Fraunhofer diffraction pattern occurs at the focus of a lens, the question remains as to the size of the image. To nd the answer, let us examine the magnication (9.56), which is given by M = di (z L ) (11.4)
Taking the limit of very large z and employing (11.3), the magnication becomes M f z (11.5)
This is a remarkable result. When the lens is inserted, the size of the diffraction pattern decreases by the ratio of the lens focal length f to the original distance z to a far-away screen. Since in the Fraunhofer regime the diffraction pattern is proportional to distance (i.e. si ze z ), the image at the focus of the lens scales in proportion to the focal length (i.e. si ze f ). This means that the angular width of the pattern is preserved! With the lens in place, we can rewrite (11.1) straightaway as
2
1 I x , y, L + f = c 2
1 f
E x , y ,0 e
aperture
i k xx + y y f (
)d x d y
(11.6)
which describes the intensity distribution pattern at the focus of the lens. Although (11.6) correctly describes the intensity, we cannot easily write the electric eld since the imaging techniques that we have used do not render the phase information. To obtain an expression for the eld, it will be necessary to employ the Fresnel diffraction formula. In addition, we need to know how a lens adjusts the phase fronts of the light passing through it. Phase Front Alteration by a Lens
Consider a monochromatic light eld that goes through a thin lens with focal length f . In traversing the lens, the wavefront undergoes a phase shift that varies across the lens. We will reference the phase shift to that experienced by the light that goes through the center of the lens. In the Fig. 11.3, R 1 is a positive radius
278
of curvature, and R 2 is a negative radius of curvature, according to our previous convention. We take the distances 1 and 2 , as drawn, to be positive. The light passing through the off-axis portion of the lens experiences less material than the light passing through the center. The difference in optical path length is (n 1) ( 1 + 2 ) (see discussion connected with (9.14)). This means that the phase of the eld passing through the off-axis portion of the lens relative to the phase of the eld passing through the center is = k (n 1) ( Figure 11.3 A thin lens, which modies the phase of a eld passing through.
1 + 2) .
(11.7)
The negative sign indicates a phase advance (i.e. same sign as t ). Since the off-axis light travels through less material, the phase of the wave front gets ahead of the light traveling through the center of the lens. In (11.7), k represents the wave number in vacuum (i.e. 2/vac ); since 1 and 2 correspond to distances outside of the lens material. We can nd expressions for surfaces of the lens:
1
and
2 1) 2)
(R 1 (R 2 +
(11.8)
In the Fresnel approximation, which takes place in the paraxial limit, it is appro2 priate to neglect the terms 2 1 and 2 in comparison with the other terms present. Within this approximation, equations (11.8) become
1
x2 + y 2 = 2R 1
and
x2 + y 2 = 2R 2
(11.9)
where the focal length of a thin lens f has been introduced according to the lensmakers formula (9.46). In summary, the light traversing a lens experiences a relative phase shift given by E x , y , z after lens = E x , y , z before lens e
i 2kf (x 2 + y 2 )
(11.11)
Figure 11.4 The phase fronts of a plane wave are bent as they pass through a lens.
Equation (11.11) introduces a wave-front curvature to the eld. For example, if a plane wave (i.e. a uniform eld E 0 ) passes through the lens, the eld emerges with a spherical-like wave front converging towards the focus of the lens.
We compute the diffraction pattern after the lens in three steps, as illustrated in Fig. 11.5. First, we use the Fresnel diffraction formula to compute the eld arriving at the lens. Second, we adjust the phase front of the light passing through the lens according to (11.11). Third, we use the eld exiting the lens as the input for a second Fresnel diffraction integral to nd the eld at the lens focus. The result gives an intensity pattern in agreement with (11.6). It also provides the full expression for the eld, including its phase.
279
Starting from the known eld E x , y , 0 at the aperture, we compute the eld incident on the lens using the Fresnel approximation: E (x , y , L ) = i e i kL e i 2L (x L
k 2
+y
E (x , y , 0)e i 2L (x
+y
x ) e i k L(
x +y y
)d x d y
(11.12) (The double primes keep track of distinct variables in sequential diffraction integrals.) As mentioned, the eld gains a phase factor according to (11.11) upon transmitting through the lens. Finally, we use the Fresnel diffraction formula a second time to propagate the distance f from the back of the thin lens: eik f e
i 2kf (x 2 + y 2 ) i 2kf (x 2 + y +y y
2
E x , y , L + f = i
f e
E ( x , y , L )e
i 2kf (x 2 + y
2
) (11.13)
xx ) e i k f (
)d x d y
As you can probably appreciate, the injection of (11.12) into (11.13) makes a rather long formula involving four dimensions of integration. Nevertheless, two of the integrals can be performed in advance of choosing the aperture (i.e. those over x and y ). This is accomplished with the help of the integral formula (0.55) (even though in this instance the real part of a is zero). After this cumbersome work, (11.13) reduces to
e i k (L + f ) e
kL 2 2 i 2kf (x 2 + y 2 ) i 2 f 2 (x + y )
E x , y , L + f = i
E (x , y , 0)e
k f
(xx + y y ) d x d y (11.14)
Notice that at least the integration portion of this formula looks exactly like the Fraunhofer diffraction formula! This happened even though in the preceding discussion we did not at any time specically make the Fraunhofer approximation. The result (11.14) implies the intensity distribution (11.6) as anticipated. However, the phase of the eld is also revealed in (11.14). In general, the eld caries a wave front curvature as it passes through the focal plane of the lens. In the special case L = f , the diffraction formula takes a particularly simple form: E (x , y , L + f ) = i e 2i k f f E (x , y , 0)e
i k xx + y y f (
L= f
)d x d y
(11.15)
When the lens is placed at this special distance following the aperture, the Fraunhofer diffraction pattern viewed at the focus of the lens carries a at wave front.
280
Figure 11.6 To resolve distinct images at the focus of a lens, the angular separation must exceed the width of the Fraunhofer diffraction patterns.
optical instruments. In essence, any optical instrument incorporates an aperture, limiting the light that enters. If nothing else, the diameter of a lens itself acts effectively as an aperture. The pupil of the human eye is an aperture that induces a Fraunhofer diffraction pattern to occur at the retina. Cameras have irises which aperture the light, again causing a Fraunhofer diffraction pattern to occur at the image plane. Of course, the focus of the lens is just where one needs to look in order to see images of distant objects. The Fraunhofer pattern, which occurs at the focus, represents the ultimate amount of diffraction caused by an aperture. This has the effect of blurring out features in the image and limiting resolution. This illustrates why it is impossible to focus light to a true point. Suppose you point a telescope at two distant stars. An image of each star is formed in the focal plane of the lens. The angular separation between the two images (referenced from the lens) is the same as the angular separation between the stars.1 This is depicted in Fig. 11.6. A resolution problem occurs when the Fraunhofer diffraction causes the image of each star to blur by more than the angular separation between them. In this case the two images cannot be resolved because they bleed into one another. The Fraunhofer diffraction pattern from a circular aperture was computed previously (see (10.29)). At the focus of a lens, this pattern becomes I , f = I0 2 4 f
2
J 1 k /2 f k /2 f
(11.16)
-1 1
where f , the focal length of the lens, takes the place of z in the diffraction formula. The parameter is its diameter of the lens. This intensity pattern contains the rst order Bessel function J 1 , which behaves somewhat like a sine wave as seen in Fig. 11.7. The main differences are that the zero crossings are not exactly periodic and the function slowly diminishes with larger arguments. The rst zero crossing (after x = 0) occurs at 1.22. The intensity pattern described by (11.16) contains the factor 2 J 1 ()/, where represents the combination k /2 f . As noticed in Fig. 11.7, J 1 () goes to zero at = 0. Thus, we have a zero-divided-by-zero situation when evaluating 2 J 1 ()/ at the origin. This is similar to the sinc function (i.e. sin ()/), which approaches one at the origin. In fact, 2 J 1 ()/ is sometimes called the jinc function because it also approaches one at the origin. The square of the jinc is shown in Fig. 11.7b. This curve is proportional to the intensity described in (11.16). This pattern is sometimes called an Airy pattern after Sir George Biddell Airy (English, 18011892)
1 In the thin-lens approximation, the ray from either star that traverses the center of the lens (i.e.
Figure 11.7 (a) First-order Bessel function. (b) Square of the Jinc function.
281
who rst described the pattern. As can be seen in Fig. 11.7b, the intensity quickly drops at larger radii. We now return to the question of whether the images of two nearby stars as depicted in Fig. 11.6 can be distinguished. Since the peak in Fig. 11.7b is the dominant feature in the diffraction pattern, we will say that the two stars are resolved if the angle between them is enough to keep their respective diffraction peaks from seriously overlapping. We will adopt the criterion suggested by Lord Rayleigh that the peaks are distinguishable if the peak of one pattern is no closer than the rst zero to the other peak. This situation is shown in Fig. 11.8. The angle that corresponds to this separation of diffraction patterns is found by setting the argument of (11.16) equal to 1.22, the location of the rst zero: k = 1.22 2f With a little rearranging we have 1.22 min = = f (11.18) (11.17)
Here we have associated the ratio / f (i.e. the radius of the diffraction pattern compared to the distance from the lens) with an angle min . The Rayleigh criterion requires that the diffraction patterns be separated by at least this angle before we say that they are resolved. min depends on the diameter of the lens as well as on the wavelength of the light. Since the angle between the images and the angle between the objects is the same, min tells the minimum angle between objects that can be resolved with a given instrument. This analysis assumes that the light from the two objects is incoherent, meaning the intensities in the image plane add; interferences between the two elds uctuate rapidly in time and average away.
Example 11.1
What minimum telescope diameter is required to distinguish a Jupiter-like planet (orbital radius 8 108 km) from its star if they are 10 light-years away? Solution: From (11.18) and assuming 500 nm light, we need > 1.22 1.22(500 109 m) 9.5 1015 m = = 0.07m min (8 1011 m)/(10ly) ly
This seems like a piece of cake; a telescope with a diameter bigger than 7cm will do the trick. However, the vastly unequal brightness of the star and the planet is the real technical challenge. The faint diffraction rings in the stars diffraction pattern completely swamp the faint signal from the planet.
282
E x , y ,0 =
n =1
E aperture (x x n , y y n , 0)
(11.19)
We next compute the Fraunhofer diffraction pattern for the above eld. Upon inserting (11.19) into the Fraunhofer diffraction formula (10.19) we obtain
e i kz e i 2z (x E x , y , z = i z
k 2 +y 2
dx
n =1
k d y E aperture x x n , y y n , 0 e i z (xx + y y )
(11.20)
where we have taken the summation out in front of the integral. We have also integrated over the entire (innitely wide) mask since E aperture is nonzero only inside each aperture. Even without yet choosing the shape of the identical apertures, we can make some progress on (11.20) with the change of variables x x x n and y y y n : e i kz e i 2z (x E x , y , z = i z
k 2
+y 2) N
dx
n =1
d y E aperture x , y , 0 e i z [x (x
k
+x n )+ y ( y + y n )]
(11.21) Next we pull the factor exp {i k ( xx + y y )} out in front of the integral to arrive n n z at our nal result:
N
E x , y, z =
n =1
e i z (xxn + y y n )
k 2
e i kz e i 2z (x i z
+y 2)
dx
k z
283
For the sake of elegance, we have traded back x for x and y for y as the variables of integration. Equation (11.22) is known as the array theorem.2 Note that the second factor in brackets is exactly the Fraunhofer diffraction pattern from a single aperture centered on x = 0 and y = 0. When more than one identical aperture is present, we only need to evaluate the Fraunhofer diffraction formula for a single aperture. Then, the single-aperture result is multiplied by the summation in front, which entirely contains the information about the placement of the (many) identical apertures. Example 11.2
Calculate the Fraunhofer diffraction pattern for two identical circular apertures with diameter whose centers are separated by a spacing h . Solution: As computed previously, the single-slit Fraunhofer diffraction pattern from a circular aperture (see (10.29)) is I , z = I0 2 4 z
2
J 1 k /2z k /2z
From the array theorem (11.22), the intensity of the overall diffraction pattern is
2 2
I x , y, z =
n =1
e i (xxn + y y n ) I 0
k z
2 4 z
J 1 k /2z k /2z
Figure 11.10 Fraunhofer diffraction pattern from two identical circular holes separated by twice their diameters.
e i z (xxn + y y n ) = e
hx i k z 2
+e
i k z
hx 2
= 2 cos
khx 2z
J 1 k /2z k /2z
cos2
khx 2z
E aperture x x n , y x n , 0 =
dx
d y x x n y x n E aperture x x , y y , 0
The integral in (11.20) therefore may be viewed as a 2-D Fourier transform of a convolution, where kx /z and k y /z play the role of spatial frequencies. The convolution theorem (see P0.26) indicates that this is the same as the product of Fourier transforms. The 2-D Fourier transform for the delta function (times 2) is
dx
k k d y x x n y y n e i z (xx + y y ) = e i z (xxn + y y n )
The array theorem (11.22) exhibits this factor. It multiplies the single-slit Fraunhofer diffraction integral, which is the Fourier transform of the other function.
284
The only part of (11.22) that remains to be evaluated is the summation out in front. Let the apertures be positioned at
Figure 11.11 Transmission grating.
xn = n
N +1 h, 2
yn = 0
(11.24)
where N is the total number of slits. Then the summation in the array theorem, (11.22), becomes
N n =1
e i z (xxn + y y n ) = e i
khx z
N +1 2
N n =1
e i
khx n z
(11.25)
This summation is recognized as a geometric sum, which can be performed using formula (0.65). Equation (11.25) then simplies to
N
e
n =1
i k xx n + y y n ) z(
=e
ik z
N +1 2
xh i khx z
e i e
khx z
1 (11.26)
i khx z
1
khx 2z
e i
khx 2z
ei ei
khx 2z khx 2z
e i
khx 2z
By combining (11.23) and (11.26) we obtain the full Fraunhofer diffraction pattern for a diffraction grating. The expression for the eld is E x , y, z = sin N khx 2z sin
khx 2z
i E 0
x ye i kz i k (x 2 + y 2 ) x y e 2z sinc x sinc y z z z
(11.27) Now let us suppose that the slits are really tall (parallel to the y -dimension) such that y . If the slits are innitely tall, the nal sinc function in Eq. (11.27) can be approximated as one. 3 The intensity pattern in the horizontal direction
3 This is mostly the right idea, but is still a bit of a fake. In fact, the eld often does not have a uniform phase along the entire slit in the y -dimension, so our use of the function sinc y /z y was inappropriate to begin with. The energy in a real spectrometer is usually spread out in a diffuse pattern in the y -dimension. However, its form in y is of little relevance; the spectral information is carried in the x -dimension only.
11.5 Spectrometers
285
can then be written in terms of the peak intensity of the diffraction pattern on the screen: hx sin2 N z 2 x (11.28) x I (x ) = I peak sinc hx z N 2 sin2 z Note that lim
sin N 0 sin
N=2
ducing our denition of I peak , which represents the intensity on the screen at x = 0. In principle, the intensity I peak is a function of y and depends on the exact details of how the slits are illuminated as a function of y , but this is usually not of interest as long as we stay with a given value of y as we scan along x . It is left as an exercise to study the functional form of (11.28), especially how the number of slits N inuences the behavior. The case of N = 2 describes the diffraction pattern for a Youngs double slit experiment. We now have a description of the Youngs two-slit pattern in the case that the slits have nite openings of width x rather than innitely narrow ones. A nal note: You may wonder why we are interested in Fraunhofer diffraction from a grating. The reason is that we are actually interested in separating different wavelengths by observing their distinct diffraction patterns separated in space. In order to achieve good spatial separation between light of different wavelengths, it is necessary to allow the light to propagate a far distance. Optimal separation (the maximum possible) occurs therefore in the Fraunhofer regime.
N=5
N = 10
11.5 Spectrometers
The formula (11.28) can be exploited to make wavelength measurements. This forms the basis of a diffraction grating spectrometer. A spectrometer has relatively poor resolving power compared to a Fabry-Perot interferometer. Nevertheless, a spectrometer is not hampered by the serious limitation imposed by free spectral range. A spectrometer is able to measure a wide range of wavelengths simultaneously. The Fabry-Perot interferometer and the grating spectrometer in this sense are complementary, the one being able to make very precise measurements within a narrow wavelength range and the other being able to characterize wide ranges of wavelengths simultaneously. To appreciate how a spectrometer works, consider Fraunhofer diffraction from a grating, as described by (11.28). The structure of the diffraction pattern has various peaks. For example, Fig. 11.12a shows the diffraction peaks from a Youngs double slit (i.e. N = 2). The diffraction pattern is comprised of the typical Youngs double-slit pattern multiplied by the diffraction pattern of a single slit.
hx 2 hx 2 hx (Note that sin2 2 z /4sin z = cos z .) As the number of slits N is increased, the peaks seen in the Youngs double-slit pattern tend to sharpen with additional smaller peaks appearing in between. Figure 11.12b shows the case for N = 5. The more signicant peaks occur when sin(hx /z ) in the denominator of (11.28) goes to zero. Keep in mind that the
N = 100
-4
-2
Figure 11.12 Diffraction through various numbers of slits, each with x = h /2 (slit widths half the separation). The dotted line shows the single slit diffraction pattern. (a) Diffraction from a double slit. (b) Diffraction from 5 slits. (c) Diffraction from 10 slits. (d) Diffraction from 100 slits.
286
numerator goes to zero at the same places, creating a zero-over-zero situation, so the peaks are not innitely tall. With larger values of N , the peaks can become extremely sharp, and the small secondary peaks in between are smaller in comparison. Fig. 11.12c shows the case of N = 10 and Fig. 11.12d, shows the case of N = 100. When very many slits are used, the diffraction pattern becomes very useful for measuring spectra of light, since the position of the diffraction peaks depends on wavelength (except for the center peak at x = 0). If light of different wavelengths is simultaneously present, then the diffraction peaks associated with different wavelengths appear in different locations. It helps to have very many slits involved (i.e. large N ) so that the diffraction peaks are sharply dened. Then closely spaced wavelengths can be more easily distinguished. Consider the inset in Fig. 11.12d, which gives a close-up view of the rst-order diffraction peak for N = 100. The location of this peak on a distant screen varies with the wavelength of the light. How much must the wavelength change to cause the peak to move by half of its width as marked in the inset of Fig. 11.12d? We will say that this is the minimum separation of wavelengths that still allows the two peaks to be distinguished. Finding the Minimum Distinguishable Wavelength Separation
As mentioned, the main diffraction peaks occur when the denominator of (11.28) goes to zero, i.e. hx = m (11.29) z The numerator of (11.28) goes to zero at these same locations (i.e. N hx /z = N m ), so the peaks remain nite. If two nearby wavelengths 1 and 2 are sent through the grating simultaneously, their m th peaks are located at x1 = mz 1 h and x 2 = mz 2 h (11.30)
These are spatially separated by x x 2 x 1 = where 2 1 . Meanwhile, we can nd the spatial width of, say, the rst peak by considering the change in x 1 that causes the sine in the numerator of (11.28) to reach the nearby zero (see inset in Fig. 11.12d). This condition implies N h x 1 + x peak 1 z = N m + (11.32) mz h (11.31)
We will say that two peaks, associated with 1 and 2 , are barely distinguishable when x = x peak . We also substitute from (11.30) to rewrite (11.32) as N h (mz 1 /h + mz /h ) = N m + 1 z = Nm (11.33)
287
As we did for the Fabry-Perot interferometer, we can dene the resolving power of the diffraction grating as RP = mN (11.34)
The resolving power is proportional to the number of slits illuminated on the diffraction grating. The resolving power also improves for higher diffraction orders m .
Example 11.3
What is the resolving power with m = 1 of a 2-cm-wide grating with 500 slits per millimeter, and how wide is the 1st-order diffraction peak for 500-nm light after 1-m focusing? Solution: From (11.34) the resolving power is RP = mN = 2 cm 500 = 104 0.1 cm
and the minimum distinguishable wavelength separation is = /RP = 500 nm/104 = 0.05 nm From (11.31), with z f , we have x = 1m mf = 0.05nm = 25 m h 2 106 m
x +y w2 0
(11.35)
where w 0 , called the beam waist, species the radius of Gaussian prole. It is depicted in Fig. 11.14. To better appreciate the meaning of w 0 , consider the intensity of the above eld distribution: I x , y , 0 = I 0 e 2
2 2 /w 0
(11.36)
z-axis
where 2 x 2 + y 2 . In (11.36) we see that w 0 indicates the radius at which the intensity reduces by the factor e 2 = 0.135. We would like to know how this eld evolves when it propagates forward from the plane z = 0. We compute the eld downstream using the Fresnel approximation (10.13):
288
E x , y , z = i
e i kz e i 2z (x z
2 +y 2
dx
d y E 0 e (x
2 +y 2
k k 2 2 2 )/ w 0 e i 2z (x + y ) e i z (xx + y y )
(11.37)
The Gaussian prole itself limits the dimension of the emission region, so there is no problem in integrating to innity. Equation (11.37) can be rewritten as
E 0 e i kz e i 2z (x z
k 2 +y 2
E x , y , z = i
dx e
1 w2 0
k kx 2 i 2 z x i z x
dy e
1 w2 0
k 2 +i 2 z y i
ky z
(11.38)
The integrals over x and y have the identical form and can be done individually with the help of the integral formula (0.55). The algebra is cumbersome, but the integral in the x dimension becomes
1 w2 0
dx e
k 2z
x i
kx z
=
= =
1 2 w0 k i 2 z
1
2
exp
i kx z 4
1 2
1 2 w0
k 2z
i
k 2z
1+i
2z 2 kw 0
exp
kx 2z
2z 2 kw 0
z 1+
2z 2 kw 0 2 i
tan1 2z2 kw 0
2 2 2z 2 +i kx kw 0 exp 2 2z 2z 1 + kw 2
0
(11.39) A similar expression results from the integration on y . When (11.39) and the equivalent expression for the y -dimension are used in (11.38), the result is
k x2+y 2) i kz i 2 z(
(x 2 + y 2 )
1+ 2z kw 2 0
E x , y, z = E 0
1 w2 0
k +i 2 z
1+
2z 2 kw 0
i tan1
2z kw 2 0
(11.40)
This rather complicated-looking expression for the eld distribution is in fact very useful and can be directly interpreted, as discussed in the next section.
289
d E 0 e
0
2 /w 2 0
ei
k 2 2z
J0
k z
k 2 z 4 1 i k 2z w2 0
2e i kz e i E , z = i E 0 z
k 2 2z
e 2
1 2 w0
2
k i 2 z
2 1 w2 0 k +i 2 z
= E0
i kz i
k 2 2z
2z 2 kw 0
1+
2z kw 2 0
i tan1
2z kw 2 0
(11.41)
2 R (z ) z + z 0 /z ,
z0
This formula describes the lowest-order Gaussian mode, the most common laser beam prole. (Please be aware that some lasers are multimode and exhibit more complicated structures.) It turns out that (11.41) works equally well for negative values of z . The expression can therefore be used to describe the eld of a simple laser beam everywhere (before and after it goes through a focus). In fact, the expression works also near z = 0!4 At z = 0 the diffracted eld (11.41) returns the exact
4 There is good reason for this since the Fresnel diffraction integral is an exact solution to the
paraxial wave equation (10.15). The beam (11.41) therefore satises the paraxial wave equation for all z.
290
expression for the original eld prole (11.35) (see P11.11). In short, (11.41) may be used with impunity as long as the divergence angle of the beam is not too wide. To begin our interpretation of (11.41), consider the intensity prole I E E as depicted in Fig. 11.15: I , z = I0
2 w0
w 2 (z )
2 2 w 2 (z )
I0
1 + z 2 /z 0
e 2
2 2 w 2 (z )
(11.46)
Figure 11.15 A Gaussian laser eld prole in the vicinity of its beam waist.
By inspection, we see that w (z ) gives the radius of the beam anywhere along z . At z = 0, the beam waist, w (z = 0) reduces to w 0 , as expected. The parameter z 0 , known as the Rayleigh range, species the distance along the axis from z = 0 to the point where the intensity decreases by a factor of 2. Note that w 0 and z 0 are not independent of each other but are connected through the wavelength according to (11.45). There is a tradeoff: a small beam waist means a short depth of focus. That is, a small w 0 means a small Rayleigh range z 0 . We next consider the phase terms that appear in the eld expression (11.41). The phase term i kz + i k 2 /2R (z ) describes the phase of curved wave fronts, where R (z ) is the radius of curvature of the wave front at z . At z = 0, the radius of curvature is innite (see (11.44)), meaning that the wave front is at at the laser beam waist. In contrast, at very large values of z we have R (z ) = z (see (11.44)). k 2 In this case, we may write these phase terms as kz + = k z 2 + 2 . This describes a spherical wave front emanating from the origin out to point , z . The Fresnel approximation (same as the paraxial approximation) represents spherical wave fronts with the former parabolic approximation. As a reminder, to restore the temporal dependence of the eld, we append e i t to the solution, as discussed in connection with (10.4). The phase i tan1 z /z 0 is perhaps a bit more mysterious. It is called the Gouy shift and is actually present for any light that goes through a focus, not just laser beams. The Gouy shift is not overly dramatic since the expression tan1 z /z 0 ranges from /2 (at z = ) to /2 (at z = +). Nevertheless, when light goes through a focus, it experiences an overall phase shift of . Example 11.4
Write the beam waist w 0 in terms of the f-number, dened to be the ratio of z to the diameter of the beam diameter 2w (z ) far from the beam waist. Solution: Far away from the beam waist (i.e. z >> z 0 ) the laser beam expands along a cone. That is, its diameter increases in proportion to distance. w (z ) = w 0
2 1 + z 2 /z 0 w 0 z /z 0
2R ( z )
The cone angle is parameterized by the f-number, the ratio of the cone height to its base: z z z0 = = f # lim z 2w (z ) 2w 0 z /z 0 2w 0
Figure 11.16
291
Equation (11.47) gives a convenient way to predict the size of a laser focus. One calculate the f-number by dividing the diameter of the beam at a lens by the distance to the focus. However, in practice you may be very surprised at how badly a beam focuses compared to the theoretical prediction (due to aberrations, etc.). It is always good practice to directly measure your focus if its size is important to an experiment.
292
Figure 11.17 Gaussian laser beam traversing an optical system described by an ABCD matrix. The dark lines represent the incoming and exiting beams. The gray line represents where the exiting beam appears to have been.
where A , B , C , and D are the matrix elements of the optical system. The imaginary number i 1 imbues the law with complex arithmetic. It makes two equations from one, since the real and imaginary parts of (11.48) must separately be equal. We now prove the ABCD law. We begin by showing that the law holds for two specic ABCD matrices. First, consider the matrix for propagation through a distance d : A B 1 d = (11.49) C D 0 1 We know that simple propagation has minimal effect on a beam. The Rayleigh range is unchanged, so we expect that the ABCD law should give z 0 = z 0 . The propagation through a distance d modies the beam position by z = z + d . We now check that the ABCD law agrees with these results by inserting (11.49) into (11.48): z + i z0 = 1 (z + i z 0 ) + d = z + d + i z 0 (propagation through distance d) (11.50) 0 (z + i z 0 ) + 1
Thus, the law holds in this case. Next we consider the ABCD matrix of a thin lens (or a curved mirror): A C B D = 1 1/ f 0 1 (11.51)
A beam that traverses a thin lens undergoes the phase shift k 2 /2 f , according to (11.11). This modies the original phase of the wave front k 2 /2R (z ), seen in (11.41). The phase of the exiting beam is therefore k 2 k 2 k 2 = 2R ( z ) 2R ( z ) 2 f (11.52)
where we do not keep track of unimportant overall phases such as kz or kz . With (11.44) this relationship reduces to 1 1 1 1 1 1 = = 2 2 R (z ) R (z ) f z + z 0 /z f z + z 0 /z (11.53)
293
In addition to this relationship, the local radius of the beam given by (11.43) cannot change while traversing the thin lens. Therefore, w z = w (z ) z 0 1 + z z
2 2
0
= z0 1 +
z2
2 z0
(11.54)
On the other hand, the ABCD law for the thin lens gives z + i z0 = 1 (z + i z 0 ) + 0 1/ f (z + i z 0 ) + 1 (traversing a thin lens with focal length f )
(11.55) It is left as an exercise (see P11.14) to show that (11.55) is consistent with (11.53) and (11.54). So far we have shown that the ABCD law works for two specic examples, namely propagation through a distance d and transmission through a thin lens with focal length f . From these elements we can derive more complicated systems. However, the ABCD matrix for a thick lens cannot be constructed from just these two elements. However, we can construct the matrix for a thick lens if we sandwich a thick window (as opposed to empty space) between two thin lenses. The proof that the matrix for a thick window obeys the ABCD law is left as an exercise (see P11.17). With these relatively few elements, essentially any optical system can be constructed, provided that the beam propagation begins and ends in the same index of refraction. To complete our proof of the general ABCD law, we need only show that when it is applied to the compound element A2B1 + B2D 1 C2B1 + D 2D 1 (11.56) it gives the same answer as when the law is applied sequentially, rst on = = A1 C1 and then on A2 C2 B2 D2 B1 D1 A C B D A2 C2 B2 D2 A1 C1 B1 D1 A 2 A 1 + B 2C 1 C 2 A 1 + D 2C 1
294
Explicitly, we have z + i z0 = A 2 z + i z0 + B 2 C 2 z + i z0 + D 2 A2 = C2 =
A 1 (z +i z 0 )+B 1 C 1 (z +i z 0 )+D 1 A 1 (z +i z 0 )+B 1 C 1 (z +i z 0 )+D 1
+ B2 + D2 (11.57)
A 2 [ A 1 (z + i z 0 ) + B 1 ] + B 2 [C 1 (z + i z 0 ) + D 1 ] C 2 [ A 1 (z + i z 0 ) + B 1 ] + D 2 [C 1 (z + i z 0 ) + D 1 ] ( A 2 A 1 + B 2C 1 ) (z + i z 0 ) + ( A 2 B 1 + B 2 D 1 ) = (C 2 A 1 + D 2C 1 ) (z + i z 0 ) + (C 2 B 1 + D 2 D 1 ) A (z + i z 0 ) + B = C (z + i z 0 ) + D
Thus, we can construct any ABCD matrix that we wish from matrices that are known to obey the ABCD law. The resulting matrix also obeys the ABCD law.
Exercises
295
Exercises
Exercises for 11.1 Fraunhofer Diffraction Through a Lens P11.1 L11.2 Fill in the steps leading to (11.14) from (11.13). Show that the intensity distribution (11.6) is consistent with (11.14). Set up a collimated plane wave in the laboratory using a HeNe laser ( = 633 nm) and appropriate lenses. (a) Choose a rectangular aperture (x by y ) and place it in the plane wave. Observe the Fraunhofer diffraction on a very far away screen (i.e., 2 k where z 2 aperture radius is satised). Check that the location of the zeros agrees with (10.20). (b) Place a lens in the beam after the aperture. Use a CCD camera to observe the Fraunhofer diffraction prole at the focus of the lens. Check that the location of the zeros agrees with (10.20), replacing z with f . (c) Repeat parts (a) and (b) using a circular aperture with diameter . Check the position of the rst zero. (video)
CCD Camera Filters Screen
Figure 11.18
Exercises for 11.2 Resolution of a Telescope P11.3 On the night of April 18, 1775, a signal was sent from the Old North Church steeple to Paul Revere, who was 1.8 miles away: One if by land, two if by sea. If in the dark, Pauls pupils had 4 mm diameters, what is the minimum possible separation between the two lanterns that would allow him to correctly interpret the signal? Assume that the predominant wavelength of the lanterns was 580 nm. HINT: In the eye, the index of refraction is about 1.33 so the wavelength is shorter. This leads to a smaller diffraction pattern on the retina. However, in accordance with Snells law, two rays separated by an angle 580 nm outside of the eye are separated by an angle /1.33 inside the eye. The two rays then hit on the retina closer together. As far as resolution is concerned, the two effects exactly compensate.
296
L11.4
Simulate two stars with laser beams ( = 633 nm). Align them nearly parallel with a small lateral displacement. Send the beams down a long corridor until diffraction causes both beams to grow into one another so that it is no longer apparent that they are from two distinct sources. Use a lens to image the two sources onto a CCD camera. The camera should be placed close to the focal plane of the lens. Use a variable iris near the lens to create different pupil openings.
Laser
Laser
Figure 11.19
Experimentally determine the pupil diameter that just allows you to resolve the two sources according to the Rayleigh criterion. Check your measurement against theoretical prediction. (video) HINT: The angular separation between the two sources is obtained by dividing propagation distance into the lateral separation of the beams.
Exercises for 11.3 The Array Theorem P11.5 Find the diffraction pattern created by an array of nine circles, each with radius a , which are centered at the following (x , y ) coordinates: (b, b ), (0, b ), (b, b ), (b, 0), (0, 0), (b, 0), (b, b ), (0, b ), (b, b ) (a is less than b ). Make a plot of the result for the situation where (in some choice of units) a = 1, b = 5a , and k /d = 1. View the plot at different zoom levels to see the ner detail. (a) A plane wave is incident on a screen of N 2 uniformly spaced identical rectangular apertures of dimension x by y (see Fig. 11.20). Their +1 +1 positions are described by x n = h n N2 and y m = s m N2 . Find the far-eld (Fraunhofer) pattern of the light transmitted by the grid. (b) You look at a distant sodium street lamp (somewhat monochromatic) through a curtain made from a ne mesh fabric with crossed threads. Make a sketch of what you expect to see (how the lamp will look to you). HINT: Remember that the lens of your eye causes the Fraunhofer diffraction of the mesh to appear at the retina.
P11.6
Figure 11.20
Exercises
297
Exercises for 11.4 Diffraction Grating P11.7 Consider Fraunhofer diffraction from a grating of N slits having widths x and equal separations h . Make plots (label relevant points and scaling) of the intensity pattern for N = 1, N = 2, N = 5, and N = 1000 in the case where h = 2x , x = 5 m, and = 500 nm. Let the Fraunhofer diffraction be observed at the focus of a lens with focal length f = 100 cm. Do you expect I peak to be the same value for all of these cases? For the case of N = 1000 in P11.7, you wish to position a narrow slit at the focus of the lens so that it transmits only the rst-order diffraction peak (i.e. at khx / 2 f = ). (a) How wide should the slit be if it is to be half the separation between the rst intensity zeros to either side of the peak? (b) What small change in wavelength (away from = 500 nm) will cause the intensity peak to shift by the width of the slit found in part (a)?
P11.8
Exercises for 11.5 Spectrometers L11.9 (a) Use a HeNe laser to determine the period h of a reective grating. (b) Give an estimate of the blaze angle on the grating. HINT: Assume that the blaze angle is optimized for rst-order diffraction of the HeNe laser (for one side) at normal incidence. The blaze angle enables a mirror-like reection of the diffracted light on each groove. (video) (c) You have two mirrors of focal length 75 cm and the reective grating in the lab. You also have two very narrow adjustable slits and the ability to tune the angle of the grating. Sketch how to use these items to make a monochromator (scans through one wavelength at a time). If the beam that hits the grating is 5 cm wide, what do you expect the ultimate resolving power of the monochromator to be in the wavelength range of 500 nm? Do not worry about aberration such as astigmatism from using the mirrors off axis.
Figure 11.21
298
Slit Light in
Figure 11.22
L11.10
Study the Jarrell Ash monochromator. Use a tungsten lamp as a source and observe how the instrument works by taking the entire top off. Do not breathe or touch when you do this. In the dark, trace the light inside of the instrument with a white plastic card and observe what happens when you change the wavelength setting. Place the top back on when you are done. (video) (a) Predict the best theoretical resolving power that this instrument can do assuming 1200 lines per millimeter. (b) What should the width x of the entrance and exit slits be to obtain this resolving power? Assume = 500 nm. HINT: Set x to be the distance between the peak and the rst zero of the diffraction pattern at the exit slit for monochromatic light.
Exercises for 11.7 Gaussian Laser Beams P11.11 (a) Conrm that (11.41) reduces to (11.35) when z = 0. (b) Take the limit z z 0 to nd the eld far from the laser focus.
P11.12 Use the Fraunhofer integral formula (either (10.19) or (10.28)) to determine the far-eld pattern of a Gaussian laser focus (11.35). HINT: The answer should agree with P11.11 part (b). L11.13 Consider the following setup where a diverging laser beam is collimated using an uncoated lens. A double reection from both surfaces of the lens (known as a ghost) comes out in the forward direction, focusing after a short distance. Use a CCD camera to study this focused beam. The collimated beam serves as a reference to reveal the phase of the focused beam through interference. Because the weak ghost beam
Ghost Beam
Figure 11.23
Exercises
299
concentrates near its focus, the two beams can have similar intensities for optimal interference effects. (video)5
Filter Laser Lens 150 cm Pin Hole Uncoated Lens CCD Camera
Figure 11.24
The ghost beam E 1 , z is described by (11.41), where the origin is at the focus. Let the collimated beam be approximated as a plane wave E 2 e i kz +i , where is the relative phase between the two beams. The 2 net intensity is then I t , z E 1 , z + E 2 e i kz +i or I t , z = I2 + I1 , z + 2 I 2 I 1 , z cos z k 2 tan1 2R ( z ) z0
where I 1 , z is given by (11.46). We now have a formula that retains both R (z ) and the Gouy shift tan1 z /z 0 , which are not present in the intensity distribution of a single beam (see (11.46)). (a) Determine the f-number for the ghost beam (see Example 11.4). Use this measurement to predict a value for w 0 . HINT: You know that at the lens, the focusing beam is the same size as the collimated beam. (b) Measure the actual spot size w 0 at the focus. How does it compare to the prediction? HINT: Before measuring the spot size, make a subtle adjustment to the tilt of the lens. This incidentally causes the phase between the two beams to vary by small amounts, which you can set to = /2. Then at the focus the cosine term vanishes and the two beams dont interfere (i.e. the intensities simply add). This is accomplished if the center of the interference pattern is as dark as possible either far before or far after the focus. (c) Observe the effect of the Gouy shift. Since tan1 z /z 0 varies over a range of , you should see that the ring pattern before versus after the focus inverts (i.e. the bright rings exchange with the dark ones). (d) Predict the Rayleigh range z 0 and check that the radius of curvature 2 R (z ) z + z 0 /z agrees with measurement. HINT: You should see interference rings similar to those in Fig. 11.25. The only phase term that varies with is k 2 /2R (z ). If you count N fringes out to a radius , then k 2 /2R (z ) has varied by 2N .
5 J. Peatross and M. V. Pack, Viewing the Mathematical Structure of Gaussian Laser Beams in a Student Laboratory, Am. J. Phys. 69, 1169 (2001).
z=0
z = +z0
z = -z0
z = +2z0
z = -2z0
z = +3z0
z = -3z0
z = +4z0
Figure 11.25
300
Exercises for 11.A ABCD Law for Gaussian Beams P11.14 Find the solutions to (11.55) (i.e. nd z and z 0 in terms of z and z 0 ). Show that the results are in agreement with (11.53) and (11.54). P11.15 Assuming a collimated beam (i.e. z = 0 and beam waist w 0 ), nd the location L = z and size w 0 of the resulting focus when the beam goes through a thin lens with focal length f . L11.16 Place a lens in a HeNe laser beam soon after the exit mirror of the cavity. Characterize the focus of the resulting laser beam, and compare the results with the expressions derived in P11.15.
P11.17 Prove the ABCD law for a beam propagating through a thick window of material with matrix A C B D = 1 d /n 0 1
Chapter 12
12.1 Interferograms
Consider the Michelson interferometer seen in Fig. 12.1. Suppose that the beamspliter divides the elds evenly, so that the overall output intensity is given by (8.1): I tot = 2 I 0 [1 + cos ()] (12.1) As a reminder, is the roundtrip delay time of one path relative to the other. This equation is based on the idealized case, where the amplitude and phase of the two
1 See M. Born and E. Wolf, Principles of Optics, 7th ed., Sect. 7.5.5 (Cambridge: Cambridge University Press, 1999). 2 In fact, a grating can be considered to be a hologram and holographic techniques are often employed to produce gratings.
301
302
(a)
beams are uniform and perfectly aligned to each other following the beamsplitter. The entire beam blinks on and off as the delay path is varied. What happens if one of the retro-reecting mirrors is misaligned by a small angle ? The fringe patterns seen in Fig. 12.2 (a)-(c) are the result. By the law of reection, the beam returning from the misaligned mirror deviates from the ideal path by an angle 2 . This puts a relative phase variation of = kx sin (2x ) + k y sin 2 y (12.2)
(b)
on the misaligned beam.3 Here x represents the tilt of the mirror in the x dimension and y represents the amount of tilt in the y -dimension. When the two plane waves join, the resulting intensity pattern is I tot = 2 I 0 1 + cos +
(c)
(12.3)
(d)
The phase term depends on the local position within the beam through x and y . Regions of uniform phase, called fringes (in this case individual stripes), have the same intensity. As the delay is varied, the fringes seem to move across the detector. In this case, the fringes appear at one edge of the beam and disappear at the other. Another interesting situation arises when the beams in a Michelson interferometer are diverging. A fringe pattern of concentric circles will be seen at the detector when the two beam paths are unequal (see Fig. 12.2 (d)). The radius of curvature for the beam traveling the longer path is increased by the added amount of delay d = c . Thus, if beam 1 has radius of curvature R 1 when returning to the beam splitter, then beam 2 will have radius R 2 = R 1 + d upon return (assuming at mirrors). The relative phase (see phase term in (11.41)) between the two beams is = k 2 /2R 1 k 2 /2R 2 (12.4)
(e)
Figure 12.2 Fringe patterns for a Michelson interferometer: (a) Horizontally misaligned beams. (b) Vertically misaligned beams. (c) Both vertically and horizontally misaligned beams. (d) Diverging beam with unequal paths. (e) Diverging beam with unequal paths and horizontal misalignment.
303
A typical industry standard for research-grade optics is to specify the surface atness to within one tenth of an optical wavelength (633 nm HeNe laser). This means that the interferometer should reveal no more than one fth of a fringe variation across the substrate surface. The fringe pattern tells the technician how the surface should continue to be polished in order to achieve the desired surface atness. Figure 12.3(a) shows the fringe pattern for a surface with signicant variations in the surface gure. When testing a surface, it is not necessary to remove all tilt from the alignment before the effects of surface variations become apparent in the fringe pattern. In fact, it can be helpful to observe the distortions as deections in a normally regularly striped fringe pattern. Figure 12.3(b) shows fringes from a distorted surface when some tilt is left in the interferometer alignment. An important advantage to leaving some tilt in the beam is that one can better tell the sign of the phase errors. We can see, for example, in the case of tilt that the two major distortion regions in Fig. 12.3 have opposite phase; we can tell that one region of the substrate protrudes while the other dishes in. On the other hand, this is not clear for an interferogram with no tilt. Other types of optical components (besides at mirrors) can also be tested with an interferometer. Figure 12.4 shows how a lens can be tested using a convex mirror to compensate for the focusing action of the lens. With appropriate spacing, the lens-mirror combination can act like a at surface. Distortions in the lens gure are revealed in the fringe pattern. In this case, the surfaces of the lens are tested together, and variations in optical path length are observed. In order to record fringes, say with a CCD camera, it is often convenient to image a larger beam onto a relatively small active area of the detector. The imaging objective should be adjusted to produce an image of the test optic on the detector screen. The diameter of the objective lens needs to accommodate the whole beam.
(a)
(b)
Figure 12.3 (a) Fringe pattern arising from an arbitrarily distorted mirror in a perfectly aligned interferometer with plane wave beams. (b) Fringe pattern from the same mirror as (a) when the mirror is tilted (still plane wave beams). The distortion due to surface variation is still easily seen.
Optic to be tested
Imaging Objective
Camera
304
Object Film
Beamsplitter
out. For simplicity, we neglect the vector nature of the electric eld, assuming that the scattering from the object for the most part preserves polarization and that the angle between the two beams incident on the lm is modest (so that the electric elds of the two beams are close to parallel). To the extent that the light scattered from the object contains the polarization component orthogonal to that of the reference beam, it provides a uniform (unwanted) background exposure to the lm on top of which the fringe pattern is recorded. In general terms, we may write the electric eld arriving at the lm as5 E lm (r) e i t = E object (r) e i t + E ref (r) e i t (12.5)
Here, the coordinate r indicates locations on the lm surface, which may have arbitrary shape, but often is a plane. The eld E object (r), which is scattered from the object, is in general very complicated. The eld E ref (r) may be equally complicated, but typically it is convenient if it has a simple form such as a plane wave, since this beam must be re-created later in order to view the hologram. The intensity of the eld (12.5) is given by 1 2 I lm (r) = c 0 E object (r) + E ref (r) 2 (12.6) 1 2 = c 0 E object (r) + |E ref (r)|2 + E ref (r) E object (r) + E ref (r) E object (r ) 2 For typical photographic lm, the exposure of the lm is proportional to the intensity of the light hitting it. This is known as the linear response regime. That is, after the lm is developed, the transmittance T of the light through the lm is proportional to the intensity of the light that exposed it ( I lm ). However, for low exposure levels, or for lm specically designed for holography, the transmission of the light through the lm can be proportional to the square of the intensity of the light that exposes the lm. Thus, after the lm is exposed to the fringe pattern and developed, the lm acquires a spatially varying transmission function according to 2 T (r) I lm (r ) (12.7) If at a later point in time light of intensity I incident is directed onto the lm, it will transmit according to I transmitted = T (r) I incident . In this case, the eld, as it emerges from the other side of the lm, will be E transmitted (r) = t (r) E incident (r) I lm (r) E incident (r) where t (r) = T (r). (12.8)
(12.9)
5 See P. W. Milonni and J. H. Eberly, Lasers, Sect. 16.4-16.5 (New York: Wiley, 1988); G. R. Fowles,
305
and view the light that is transmitted. According to (12.6) and (12.8), the transmitted eld is proportional to E transmitted (r) I lm (r) E ref (r)
2 = E object (r) + |E ref (r)|2 E ref (r) + |E ref (r)|2 E object (r) + E ref (r) E object (r) (12.10) Although (12.10) looks fairly complicated, each of the three terms has a direct interpretation. The rst term is just the reference beam E ref (r) with an amplitude modied by the transmission through the lm. It is the residual undeected beam, similar to the zero-order diffraction peak for a transmission grating. The second term is interpreted as a reconstruction of the light eld originally scattered from the object E object (r). Its amplitude is modied by the intensity of the reference beam, but if the reference beam is uniform across the lm, this hardly matters. An observer looking into the lm sees a wavefront identical to the one produced by the original object (superimposed with the other elds in (12.10)). Thus, the observer sees a virtual image at the location of the original object. Since the wavefront of the original object has genuinely been recreated, the image looks three-dimensional, because the observer is free to view from different perspectives. The nal term in (12.10) is proportional to the complex conjugate of the original eld from the object. It also contains twice the phase of the reference beam, which we can overlook if the reference beam is uniform on the lm. In this case, the complex conjugate of the object eld actually converges to a real image of the original object. This image is located on the observers side of the lm, but it is often of less interest since the image is inside out. An ideal screen for viewing this real image would be an item shaped identical to the original object, which of course defeats the purpose of the hologram! To the extent that the lm is not at or to the extent that the reference beam is not a plane wave, the phase of 2 E ref (r) severely distorts the image. On the other hand, the virtual image previously described never suffers from this problem. 2
Film
Observer
Figure 12.6 Holographic reconstruction of wavefront through diffraction from fringes on lm. Compare with Fig. 12.5.
Reference Beam
Film
Point Object
Example 12.1
Analyze the three eld terms in (12.10) for a hologram made from a point object, as depicted in Fig. 12.7. Solution: Presumably, the point object is illuminated sufciently brightly so as to make the scattered light have an intensity similar to the reference beam at the lm. Let the reference plane wave strike the lm at normal incidence. Then the reference eld will have constant amplitude and phase across it; call it E ref . The eld from the point object can be treated as a spherical wave: E object = E ref L L2 + 2 eik
L 2 + 2
Figure 12.7 Exposure to holographic lm by a point source and a reference plane wave. The holographic fringe pattern for a point object and a plane wave reference beam exposing a at lm is shown on the right.
Here represents the radial distance from the center of the lm to some other
306
point on the lm. We have taken the amplitude of the object eld to match E ref in the center of the lm. After the lm is exposed, developed, and re-illuminated by the reference beam, the eld emerging from the right-hand-side of the lm, according to (12.10), becomes E transmitted
2 2 E ref L 2 2 + E ref E ref + E ref
E ref L L2 + 2
L2 + 2
2 + E ref
eik
L 2 + 2
E ref L L2 + 2
(12.12)
e i k
L 2 + 2
Virtual image
We see the three distinct waves that emerge from the holographic lm. The rst term in (12.12) represents the plane wave reference beam passing straight through the lm with some variation in amplitude (depicted in Fig. 12.8 (a)). The second term in (12.12) has the identical form as the eld from the original object (aside from an overall amplitude factor). It describes an outward-expanding spherical wave, which gives rise to a virtual image at the location of the original point object, as depicted in Fig. 12.8 (b). The nal term in (12.12) corresponds to a converging spherical wave, which focuses to a point at a distance L from the observers side of the screen (depicted in Fig. 12.8 (c)).
Film
Figure 12.8 Reference beam incident on previously exposed holographic lm. (a) Part of the beam goes through. (b) Part of the beam takes on the eld prole of the original object. undeected. (c) Part of the beam converges to a real image of the original object.
Exercises
307
Exercises
Exercises for 12.1 Interferograms P12.1 An ideal Michelson interferometer that uses at mirrors is perfectly aligned to a wide collimated laser beam. Suppose that one of the mirrors is then misaligned by 0.1 . What is the spacing between adjacent fringes on the screen if the wavelength is = 633 nm? What would happen if, instead of tilting one of the mirrors, the angle of the input beam (before the beamsplitter) changed by 0.1 ? An ideal Michelson interferometer uses at mirrors perfectly aligned to an expanding beam that diverges from a point 50 cm before the beamsplitter. Suppose that one mirror is 10 cm away from the beam splitter, and the other is 11 cm. Suppose also that the center of the resulting bulls-eye fringe pattern is dark. If a screen is positioned 10 cm after the beam splitter, what is the radial distance to the next dark fringe on the screen if the wavelength is = 633 nm?
P12.2
Exercises for 12.2 Testing Optical Components L12.3 Set up an interferometer and observe distortions to a mirror substrate when the setscrew is over tightened.
Exercises for 12.3 Generating Holograms P12.4 Consider a diffraction grating as a simple hologram. Let the light from the object be a plane wave (object placed at innity) directed onto a at lm at angle . Let the reference beam strike the lm at normal incidence, and take the wavelength to be . (a) What is the period of the fringes? (b) Show that when re-illuminated by the reference beam, the three terms in (12.10) give rise to zero-order and 1st-order diffraction (occurring on each side of zero-order). P12.5 (a) Show that the phase of the real image in (12.12) may be approximated as = k 2 /2L , aside from a spatially independent overall phase. Compare with (11.10) and comment. (b) This hologram is similar to a Fresnel zone plate, used to focus extreme ultraviolet light or x-rays, for which it is difcult to make a lens. Graph the eld transmission for the hologram as a function of and superimpose a similar graph for a best-t mask that has regions of either 100% or 0% transmission. Use = 633 nm and L = (5 105 1 4 )
308
(this places the point source about a 32 cm before the screen). See Fig. 12.9. Consider the holographic pattern produced by the point object described in section 12.4. L12.6 Make a hologram.
0.5
Hologram Transmittance
0.5
Figure 12.9 Field transmission for a point-source hologram (upper) and a Fresnel zone plate (middle), and a plot of both as a function of radius (bottom).
R49
R50
R51
R52
R53
R54
R55
R56
R57
R58 R59
309
310
R60
T or F: The array theorem is useful for deriving the Fresnel diffraction from a grating. T or F: A diffraction grating with a period h smaller than a wavelength is ideal for making a spectrometer. T or F: The blaze on a reection grating can improve the amount of energy in a desired order of diffraction. T or F: The resolving power of a spectrometer used in a particular diffraction order depends only on the number of lines illuminated (not wavelength or grating period). T or F: The central peak of the Fraunhofer diffraction from two narrow slits separated by spacing h has the same width as the central diffraction peak from a single slit with width x = h . T or F: The central peak of the Fraunhofer diffraction from a circular aperture of diameter has the same width as the central diffraction peak from a single slit with width x = . T or F: The Fraunhofer diffraction pattern appearing at the focus of a lens varies in angular width, depending on the focal length of the lens used. T or F: Fraunhofer diffraction can be viewed as a spatial Fourier transform (or inverse transform if you prefer) on the eld at the aperture.
R61
R62
R63
R64
R65
R66
R67
Problems R68 (a) Derive Snells law using Fermats principle. (b) Derive the law of reection using Fermats principle. R69 (a) Consider a ray of light emitted from an object, which travels a distance d o before traversing a lens of focal length f and then traveling a distance d i . Write a vector equation relating
image object
y2 y1 to . Be sure to simplify 2 1 the equation so that only one ABCD matrix is involved. HINT: 1 1/ f 0 1 , 1 d 0 1
Figure 12.10
(b) Explain the requirement on the ABCD matrix in part (a) that ensures that an image appears for the distances chosen. From this requirement, extract a familiar constraint on d o and d i . Also, make a reasonable denition for magnication M in terms of y 1 and y 2 , then substitute to nd M in terms of d o and d i .
311
(c) A telescope is formed with two thin lenses separated by the sum of their focal lengths f 1 and f 2 . Rays from a given far-away point all strike the rst lens with essentially the same angle 1 . Angular magnication M quanties the telescopes purpose of enlarging the apparent angle between points in the eld of view. Give a sensible denition for angular magnication in terms of 1 and 2 . Use ABCD-matrix formulation to derive the angular magnication of the telescope in terms of f 1 and f 2 . R70 A B (beginning C D and ending in the same index of refraction) can be made to look like the matrix for a thin lens if the beginning and ending positions along the z-axis are referenced from two principal planes, located distances p 1 and p 2 before and after the system. (a) Show that a system represented by a matrix HINT: A C B D = 1.
Figure 12.11
(b) Where are the principal planes located and what is the effective focal length for two identical thin lenses with focal lengths f that are separated by a distance d = f (see Fig. 12.12)? R71 Derive the on-axis intensity (i.e. x , y = 0) of a Gaussian laser beam if you know that at z = 0 the electric eld of the beam is E , z = 0 = E0e Fresnel:
x2 +y 2 ) i kd i 2k d(
Figure 12.12
2 w2 0
ie E x , y, d =
E x , y , 0 e i 2d ( x
2 +y 2
k ) e i d (xx + y y ) d x d y
e Ax
+B x +C
dx =
B 2 +C e 4A . A
R72
(a) You decide to construct a simple laser cavity with a at mirror and another mirror with concave curvature of R = 100 cm. What is the longest possible stable cavity that you can make? HINT: Sylvesters theorem is
N
A C
B D
1 sin
312
where cos = 1 2 ( A + D ). (b) The amplier is YLF crystal, which lases at = 1054 nm. You decide to make the cavity 10 cm shorter than the longest possible (i.e. found in part (a)). What is the value of w 0 , and where is the beam waist located inside the cavity (the place we assign to z = 0)? HINT: One can interpret the parameter R (z ) as the radius of curvature of the wave front. For a mode to exist in a laser cavity, the radius of curvature of each of the end mirrors must match the radius of curvature of the beam at that location. E , z = E0 2 x 2 + y 2 w (z ) w 0
2 kw 0 2 1 + z 2 /z 0
2 k 2 w0 i tan1 zz 0 e w 2 (z ) e i kz +i 2R (z ) e w (z )
2 R (z ) z + z 0 /z
z0 R73
(a) Compute the Fraunhofer diffraction intensity pattern for a uniformly illuminated circular aperture with diameter . HINT: ie E x , y, d =
x2+y 2) i kd i 2k d(
d 1 J 0 () = 2
a 2
E x , y , 0 e i d (xx + y y ) d x d y
e i cos( ) d
0
J 0 (bx ) xd x =
0
a J 1 (ab ) b
J 1 (1.22) = 0
x 0
lim
2 J 1 (x ) =1 x
(b) The rst lens of a telescope has a diameter of 30 cm, which is the only place where light is clipped. You wish to use the telescope to examine two stars in a binary system. The stars are approximately 25 light-years away. How far apart need the stars be (in the perpendicular sense) for you to distinguish them in the visible range of = 500 nm? Compare with the radius of Earths orbit, 1.5 108 km.
313
R74
(a) Derive the Fraunhofer diffraction pattern for the eld from a uniformly illuminated single slit of width x . (Dont worry about the y -dimension.) (b) Find the Fraunhofer intensity pattern for a grating of N slits of width +1 x positioned on the mask at x n = h n N2 so that the spacing between all slits is h .
N
HINT: The array theorem says that the diffraction pattern is times the diffraction pattern of a single slit. You will need
N n =1 n =1
e i d xxn
rn =r
rN 1 r 1
(c) Consider Fraunhofer diffraction from the grating in part (b). The grating is 5.0 cm wide and is uniformly illuminated. For best resolution in a monochromator with a 50 cm focal length, what should the width of the exit slit be? Assume a wavelength of = 500 nm. R75 (a) A monochromatic plane wave with intensity I 0 and wavelength is incident on a circular aperture of diameter followed by a lens of focal length f . Write the intensity distribution at a distance f behind the lens. (b) You wish to spatially lter the beam such that, when it emerges from the focus, it varies smoothly without diffraction rings or hard edges. A pinhole is placed at the focus, which transmits only the central portion of the Airy pattern (inside of the rst zero). Calculate the intensity pattern at a distance f after the pinhole using the approximation given in the hint below. HINT: A reasonably good approximation of the transmitted eld is 2 2 that of a Gaussian E , 0 = E f e /w 0 , where E f is the magnitude of the eld at the center of the focus found in part (a), and the width is w 0 = 2 f # / and f # f / . The gure below shows how well the Gaussian approximation ts the actual curve. We have assumed that the rst aperture is a distance f before the lens so that at the focus after the lens the wave front is at at the pinhole. To avoid integration, you may want to use the result of P11.12 or P11.11(b) to get the Fraunhofer limit of the Gaussian prole. (See gure below.) Selected Answers R72: (a) 100 cm (b) 0.32 mm. R73: (b) 4.8 108 km. R74: (c) 5 m.
Figure 12.14
Figure 12.13
Chapter 13
Blackbody Radiation
Hot objects glow. In 1860, Kirchhoff proposed that the radiation emitted by hot objects as a function of frequency is approximately the same for all materials.1 The notion that all materials behave similarly led to the concept of an ideal blackbody radiator. Most materials have a certain shininess that causes light to reect or scatter in addition to being absorbed and reemitted. However, light that falls upon an ideal blackbody is absorbed perfectly before the possibility of reemission, hence the name blackbody. The distribution of frequencies emitted by a blackbody radiator is related to its temperature. We often consider a blackbody radiator that is in thermal equilibrium with the surrounding light that is absorbed and reemitted. If it is not in thermal equilibrium, for example, if more light is emitted than absorbed, then the object inevitably cools as light escapes to the environment, moving the system toward thermal equilibrium. The Sun is a good example of a blackbody radiator. The light emitted from the Sun is associated with its surface temperature. Any light that arrives to the Sun from outer space is virtually 100% absorbed, however little light that might be, so the name blackbody aptly describes it. Mostly, light escapes to the much colder surrounding space (i.e. it is not in thermal equilibrium), and the temperature of the Suns surface is maintained by the fusion process within. As another example, a glowing tungsten lament in an ordinary light bulb may be reasonably described as a blackbody radiator. However, surface reections make it less than ideal both for absorption and emission. Experimentally, a near perfect blackbody radiator can be constructed from a hollow object. An example is shown in Fig. 13.1. As the interior of the object is heated, the light present inside the internal cavity is in equilibrium with the glowing walls. A small hole can be drilled through the wall to observe the radiation inside without signicantly disturbing the system. The observation hole can be thought of as a perfect blackbody since any light entering the hole from the outside is eventually absorbed (before being potentially reemitted), if not on the
1 An important exception is atomic vapors, which have relatively few discrete spectral lines.
Gustav Kirchho
(18241887, German)
was born in Konigsberg, the son of a lawyer. Kirchho attended the University of Konigsberg. While still a student, he developed what are now called Kirchho 's law for electrical circuits. During his career, Kirchho was a professor in Breslau, Heidelberg, and nally Berlin. Kirchho was one of the rst to study the spectra emitted by various objects when heated. Not coincidentally, his colleague in heidelberg was Robert Bunsen, inventor of the Bunsen burner. Kirchho coined the term `blackbody' radiation. He demonstrated that an excited gas gives o a discrete spectrum, and that an unexcited gas surrounding a blackbody emitter produces dark lines in the blackbody spectrum. Together Kirchho and Bunsen discovered caesium and rubidium. Later in his career, Kirchho showed how to derive Fresnel's diraction formula starting from the wave equation. (Wikipedia)
However, Kirchhoffs assumption holds quite well for most solids, which are sufciently complex.
315
316
rst bounce then on subsequent bounces inside the cavity. In this chapter, we develop a theoretical understanding of blackbody radiation and provide some historical perspective. The explanation given by Max Planck in 1900 marks the birth of quantum mechanics. He postulated the existence of electromagnetic quanta, which we now call photons. Einstein used Plancks ideas to explain the photoelectric effect and to develop the concept of stimulated and spontaneous emission. Because of his analysis, Einstein can be thought of as the father of light amplication by stimulated emission of radiation (LASER).
Figure 13.1 Blackbody radiator. Thermal light emerges from the small hole in the end.
where is called the Stefan-Boltzmann constant and T is the absolute temperature (in Kelvin) of the surface. The value of the Stefan-Boltzmann constant is = 5.6696 108 W/m2 K4 . The dimensionless parameter e , called the emissivity , is equal to one for an ideal blackbody surface. However, it takes on smaller values for actual materials because of surface reections. For example, the emissivity of tungsten is approximately e = 0.4. This takes into account surface reections, which make it harder for a material to emit light as well as to absorb light.4 As mentioned in the introduction, one can construct an ideal blackbody radiator from a material with e < 1 by creating an enclosure, or cavity, as depicted in Fig. 13.2. A small hole in the wall behaves to the outside world like an ideal blackbody surface. From the perspective of the outside world, the holes surface has emissivity e = 1. Light within the cavity recirculates until it is eventually absorbed. The intensity emerging from the hole automatically approaches that of an ideal blackbody radiator. It is sometimes useful to express intensity in terms of the energy density of the light eld u eld (given by (2.53) in units of energy per volume). The connection between the intensity emerging from the observation hole in the wall of a blackbody cavity and the energy density of the thermal light within the cavity is
Figure 13.2 Blackbody radiator constructed as a cavity with a small hole to sample the internal light.
I=
4T 4 cu eld u eld = 4 c
(13.2)
1.2 (San Diego: Academic Press, 1994). 3 It is less effort to obtain the Stefan-Boltzmann law using the Planck radiation formula as a starting point (see P13.3). 4 Emissivity typically has some frequency dependence, so what is presented here is an oversimplication.
317
Within the enclosed cavity, light travels at speed c isotropically in all directions. A factor of 1/2 arrises because only half of the energy travels towards the hole from within the cavity as opposed to away. The remaining factor of 1/2 occurs because the light emerging from the hole is directionally distributed over a hemisphere as . The average over the opposed to owing in the direction of the surface normal n hemisphere is carried out as follows:
2 0 2 0
/2 0
sin d rn = r sin d
2 0
d
2 0
/2 0
/2 0
/2 0
1 2
(13.3)
Although (13.1) describes the total intensity of the light that leaves a blackbody surface, it does not describe what frequencies make up the radiation eld. This frequency distribution was not fully described for another two decades, when Max Planck developed his famous formula. Planck was rst to arrive at the correct formula for the spectrum of blackbody radiation, building on the work of others, most notably Wien, who came very close. At rst, Planck tweaked Wiens formula to match newly available experimental data. When he attempted to explain it, he was forced to introduce the concept of light quanta. Even Planck was uncomfortable with and perhaps disbelieved the assumption that his formula implied, but he deserves credit for recognizing and articulating it.
318
potential energy). The problem then reduces to that of nding the number of unique modes for the radiation at each frequency.5 The idea is that requiring each mode of electromagnetic energy to hold energy k B T should reveal the spectral shape of blackbody radiation.
Re
n = m = =
E n ,m , e i (nk0 x +mk0 y +
k0 z )
(13.4)
where each component of the wave number in any of the three dimensions is an integer times k 0 = 2/L (13.5) Considering a box of size L does not articially restrict our analysis, since we may later take the limit L so that our box represents the entire universe. Moreover, L will naturally disappear from our calculation when we later consider the density of modes. We can think of a given wave number k as specifying the equation of a sphere in a coordinate system with axes labeled n , m , and : n2 + m2 +
2
k k0
(13.6)
The fact that the integers n , m , and range over both positive and negative values automatically takes into account that the eld may travel in the forwards or the backwards direction. We need to know how many more ways there are to choose n , m , and when the wave number k /k 0 increases to (k + d k )/k 0 . The answer is the difference in the volume of the two spheres shown in Fig. 13.3: # modes in (k ,k +d k ) = 4 k2 d k 2 k k0 0 (13.7)
This is the number of terms in (13.4) associated with a wave number between k and k + d k .
5 See O. Svelto, Principles of Lasers, 4th ed., translated by D. C. Hanna, Sect. 2.2.1 (New York: Plenum Press, 1998). 6 The Fourier expansion 13.4 implies that the eld on the right and left of each dimension match up, which is known as periodic boundary conditions.
319
According to the Rayleigh-Jeans assumption, each mode should carry on average equal energy k B T . The energy density associated with a specied range of wave numbers d k is then k B T /L 3 times the number of modes within that range (13.7). The total energy density in the eld involving all wave numbers is then7
u eld =
0
k B T 4 k 2 2 3 3 d k = kB T L k0
k2 dk 2
(13.8)
where the extra factor of 2 accounts for two independent polarizations, not specied in (13.4). As anticipated, the dependence on L has disappeared from (13.8) after substituting from (13.5). We can immediately see that (13.8) disagrees drastically with the StefanBoltzmann law (13.2), since (13.8) is proportional to temperature rather than to its fourth power. In addition, the integral in (13.8) is seen to diverge, meaning that regardless of the temperature, the light carries innite energy density! This has since been named the ultraviolet catastrophe since the divergence occurs on the short wavelength end of the spectrum. This is a clear failure of classical physics to explain blackbody radiation. Nevertheless, Rayleigh emphasized the fact that his formula works well for the longer wavelengths. It is instructive to make the change of variables k = /c in the integral to write
u eld = k B T
0 2 2 3
2 d 2 c 3
(13.9)
The important factor / c can now be understood to be the number of modes per frequency. Then (13.9) is rewritten as
u eld =
0
() d
(13.10)
tributions was the development of Jeans length, the critical radius for interstellar clouds, which determines whether a cloud will collapse to form a star. In his later career, Jeans became somewhat well known to the public for his lay-audience books highlighting scientic advances, in particular relativity and cosmology. (Wikipedia)
2 Rayleigh-Jeans () = k B T 2 3 (13.11) c describes (incorrectly) the spectral energy density of the radiation eld associated with blackbody radiation.
where
320
became available over a fairly wide wavelength range. In keeping with Kirchhoffs notion of an ideal blackbody radiator, the results were observed to be independent of the material for most solids. The intensity per frequency depended only on temperature and when integrated over all frequencies agreed with the StefanBoltzmann law (13.1). In 1896, Wilhelm Wien considered the known physical and mathematical constraints on the spectrum of blackbody radiation and proposed a spectral function that seemed to work:8
0 2 4 6 8 10
Wien () =
3 e /kB T 2 c 3
(13.12)
Figure 13.4 Energy density per frequency according to Planck, Wien, and Rayleigh-Jeans.
An important feature of (13.12) is that it gives a result proportional to T 4 when integrated over all frequency (i.e. the Steffan-Boltzmann law). Wiens formula did a fairly good job of tting the experimental data. However, in 1900 Lummer and Pringshein, colleagues of Max Planck, reported experimental data that deviated from the Wien distribution at long wavelengths (infrared). Planck was privy to this information early on and introduced a modest revision to Wiens formula that t the data beautifully everywhere: Planck () = 3 2 c 3 e /kB T 1 (13.13)
where = 1.054 1034 J s is an experimentally determined constant.9 Figure 13.4 shows the Planck spectral distribution curve together with the Rayleigh-Jeans curve (13.11) and the Wien curve (13.12). As is apparent, the Wien distribution does a good job nearly everywhere. However, at long wavelengths it was off by just enough for the experimentalists to notice that something was wrong. At this point, it may seem fair to ask, what did Planck do that was so great? After all, he simply guessed a function that was only a slight modication of Wiens distribution. And he knew the answer from the back of the book, namely Lummers and Pringsheins well done experimental results. (At the time, Planck was unaware of the work by Rayleigh.) Planck gets well-deserved credit for interpreting the meaning of his new formula. His interpretation was what he called an act of desperation. He did not necessarily believe in the implications of his formula; in fact, he presented them somewhat apologetically. It was several years later that the young Einstein published his paper explaining the photoelectric effect in the context of Plancks work. Plancks insight was an enormous step toward understanding the quantum nature of light. Nevertheless, it took another three decades to develop a more
8 The constant h had not yet been introduced by Planck. The actual way that Wien wrote his distribution was Wien () = a 3 e b /T , where a and b were parameters used to t the data. 9 Plancks constant was rst introduced as h = 6.626 1034 J s, convenient for working with frequency , expressed in Hz. It is common to write h /2 when working with frequency , expressed in rad/s.
321
complete theory of quantum electrodynamics. Students should appreciate that the very people who developed quantum mechanics were also bothered by its confrontation with deep-seated intuition. If quantum mechanics bothers you, you are in good company! Planck found that he could derive his formula only if he made the following strange assumption: A given mode of the electromagnetic eld is not able to carry an arbitrary amount of energy (for example, k B T as Rayleigh and Jeans used, which varies continuously as the temperature varies). Rather, the eld can only carry discrete amounts of energy separated by spacing . Under this assumption, the probability P n that a mode of the eld is excited to the n th level is proportional to the Boltzmann statistical weighting factor e n /kB T . A review of the Boltzmann factor is given in Appendix 13.B.
e m /kB T
= e n /kB T 1 e /kB T
(13.14)
We used (0.66) to accomplish the above sum, which is a geometric series. The expected energy in a particular mode of the eld is the sum of each possible energy level (i.e. n ) times the probability of it occurring:
n =0
n P n = 1 e = 1 e
/k B T
ne
n =0
n /k B T
/k B T
e n /kB T (/k B T ) n =0
= 1 e =
/1k B T
1 (/k B T ) 1 e /kB T
(13.15)
He became an associate professor of theoretical physics at the University of Kiel and then a few years later took over Kirchho 's post at the University of Berlin. After nearly twenty years of idillic and happy family life, a series of tragedies hit the Planck household. Planck's rst wife and mother of four, died. Then his eldest son was killed in action during World War I. Soon after, his twin daughters each died giving birth to their rst child. Later Planck's remaining son from his rst marriage was executed for participating in a failed attempt to assassinate Hitler. Planck won the Nobel prize in 1918 for his introduction of energy quanta, but he had serious reservations about the course that quantum mechanics theory took. (Wikipedia)
e /kB T 1
Equation (13.15) provides the expected energy in any of the modes of the radiation eld, as dictated by Plancks assumption. To obtain the Planck distribution (13.13), we replace k B T in the Rayleigh-Jeans formula (13.10) with the correct expected energy (13.15).10 It is interesting that we are now able to derive the constant in the StefanBoltzmann law (13.2) in terms of Plancks constant (see P13.3). The StefanBoltzmann law is obtained by integrating the spectral density function (13.13)
10 See O. Svelto, Principles of Lasers, 4th ed., translated by D. C. Hanna, Sect. 2.2.2 (New York:
322
over all frequencies to obtain the total eld energy density, which is in thermal equilibrium with the blackbody radiator:
u eld =
0
Planck ()d =
4 4 4 2 k B T 4 T 4 2 3 c 60c c
(13.16)
Since Plancks constant was not introduced until a couple decades after the StefanBoltzmann law was developed, one might more appropriately say that the StefanBoltzmann constant pins down Plancks constant.
Example 13.1
Determine Planck () such that
u eld =
0
Planck () d =
0
Planck () d
where Planck () and Planck () represent distinct functions distiguished by their arguments. Solution: The change of variables 2c / d = 2cd /2 gives
0
u eld =
2c
d = 2
16c 5 e 2c /kB T 1
8hc 5 e hc /kB T 1
(13.17)
where we have written h 2. It is interesting to note that the maximum of Planck () occurring at max and the maximum of Planck () occurring at max do not correspond to a matching wavelength and frequency. That is, max = 2c /max , because of the nonlinear nature of the variable transformation. (See problem P13.4.)
323
transitions between energy levels. In addition, he postulated that some transitions must occur spontaneously. (If the possibility of spontaneous transitions is not included, then there can be no way for a eld mode to receive energy if none is present to begin with.) Einstein wrote down rate equations for populations of the two levels N1 and N2 associated with the transition :11 1 = A 21 N2 B 12 () N1 + B 21 () N2 , N 2 = A 21 N2 + B 12 () N1 B 21 () N2 N (13.18)
The coefcient A 21 is the rate of spontaneous emission from state 2 to state 1, B 12 () is the rate of stimulated absorption from state 1 to state 2, and B 21 () is the rate of stimulated emission from state 2 to state 1. In thermal equilibrium, the rate equations (13.18) are both equal to zero (i.e., 1 = N 2 = 0), since the relative populations of each level must remain constant. N We can then solve for the spectral density () at the given frequency. In this case, either expression in (13.18) yields () = A 21
N1 N2 B 12 B 21
(13.19)
In thermal equilibrium, the spectral density must match the Planck spectral density formula (13.13). In making the comparison, we should rst rewrite the ratio N1 /N2 of the populations in the two levels using the Boltzmann probability factor (see Appendix 13.B): N1 e E 1 /kB T = e (E 2 E 1 )/kB T = e h /kB T = N2 e E 2 /kB T (13.20)
withdrew. Einstein then attended school in Switzerland, and subsequently entered a mathematics program at the Polytechnic in Zurich. There, Einstein met his rst wife, Mileva Maric, a fellow math student, who he later divorced before marrying Elsa Lowenthal. Early on, Einstein could not nd a job as a professor, and so he worked in the Swiss patent oce until his "Miracle Year" (1905), when published four major papers, including relativity and the
Then when equating (13.19) to the Planck blackbody spectral density (13.13) we get A 21 3 (13.21) = e /kB T B 12 B 21 2 c 3 e /kB T 1 From this expression we deduce that
12
B 12 = B 21 and
3
(13.22)
photoelectric eect (for which he later received the Nobel prize). Thereafter, job oers were never in short supply. In 1933, as the Nazi regime came to power, Einstein immigrated from Berlin to the US and became a professor at Princeton University. Einstein is most noted for special and general relativity, for which he became a celebrity scientist in his own lifetime. Einstein also made huge contributions to statistical and quantum mechanics. (Wikipedia)
A 21 = 2 3 B 21 (13.23) c We see from (13.22) that the rate of stimulated absorption is the same as the rate of stimulated emission. In addition, if one knows the rate of stimulated
11 See P . W. Milonni, The Quantum Vacuum An Introduction to Quantum Electrodynamics, Sect.
1.8 (San Diego: Academic Press, 1994). 12 We assume that energy levels 1 and 2 are non-degenerate. Some modications must be made in the case of degenerate levels, but the procedure is similar.
324
emission between a pair of states, it follows from (13.23) that one also knows the rate of spontaneous emission. This is remarkable because to derive A 21 directly, one needs to use the full theory of quantum electrodynamics (the complete photon description). However, to obtain B 21 , it is actually only necessary to use a semiclassical theory, where the light is treated classically and the energy levels in the material are treated quantum-mechanically using the Schrdinger equation. In writing the rate equations, (13.18), Einstein predicted the possibility of creating lasers fty years in advance of their development. These rate equations are still valid even if the light is not in thermal equilibrium with the material. The equations suggest that if the population in the upper state 2 can be made articially large, then amplication will result via the stimulated transition. The rate equations also show that a population inversion (more population in the upper state than in the lower one) cannot be achieved by pumping the material with the same frequency of light that one hopes to amplify. This is because the stimulated absorption rate is balanced by the stimulated emission rate. The material-dependent parameters A 21 and B 12 = B 21 are called the Einstein A and B coefcients.
on the walls of the container. This can be derived from the fact that radiation of energy E imparts a momentum p = E cos c (13.25)
when it is absorbed with incident angle on a surface.14 A similar momentum is imparted when radiation is emitted.
13 See P . W. Milonni, The Quantum Vacuum An Introduction to Quantum Electrodynamics, Sect.
1.2 (San Diego: Academic Press, 1994). 14 The fact that light carries momentum was understood well before the development of the theory of relativity and the photon description of light.
325
Derivation of (13.24)
Consider a thin layer of space adjacent to a container wall with area A . If the layer has thickness z , then the volume in the layer is A z . Half of the radiation inside the layer ows toward the wall, where it is absorbed. The total energy in the layer that will be absorbed is then E = ( A z )u eld /2, which arrives during the interval t = z /(c cos ), assuming for the moment that all light is directed with angle ; we must average the angle of light propagation over a hemisphere. The pressure on the wall due to absorption (i.e. force or d p /d t per area) is then
2
d
2 0
/2 0
P abs =
p 1 t A
sin d =
/2 0
sin d
u eld 2
/2
cos2 sin d =
0
u eld 6
(13.26)
In equilibrium, an equal amount of radiation is also emitted from the wall. This gives an additional pressure P emit = P abs , which conrms that the total pressure is given by (13.24).
We derive the Stefan-Boltzmann law using the concept of entropy, which is dened in differential form by the quantity dS dQ T (13.27)
where d Q is the injection of heat (or energy) into the radiation eld in the box and T is the temperature at which that injection takes place. We would like to write d Q in terms of u eld , V , and T . Then we may invoke the fact that S is a state variable, which implies 2 S 2 S = (13.28) T V V T This is a mathematical statement of the fact that S is fully dened if the internal energy, temperature, and volume of a system are specied. That is, S does not depend on past temperature and volume history; it is dictated by the present state of the system. To obtain d Q in the form that we need, we can use the 1st law of thermodynamics. It states that a change in internal energy dU = d (u eldV ) can take place by the injection of heat d Q or by doing work dW = P dV as the volume increases: d Q = dU + P dV = d (u eldV ) + P dV 1 = V d u eld + u eld dV + u eld dV 3 d u eld 4 =V d T + u eld dV dT 3
(13.29)
We have used energy density times volume to obtain the total energy U in the radiation eld in the box. We have also used (13.24) to obtain the work accomplished by pressure as the volume changes.
326
When we differentiate (13.30) with respect to temperature or volume we get 4u eld S = V 3T S V d u eld = T T dT We are now able to evaluate the partial derivatives in (13.28), which give 2 S 4 u eld 4 1 u eld 4 u eld = = T V 3 T T 3 T T 3 T2 2 S 1 d u eld = V T T d T
(13.31)
(13.32)
Since by (13.28) these two expressions must be equal, we get a differential equation relating the internal energy of the system to the temperature: u eld 4u eld 4 1 u eld 4 u eld 1 d u eld = = 2 3 T T 3 T T dT T T (13.33)
The solution to this differential equation is (13.2), where 4/c is a constant to be determined experimentally.
which depends on the number of congurations n obj for a given state (dened, for example, by xed energy and volume). Now imagine that the object is placed in contact with a very large thermal reservoir. The object could be the electromagnetic radiation inside a hollow blackbody apparatus, and the reservoir could be the walls of the apparatus, capable of holding far more energy than the light eld can hold. The condition for thermal equilibrium between the object and the reservoir is S obj S res 1 = (13.35) Uobj Ures T where temperature has been introduced as a denition, which is consistent with (13.27). The total number of congurations for the combined system is N = n obj n res , where n obj and n res are the number of congurations available within the object and the reservoir separately. A thermodynamic principle is that all possible
327
congurations are equally probable. In thermal equilibrium, the probability for a given conguration in the object is therefore proportional to P N = n res = e S res /kB n obj (13.36)
where we have invoked (13.34). Meanwhile, a Taylors series expansion of S res yields S res eq + S res (Ures ) = S res Ures Ures
eq Ures Ures + ...
Ures
eq
(13.37)
Higher order terms are not needed since we assume the reservoir to be very large so that it is disturbed only slightly by variations in the object. Since the overall energy of the system is xed, we may write
eq Ures Ures = Ures = Uobj
(13.38)
where Uobj is a small change in energy in the object. When (13.35), (13.37), and (13.38) are introduced into (13.36), the probability for the specic conguration Uob j eq 1 S (U ) becomes P e kB res res kB T , or simply P e
Uob j kB T
(13.39)
since the rst term in the exponent is constant. Uobj represents an amount energy added to the object to establish a conguration. In the case of blackbody radiation, a mode takes on energy Uobj = n , where n is the number of energy quanta in the mode. The probability that a mode carries energy n is therefore proportional to e
n k T B
328
Exercises
Exercises for 13.1 Stefan-Boltzmann Law P13.1 The Sun has a radius of R S = 6.96 108 m. What is the total power that it radiates, given a surface temperature of 5750 K? A 1 cm-radius spherical ball of polished gold hangs suspended inside an evacuated chamber that is at room temperature 20 C. There is no pathway for thermal conduction to the chamber wall. (a) If the gold is at a temperature of 100 C, what is the initial rate of temperature loss in C/s? The emissivity for polished gold is e = 0.02. The specic heat of gold is 129 J/kg C and its density is 19.3 g/cm3 . HINT: Q = mc T and Power = Q /t . (b) What is the initial rate of temperature loss if the ball is coated with at black paint, which has emissivity e = 0.95? HINT: You should consider the energy owing both ways.
P13.2
Exercises for 13.3 Plancks Formula P13.3 Derive (or try to derive) the Stefan-Boltzmann law by integrating the (a) Rayleigh-Jeans energy density
u eld =
0
Rayleigh-Jeans () d
u eld =
0
Wien () d
Please evaluate .
HINT:
0
x 3 e ax d x =
6 . a4
u eld =
0
Planck () d
HINT:
0
x3d x e ax 1
4 . 15a 4
Exercises
329
P13.4
which gives the strongest wavelength present in the blackbody spectral distribution. HINT: See Example 13.1. You may like to know that the solution to the transcendental equation (5 x ) e x = 5 is x = 4.965. (b) What is the strongest wavelength emitted by the Sun, which has a surface temperature of 5750 K (see P13.1)? (c) Also nd max and show that it is not the same as c /max . Why would we be interested mainly in max ?
Index
ABCD Law for Gaussian Beams, 291 ABCD matrices transmission through a curved surface, 238 ABCD Matrices for Combined Optical Elements, 239 ABCD matrix, 236 aberration, 235 Aberrations and Ray Tracing, 248 absolute value, 1, 10 Airy pattern, 280 Amperes Law, 32 Amperes law, 27 angle addition formula, 7 anisotropic, 117 aperture, 258 Arago, Francois Jean Dominique, 261 array theorem, 275 Array Theorem, The, 282 arrival time, 186 astigmatism, 249 beam waist, 275, 287, 290 Beyond Critical Angle: Tunneling of Evanescent Waves, 96 biaxial, 123 Biaxial and Uniaxial Crystals, 123 Biot, Jean-Baptiste, 30 Biot-Savart law, 30 birefringence, 117, 122, 126 blackbody, 315 blackbody radiation, 315 Bohr, Niels, 322 Boltzmann Factor, 326 boundary conditions, 75 Boundary Conditions For Fields at an Interface, 84 Brewsters Angle, 80 Brewster, David, 80 broadband, 172 carrier frequency, 182 Cartesian coordinates, 1 causality, 195, 196 Causality and Exchange of Energy with the Medium, 191 centroid, 193 characteristic matrix, 108 chirped pulse amplication, 189 chirping, 185 Christiaan Huygens, 126 chromatic aberration, 248 circular polarization, 146 circular polarizer, 160 circularly polarized light, 145 Clausius-Mossotti Relation, 63 coefcient of nesse, 95 coherence length, 208 coherence time, 208 Coherence Time and Fringe Visibility, 208 coma, 249 complex angle, 11 complex conjugate, 10 complex notation, 45, 47 Complex Numbers, 6 complex plane, 9 complex polar representation, 9 concave, 237 conductivity, 70 constitutive relation, 49 Constitutive Relation in Crystals, 117 continuity equation, 33 convex, 237
331
332
INDEX
convolution theorem, 24 cosine complex representation, 7 Coulombs law, 28 critical angle, 81 cross product, 2 curl, 3 current density, 29 curvature of the eld (aberration), 250 cylindrical coordinates, 3 degree of coherence, 206, 208 degree of polarization, 146, 159, 162 density of modes, 318 depth of focus, 290 determinant, 12 dielectric, 45 Diffraction Grating, 284 Diffraction of a Gaussian Field Prole, 287 Diffraction with Cylindrical Symmetry, 265 Dirac delta function, 17 dispersion, 45, 171, 182, 183 dispersion relation, 46 in crystals, 121 displacement current, 34 Distinguishing Nearby Wavelengths in a Fabry-Perot Instrument, 101 distortion, 250 divergence, 3 divergence theorem, 5 dot product, 2 Double-Interface Problem Solved Using Fresnel Coefcients, 90 eikonal equation, 227, 230 Eikonal Equation, The, 228 Einsteins A and B Coefcients, 322 Einstein, Albert, 323 electric eld, 28 Electric Field in Crystals, 131 ellipsometer, 157 Ellipsometry, 157
elliptical polarization, 145, 147, 148 Elliptically Polarized Light, 148 ellipticity, 149, 158 emissivity, 316 energy density, 192, 316 Energy Density of Electric Fields, 66 Energy Density of Magnetic Fields, 68 energy transport velocity, 192 Equipartition Principle, Failure of, 317 Eulers formula, 7 evanescent waves, 82 extraordinary, 117 extraordinary index, 123, 124 f-number, 290 Fabry, Charles, 98 Fabry-Perot, 98 Fabry-Perot etalon, 100 Fabry-Perot interferometer, 100 Fabry-Perot, Distinguishing Nearby Wavelengths, 101 Failure of the Equipartition Principle, 317 far eld, 257, 264 Faradays Law, 31 Faradays law, 27, 46 Faraday, Michael, 31 fast axis, 153 Fermats Principle, 231 Fermats principle, 227 Fermat, Pierre, 231 nesse, 104 nesse, coefcient of, 95 uence, 205 focal length, 242 Fourier expansion, 14 Fourier integral theorem, 14, 16 Fourier Spectroscopy, 210 Fourier Theory, 13 Fourier transform, 16, 177 Fraunhofer Approximation, 264 Fraunhofer Diffraction Through a Lens, 275 Fraunhofer, Joseph, 264 free spectral range, 102
INDEX
333
frequency, 46 Frequency Spectrum of Light, 176 Fresnel Approximation, 262 Fresnel Coefcient, 77 Fresnel coefcients, 78 Fresnel Coefcients, The, 77 Fresnels equation, 121 Fresnel, Augustin, 77 Fresnel-Kirchhoff Diffraction Formula, 267 fringe, 302 fringe pattern, 301 fringe visibility, 208 fringes, 100 frustrated total internal reection, 96 Gabor, Dennis, 304 Galileo, 241 Gauss Law, 28 Gauss law, 27, 28 Gauss Law for Magnetic Fields, 29 Gauss, Friedrich, 29 Gaussian Laser Beams, 289 Generalized Context for Group Delay, 185 Generating Holograms, 303 Gouy shift, 290 gradient, 3 grating, 189 Greens Theorem, 270 group delay, 172, 187 group delay function, 183 group velocity, 171, 175, 183 Group vs. Phase Velocity: Sum of Two Plane Waves, 174 half-wave plate, 154 Hankel transform, 266 helicity, 149, 158 Helmholtz equation, 261 hologram, 301 Holographic Wavefront Reconstruction, 304 holography, 301
Huygens Elliptical Construct for a Uniaxial Crystal, 134 Huygens Principle as Formulated by Fresnel, 258 Huygens, Christian, 257 hyperbolic cosine, 7 hyperbolic sine, 7 identity matrix, 11 image, 233, 235 Image Formation, 241 imaginary number, 7 imaginary part, 8 Index of Refraction, 48 index of refraction, 45 Index of Refraction of a Conductor, 54 instantaneous power spectrum, 195 intensity, 59 Intensity of Superimposed Plane Waves, 172 Interferograms, 301 inverse Fourier transform, 177 inverse matrix, 11 irradiance, 56, 59 Irradiance of a Plane Wave, 58 isotropic, 117 isotropic medium, 58 Jeans, James Hopwood, 319 Jones Matrices for Wave Plates, 153 Jones matrix, 145 Jones Matrix for Polarizers at Arbitrary Angles, 152 Jones vector, 145, 148 Jones Vectors for Representing Polarization, 147 Jones, R. Clark, 147 Kirchhoff, Gustav, 315 Kramers-Kronig Relations, 196 Kronecker delta function, 215 Laplacian, 4 laser, 316, 324 laser beam, 289
334
INDEX
laser cavity, 245 law of reection, 75 lens, 239 lens makers formula, 239 Linear Algebra, 11 linear medium, 49 linear polarization, 145 Linear Polarizers and Jones Matrices, 149 Linear, Circular, and Elliptical Polarization, 146 Lorentz model, 51 Lorentz Model of Dielectrics, The, 51 Lorentz, Hendrik, 51 Lorentz-Lorenz formula, 64 magnetic eld, 29 magnication, 242 magnitude, 1 matrix multiplication, 11 Maxwells Adjustment to Amperes Law, 33 Maxwells equations, 27 Maxwell, James, 34 Michelson Interferometer, 203 Michelson, Albert, 205 mirage, 230 Mueller matrix, 163 Multilayer Coatings, 105 multimode, 289 narrowband, 172 negative crystal, 124 Newton, Isaac, 172 normal to a surface, 6 object, 235 obliquity factor, 262, 270 optic axes of a crystal, 123 optical activity, 166 optical axis, 228, 235 optical path length, 232 ordinary, 124 oscillator strength, 53 p-polarized light, 74, 91
Packet Propagation and Group Delay, 181 paraxial approximation, 228, 235 paraxial ray theory, 228 Paraxial Rays and ABCD Matrices, 235 paraxial wave equation, 263, 264 Parsevals relation, 18 Parsevals theorem, 178 Partially Polarized Light, 159 pellicle, 99 phase delay, 182 phase velocity, 174, 183 photometry, 60 photon, 316 Plancks Formula, 319 Planck, Max, 321 plane of incidence, 74 Plane Wave Propagation in Crystals, 119 Plane Wave Solutions to the Wave Equation, 45 plane waves, 45, 47 plasma frequency, 53 Poissons spot, 260 polarizability, 63 Polarization Effects of Reection and Transmission, 156 polarization of a medium, 36 polarization of light, 145 Polarization of Materials, 36 polarizer, 145 Polaroid, 149 positive crystal, 124 power spectrum, 177 Poynting vector, 56, 58 Poynting Vector in a Uniaxial Crystal, 125 Poyntings Theorem, 55 Poyntings theorem, 56, 192 Poynting, John Henry, 56 principal axes, 119, 122 principal planes, 228, 244 Principal Planes for Complex Optical
INDEX
335
Systems, 244 principal value, 197 pulse chirping, 183 Pulse Chirping in a Grating Pair, 189 pulse stretching, 183 Quadratic Dispersion, 183 quarter-wave plate, 154, 157 radiometry, 60 Radiometry, Photometry, and Color, 60 radius of curvature, 239 ray, 227, 231 ray diagram, 243 ray tracing, 235 Rayleigh criterion, 281 Rayleigh range, 290 Rayleigh, Lord, 176 real image, 243 real part, 8 rectangular aperture, 263, 265 reectance, 78 Reectance and Transmittance, 78 reection, 237 Reection and Refraction at Curved Surfaces, 237 reection from a curved surface, 238 Reections from Metal, 83 refraction, 75 Refraction at a Uniaxial Crystal Surface, 124 Refraction at an Interface, 73 Repeated Multilayer Stacks, 109 reshaping delay, 187 resolution, 275, 280 Resolution of a Telescope, 279 resolving power, 104, 287 retarder, 153 right-hand rule, 3 ring cavity, 246 Roemer, Ole, 43 Rotation of Coordinates, 129 s-polarized light, 74, 91
Savart, Felix, 30 scalar diffraction, 258 Scalar Diffraction Theory, 260 scalar Helmholtz equation, 261 senkrecht, 74 Setup of a Fabry-Perot Instrument, 100 signal front, 192 sine complex representation, 7 skin depth, 50 slow axis, 153 Snells law, 75, 124 Snell, Willebrord, 75 spatial coherence, 203, 211, 212, 215 Spatial Coherence for a Continuous Source, 216 spatial lter, 313 Spectrometers, 285 spectrum, 177 spherical aberration, 249 spherical interface, 238 spherical surface, 237 spherical wave, 258, 305 Stability of Laser Cavities, 245 Stefan-Boltzmann Law, 316 stochastic phase, 214 Stokes parameters, 161 Stokes vector, 159, 162 Stokes theorem, 6 Stokes, George Gabriel, 159 Strutt, John William, 176 subluminal, 188 superluminal, 171, 188 surface gure, 302 susceptibility, 49 susceptibility tensor, 118 Sylvesters theorem, 12, 109 Symmetry of Susceptibility Tensor, 128 Table of Integrals and Sums, 20 Taylors series, 7 temporal coherence, 203, 204 Temporal Coherence of Continuous Sources, 209
336
INDEX
Testing Optical Components, 302 Thermodynamic Derivation of the Stefan-Boltzmann Law, 324 thin lens, 239 Total Internal Reection, 81 transmittance, 79 Two-Interface Transmittance at Sub Critical Angles, 93 uniaxial, 123 unit vector, 1 unpolarized light, 145, 159 Van Cittert-Zernike Theorem, 217 vector, 1 Vector Calculus, 1 vector multiplication, 2 virtual image, 253 voltage, 31 Wave Equation, The, 37 wave number, 46 wave plate, 145, 153 wavelength, 46 Wien, Wilhelm, 320 Youngs Two-Slit Setup and Spatial Coherence, 211 Young, Thomas, 214
Physical Constants
Constant Permittivity Permeability Speed of light in vacuum Charge of an electron Mass of an electron Boltzmanns constant Plancks constant Stefan-Boltzmann constant
Symbol
0
Value 8.8542 1012 C2 /N m2 4 107 T m/A (or kg m C2 ) 2.9979 108 m/s 1.602 1019 C 9.108 1031 kg 1.380 1023 J/K 6.626 1034 J s 1.054 1034 J s 5.670 108 W/m2 K4
0 c qe me kB h