0% found this document useful (0 votes)
67 views

Solving High-Dimensional Partial Differential Equations Using Deep Learning

Machine Learning

Uploaded by

Diego Manzur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
67 views

Solving High-Dimensional Partial Differential Equations Using Deep Learning

Machine Learning

Uploaded by

Diego Manzur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 6
PNAS ® Solving high-dimensional partial differential equations using deep learning Jiequn Han*, Arnulf Jentzen®, and Weinan E*<*" “Program in Applied and Computational Mathematics, rincston University, Princston, N! 08544; Seminar for Applied Mathematics, Department of Mathomati, ETH Zoic, 8082 Zurich, Switzetiand; ‘Department of Mathematics, Princeton University, rneeten MJ 08544 nd sBeing insu of ig Data Research, Boing 100871, china Felted by George Papanicolacu, Stanford University, Stanford, CA, and approved June 19,2018 (reclved fr review November 7, 2017) Developing algorithms for solving high-dimensional partial di ferential equations (PDEs) has been an exceedingly difficult ‘ask for a long time, due to the notoriously difficult problem known as the “curse of dimensionality.” This paper introduces 4 deep learning-based approach that can handle general high dimensional parabolic PDEs. To this end, the PDES are reformu- lated using backward stochastic differential equations and the ‘gradient of the unknown solution is approximated by neural networks, very much in the spirit of deep reinforcement learn- ing with the gradient acting as the policy function. Numerical results on examples including the nonlinear Black-Scholes equa- ton, the Hamilton-Jacobi-Bellman equation, and the Allen-Cahn ‘equation suggest thatthe proposed algorithm s quite effective in high dimensions, in terms of both accuracy and cost. This opens. Lup possiblities in economics, finance, operational research, and Physics, by considering all participating agents, assets, resources, ‘F particles together at the same time, instead of making ad hoc assumptions on their interelationships. pata diferental equations | backward stochastic liferential equations high dimension | deep learning | Feynman-Kac Pees ites equations (PDES) are among the most ‘ubiquitous tools used in modeling problems in nature. Some ‘of the most important ones are naturally formulated as PDEs in, high dimensions. Well-known examples include the following: 4) The Schrddinger equation in the quantum many-body prob- Jem. In this ease the dimensionality of the PDE is roughly three times the number of eleetrons or quantum particles in the system. ii) The nonlinear Black-Scholes equation for pricing financial derivatives, in which the dimensionality of the PDE is the number of underlying financial assets under consideration. it) The Hamilton-Jacobi-Bellman equation in. dynamic pro- gramming. Ina game theory seting with multiple agen, the dimensionality goes up linearly with the number of agents similarly, in a resource allocation problem, the dimen- Sionality goes up linearly with the number of devices and As elegant as these PDE models are, their practical use has proved to be very limited due tothe curse of dimensionality (1): ‘The computational cost for solving them goes up exponentially with the dimensionality ‘Another area where the curse of dimensionality has been an ‘essential obstacle is machine learning and data analysis, where the complesty of nonlinear regression models, for example, 0s up exponentially with the dimensionality. In both cases the ‘essential problem we face is how to represent oF approximate a nonlinear function in high dimensions. The traditional approach, by building functions using polynomials, piecewise polynomials, ‘wavelets, or other bass functions is bound to run into the curse ‘of dimensionality problem. In recent years @ new class of techniques, the deep nev- ral network model, has shown remarkable suseess in artificial ‘wo pras.rgegidoV0107%pra 788682015. intelligence (e.g, rel8. 2-6). The neural network is an old ‘dea but recent experience has shown that deep networks with ‘many layers seem to do a surprisingly good job in modeling complicated datasets. In terms of representing functions, the neural network model is compositional: It uses compositions (of simple functions to approximate complicated ones. In con- ‘rast, the approach of classical approximation theory is usually additive. Mathematically, there are universal approximation the~ forems stating that a single hidden-layer neural network can Approximate a wide class of functions on compact subsets (see, eg. survey in ref. 7 and the references therein), even though we still lack a theoretical framework for explaining the seem- ingly unreasonable effectiveness of multilayer neural networks, Which are widely used nowadays. Despite this, the practical success of deep neural networks in artificial intelligence has been very astonishing and encourages applications to other problems where the curse of dimensionality has been a tor- ‘menting issu. In this paper, we extend the power of deep neural networks to another dimension by developing a strategy for solving a large class of high-dimensional nonlinear PDEs using deep learning ‘The class of PDEs that we deal with is (nonlinear) parabolic PDEs. Special cases include the Black-Scholes equation and the ‘Hamilton-Jacobi-Bellman equation. To do so, we make use of the reformulation of these PDEs as backward stochastic differ- ential equations (BSDEs) (e, refs. 8 and 9) and approximate the gradient of the solution using deep neural networks. The methodology bears some resemblance to deep reinforcement learning with the BSDE playing the role of model-based rein- forcement learning (or control theory models) and the gradient of the solution playing the role of policy function. Numerical Significance Partial differential equations (PDEs) are among the most ubig tuitous tools used in modeling problems in nature. However, solving high-dimensional PDEs has been notoriously difficult due to the “curse of dimensionality” This paper introduces 2 practical algorithm for solving nonlinear POEs in very high (hundreds and potentially thousands of) dimensions. Numer- Teal results suggest that the proposed algorithm is quite effective for a wide variety of problems, in terms of both accuracy and speed. We believe that this opens up a host ‘of possiblities in economics, finance, operational research, and physics, by considering all participating agents, assets, resources, or particles together at the same time, instead of ‘making ad hoc assumptions on their interelationships. ‘Author cottons: IM, AJ, end WE. designed reseach,prlormed reseeh PNAS. | August24,2018 | vol 115 | no. 34 | 805.8510 PNAS ‘examples manifest that the proposed algorithm is quite satisfac. tory in both accuracy and computational cost. Due to the curse of dimensionality, there are only @ very limited number of cases where practical high-dimensional algo- rithms have been developed inthe literature. For linear parabolic PDEs, one can use the Feynman-Kac formula and Monte Carlo ‘methods to develop efficient algorithms to evaluate solutions at ay given space-time locations. For a class of inviscid Hamilton~ Jacobi equations, Darbon and Osher (10) recently developed ‘an effective algorithm in the high-dimensional case, based on the Hopf formula for the Hamilton-Jacobi equations. A general, algorithm for nonlinear parabolic PDEs based on the multilevel decomposition of Picard iteration is developed in ref. 11 and has been shown to be quite efficient on a number of examples, in finance and physics. The branching diffusion method is pro- posed in refs. 12 and 13, which exploits the fact that solutions ‘of semilinear PDEs with polynomial nonlinearity can be repre- sented as an expectation of a functional of branching diffusion processes. This method does not suffer from the curse of dimen- sionality, but still has limited applicability due to the blow-up of approximated solutions in finite time. "The starting point of the present paper is deep learning. It should be stressed that even though deep learning has been a very successful tool for a number of applications, adapting it to the current setting with practical success is still a highly nontriv- ial task. Here by using the reformulation of BSDEs, we are able to cast the problem of solving PDEs as a learning problem and we design a deep-learning framework that fits naturally to that, setting. This has proved to be quite successful in practice. ‘Methodology We consider a general class of PDEs known as semilinear parabolic PDEs, These PDEs can be represented as 2 a) 4 bay (oo (.2(Hes.0)(t,2))+ Pu(h2)-nlt,2) a s(t). 0% 2 Vat) with some specified terminal condition w(T,2) = g(x). Here and + represent the time and d-dimensional space variable, Tespectively, 41 is a known vector-valued function, o is a known dx d matrix-valued function, o" denotes the transpose associ- sated too, Vuand Hess 1 denote the gradient and the Hessian of| function w with respect tox, Tv denotes the trace of matrix and {isa known nonlinear funeion. To fx ideas, we are interested ‘in the solution at z= € for some vector €€ R°. Let {iJeeo-7) be'a d-dimensional Brownian motion and {Xi}ica.n be dimensional stochastic proces which satisties xemes [ul Xdes f'ol.X ae ‘Then the solution of Eq. 1 satisfies the following BSDE (ct. e., refs. 8 and 9) Wt) 0.2%) [so %mo.X)0%.X) Puls.KD) AE + [ 19006. 04%) 40%, We refer to Materials and Methods for further explanation of G3. ‘To derive a numerical algorithm to compute u(0, Xo), we treat u(0, No) 04, Vu(0, Xe) Oy as parameters in the model £8306 | wor prasorgcg/3o¥%01078pras.1718842115. and view Eq. 3 as a way of computing the values of u at the terminal time 7, knowing u(0, Xo) and Vu(t, X,). We apply a temporal discretization to Eqs. 2 and 3. Given a partition of the time interval [0, T]: 0=t 0), we can tum Eq. 18 into the form ‘of Eq. 1 such that the deep BSDE method can be used. Fig. 5, Top shows the mean and the SD of the relative error of ‘u(t=0.3, 2=(0,...,0)). The not explicitly known exact solution of Eg. 15 at 1013, s=(0,...,0) has been approximatively computed by means of the branching difusion method (c., refs 12 and 13): u(t=0.8, 2=(0,...,0))~0.0528. For this 100- dimensional example PDE, the deep BSDE method achieves a relative error of 0.10% in a runtime of 647 s on a Machook Pro, We also use the deep BSDE method to approximatively com: pute the time evolution of u(t,2=(0,-..,0)) for te [0,0.3] Fig. 3, Bottom). Conclusions ‘The algorithm proposed in this paper opens up a host of possibilities in several different areas. For example, in eco- nomics one can consider many different interacting agents at the ‘same time, instead of using the “representative agent” model. Similarly in finance, one ean consider all of the participating 50m | wor pras.rgcg/d0¥101073%pna8.1798642015. instruments at the same time, instead of relying on ad hoe assumptions about their relationships. In operational research, fone can handle the cases with hundreds and thousands of par- ticipating entities directly, without the need to make ad hoc approximations. It should be noted that although the methodology presented here is fairly general, we are so far not able to deal with the quan- ‘tum many-body problem due to the difficulty in dealing with the Pauli exclusion principle Materials and Methods {SDE Reformulaton. The link between (nonines) parabolic POEs and ASDES has been extensively investigate inthe Inarature (eg, res 8,8, 25, and 27, In particular, Markovian BSOES give 2 nonlinear Feynman-Kac repre Sentation of some nonlineat parabolic PDES Let (0, P) bea probably Space, W: [0,7] 1->R" be a dimensional standard Brownian motion, and {Fieciary Be the normal fitration generated by (W,]rcjas. Consider the fllowing SOE, nace [uomas [loanran 98 pois [Aaxerazrie- [Cham om for which we are seeking a {Fehiar-adapted solution process (OY, Z)rcian with values in Rx eR. Under eutable regularity assump tions onthe coffient functions, 2, and f, one can prove existence and ptosndistinguishabity uniqueness of solutions (ef, eres. 8 an 35) Furthermore, we have thatthe noniloear parabolic PDE ls related 107 Relative approximation err wt ‘© 00 1000 1900 2000 2500 s000 3500 4000 Number ftoation stops 025 Fo Sours 10 008 ‘000 005 070 015 020 025 0% Fg. 3. (Top) Relative error of the deep BSDE method for ult=0.3,x= (0..--,0) against the numberof eration steps Inthe case of the 100 ‘mensional Allon-Cahn €. 15 with 20 equidistant time steps (N=20) and learning rate 0.0005. The shaded area depicts the mean =the SD ofthe Felative eroe for five sifferent curs. The deep BSDE method achieves 9 Felativeeror of size 0.30% in a runtime of 647 x. (Bottom) Time ev tion of u(t x0...) for re [0,0.3|n ce case of the 100-dimersional Allen-Cahn a, 15 computed by means of the deep BSOE method. PNAS =H correspond to a subetvrk atime tA), to the BSDEF 16 and 17 in the sense that for all F€/0,7] # holds Pras that eeu) and 2 = 64x) VUE XY 1) (ety €a res, 8 and 9). Therefore, we can compute the quantity 0. %) ‘etciated tog, ¥ through Yo by solving the 8SDE 16 and #7. More speci: ically, we plug the identities in &q 18 into Eq 17 and rewrite the equation forwardly to obtain the formula in 6,3. Then we discretize the equation temporally and use neural networks to approximate the spatial gradients ana finaly the unknown function, 2 introduced inthe methodology f ths paper. Neural Network Architecture. In this subsection we bre llustrate the architecture of the deep BSE method. To simplify the presentation we restrict ourselves in these illustrations tothe case where the aiffusion coe ficient o in, 1 sates that Vie R** o(8)= Mya. Fig. illustrates the network achitecute for the deep BSDE method: Note that Vuln Xn) ‘denotes the variobie we approximate tect by subnetworks and ut) ‘denotes the variable we compute erative inthe network, There are tree types of connections inthis network 1 Xiph == + Welt Xa) the multlayer feedforward ‘ural network approximating the spatial graclant a time t=. The eights dp ofthis suetwork are the parometers weal to optimize 1) te X), VU Kl Wing — Wi) * Uo Nig) the forward HReraton giving the inal"darput ot the network san approxmation of uty Xyh completely characterized by Eas. 5 and 6. There are no parameters tobe optimized ths type of connection, W) Cle Wigs, — Win) > Xp the shortaut connecting blocks at iter tent tied which i chatScerzed by Eqs. 4 and 6. There are also no parameters tobe optimized in this type of connection If we use W hidden layers in each subnetwork, as illustrated in Fig. 4 ‘then the whole network as (H+ NM ~ 1) layers in total that vole fee pafameters tobe optimized smutancoush ‘Table 1. The mean and SD of the relative error for the PDE in Eq. 19, obtained by the deep BSDE method with different numbers of hidden layers No.of layers! Relative error 29, Saeed ere ve ea] Mean, % 229 090 060 086 03 so 0.0026 0.0016 0.0017 0.0017_ 0.0014 The PDE is solved until convergence with 30 equidistant time steps 30) and 40,000 iteration steps. Learning fate if B01 for the fst ha ‘of teratons and 0.001 Yor the second halt “We count only the layers that have free parameters o be optimized stration ofthe network architecture for solving semilinear parabolic PDEs with H hidden layers for each subnetwork and time interval. ‘The whole network has (| )(N-~ 1) layers fn total that involve free parameters to be optimized simultaneous Each column for f= tt In are the intermediate neurons in the subnetwork at ime ‘orn ue LDeedN =H Te should be pointed out thatthe proposed deep BSDE method can ako bbe uted if we are interested in values of the POE soliton vin a region Dec" at time ¢=0 instead of at a single spacepoint Rin this ese we choose X= to be a nondegenerate Dvalued random variable and ‘we use two atonal neural networks parameterzed By (lyr up) FE approximating the functions 09 x-+ 0.2) €R and D9.x+> Vul0,2) 3 Upper and lower bounds for approximation errors of stochastic approxima tion algorithms for PDEs and BSDEs, rexpectively, canbe foundin refs 27-29, land the references therein, Implementation. We describe n deal the implementation forthe numer: CGlexampes presented in this paper Each subsets full connected and ‘Const of fout ayers (except the example inthe next subsection with one put layer (a dimensional), two hidden layers oth d+ 10 dimensiona, and one cutput layer (dimensional. We choose the rectifer function (RLU) as our activation function. We ako adopted the technique of bateh ‘ormalzation 30 in the subnetwork, ight after each linear transformation land before ectvetion. This technique eceerates the Waining by allowing larger step size and easier paramaterinialzation Al of the parame {ers ae intalized through a normal ora uniform dstibution without any prevaining, ‘We use Tensorfiow (31) to implement our algorithm wth the Adam opt- sizer (1) to optimize parameters. Adam isa variant ofthe SCD algorithm, Bose an adeptive estimates of lower-rder moments. We set the default values for corespending hyperparameters a5 recommended in ref. 18 and (Choose the batch size a5 6A n each ofthe presented numerical examples ‘the means ond the SDs of the relate {approximation ears are com puted appronimatively by means of ive independent runs ofthe algorithm ‘eth different random seed. All ofthe numerical examples reported are fun on a Macbook Pro with # 25-GHE Intl Coe I processor and 16 GB ‘memory. fect of Number of Hidden Layers. The accuracy of the deep BSDE method Certainly depends on the number of hidden layers in the subnetwork {approximation Eq 7- To test this effect, we solve a reaction aifusion type POE witha ifferent number of hidden ayers inthe subnetwork. The POE i 3 high-dimensional vasion (4100) ofthe example analyzed numerically In Gobet and Turkedjiev (32) (4=2), ened stm} Mane Ladinsnnfiuen—ven?}=0 19 In which us(.9) the expt oscllating solution Ei eo vrtannn san(astaaenn( 6-2) rameters are chosen in the same vay as In et 32 n= 18, Or, rat Arse! ttre wth sip conection i wad in each ub tetwork wth eat hidden oer having a neuron. We Ines the PNAS | August21,2018 | vol. 115-| no. 34 | 8508 arruio arenas PNAS umber of hidden layers in each subnetwork from zero 10 four and report the relative error in Table 3, ti evident that the approxin tion accuracy increases a5 the number of hidden layers in the subnetwork etman RE (1987 @ynami gramming (rncton Uy Pras icon ooutlow engin, CouvieA 2016) Dee Leong AT Pes, Covi MA {ican Seng. Hon G 2015) Depleting. Nature S21 A364, ‘canenky fy sue | Hon GE 2012) magnet csiaon with deen con ‘elution neat mtr ares in Neu! formation cen Sates, ret, Pte, turges bate Leinberger KO curan soca, he Reatioo Vl pp 187-10, Hirt Got ak 2042) Deep neural rete for acute modeling in speach ‘ecogniion The shred ews of four rsarh soup If gral Proce Mog Sher, Hung A Maden, ue, G01 Matern tha gre of Goth Faor a G39) sgpronaton tay fhe MP mods in pura macwoks. Ata Pardon Peng (192) Sacknard scan deel equations and qi ‘irpurabcc paral erential equator Stoshast arta) ierental Equations re Ther Appestors (harlot MC 138), Lecture Noes non ardor ‘maton sence es Rosa BL, Somes Ha pringer Brie), Vo 76 pp 200 9, Prdou Tag § (198) Forward card toch diferent equations and ‘qualia prac Pot obs Theor alt Far 412350, 10, Brbon) omer 5 Gat) Algor for owreaming the cane of menos for ain Homie quan artingia earl head ane Re Mh 11, Ey Ruaematr a Jez A, Kn 7 QD16) On mule! Pear numer “oproumotors for hghdimersona! nningorpurboe pr! dieertal eau {Sr a hme nna bewed ece ae etn 12. He arte P2012) Counterparty saan: A marke bancingeitson rare Tan Tous 2018) marl algorthm fora a of S08 sss TRelarshing preva tach re sp 198-1, 1 Kinga tc optimnon ante 412580, Pep poed Dcomber2. 218. 15, ae, sles (57 Te pig of pon an cope abies J Flt on ‘tears repnted n Blak, choles M QTD Pnancal Rak Messer se “MarazementInernaoral Lary of ial Wits a Earomis Ewer get ‘Geter, UE, Vl 257 po 10-17 510 | wor pras.orgeglsov10107pra.1798642015. ACKNOWLEDGMENTS. The work of JM. and WE. is supported in part by National Natural Science Foundation of China (NSFC) Grant 91120005, US Department of Energy (DOE) Grant F-SC0009248, and US Office of Naval Research (ONR) Grant NOOD34- 131.0538, 16, Dut Sader ska € (1969 Recreation of fatale acuer She ning of elton of une. Ar po! robb 6175-109 17. Bande Cseier M,Z J 207) ® oma sg for BSD. Math 1, Beran ¥2(995 Option ering wit iret tres ates. Re Finan Sa 18. ald H 1985) option ping andre 2, fiona Mi Lay Pre & (199) eng and hedsing dere acres ‘rks wth natn vie Al Mtn inane 279 8 2. apy carbo re gar W013) cougar ae feng The four ngs of te TVR Theor App Rare 16135006. 22. Forth PX, Vota KR GBD mpl won of uaa vlatitrarecion cmt fptanpicng mode wth dete observed bates. pp Me! Mah 36427~ Powell W501) Aarotinate Oyamic Mogremming:Sohing the Cases of Dimers ey he Yor. “ors Zh X (989) StochatiConto Stinger New Yr Emer 200) Tre Difie Irae Apranh n Maer Scene: Theo Symone Conzepts and Apalcavon of Pave ts ade Ginger New Yo [ston Peng’ Qenes ME (357 Bacar toca afer equations a finance Math ance P17 Gober a6) Mone Caro Method and Stacie Proce From nerf on Linear (chapman trate: bea ato, Neiwih'S (2005) The andere Imation complenty of elpte FOE. J Ciel 2220-298 ‘Sete, Viner) G4 Decouting on te Ware space rete ev space, and (Stes Srgedy€ B01) tt normazaton: Alerting dep twa ning Spratt 80 Re, oe ay 2, Aba M, ot 2016) Terri: Aap frargeacale machine ang 2 {sen Sympasam on Operating pts Design and implementation USENK ‘Anactin Berkly C8 pp 255-28 22. Gobet, TrkdjerP O17 Adon Inoranc sping i etsaares Mente to alors Yor Botward chs diferent equa. toh Proc A! 8 BE BB RS

You might also like