0% found this document useful (0 votes)
8 views14 pages

Chapter 0 Introduction

CURSO DE IA PROBABILÍSTICA

Uploaded by

CARLOS HENRIQUE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
8 views14 pages

Chapter 0 Introduction

CURSO DE IA PROBABILÍSTICA

Uploaded by

CARLOS HENRIQUE
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 14
Introduction This chapter introduces python functions, graphics, widgets and pandas. To run the code, lick on the recket symbol at the top ofthe page and select Live Code. The server wil then launch and, after some time (be patient), you will see ready. You can then click Run on the calls below. You should run the cells one ata time, without skipping, as values computed in one cell are reused in subsequent cals Let's run the next cell peine(‘Helte verte’) Helle verte Next, change the values of x and y inthe cell below andl click un print(x sy =", xy) Inthe cell above, you could try to print x « y instead of x + y. You can use that cell to try your own code. I you are not familiar with Python, or ifyou want a refresher, you may find Appendix 1 useful We now load some necesary libraries. (Run the next cell) fram sPython-core.cisplay import HIM. port sunpy asm Inport matplotlib moore seipy from scipy.stats Apert norn from scipy stats Snport boron from scipy.stats Saport poisson Inport panes 95 pa parans (Figure: Figsize':(12,6), # These are plot paraweters Legend. fontsize’: 23) from natplotlib anport pypiot as pt satpiotLio.reParans.update(parans) Inport radon ‘fron ipywidgets Inport * printtTe Libraries los sceess fully") ‘The Libraries loaded successfully \We start with some Python experiments with random variables. Plotting Distributions of Random Variables Binomial: B(100,p) We plot the probability mass function of the Binomial distribution with parameters N’ = 100 and p= 0.1,0.2,0.5. For n = 0,..., 100, this isthe probability that 100 flips ofa coin that yields heads with probability p result in w heads 1k = mp-z0rostN) B= np.zeres(y) C= mp.2er0stN) X= nparange(s) for nin range(®): An} = blron.pm(o,N-1,8) B5{n) = biren.paFnsh-2,6) Cin) = baron pme(nyh-2,e) plt.plot(a,label= p=t.1") # continuous plot gives a sense of the shope of the pa lt. seatter(x,A,5930) # we add the markers on the integers Plt plot(@,tabel=p=2.2") put Seatter(X,8,5=30) Ut plot(€,Tabel='po2.5°) Put Seatter(x,¢,5=30) pie tsele(-pon. tof 9(100,5)") prtsaaser('n') pit legend() Plt show) ims. of 800.9) 008 006 PPF of B(100,0.2): “The percent point function (ppt isthe inverse CDF. For instance, binom ppl(0.95, 1000.2) is the smallest value of n such that P(X 0.95, where X = B(100, 0.2). np. zeras() np. zeros(s) for nin range(§) X{n) = 009 1 8.05%a/( B{n} = Binon.ppF(x(a],100,6) plt.seatear(x,8,5950) prt titleC'p.p.t. of 8(180,0.2)°) pit adabel('n') pt show) pps of 1300.02) Poisson: P(A) ‘The Poisson distribution with parameter 2 i that of the number of phone calls that you receive in one npvarange(s) for m in range(¥) Ato} = polsson.pe(a,a) In) = poisson. paf(a,0) ln} = potssen.paf(are) spleens = ple plot(A,1obelo"$\Lareda = 18°) ple. seatcar(iyA, 5-30) Plt plot(@,labele-$\arede = 46°) ple Seatcar(iy8,5=30) Plt plot(,label-s\tanbdo = 108°) ple seatcer(x,¢,5=30) Pit-titleC pense. of $°(\2anbda)$") pit adabel('n') pit legend) plt-show) day, if you receive A phone calls per day, on average. 1 = np-zerostn) 5 = np.zerosty) C= mpzeros(N) f pimf. of A) os 020 ons Exponential: Expo(A) ‘The exponential distribution with pamameter is that ofthe time {in days) until the next phone call, if yyou get A phone cals per day, on average. boas = np-zeros() 8 = np.zeresty) = mpzeroe(¥) D = nprzerostN) X= (iN}"np.arango(N) ‘for nin range(s: An) = atrp.exp(- 3°#{n1) B[n) = berbvexa(- Box{n]) ln) = eerprexpt- extn) Din) = eerp.expt- sxtn]) splt.aticne =| pit plot(xyaytasel="s\lanbes = 0.255") pit plot(x.6,1abel~"s\Lanbca = 6.55") ple plot(x,c,lavels"s\lanbes = 0.758") plt-plot(x,0,1abel="s\Lanbc2 = 35°) ple-title(-erés ts of $84p0(\larbds)$") prt aaabel(«') pit tegend() put sho) ef. of Exp0(8) os 02 00 Generating and Plotting Random Variables np.random.uniform\a,b) retums one value chosen uniformly at random inthe interval [ab nprrandom.uniform(a, bn) returns n values chosen independently and uniformly 3t random in the interval [a 8 Let: yprandom.uniform(®, 1, 100). We then plot these values. N= 100 # nunber of vatues X= nporandon.entforn(®, 3, 8) # Hantfona random values An £9, 1) ple.plot(x); # plot thece values. The ; at the and of the Line Suppresses unsonted outputs 10 oe 06 os Since X i a sequence of values, itis more appropriate to use 2 seatter plot of X asa function of the Index. To do this, we define timeSteps = nparange(100) = [0, 1, ., 99] and we plot X against timeSteps. ‘inastens = np. arange(s) ple seatter (tinesteps, 3; Strong Law of Large Numbers ‘The values above look lke noise, because they are independent. Is there some statistical regularity of any kind inthis noise? We compute Yin]. the sample average ofthe frst n values, form = 1, 100 and we plot ¥. That is, Yinl = (XO) + .. + Xinl/n+1) for n= 6, .. 99. (Note that indexing requires some care) “The values of X are kept in memory, so we don’t have to generate them again if we ran the previous cel Y= np-zeros(N) # on array of W zeros for min range((): 8 for D = 8, By wove starts with 8) Yle) = sunbx{ snes) ne values of X, for m= @, 2, Plt seatter(tinesteps, 2 = 1 (renenber thot Python A X{snot] (5 the List of the frst net 1 we could have wettten Us code nore effletently es FoLLous (ean you fell why it te more effictent?)> ye) = x0) 4 for'n sn range(99) 2 maa = Carvin) + xEmea})/(n04) 1 ple pleeey; oes). 060 050 oss Central Limit Theorem Observe that, even though the values X fluctuate wildly, their sample average converges to 0.5. That property is the Strong Law of Large Numbers. Note that Y[99] tends to be closer to 05 than ¥115]. The Central Limit Theorem (CLT) makes that observation precise. To illust Previous experiment 100 times and count the fraction of times that Y{a] fas in intervals of width 0.07, 5 and n = 99, We do this the hard way, just to build some familiarity withthe code. that result let us repeat the Y= nporeros(s) # Ufh) mL be the numer of tines that ¥f15) fotLs tn wv9. 02, (he2)%0.03) i= npoteros( so) # W(h) MLL be the number of tines that ¥[99) FotLs tn [e79.02,_(hei)"9.63) {for ioe tn range(200): 9 to 15 the Andex of our experinent np randon-unitonn(O, 1, W) # we generate X x10) for nin rango(¥-3): Wnt] = (n°v{a) + x{na])/ (oe) # we generate ¥, using the more efficient code c= nt (S60°Y[35]) # tnt(x) ts the integer part of x. For instance, sne(as.7) = 35 ve) ents (5 the value of b such that ¥f35) ts &2 (470,05, (he2)70.05), VU) S22 a that fs, vpey = VER] #2 = tne(ae0"¥199)) Mio 2 values = 0:03"1p.arange(i09) # this AS (0, 0:61, 0,02.» 9.99) ple-seatcor(values,4/00)3 # the seatter plot gives the value of V as @ function of As 4 we divide V by 100 to get the fraction of times plt-seatter(values,w/180); We can add legends to the scatter plot as follows fig, ax plt-subpiots() axe scatter(vaiues, ¥/180,label="¥{35)") de-seatter (values, W/i0e,label='¥[9]") be legend() aeartectrue) plteahow() 020 + yas) + Y199] 010 20s) “These plots show that most of the time, Y[99] is very close to 0.5 whereas YIT5] tends to be more dispersed, To make the case abit stronger, we repeat the experiment 10000 times, instead of 100, Y= mporeros(s) # Ufh] MLL be the numer of tines that ¥[15) fotts én ev9.02, (he2)%2.03) Ui nposeros( soa) # W(S) MLL be the number of tines that ¥[99) Fotts tn [e°9.02,.(het)*9.63) for sex 19 ronge(s000e): # tak 1s the index of our expertnent np.randon-unitorn(®, 1, 108) # we generate X wie) = x18) for n in rango( 98) Y{nez] = Ko°v{n) + x(na])/ (net) # me generate ¥, using the more efftctent code = nt(200°Y(35]) ff tnt(x) s the tntager part of x For instance, sne(as.7) = 35 ents ts the value of b such that ¥f35) {st (279,01, (he2)"0.01), Vik) = 2 a that , vpey = VER) #2 = ine(a00"¥199)) Mtg 2 ‘inestens = 0.01"mp.arange(see) # this ts [2 6.01, 6,82, ..., 0.99) Fig, ax = pit subplots) parscatter(values, 0/1000, abel="15]°) dr seatter(values, W/1¢000,3be1-"¥(99]") be tegenac) decerie(rrue) pltshow) + Yas) on oc + 199] “The two plots show that YI15} and 99] havea probably distribution that looks like a bell cure, ie, 9 Gaussian distribution, The dstbbution of [15] is more spread out than that of YI99] which is more concentrated around 0.5, The spread is measured by the standard deviation of the distribution and its vale iso /T6 = 2/4 for ¥115} and o/-V/TO0 = a /10 for V9). Here, o's the standard deviation of a uniform [0, 1] random variable, and « = 1//T2 = 0.3. Roughly, about 70% of the values fall within one standard deviation of 05, Thus, one expects that about 70% of the values of [15 fall within «/4 = 0.07 of 0.5 and 70% of the values of ¥[99] fall within a /10 = 0.03 of 0.5. Lets ty to see how far the empirical distribution of [99 i from a Gaussian. The Centra Limit Theorem states that Y[99] should be almost be distributed like 2 Gaussian ranclom variable with mean 0.5 and standard deviation 0/10 = 0.03, We plot the fraction of experiments, out of 10000, where ¥199) is less ‘than k x 0.01, asa function of k € [1,..., 100] and we also plot F[k/100] where F isthe cumulative distribution function (cd of a Gaussian G with mean 0.5 and standard deviation 0.03. We use the cdf norm cf of a Gaussian SG with mean 0 and standard deviation 1. The trick is that (G - 0.5)/0.03 is then distributed lke SG, so thatthe probabilty that G < xis the probability that (G -05/0.03 < (x -0:5/003, ie. the probability that SG < (x 050.03. As you see, the plot confirms the prediction of the CLT. 2 = np. eros(200) C= np.2er0s(s02) for kin range(s, 300) bik = bbe} = wk {Kd = norm. cae fig, ax = plt-subplots() pe-seatcen(values, 6/18000,.abel='Erpirical") be seatter(values, C,label~"GaussSan') be legend) begrie(irae) pleshou() ~ €.59/0.65) 20) empiric eee os E 06. ~ oa e a Markov Chain Using 2 uniform random variable we can generate random variables with an abitray distribution. Let us generate a random variable that takes the value 2 with probability py for k 1K = Lwhere the numbers pg are nonnegative and add up to one, We define a function dicreteRVixp) that returns such a random variable, Let U be a random variable uniformly distributed in [0,1]. Define $ 2 mig t-te J',Q[:.0)4P) # the einstetn sun noteates hou t2 manipulate indices 1 mp. etnsun th hy Ded"yA) ts the product of A ond 8 return X, @ P= (10.4,0.6,8), 10.7.6,0.5), [.2,0°5,0.3)) ‘inesteps = np. arange(s) HO» MECH 30, 2) spots = (00°) 0, 2°] ple-ytters((2, 1, 2), labels) ple ylabel( stinis") prt aaaben(-$o5") Ple-title(“forkov Chain $ub with initial state “estrOAOD) ple scatter timesteps, 2): ple. show() Fig, ax = pt subplots() ple adaver(“So5") plt-title("vistribution of $x(n]s with sricial state “sstrQxle)) for & Sm range(len(?)) ax scatter(einesteps, Q[t,:]elabel~'P[x(a] = * + stegiys °)') sx tegens() ae grtocTrue) fig'set_Fighetene(o) ig. set_figeietn(i2) # Me specify the size of the Figure pit show). Markov Chain X with | Distribution of xin} with intial state © MrT + Pixtn] = 0} os + POxn = 21 + PIX(n] = 21 1 ia oof = Let's calculate the fraction of time Z[in] thatthe Matkov chain takes the value i duing [0, 1, . n-1). To do this, we generate the Markov chain, then compute the Zin. (Don't forget to run the previous cell first) 2 = p-reros({Len(P) NI) # Z {5 tho-dtnensanal: Len(P) by for 3 in range(lenth)) 2[iye] nf an HO # thas, Z[x0,0) = 2 and 26,8} © @ for { not equat for nin cango(¥-2) for in ranget en): BLAgned] = (o°2Lan] + OXfner] == 49/2) fig, ax = plt-subpiots() for sin cange(len(P)) ax.seatter(tinesteps, Z[4y:],abele'Z[" + str(i) +, 9)") pe.tegend() bearie(True) it show) id + 210,01 + Za) ie + 22,01 06 02 0.0 semen Markov Chain of Fig. 3.11 {As another example, we simulate the Markov chain in Fig. 3.11, M28 # number of states P= np. zeros( {ti ae ]) P= lawe(a - eu) Fe = mur(a = Tan) Aol = pu/(pe - 92) Pre,0] = 1 - pe Plows] = 92 Pome] = 2 pe Pinna) = pe Plea} = 22 P= petalist() XG = MEUN,2.P) x = np. 20r056N) {for win rangeC2.¥): xin} = (eI) °AXEa-1] KC) /0 inestens = nprarange(™) obelse{) for n in range(t) 4 Labets.appene(str(m)) ‘ple. ytichs(np.aronge(H), Labels) plt-adabel sss") lt scatter (tinesteps x 2abel= "suffer backlog $4{018°) lt plot(2x,tabel~ expecter backlog $E[x)S"ye="red!) Plt plot(Ax,Iabel~ average bacelog during $\(0,1, ldots, n\)$"€ Sareen') pit tegen) put shou) — expected backlog E11 — average backlog during {0,1,...0} buffer backlog Xin] Widgets Instead of modifying parameters in the code of a cell its sometimes convenient to expose these parameters in ‘widgets’ We illustrate this method on the plot of a Gaussian with mean js and standard deviation a. In a Jupyter Notebook, the widgets are wellintegrated with code ces and ajusting a widget triggers a new run of the code. Unfortunately, there is a major bug in Jupyter Bock and widgets do not ‘work as smoothly asin Jupyter Notebooks. As a work-around, we separate the widgets and the code into separate cells. Also, some useful widgets still do not work with this fx. Oh well... we wil live with what works. ‘lost(mid), Float stgnad) ind = widgets, oropdown(optionse['-3', 2's "2's fe, tty'2", "a" }yvalun='8"jeescription="#\m$ydiaabled-False) sgnad “widgets Dropdoun(options['0.1', "2.2", '0.3°) 113,552] values’ yeeseription= $\signas” disabled: False) 2 -'wiagets.snteractive(dumty, aud = mud, sigrad ~ signaa) aisplay(e) sinus [9 ssigmas [1 ef plotcaussian(mu signa) ‘alues = 0,095%np.arange(2IN + 1) = 0.6059 # this 4s our x-axis Benpazeron(2oN 1 1) # this is our y-axis for the caf CL mpezeros(2on 4 1) # this is our yrants for the pa for nin range(2"® + 2) tn] = norm. ca((walues[n] ~ ma)/sigea) Un} = norm: pat((values(n] ~ ns)/Sigea) fey a = ple-subplots() ig. sat_fipneighe(s) ‘ig. set figwidth(i2) # We specify the size of the figure acscatier(values, 9,labelo"COr"se4) ax seatter(values, ¢;label="P0F' 5-4) se legend() axari(trae) Its eeLe("S(\est A)(\mu \signa2)$") H We edd a €CtLe fo the graph pis. snow) plotsavsstan(nu, signa) Frint("To charge the paroneter values, gp Sack £o the previous cell, \n Bajust widgets, and run this cell again’) Myo) a oF POF: oe 06 02 20 To change the paravater values, go back to the previous cell, ajust wideets, and run this cell seein Basic Data Analysis ‘We use pandas to explore sample data, We Use 2 table of weights and heights borrowed from p/w stat ula edu/socr/indexhp/SOCR Data Dinov 020108 HeightsWeights “This data is in the excel spreadsheet HW.xlsx ofthis chapters dtectory. First, we read the spreadsheet into a Dataframe that we print. We then calculate the linear regression of weight over height. (More about this in later chapters) We then plot. Many statistical tools are avallable in Python, The key isto Understand not only how to use them, but what they do, from sklearn.2inear_wodel inport Linearkegression kde = pa.reae excel (‘Hi xsx Sndexs cot = 8) prince de) H = np. array(Wdf.Stoe[:,0]) reshape((-2, 1)) # convert H into a column WW tages Sdoe( a) Linear_regressor = LinearRegression() # create object for the class Linear_regressor-Fit(0, K) 1 peeform Linear regression pred” linear regressor.predict(n) # nate predictions plt.ylaver(vetahe”) pit sdabel height”) PIL-UHELe("Lirear Regression of Welght over Hetgnt”) ple seaecor(iy 6) PIL-plOECH, Hered, color="red") plt-show() Print(*Ladk ma, 1 an doing oata Sctence!") edge (inches) Weight (Pounds) 1 5.78 12.99 [200 rows x 2 coluans] Linear Regression of Weight over Height Look ma, T a0 doing oata Setence! Monte Carlo ‘We use simulations to estimate the area of the intersection of two unit circles whose centers are separated by C > 1. The figure illustrates the setup, ‘The code generates 10000 points chosen uniformly inside the unit square SI calculates the fraction of these points that fll inthe intersection of the two circles. This fraction isan estimate of half ofthe area ofthe intersection of the two circles. A point (x, 9) is inside the frst circle if a? = 2? + y? < 1. ttis inside the second circle i 8 = (2 — C)? + y# < 1. We include an estimate ofthe error based on the standard deviation of a Bernoulli rancom variable, estinate = @ for in rango(t: X= np-randan. unsfora(@,2) Y= nporandan.onfore(e 3) festanace = (nventingee (X02 6 yt co 1 and (x = C2 6 ytez = ayeeay dev = 2.6e(estinate*(s-estinate))*(0.5)/308 Print( The ares value 15 sn ['yfound(2vestinate - dev,3),'s °s rouna(2*estinate » ev,3).") with probability 99%") 4 = wagers. Floatstider(deseription="C"y min = 2, max = 2, Step = 8. Yalue “4.5, position = ‘botton', continuous update = Poise) #2 = lagers. interactive(estinstenres, = €) + atsplay(z) cestanatetrea(1.5) The area value 4s in [ 0.43 , 0.452 ] with probabiisty 99% Conclusions: ‘The previous examples illustrate how to generate random variables, their paf and edf, Markov chains, how to plot sequences, adjust widgets, and how to use pandas to manipulate DataFrames and analyze data with statistical tools. Ifyou are comfortable with Python, you can move on to the next chapter. Otherwise, you can review Appendix 1 By Jean Walrand © Copyright 2021.

You might also like