0% found this document useful (0 votes)
49 views

Python Pandas - I

Uploaded by

Soham Saha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
49 views

Python Pandas - I

Uploaded by

Soham Saha
Copyright
© © All Rights Reserved
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 32
i 12 Python Pandas - | Outline ‘1.1 Introduction 1.2. Using Pandas 13. Pandas Data Stuctues Introduction Pandas or Python Pandas is Python's library for data 1.4 Series Data Structure analysis. Pandas has derived its name from “panel data 1.5 Accessing a Series Object system”, which is an ecometrics term for multi- ‘and fs Elements dimensional, structured data sets. Today, Pandas has 16°. Operations on Series Object LT. Series Objects vs. 1D Data Structures and 1D Numpy Array 1.8 DataFrame Data Structure 1.9 Creating and Displaying a DF 1.10 DataFrame Attributes 111 Dataframe vs. Series and become a popular choice for data analysis. As you must be aware of that data analysis refers to process of evaluating big data sets using analytical and statistical tools so as to discover useful information and conclusions to support business decision-making. Pandas makes available various tools for data analysis 2D Numpy Array and makes it a simple and easy process as compared to «1.12 Selecting or Accessing Data other available tools. The main author of Pandas is Wes 1.13 _Adding/Moditying Rows'/Columns" McKinney. \Values in DataFrames 4.14 DeletingRenaming ColumnsiRows 1.15. More on DataFrame Indexing ~ Boolean Indexing This chapter will introduce Python's Pandas library, its data structures series and dataframes and some useful functions with these data structures. “TMPORTANT Please note tha although Pandas i separate brary yet it uses Num as it support library and hence many datatypes, constants and functions of NumPy ae frequent! used with Pandas, Since NumPy was removed from lat year's class 11th syllabus because of COVID situation, students could not study NumPy. Thus, we are giving full NumPy chapter © of class XL as a SUPPORT MATERIAL in SIPO App. To get this chapter, open SIPO App ‘and then Open Support tab to find it. Using Pandas Pandas is an open source, BSD library built for Python programming language Pandas offers high-performance, easy-to-use data structures and data analysis tools. In order to work with pandas in Python, you need to import pandas library in your python environment. You can do this in either on the shell prompt or in your script file (py) by writing: import pandas as pd 1 wy 2 . Pee a aime, Ma Ms ea ce” fr mre ee : i me | tan aassngens HO YAIN Ny Pig oan ee ‘Why Pondos ® sar brary in the cet Pylon ecsystet for doing data analysis Pandas isthe most popular any tasks incuding Pome boenea ferent dat formats (nfeger, flat, doublet). «9 Ttean read or wate in many ata is organized ie, across rows and down columns. ie toy old ining dt a oe eter = pandas.series( ) series() Saupe ‘The above statement will create an empty Series type object with no value having default datatype which is float64. Consider following statement S25 ead» oe Series) pre Series), tye: Floste) 1. Creating non-empty Series objects ‘To create non-empty Series objects, you need to specify arguments for data and indexes as per following syntax ‘Series object = pd.Series(data, index-idx) ‘where idx isa valid Numpy datatype and data is the data part ofthe Series ok oni ae Part ofthe Series object, it can be one © A Python sequence © An ndarray ‘© A Python dictionary © Ascalar value CChopter 1: PYTHON PANDAS — Folk ° owing Slsetions talk about the ways to creat Series objets as pera () Specify dota os Python Sequence mores Simplest way to create Series i sin ‘ype objet isto givea sequence of values as attribute to Series), ‘Series Object> = Series () It will return an object of Series ries type. For instance, consider following example ing some Python sequences Seampiesatenens at create two Series type objects ust : Series cjectobjt cE >> print(ndat) (3 osu.) rndod = np.arange(3, 13, 2.5) >>>-serl = pd.Series(ndat) print(ndat) : ser = pd. Sertes(ndat) eee print(sera) lo 30 hos You can ako combine te code ines of yg ae above code. Carefully read the following : arte Pd-serses(DB-3range(3, 13, 3.5)) aoe se een inv cale 7 tc naray deel hoo ‘ou can create a Series object from any ndarray created from any fun i Zeepn cae Sere op ty ndarray fom any function, Consider following ‘Chapter I: PYTHON PANDAS: 7 EXAMPLE EI Wate progam eaea Sow object using an nara ‘hat has 5 elements inthe range 2 to 64 * Output SOLUTION eens smote ap ae S6=pd.Series (np. Linspace(24, 64, 5)) 2 print(s6) pace type: Flares EXAMPLE T Wites pono ten S lit gm ny lS oe HST soic). SOLUTION ae {import pandas as pa ar 57 = paSerses (op t10(3,5},2)) 3 print(s7) an type: int32 (ii) Specify data as a Python Dictionary ‘The sequence that you provide with Series() can be any sequence, including dictionaries. Let us see how you can create a Series abject by specifying indexes and values through a dictionary. {ee below) pa-Series( (an voy Feb? 38) ars YD) ee, ef otra oak the Keys of parsed as indexes and values form te data values (Aso, notice nat indexes ae nol inthe same mer 38 given inthe dictonary above ) Here, one thing is noteworthy that if you are creating a Series object from a dictionary object then keys ofthe dictionary become index of the Series and the values of dictionary become the data of Series object. Also, the indexes, which ar created from keys may not bein the same order as you have typed them. EXAMPLE Bo) Write program to create a Series bject using a dictionary tha stores te number of students in cach section of class 12 in your school. Output SOLUTION ire import pandas as pd aaaal stu {°A':39, 'B:41, "C142, con “4 = pd.Series(stu! ° See type: int6e print (s8) INFORMATICS PRACTICES ~ Xit 8 (@) Specify data os © Scolar Volue tase veo 2 N The daca ine mt nder gent ganas a ee value, BUT if data is sala provided. The scalar va on une be prod to Sexiest) function must Pe Pan the length of index iven as data) will be repé enor agents 1B 2 labels of any type) ¢. consider this jes(10, index = range(®, 2)) re Cintra “ile sequence of rumbers oF is ergamens wil specify he inde for Sones ce lem ‘nedalskon = pd. Ser’ rnedals2 « pd.Seres(15, index =.ra Series("Vet to start’, index = serz=| ee below the Series objects created as per above statements Yet to start Yet to start! Sar ae TOD ear SEMPLE Epp re St so ab fej art ofte war, QO and QP ouput coon gtrl 50000 import panda 3 4 eaten seecrrentestsne, ince [tet "Qe2", Rr", “@ee'D) gerd 50000 aera 50000 prwees) dtype: int64 Spann NI ar ct on 20 er Univers goes deer allrat yer tod bree Sei jt hat sores thse meals forges fb hd the decade 2020-2028, SOLUTION Ostpnt 2020 200 nport pandas as pd pata 510 « pd.Series(209, index = range(2020,2028,2)) 2028 200 print(ste) cee ee 2028 200 dtype: ints 1.4.2 Creating Series Objects ~ Additional Functionality Now that you have a fair idea about how to create Series type objects, let us talk about ‘ditional functionality of Series) that you can use to create Pandas Series objec. ) Specitying/Adding NoN values in o Series Object Sometimes you need to create a series obj vlna a ee cea Of rinsizebut you do nothave complete dla "can fill missing data with a NaN (Not a Number) value (i) Specity index(es) as well as data with Series( ‘Chapter I= PYTHON PANDAS - | 9 'Py module and hence you can use np.NaN to specify a wing code Legal empty value NaN is defined in Num missing value, or use None eg, see follo >>> ong 23 >>> obfa Gas 0 hays cope : Soran 2 ctype: Use apna or None to add missing data, While creating Series type object is that along with values, you also provide indexes. Both. values and indexes are sequences, ‘The syntax for this is as follows : Tees ou sip tee parte aer © [31, 28, 31, 30] Laon © [ant “Fi, Mh’, “toe) Sere ajo Sead wi bob vs wa EE ‘oven as soquenoas >>> obj3 « pd.Series(date = arr, index = won) poe obi feo 28 Sores elevated wih above sora] ivr 32 aor 3 {etype: intse |Sonobia = pdsertet date © (92, 34, 25], index © 0s Temes prereset ‘You could skip keyword data also, ic, following statement will also do the same as above ‘obj3 = pd.Series(arr, index = non) ‘You may use loop for defining index sequence also, si =pd,Series(range(1, 15, 3), index = [x for xn ‘abede']) 10 “The above code will create a series objet as shown belt LSS aa pd. Series(range(ts 15+ 3+ indexz[x for * 31 Caution! / indexes equal to the number of speci indees expt sigan index sevens YOu mi prov inso a values in data anay; providing to0 few or too many indices will Error TCA, B,C, Def clas 12 in your schol. sa aon es : feel we charity fund endorsed by the school. Write “Another list contri stores the contribution made by these students toa itactee Sree tha stares the contrition anount ste vals he ection names asthe eves SOLUTION Output Snport pandas a pd oreo sections ('A', ‘81, °C, 0°) 25600 contri = (6700, 5600, 5860, 5200) a s1t «pd. Series(data = contri, index= section) > $200 print(sit) type: snes (i) Specify Dato Type along with dato and index ‘You can also specify datatype along with data and index with Series()as per following syntax ) To understand its functioning, consider following example statements NamPy arava created as ‘supports vectrzed operation >>> print(a) it 9 4811 12) 59> obj? = pd.Series(index <4, date = a * 2) ita seccene dren a8 37) 1s, oer vaua of a doubled >>> objB = pd.Series(index = , date = a ** 2) >>> aba ow Ho 389 ini in la ‘type: int? The vectorized operations on a NumPy array (¢g.,* 2 or a "2 ) will be applied on every clement of NumPy array and stored as data part of Series object. Now consider another code: SSSust = (9, 20, 12, 12] Notice thstime the eqresion or ata aay mole tad se ow thas impacted he irl data aay “The above code tries to create data array for Series object obj8 by giving expression 2 *Lst where Lst is a Python list and as you know a number multiplied with alist replicates the list those many times and hence the data array has the same list values replicated twice (= 2* Lst) INFORMATICS PRACTICES ~ XII 12 previous example erent fom the slowing example. Iti sit lar yet differen Carefully read the fol Ta. Fe oc dre math ll be doubled. AMPLE BEM Sequences section ne the rept 670, S600, S00, 5200, i Contin as made by ac set ie the dain YS iit cde to crete a Series objec that stores the conriution indees with datatype a float32. SOLUTION ers eon aso i import pandas as pd Te a econ ae reg {nport nunpy 35°F ne oll cho conten section =['A', '8", 'C, ‘D's E'T $200, mt) contri = ap.array( {678,560,520 seo Serenata contri 2, inde ected PFISTZ2) print(s12) nis wit double ack va of the comer array. This is Fe ile the sols count 0 char at ———— aed NOTE A 1400.0 3200.0 nen joustore a NaN ave na sees © 30000.0 bjs Pandas rege the datatype > 1000.0 conn he us pfs Sts cist sing poet ye. Even You re Spiraea specly an integer type, Pande wil ype: Floneaz «et om dtl at promate to» Roane pnt ype {automataly) because NAN is not supported by integer types IMPORTANT ‘While creating a Series object, when you give index array as a sequence then there is no compulsion for the uniqueness of indexes. That i, you can have duplicate entries in the index array and Python won't raise any error, eg, see figure below : shar Tienes 6, 37) — fsoowa [scat ets rray([ 2.75, 12.5 , 22.25, 32. 41.75]) >opeb3 = pd.Series (val, index = ['2", pias te) soooba 27s 12150 225 32.09 fe as None {type floats “aomenenanaennnearenanarnnnnnnnnnnnnnnnnanay (tt need not be unique in Pandas ‘Seties Object. This will only cause an ‘eror if/when you perform an operation that requires unique indice. Chopter 1 - PYTHON PANDAS - 13 ‘You have read till now that Series can store he >) sms es ee set pe ‘explore this statement litle more. Have a look atthe following examples, a 1. Creating © Series object that stores three integer values de> BLS pa.sezten( (11, 22, 3317 1 2 ‘Se, hs Sores oe trig ol neg BB mero tat ef Seyper ince Iatger dtanpes Since all the values being stored are integers, an integer datarype Is chesen by Pandas (int here), so that all values can be efficiently stored and processed. All values are of int64 types —hence homogeneous dataype 2. Creating @ Series object thot stores thee floating type values S93 a2 = palserieet (01-1, 12.2, 13.31) 33 a2 oo na ‘Sc, hs Serie bet i strng al ‘oaig pales nd ts. ts type: Eloates << stanpe ison of the float datas Since al the values being stored are floating point numbers, a floating-point dataype is chosen by Pandas (float her), so that all values can be efficiently stored and processed. Al values are of loat6# types — hence homogeneous datatype. 3. Creating a Series object that stores 0 mix of inieger and floating type values, 553 a3 = paSeriee( (i, 22.2, 23-39) 333 83 Achy Se, ths Sees be string all nr — ono npr an agp values ad s,s dba ton ofthe Rot datatypes o 2 deype: Hoates sscommadate all pes of amber Since all the values being stored are numbers - some integers and some floating point numbers, a floating- point datatype is chosen by Pandas (float here), so tha all values can be efficiently stored and processed, All values ae of float64 types — hence homogeneous datatype, 4, Creating o Series object thot stores o mix of different type values op a = pdsdeeben! (14, 17-3) 38.8, TF ° Sx, th Series bjt is rae roles whase apes are diferes. Tas, Pandas wil chose Atay abet) whch con aorta the Since all the values being stored have diferent datatypes (integers, floating point numbers, string etc), Pandas will select a datatype, which is capable of holding all these values, Hence, Pandas will chose is datatype as abject, which is capable of holding any typeof value. All values ar of object types ~ hence homogeneous datatype. INFORMATICS PRACTICES = yg), 4 1.43 Series Object Atributes ‘When you create Series type object through atibutes. You can use these a set information about the Series obec: ‘Series object> sue The me common at 1 Series object ae listed inthe table bel a oe fr th rageot tee Me flow the Table 12 atlinformation related toitis available ttributes in the following format to Tobie 1.2 Common attributes of Series objects Attbute Description “Series cbject>.index | The index axis abel) of the Series “Series ebject>-index.nane | Name ofthe index; can be use to asign new name to index ‘Series object».values | Rum Series as ndamay or ndareay-ke depending on the diype Series object».dtype | retum the dtype object of the undecying data Series object».shape | retum.a tuple of the shape of the underiying data Series objecto.nbytes | retum the rumber of bytes inthe underlying data Series cbjecto.ndin | etum the numberof dimensions of the underlying data Series object>.size | retum the umber of elements in the underlying data Series abject>.itensize | retum the Sze ofthe dlype ofthe te ofthe underlying data ‘Series object>-hasnans | retum True if there are any NaN values; otherwise return False Series object>.empty | retum True if the Series object i empty fab otherwise Series object>.nane | retu orasign name to Series objet Following examples show how to view various attributes of Series objects. () Retrieving Index Array (index atribute) & Data Ary (values atribute) of a Series Object You can acces the index array and data values’ array of an existing Series object objS as shown ‘See, .name attribute can be used to get or set the name of a Series object, eg, >>> serd aa b 2 63 type: inte >>> sert.name >>> sent aa b2 3 Nane: Mys, dtype: ints >>> serd.nane a {¢) Retrieving Data Type (dtype) and Size of Type (itemsize) ‘To retrieve the data type of individual elements of a series object, use type. ‘To know about the type of Series object itself, you can use typel) of Python. You can use itemsize attribute to know the number of bytes allocated to each data item eg, ‘atime ype a ata vais stored in eres cect {Sosebia.dtiee {ype("floatse’) | sosobj2.itensis {>>> type(obs2) | pandas. core. series. Series ee 16 ————— 1 tae svn seia i (se arte tl ust NOTE ‘The shape of # Ser including missing oT elements it comtins s theshape of series object els how bigit is sf may rere isl 08 2888S yee conan ‘ince yas (NAN Cape itis shown 38: Chements in he objet soo prine(ab. shape» OSBIMPE) oyna t= G0 Sra aon ) where is the mumber of Tt or empty values Nah). bute), rieving Dimension {number ‘of axis : ndim attr 8 = ear Number of Bytes (nbytes attribute) ber of axis), use .ndim. ‘To know about the dimension (num cee ‘Tokaow about the number of elements inthe Series object, use bytes (nbytes is qual tothe size *itemsize) >> obf2andin 3 >o> print (obf2 sie, ob3.size) a 92 print (obj2.nbytes, obj3.nbytes) obj2 ha 4 eemess, hance ee ee i ion ce 8 (g} Checking Emptiness (empty ottribute) ‘and Presence of NoNs {hasnans attribute) See we have created an empty Series object namely obj1 also and then checked emptiness of ‘obj along with emptiness of objects obj2 and obj3 of Reference 1.2. (given on previous page). Scie be dimension obec 82 has flan tg has 3 eons 52 band oj oer [S5S0uga empty The cbject oti onety but ob/2 and ob) = | False (ot 102) 3 ot ompty ae, asthe contin some data {>> >0bj2.enpty ‘alse © Similarly, to check if Series object contains some NOTE [NaN value or not, you can use hasnans attribute a shown below (using objects of reference 2.2) ‘© You can use fen) o get total number of elements and .count) method with Series object to {get the count of non-NaN values in a series objec, £8 S>3ebi2hesnans Fratee IM you use len) on a Series object, then it Fetus total elements in including NaN but eres count) returns ony the count of por-Na values ina Seres objec. {>>> ten(obia) Tuan abe ale Ihe presence of NaN vues and cunt tur zur oon_NaN vais f>>>0852.count() 4 S>n0bf3.hasnans Trae |>>20b3-count() a CChopter} : PYTHON PANDAS — 1 7 EXAMPLE NB Goer eo do apn ed ca TTR Prt te arnt at eae vee oes tn Attribute name object si2._—object s12 Data type Shape No. oF bytes No, oF dinensions Item size Has ais? Enpty > SOLUTION {import pandas as pd { statenents here to create objects S11 and s12 from previous exanples print ("Attribute nane \t\t Object si \t object s12") print(" \t\e ) print ("Data type( type) \", sLL.dtype, "\t\t", si12.dtype) print("Shape (shape) |\t", sUL.shape, "\t\e", s12.shape) print("No. of bytes (.nbytes) :\t", std.nbytes, "\t\t", s12.nbytes) Print("No. of dimensions(.ndin) :\t", s12.ndin, "\t\t", s12.ndim) print("Ttem size (.itensize) :\t", sid.itensize, "\t\t', s12.itensize) print("has NaNs? (.hasnans) :\t", si-hasnans, "\t\t", s12-hasnans) print("*enpty? (empty) \t", siL-enpty, "\t\t", s12.enpty) Output actribute nane object sil object siz Data type. dtype) ines Floats? Shape (.shape) » a G6. No. of bytes (.nbytes) 3 20 no. of dimensions(.ndim) : 1 a ten size (itemize): 8 4 as naNs? (hasnans) ralse True empty? Cenpty) False False 18 ts 15 Accessing a Series Object ands Flere sd Series type objet sit in many ways. You cam access its ect itis o created Series Jo eee paividal elements and slices you have e Shap a os 8 07 28 0 E19) roving Series objet ()5, OPIS: a reese ow. Coie oo ny fa jo 81 jie 1088 ta aa jo ia Udevees ieta2 Figure 1.3 Some sample pandas Series Objects ror al the flowing operations we shall be using te sample objets shown in Fig. 13, 1.5.1. Accessing Individucl Elements from 0 Series Object “To access individual elements ofa Series object, you can give it with its name, ie, as ts index in square brackets along “Series Object nave>[«>ob350 Feb] 28 “Seo a theve object legal ndore are ured to access individual lent of those bjt. ‘As you seein above figure, we have use only valid or legal indexes (ie, which exist in series object) to access an element. Ifthe Series object has duplicate indexes, then giving an index with the Series object will return all the entries with that index, eg, see below Sosa fSssouat'e'T i dpa deve i 2 Seis object, all je a.78 fo soba") ni with he same e258 bas je il ees dtype: floats Rots Fae ag OTT fetype: Hoste Tegtnderes ar 90 12-refer stove hace heer. BUT if you try to give an index which is not a legal index for a Series object, it will give you an error, See the adjacent figure. RR ile “cipython snes 075057438560", Lbne 1, fn asa] (Chapter 1 TIONS PARAS — | 9 1.5.2. Extracting Slices from Series Object Like other sequences, you can extract slice too from a Series object to retrieve subsets, Let us see how you can extract slices from Series objects. Here, you need to understand an important thing about slicing, which is that Slicing takes place postion wise and noc the index wise ina series object. ‘Tounderstand this, let us consider the same Series objects as given in Fig. 13. Internally there isa pposition associated with element ~ first element gets the position as 0, second element gets the pposition as 1 and so on. Irrespective of their indexes, positions always start with O and go on like 1, 2, and so on. (ee Fig. 1.4) objs obis 7 Position index Dota Position Index _Doto Position Index Dota ¢ [Feb] 28 eo Ley a efL>| # 2 ana Hi a a ea 2 Darr : 2 7 2[ nba H ae 3 [ape 4 2 “india loots Rave poston mbes arg rm 0 cnwards a, or freeman, for slomenl and so on Figue 1.4 Postion number associated with each element of Series object When you have toextract slices, then you need to specify slices as start: end step ] like you do for other sequences, but the start and stop signify the positions of elements not the indexes. Consider following examples fpceersircy io Latype: ints? {étype: int32 Al other rules of slices apply, ie, NOTE you can specify steps, you can Teverse the elements, the range of A slie object is created from Series object using a syntax of Mice can be oubide fhe range of AIS acs mt mcs Tube inal Saks poste ae ‘object is also a panda Series type object. Even houghobIr has pesebiats 1) indexes 10,11, 12 but iSSsapiiesy ino Series((], stype: int32) ino * iio 20 is a8 not indox wise ‘gvoes int32 INFORMATICS PRACTICES ~ a 20 Fans ch ti of sas hat tres the mur i = A 8 ae Se y- per ticket as utput seat cpensontegcr 30-78 Op tet oe an 1.6 Operations on Series Object 1 offers flexibility with storage as well as operations _ASeiess a one-dimensional trcture which offers fle nella pe aetna about how you can perform various types of operations on Pandas Series abject 1.6.1 Modifying Elements of Series Object “The data values ofa Series cbect can be easily modified through item assignment, ie, “Sersesobjects [inde ] = ‘Above assignment will change the data value ofthe given index in the Series object. «sertesobject3(start : stop] Above assignment will replace all the values falling in given slice. Consider following screenshots ew data value> ‘floatee ‘Chopter 1 : PYTHON PANDAS — 1 21 1.6.2 Renaming Indexes ‘You can even change or rename indexes of a Series obj it lexes of a Series object by assigning new index aray to its index attribute, i.e., : .index = , ¢.g., see below : sas eoewenarcen is use => pe jo 42.78 Bike ae 1S 3sas i 46.50 ——__ NOTE The sizeof new index array must match with existing index ese not that Seis oles array's size. In other words, you cannot change the size of | Cgmt So you cans that eres Series object by assigning more or less number of indexes objects are wolvemutable but {see below) sireimmutabe obec. File "C:\Prograsbata\Anaconda3\lib\site-packages\pandas\core Ainternals.py", Line 3074, In set_axis (eld_len, neuen) Length alsnatch: Expected axis has § elements, new values have 7 EXAMPLE BED Consider the Seis abject 13 tat stores the contribution ofeach section, as shou Below a 6700 8 5600 c 008 D 5200 Output Write code to modify the amount of section ‘A’ as ele icra ae cree Moe Cee N88 ae ea SOLUTION © 7000 print("Series object after modifying anounts:") print(s13) ed. 22 iit.) Functions: tail() functic 116.3 The heodt } ord till) rach finns arsransoe a 10M retumg ‘The head!) funn ‘The syntax to use these fun Jast rows from @ pandas object nead(["]) i cr .tail(°D) you do not pro respectively of a wie any value form then head) and fa will return first 5 and last: i) Je any v ) e ‘Pandas object. Consider below given 21.85 nas 36.7 2.00 aie 31.85 08 36.79 cs ass ype: Float 6.49 - 51.25 BRAMPLE BYR A series objec erate consistsoferoud 2500 rows of data, Write a program to print the flo details: (First 100 rows of date (i) Last 5 rows of data SOLUTION import pandas as pa : ‘# trdata object”s creation or loading happens here print(tréata-head(100)) print(trdata.tail()) 1.6.4 Vector Operations on Series Objects Vector operations mean tha if you apply a function or expression then it is individually applied ‘on each item ofthe object. Since Series objects are built upon NumPy arrays (ndarrays), they also support vectorized operations, just like ndarrays. Following examples will make it clear to you. Suppose we have a Pandas Series object ob2 A sample Series object, CChopler Is PYTHON PANDAS Then following all are legal operations 0b2+2, 0623, obf0b2**2, ob2>15 ete. In all the above expressions, the given operation will be carried out in vectorized way, ie,, will be applied to each item of the Series objec. Consider following examples z SSeb2 > 1s False False True 16275625, fe True 576.0000 type: bool 3242.5625 fe 21622500, type: Floste type: Floatee Sev each oft epemona igh wpmed one Swan ype coco w cried oton ‘soo ni tam ote Sees ect - Vector Operations Figure 1.5 Vector operations on Series objects 1.6.5 Arithmetic on Series Objects You can perform arithmetic like addition, subtraction, division etc. with two Series objects and it will calculate result on two corresponding items of the two objects given in expression BUT it hhas a caveat - the operation is performed only on the matching indexes, e.¢, if first object has indexes 0,1,2 then it will perform arithmetic only with objects having 0,1,2 indexes ; forall other indexes, it will produce NaN (not a number). ‘Also, if the data items of the two matching indexes are not compatible for the operation, it will return NaN (Not a Number) as the result of those operations. ‘Tounderstand this, consider below given (Fig. 1.6) ive Series objects ob1, ob2, ob3, ob and obS (ob1 and ob3 have matching indexes ; ob2 and ob5 have matching indexes; obs has some indexes matching with obt and ob3) Ldeyee: flostee Figure 1.6 (0) Some sample Series objects with matching and non-matching indexes Bie) a aying ot antec OPAaHONS O° OB yg ook atthe statements Now carefully low): indexes (see bel —— matching [Soseb2 + obs Fosbenn + 03 a 2.755 t ors bas 280 4 33.805 Ass. 449.330 Beas fe 68.885 12 ssut.758 Hdeypes floats 0104) Recently caries SE ey ner whndex0 coh BoD es mt ee oregrnag rs faing ner "bth a rent ven re a 20°. repre ne en Oe linden re ad say yme or all non-matching indexes, then cpa peeagntnr inborn t 8 maar et hasan tek Peale ig {SSbenn + obe Cenuieaihe ven == | eciae so nT oe {ay NOTE Sg ca anti = eee epee result of object arithmetic in another object, which will also be a Series object, ‘Then ob6 will also be a Series object i obt and obs are Pandas Series objects). CChopter 1: PFTHON PANDAS — 25 EXAMPLE HED Nerf sons canes Hand res rea (Soo Commerce and "Huranies) arstored int eves ebjets 11 and 12, Write codeto fin total number of students in clases 1 and 12, steam we, SOLUTION Amport pandas as pd 4 creating Series objects C11 = pd.Series(data = (32, 40, 58], index = ['Science' ,‘Conmerce’, "Humanities" ]) €12 = pd. Series (data = (37, 44, 45], index = ["Science' , “Conmerce’ ,‘Hunanities"}) ‘adding two objects to get total no. of students print("Total no, of students”) print(c1i#e12) # series objects arithmetic Output Total no. of students science 67 commerce 84 mumanities 95 type: ineee EXAMPLE BBY Objectt Population stores the details of population in four msiro cies of India nd Obec2 Anglncome stores the total average income reported in previous year in each ofthese metros. Calculate income per cxpita foreach of these metro cities. pig cle tp. ahr np the le SOLUTION ‘tone ing be Amport pandas as pd LO Population = pd.Series({18927986, 12691836, 4631392, 4328063 ], \ index= [‘Delhi", ‘Mumbai, ‘Kolkata’, ‘Chennai']) AvgIncone = pd.Series([72167810927986, 85087812691836, 4226784631392, 5261784328063 ], \ index= (‘Delhi", ‘Mumbai, ‘Kolkata’, “Chennai']) Semen common mark, Dona pit wie percapttacavgiocome / population te Erin wali per ip em print( "Population in four metro cities") print Population) peint(“Avg. Incoxe in four metro cities") I print Avgrncone) DS Sranthiet pereanite oy) print("Per Capita Incone in four netro cities p print (percapita) [Population in four metro cities toetnt ” haoavone thomtat 12691886 ' tkolkats 4531392 ' The output produced by above file is as shown chennai 4528063 : aires Jaye: ints ' lug. Income in four aetro cities | tooth: 7ate7e1e827006 ‘ See how easy it has become to calculate per Mumbai #5087812691836 ' y teotkata "4226784631392 : capita income if we have huge data stored {KolMat gu267#s639092 : about all the cities and average income ete. _ietypa:intea : ter Capita Incone in four metro cities! Ini ereossezevos ef Ronda 6.7081376096 : [kolkata 9.126381645 t lehennal 12187370006 ' ‘type: Floatée i 26 series Objects stare of Boolean ie 1.6.6 Filtering Entries in Series series ojectusingexresions ta TYPE: Cie, the Youn fiero oma or loin expressions that yield 8 { ebootean expression of Ser388 onject>] cnjectrt (« ov ves ich are based on objets 082 02 and ob3 from Fig. 16) onside folowing eam std rn oS oth semen tt >p>eba{ob2>10] 12.75 obt{ob1>5] 24.00 35.25, 46.50 type: Floats 12.75 se ple nom (Bose api? et a cic sen ttn Tor he to ope vison operator drelly on a Pandas Series object then t works like this check on each individual element of Series objet (as You en you apply this check with the Series object inside ] a that it recurs filtered result containing only the canseein() and above). When you apply a compat -vectorized operation and applies can seein (a) and (c) above) BUT wt per syntax given above then, you will fin ‘Values that return True forthe given Boolean expression (as you EXAMPLE ED hat wi be the oxtput produced by the following program ? inport pandas as pd info pd.Series(data = [31, 41, 54]) int(info) print(info > 48) print(info( info > 48]) SOLUTION oon ws aa i ct type: ints 0 False pees Toe ee ees ae Mec opaion elt type: boot 1a 2 type: ine6s tol ele CChople 1 = PYTHON PANDAS 7 EXAMPLE BID soviescbjer 11 sores the arty cibaion make by each section (ee below) A 6700 8 S600 cs00e > 5260 Write a program to display which sections made a contribution more than 255001- SOLUTION oe eee . ; ecemeetee conebaon > 500 by print ("Contribution > 5508 by :") 8 S600 Frineatt [ia 550]) pence 1.6.7. Sorting Series Volues ‘You can sort the values of a Series object on the basis of values and indexes, Sorting on the Bosis of Values To sorta Series object on the basis of values, you may use sort_values( ) function as per the following index rot argent ‘Series object>.sort_values( [ascending =Trug|False]) ‘The argument ascending is optional and if skipped, it takes the value True by default. Itmeans, the sort_values() arranges the values in a Series object in ascending order by default For example consider the Series object s11, >>> sit A 6700 8 S600 cc se88 D 5200 type: Antes >>> sll. sort_values() >>) stL.sort_values (ascending = False) S280 A 6700 © 5200 8 5690 5200 8 5600 ~ This time the values oe A 6700 cee sored in descending order type: intes \ type: intee To sort values in descending order, you may Values sorted in ascending order vite .sort_values(ascending = False) (default setting) TORI FRAC! oH 28 sering on the Boss of Inderes sort index) Function as per the ‘To sorta Series object on the basis following index series abject>.sort.sf “The argument ascending i optional and of indexes, you may USE pina argent ndex([ascending = TrvelF2ls€]) skipped takes the value Tre by default Foca >>» si.sort_index() es ascending F088) : Spats set_nde(ascenng pene ose mam ‘o —_ c ‘erder af nce aa — BED ‘D 5208 Soe itt type: ita Solved problems 14, 16 are based on sorting a Series object 1.7 Series Objects vs. 1D Data Structures and 1D Numpy Arrays Series Objects vs. 10 Data Structures and 1D Numpy ATAYS works, doesn't this, Now that you have a clear idea about what a Series object is and how it works, rt Joep ns Hw in Sees jt sir rift rm ter D data sites such as ists, dictionaries or 1D NumPy Arvays ? Let us tak about the same in this section 1.7.1 Series Objects vs. Lists Series objects are essentially 1D data structures while lists can be 1D or even 2D (when nested) Following table explores Series objects os ists. Tobie 13 Series objects vs tts Sno[ Series Object ists | | 1. | eis essentially 10 data structure [it can be 1D and even multidimensional with ested ists in it It can take numeric indexes only. 2. | cam have numeric indexes as well as labels, 3 | Reports xpi indeing i, we can It does not support explicit indexing ; only Programmatically choose, provide and change | supports implicit indexing whereby the indexes Indews in terms of numbers or labels | are implicitly given 0 onwards in forward indexing and ~1 onwards in backward indexing. 4. | Indexes can be duplicate Indexes cannot be duplicate 5. | Homagensos elements: Sri bes sore ke | Heterogeneous elements: Li ments of same data type (values may be different | element vo entdaa ype Bath dnaypei tesa acerca | nme ferent dts ype 1.7.2 Series Objects vs. Dicionories Sees jc 1D da sue ht an alow hs ose doug mea san tons CChopter 1 PYTHON PANDAS — 1 29 Table 1.4 Series Objects vs. Dictionaries [sino Series Object 1. | Itis essentially a 1D data structure Tecan be 1D and even multidimensional with nested ictionaries init a values. 2 | Tecan be thought of as similar to detonais| it can be thought of similar to series object such 2 dlctionaries store values against keys, that its keys canbe considered equivalent tat of ries stores values against indexeslabels | indexes/labels of Series object and tlues equivalent to the elements stored in Series objet. 3._| Its indeves can be numbers or labels only. _| Its keys can only be of immutable 1.7.3 Difference between NumPy Arrays and Series Objects ‘The major differences in ndarrays and Series objects are listed below {In case of ndarrays, you can perform vectorized operations only ifthe shapes of two ndarrays match, otherwise it retums an error. But with Series objects, in case of vectorized operations, the data of two Series objects is aligned as per matching indexes and operation is performed on them and for non-matching indexes, NaN is retumed, ‘You have already read about this functionality in section 14.5, point 4. (id) In ndarrays, the indexes are always numeric starting from 0 onwards, BUT Series, objects can have any type of indexes, including numbers (not necessarily starting from 0), letters, labels, strings etc. Son Herray([a, 2, 3, 41) bo>a3 lareay((t, 42, 43, 14, 18, recent call Last) File "cipython-input-90-£395242637429", Live 1, in ‘Tlie 15. series object vs 10 NumPy Array SNe. Series Object 10 Ndoreay Ttstores homogeneous elements (ame datatype) 1. | itsores homogeneous element ame datatype 2. | supports explicit indexing ic, we can prog: | It does not support explicit indesing ; only TIERutulychowe provide and change indees | supports implict indexing whereby the indexes in erme of umes or abel | sre ienpliclygiven 0 onwards 3, | It supports indexes of numeric as well of string | It supports indexes of only numeric types ype tt ean perform vecorized operations on two | I can pesform vectored epeatons on to See rere even i ther shapes ae dierent | ndartays nl thei shapes mach By tang Non for tonematching indevesfabels. Intakes more memory compared to numpy aeray._| Tt takes lesser memory compared to 8 Series objec Reindexing Sometimes you nee rendexing forbs Ps series object wih his, the same staves and their index inthe rind. Se below to crete asia object pose a ett SX ject» reindex “ index atuibute, 9 >>> vowels = pa.Sertes( [2,5,6,3,8] ,\ index= ['a', 'e', “i', 0", "u')) >>> vowels i type: intos Dropping Enties from on Axis sing drop) spe tis sya For example, se below bsea ut with a diferent order of (sequence with new indexes wil be stored in the ne” «Series object» drop (cindex to be renoved>) 30 aap. {b 12.75 } ~ buss eat pce) | ee ng lees dpe! fant a ee A same indexes. You can use order of indexes>) jects perthe defined order of pss5c052 ‘Another way of changing the indexes ofa series object is just by assigning a new list of indexes to uy yoo newind =['A", >>> vowels. index = newind en —— re Lae a3 oa fancied Sometimes, you donot need a data valu ta particular index. You can remove that etry from series object CChopler 1 : PYTHON PANDAS ~ | 31 DaTAFRAME © vautrame is 2 two Amensiona beled ary fie Pandas data structure tht stotes on ordered collection Columns thet can store data of erent types. ” 1.8 Dataframe Data Structure ‘A DataFrame is another Pandas structure, which stores data in two-dimensional way. It is actually a two-dimensional (tabular ‘and spreadsheet like) labelled array, which is actually an ordered collection of columns where columns ‘may store different types of data, eg., numeric or string or ‘floating point or Boolean type etc. Since DataFrame data structure is ike a two-dimensional array, let us first understand what a two-dimensional array is like. ‘A two-dimensional array is an array in which each element is itself an array. For instance, an array A (m] [n] is an M by N table with M rows and N columns containing MN elements ‘The number of elements in a 2-D os ae. noo array can be determined by | multiplying number of rows with A ‘number of columns. For example, i the number of elements in an array c AI7] [9] is calculated as7x9=63. we ANIN-11 Characteristics Major characteristics of a DataFrame data structure can be listed as: Ontarians ea (axis = 0) anda column inder (axis =D. (Fig. 170)] ran (i) Conceptually it is like a spreadsheet where each value is fole-a0) identifiable with the combination of row index and column index. Bas ‘The row index is known as index in general and the column ae index is called the columm-name. [Fig. 1.7()] sean aR Cote (iil) The indexes can be of numbers or leters or strings. (Fig. 17(@)] ape DEF Soro BO G0 o0 80 60 90 09 Ho 9 00 0 00 aa fo ap a0 uo an Ga a 0 BD 08 00 09 Data Vales (alvaues than NaN ae ata als) ah _ Pee Li rena [i “ae “ts ‘canbe +2 so] isin] { a “aa _Mising seuary > Values Figue 1.7 (a) Some sample Datarame objects (6) Anatomy ofa DataFame object. Qe INFORMATICS PRACTICES. a 7 ail data There is no condition oF ig. 1.70) ° cae value-mutable. resi ocanemeee ) You can a om vy or delete rowsicolumis oe (oy You can add of dee ie ne ares eh out DataFrames —— estan No at 05 a a th DARE 8 ae = ‘nad in following gure [8-170 ‘ fr Frame ind Displaying @ Data Sreatng 3 se by pasing data in wordiensiona FOTMA Like ag can be creat juke make sure to import Pandas and Nant teatements on your code or Python console 19 pba Aaa te sr a . yc rtm tt) man svhere the 2D data structure passed tit contains the data values. Ifyou have imported Pandas asp, then you can create a DataFrame as pd.DataFramet ) aso. ands Both D and Fae capital lt Please note that when you pas no argument fo DataFrame() function, Pandas will create an ‘empty dataframe, ¢g, >>> FL= pd.Datarane( ) DF wilde nated x empty dtaame it pind and cola. ‘You can create a DataFrame object by passing data in many different ways, such as () Two-dimensional dictionaties ie, dictionaries having lists or dictionaries or ndarrays or Series objects ete (2 Two-dimensional ndarays (NumPy array) NOTE (i) Series type abject (io) Another DataFrame object Pandas Dataramel) with omen creates an emoty datatrame Paving 2 ‘ows and no columns. 1. Creating « Dotaframe Object from 0 2-0 Dictionary ‘A two dimensional dictionary isa dictionary havin ; ictionary having tems as (key : value) where value parts ae structure of any type: another dictionary, an ndarray, a Series object alist et. But hee the "ale parts of al the keys should have similar structure and equal lengths. Following sections explain these with the help of many examples 7. Mark, “Gurjyot", “Janaat], CChaplee 1: PYTHON PANDAS = 1 >>> dlctd « ("Students : [‘Ruchia', “neha Marks” : (78.5, 3.75, 74, 68.5, 69) “Sport! : ["Cricket", "Badminton", ‘Football eae ‘ton’, ‘Football’, ‘Athletics’, ‘Kabaddi") } (harks': (79.3) 93.75, 74, 86.5, 9), "Sore {'rleke™ Reine’, ‘ental Aheties, Nab") CituchikaSyena", “Mark, ‘Gurjyot", "Jonaek"}) >> tft «pd, Datarrane(aicta) >oodtf Marks Sport Students @ 79.50 Cricket Ruchika 103.75 Badminton Neha 2 74.00 Football ark 3 Gurjyot 2.50 Athletics (le 5.00 Kabaddi Taneal into msl nnd ‘As you can ee thatthe created Datarame object hast index assigned automaticaly (onward) just ast happens with Series objects and the columns are placed in sorted order You can specify your own indexes too by specifying a sequence by the name index in the DataFrame( ) function, ¢., >» dt2= pd.DataFrane(dict1, index [‘I', "I", "TIT, "IV", ‘V']) doo ate Marks Sport Students Spy the now nds by ig inde sequence T 79.50 Cricket —_—Ruchika Tr 83.75 Badminton Neha TIT 74.00 Football Mark IV 88.50 Athletics Gurjyot v_ 89.00 Kabaddi Jamaal Sethe inde amt pete inde segue If you specify the index sequence, Python will take indexes from the index sequence, BUT the number of indexes given in the index sequence MUST MATCH the length of dictionary’s values, otherwise Python will give error (se below) pa DataFrane(dictl, index =[ 1 jeaceback (eoet recent call Tast File “C:\ProgranData\Anacondad\1ib\site-packages\pandas\corelinternals.py", line 1608, i> construction er patted, inplied)) Isluckror: Shape of passed values is (3, 5), indices imply (, 6) - “7 34 anes Tt : contr age Cy OY srt Wo 8) ou eee ». aoe dictions Contr s, ne tra on dn ew mS , on “4 ms 1 560¢ SOLUTION ire. cs *, :3 re pesection 8B Cy oe Ceperi (6708, 60 sam rn =p atatrane(@ice) rint(oF ne having values os dictionary objects ction ftarome from 0 20 dicionary ma eh (6) Creating o 3 se ae . /A2D dictionary can have ¥é cn po) “object using such 2D dictionary ‘object, Serena are POPE. cas ale’ jy oe re (name's "onstage" As "SE a : » en ee ge 250: Fea)? Snarketing’: (ane: Neha» erste teat aries are exacy the same (name, age’ and ‘Sex. Here, the keys of inner dict ‘You ean create a DataFrame by passing this dictionary >>» pd.Datafrane(people) ma Marketing Sales aos | age BOM NE? rane Neha Rohit sex Fenale Male ‘Ths time the keys of inner dictionaries make the indexes and the keys of outer dictionary’ the columns. ‘Cea anddsley a DataFrare from «2D dictionary, Sales, which stores the quarter-wise ser dictionary fr too yes, as sam below sales {"yet": ("Qte1' : 34500, “Qtr2" : 56008, “@tr3" : 47000, "Qtra’ : 49000)» "yea! ("Obra : 44900, "Qtr2" : 46169, ‘@tr3" : 57@0@, ‘Qtra’ : 59008} } SOLUTION {port pandas as pd Sales =(‘yet" :{ “Qtr”: 34500, “Qtra": S600, “@tra" : 47000, "Gtr" : 49800 )> ‘yra' + (Qtet" + 4900, "Qtr": 46100, Qtr3* : 57800, ‘Qtra* oe 1 "Qtr3" : 57008, “Qtna* : $9000} } rint(dfsaes) Output ger atre aer3 aera a i | (Chapter | PYTHON PANDAS — 1 35 In above example notice one thin ‘one thing as the Keys of all inne dictionaries (yr, the same in number and names the da ; marae eo the dataframe object dfsles also has the same number of Now, had there been a situation where inner dctionai inner dictionaries had non-ma in case Python would have done following things mca vinieen () There would have been total number of indexes equal o sum of uni all the inner dictionaries ee eae (i Fora key that has no matching keys in other inner dictionaries , value NaN would be used to depict the missing values. >>> Collect = { "yr" :1508, ‘yr2":2500) >>> Collect2= {"yri':2208, ‘Nil: @) >> Collect = { ‘T': Collect, ‘11' :Collect2) Staommamm aan eyes estore ‘2D dca 9» df = pd.Datafrane(Collect) : | teres | >» af 7 1 m ye 0 ‘ie rasan yel 1500.0 2200.0 fe ato ern mation neo yr2 2500.0 Wal aa EXAMPLE BERN Corfily read te folowing code import pandas as pd yrh={'Qte" : 44908, ‘Qtr2" yr2={ "A" : $4500, "B" : 51002, aisalesi = (1: yr, 2: yr2} «F3 « pd. DataFrase (diSalest) 46100, '@3" : 57208, 'QU" : $9080} ‘tra’ : 57000) (i) List the index labels ofthe DataFrame df (i) List the column names of DataFrame dB. SOLUTION () The index labels of df will include : A, B, Q3, Qs, Qer1, Qu2, Qied ‘The total number of indexes is equal to total unique inner keys, it, 7. (i) The column names of df8 will be : 1, 2 NOTE ‘Total number findeses in OstaFrame object re equal to total unique lner kas of the 20 tionary passed tot and if woud use NaN values to fil sing data Le, where the forresponding values for 3 key ar missing in any nner dictionary. ANTICS PF 36 ements (ist of dictionay, ‘Sajet such that the inner gig aries Oe a DataFrar will make TOWS. For exanpye anys values 197.5) rks: 38) ie 98.5) perc, eopperb]) —— Ts iso tong o> import pee sas, "Nate" >>> topper { *ROLLAO" 4 > et oe >>> topper = { *pollno" :387+ mane" re gale 223 eppers = Leper r=: ee) >>> topof = pa 29> top 2 307 preet 98.5 [SSesouies have made te a 7 Joma ett acutanscapi stn a aye Soorar can speay your own index bes though the index argument cor owfindex labels (see below). ec") ) >>» top = pd. DataFrane (top elim to tts Paya 97.5 Rishi 98.0 Preet 98.5 Paula 98.0 -s, index ["Sec A", "See B™, "Sec C” pers, bsg EXAMPLE BEY Writes rogram orate fame from ast containing dctnares of the sales of our onal fies, Zane nas sald ee ow labels SouuTion Auporsperds 2p Output rarget Target” ste, ‘sls ste) zoned 56000 {"Target' 78000, ‘Sales’ :689¢0) zones "Target':7500@, "Sales" :78600) zonec | 720 zoned 60000 zoned « "Target :6000, "Sates :61600) sales [zoneA, zone, zoneC,zoneo) SaueOF Pe taFranesales index «(zneh, ‘zon "zones", print( saleot "zone, “zoneD"]) ay > CChoner 1: PYTHON PANDAS — | Passing. a 2D list, ving list wil also creat a dataframe wher each inners assing a 2D list, ie, alist having lists w form the row of the dataframe, sease ie > std (25, 45, 60], [36, 67, 89), (88, 90, 56] ] e12 2 25 45 60 138 67 89 2 88 99 56 ith 2 lst, the cet dattame by eft name the columns and rested cols nd indexes 20,32. unless you specify the index and calunns agumens Yor an aneagaot 2 namevabds with index argument and column abel wih ween sg ee DaaFamet a istatedin folowing two example EXAMPLE BEB) Write « program to create dataframe from a 2D ist. Specify mam index lbs SOLUTION mport pandas as pd List2=[ [ 25, 45, 60], [34, 67, 89], (88, 90, 56) } df2 = pd.DataFrame(1ist2, index = ("rowi', "row2", ‘row3"] ) print (d¢2) Output o 1 2 rowl 254560 row? 346789 rowS 88 9056 EXAMPLE BM We «progr cates dlfame fom a Tat otning Tach nang Tp od pes asa php tasaor’ amare sownon Apart pandas as Target «(e066 7060, 75000, 6000] Sales = (58002, 68000, 78000, 61800) doneSales « [Tacget, sale] ‘zsaleOF = pd.DataFrane(ZoneSales, columns = ["ZoneA' , "Zones" ,“ZoneC" ,“ZoneD"], Index = "Target", “Sales*]) print( zsaleD# ) Output Zoned zoneB_— zone zoneo Target $6000 7000075000 60000 Sales $8000 6800078000 61000 er Of 38 7.0 ndorro¥ ° having shape as (, >» nare2 = np.array(({[11.5, 21.2, 33.8], [48, 50, 60], [212.3, 301.5, 405.2]]) op ctts = pd.0ataFrane(nar?, colums= |'First', ‘Second’, "Third! ], index = Eek ees Dee [mone us 22 3.8) Seeworproegey 40.8 50.8 60.8 ‘sequence felines 212.3 301.5 405.2 ~ Inthe above example, ; 2mples the ndarays tha are pase ineach ofthe rows. If however ea Passed to DataFrame have same number ofelemet® “rows of ndarrays differ in length, ie, if number of ele CChopler 1 = PYTHON PANDAS ~ 1 39 Leach ro der, hen Python wl eae ut omc Crete fst sng clu in te date : 1pe ofthe oan wl be considered ben elo ae and pp nares =mp-array(((303.5, 201.2], (480, 52,600, 700), (232.3, 361.5, 485.21] ) array({1ist(¥@a.5, 201.2]), 1ist({400, 50, 609, 708]), lea. 3.5.20), type = object) Ee] >>> dt a = pd. DataFrane(narr3) 3 valves doo ata e e (101.5, 201.2] a [409, 50, 600, 700) 2 [212.3, 301.5, 405.2] EXAMPLE BEM Wit wil be the output of following code ? Amportt pandas as pd Ampor't nunpy as np p-array({ (11, 12}, (23, 14}, [15, 16] }, np.int32) 1d.DataFrame(arri) print(dt#2) SOLUTION 01 one 123 4 215 16 EXAMPLE BED Write « program to create a DataFramejrom a 2D array as shown below pices ao | us | 12 [430] 140 | 200 Output [us [ae [a oa 2 ~ 0 101 3 124 1 130 140 200 2 us 26 27 SOLUTION import pandas as pd import numpy as np are2=np.array({ (101, 113, 124), (138, 148, 260 }, (115, 216, 217] ]) dt 3 = pd.DataFrame(arr2) print(dt3) a INFORMATICS POSES = 1 0s Series Objects Ina2D dictionary, you wary aS argUMERt to A 1D Dictionary with Values multiple Series objects: ‘4. Creating o DataFrome ‘Object from o 2 wees nose ae th ors mnt cena ge ae aoweng Ses HS 1 eo eS tab ere ema ten Paseg ST EOE >>> dt = pd.DataFrane( school) Now the DataFrame dtf, created above, will be like poo ate ‘amount people © 166000 20 246000 36 2563000 “4 ZRAMPLE PJ Conte tow sores eects staff and salaries that toe the number of people in various ofc tranches and salaries distributed in thes ranches, respectively Write a program to creat anther Series objet at stores average slry pe ranch and then rest @ DataFrame object fr thse Sve objets, SOLUTION import pandas as pd Snport numpy 25 staffs pd.Series([20, 36, 461) | salaries = pd.Series([279000, 396800, 563000]) ‘it will create avg series object eee avg = salaries / staff orgs {‘people' staff, ‘Anount salaries, ‘Average’ :avg } aes = pd.DataFrane(org) print (ats) Output ‘Amount average people 0 279000 13950.000000 20 4 396800 11022.222222 36 2 563000 12795.454545 44 5. Creating « DataFrame Object from another DaiaFrame Object ‘You can pass an existing DataFrame object to DataFrame( ) and it will create another dataframe object having similar data, Consider the code shown below that passes to DataFrame() another datarame object Given a Daaframe object tft >ooatfa @ 1 2 3 6 CChopter |: PYTHON PANDAS — 1 a You can create an identical data i te an identical dataframe by passing its name (dtfl) to DataFrame( ) ‘fnew = pd. ataFrane(at1) You an ply he ew Daane tet o confi ht tii ene e123 Lace When you cesta datframe using another datattame you should ad an ade ame, ou should ad an adiiona argument todstafame() method as copy= Tr, which wl ensure hat he newly ted dae oe True copy ofthe pased dataframe and not anew label ely. The sbove give ethod cee only ane abel refering to the sme datafame, To rates datarame set slnked sew copy we copy) ofthe pased daftrame sete adlonal argument cop + me Refer tothe inf losing the ston Ming snl cl, page 560 wns his Please note, there are some other methods too for creating dataframe objects but covering those will be beyond the scope of the book — __ NOTE Displaying @ DatoFrame Displaying a dataframe is the same as the way you display other variables and objects, i.e, on the console prompt, either type its name or give print( ) command with the dalaframe object as DataFrames can ao be crated rom vext/S les, whch we shall cove in Chapter 4 — Data Tansler [between Oataframes and Flt lex/MySQL cere ai When you create a DataFrame object all information relate to it (suchas, its siz its datatype ete.) i available ough its attributes, You can use these attibutes in the follwing format to get information about the dataframe object. 7 .cattribute nane> on cae Some common attributes of DatFrame obec ar listed in table below. The code examples forthe usage ofthese atsibutes follow Table 1.6 Table 16 common otitutes of Datafrome Objects Attribute Index | The index (row labels) of the DataFrame coluans | The column labels ofthe DataFrame. axes Return a list representing both the aves (axis 0 ie, index and axis 1, ic, columns) of the DataFrame. dtypes | Return the dtypes of data in the DataFrame. size Retum an int representing the number of elements in this object. shape | Return a tuple representing the dimensionality ofthe DataFrame. values | Retuen a Numpy representation of the Dataframe fenpty | Indicator whether DataFrame is empty. ndin Return an int representing the number of axesarray dimensions, 1 “Transpose index and columns. 42 putes, counting, transpose et, it display various! ing Dataframe (dfn) ‘Weare using the following! >» dfn Marketing Sales : ae x ame Neha Ronit sex Female fle ject ies of o DotoFrame Obi sg name as depic fo) Retroving Various Poperis ets ame with he daarae’s name 38 depicted in “To view the value of an atribute, just following examples boo dfn. index por dfn. size : 6 +), types ‘obfect vrgon( ("ages ‘rane’, “sex']» atyper ‘obsect’) >>> dfn. shape >> dfncolums vapyect" aD Trae aketing’, "Sales, types 'ob3ect') dp dfn.ndin >> dfnaxes 2 Treden({ rage, “name, “Sex'], type 'object") i >> éfn.empty Index({"Rarketing’,‘Sales"], dtype = ‘ebsect')} False vo» dfn. ctypes Marketing object sales object type: object (b) Getting Number of Rows in a DataFrome The lenleDF object) will return the number of rows in a dataframe e3 >>> len(dfn) 3 {c) Getting Count of non-NA Values in DotaFrame ike Series, you can use count) with dataframe too to get the count of non-NaN or non-Ni Values, but count) with dataframe is litle elaborate () Ifyou do not pass any argument or pass 0 (default is 0 only), then it returns count non-NA values for each column, ¢:. >»> dfn. count) >»> éfn.count(axis « " index) Marketing 3 Marketing 3 sales 3 sales 3 type: inte type: inte4 ‘You may also pass argument axis = dex’ to get the same result as above. (ji) If you pass argument as 1, then it retums count of non-NA values for each row, $4 >»>dfn.count (2) >»> dfn count(axis = 'coluans" age 2 a age 2 rane 2 Cain counaxis= columns’) name 2 sex 2 | produce the same resut sex 2 dtype: inte type: intea You may also pass argument axis = ‘columns’ to get the same result as above. CChopler : PYTHON PANDAS — | 43 {d) Transposing o DataFrame You can transpose a dataframe by swapping i You cn anapose ime by swapping its indexes and columns by using attribute T as sooatnct Compare te wanspoataiaT) wa ‘original dfn Marketing 25 Neha Female pain, Sales 24° Rohit Male AKCE Sales ae 25 ane Neha Rohit Sex _Fenale_yale SeAMPLE I Wes pogrom se «Dalam ods wig gy nd wn ofS PIT Daan nf nepee soLuTiON Aeport pandas as pd fr eesting the DtaFrone Src pdsmarren( (tah 42, 75, 661, ‘Wane’ :['Arnav', "Charles", ‘Guru"}, *age’ :[25, 22, 35))) print( ‘Original Datafrane" ) print (df) print Teanspose: print(4f.1) Output original oataframe ‘Age Name weight 0 15 amav 42 1 22 charles 75 —, 235 Guru 66 ne Transpose: You can also use shape{0] to see the ° ae ‘umber of rows and shape] for getting age a 2 35 ‘umber of columns, i.e, name arma charles Guru af. shape[o) weight 42 7S 66 4.shape[2] (e) Numpy Representation of DotaFrame ‘You can represent the values of a dataframe object in numpy way using values attribute >>> dfn. values arvay({{'25", '24"), [iweha’, "Rohit", [‘Fenale', ‘Male'}], dtype = object) 44 evs. Series and 20 Numpy AFraY rapaaame sec si fom Ses oe and how it works letus discuss 22D NumPy Arrays. 11. Datafram [Now that you have dear ides how a Dataframe objec is sila lata structure and Series is Series is a 1D 4 table as well as size-mutable. 1.11.1 Dotofrome vs. Series Dataframe as such is a 2D ‘value-mutable only while @ Table 17 otapome vs. eres | Series Objet | ‘SNe. Datarame obec This 1D data structure, Its value- mutable only Itstores homogeneous elements (same datatype). data structure while dataframne is value-mut 1, | feis 2 20 data structure 2, | This value mutable as wel as size-mutable 3, | ttean store heterogeneous elements ferent datatypes) 4._| Each column has a heading. Its single colurnn does not have a heading inividua cola ofa tame canbe considered equvlnto ht fei bie A aan cae como amdar oases bet wih ite diference that Tow also has size-mutability unlike Series objects.” 1.11.2. Dotaframe vs. 2D Ndarray ‘A Dataame he 20 nda two denna ata suc, However iti ferent frm 2D ndarays Lats ow Toble 1.8. ootpame vs 20 Ndarays SN. Dataframe Object 1. | tis a 20 data structure 20 Ndarrays Iti also a 2D data structure, 2 | Itean store heterogeneous elements (diferent | It sa table of homogenous elements datatypes) (usually number) all ofthe same type. 3. | Itcan have indexes as wel as labels fo rows and columns, tis indexed by a tuple of positive integers for both 18 consumes more memory than an equic ‘rae It consumes lesser memory compared to equivalent datsfame, 5 | Dataeames are expandable as you cana Pe en at Ya Num aay ae nt expandable. you add ne emer slement then anew areay wll be created Even Wilh PPend|) function anew array is created an fle. 2. rie sett ih etn (Oke enn te leche st2 Mh alton one dts, mya Sp adorns 4 CChoplor | «PYTHON PANDAS ~ 1 45 1.12. Selecting or Accessing Data From a DataFrame object, you can extract or requirement. Let us see how. For all the examples inthis section, in coming line, we are using the following DataFrame a select desired rows and columns as per your DataFrame : dtf5 Population Hospitals schools Delhi 10927986 as 716 Mumbai 12691836 208 3508 Kolkata 4631392 a9 m6 Chennai 4328063, 357 7617 1.12.1 Selecting/Accessing a Column Selecting a column is easy, just use the following syntax. [ ] «————— Using sre och <————— in do nan Now, consider the following example accessing columns Population, Schools from dataframe at. >>> dt5{ Population’) >>> dt5[ 'Schools"] eihi 110927986 Delhi 718 Mumbai 12601836, Mumbai 8508, Kolkata 4631392 Kolkata 7226, Chennai 4328063, Chennai 7617 Nane: Population, dtype: intéa Nave : Schools, dtype : int6s In the dot notation, make sure not to put any quotation marks around the column name, For example, >>> def. Population beth 10927986 SS mba 12691836 Kolkata 4631382 Chennai 4328063 Name: Population, dtype: intéa 1.12.2 Selecting/Accessing Multiple Columns ‘To select multiple columns, you can givea list having multiple column names inside the square brackets with dataframe object, ie. as follows a , , , 46 reaches ae For example, . +, Wospitals’]] poo dtfs[ ['Schools'» ae ‘schools Hospitals oan oe mumbai 8508 bi Kolkata ‘7226 we 157 chennai 7617 in the order of column names erica above result £00. (see below). Compare it with yoo atfs[ ['Hospitals’, ‘Schools'] ] Hospitals Schools ethi 189 7316 Mumbai 288 8508 Kolkata 149 7226 767 chennai 157 ‘Given a DataF ame namely aid ti EXAMPLE Toys Books Uniform Shoes ‘Ando 7916-6189 6108810 Odisha 508 8208 508 «6798 mp. 7226 «614961961 up. 76176157 457,847 Write a program to display the aid for (Books and Uniform only (i) Shoes only SOLUTION import pandas 25 pd ‘oataFrane aid created or loaded print(“Aid for books and unifora:") print(atd[[ "Books", “Uniforn') ]) print ("Aid for shoes:*) print(aid.shoes) 1.12.3 Selecting/Accessing a Subset from o DotaFram To access row(s) and/or a combi nation of rows “cecil oc <@ataFraneObject>.loe [ , : given in the list inside square brackey Torts the aid by NGOS for different states Output ‘kid for books and uni for Books Uniform andra 6189 610 odisha 8208 508 mp, 6149 GIL up. 6157 457 Aid for shoes: andhra 8810 odisha 6798, MP. 9611 up, 6457 Name: shoes, dtype: int6# fe using Row/Column Names ‘you can use following syntax !0 } ‘The above syntax us see some examples PYTHON PANDAS - | 47 '@ general syntax through which you can single/multiple rows /eolumns. Let © To access a row, just give the row name/label as this Make sure not to miss the COLON AFTER COMMA,” 95 [tow Bbsb a >>> dtf5.loc{ ‘Delhi’, 2} y sy >>> a5 2oe{"Chennat| Population 19827985 a Peat oa couation 30003 seeds mie foe 7a ane: Deh fone! Chena © To access multiple rows, use : loc [ :, :]. Make sure rot to miss the COLON AFTER COMMA. Al >»> dtf5.loc[ "Mumbai : "Kolkata" Population Hospitals Schools Mumbai 12691836 208 3508 Bicep eS Kolkata 4631392 143 7226 ‘sd clon ater comma | Please note that when you specify :, the Python will return all ows, falling between start row and end row, along with start row and endrow. (ee below) >>> dt5.loc["Mumbai :‘Chennai", :) Population Hospitals Schools Mumbai «12691836288 8508 teow Kolkata 4631392149 726 se om a cakane = =e s he ‘© To access selective columns, use EDF object>loc [ : , : }, Make sure not to miss the COLON BEFORE COMMA. Like rows, all columns falling between start and end columns, will also be listed >o> dtf5.loc[ ‘Population’ :*Schools’] Population Hospitals Schools Delhi 19927985 189 7916 mumbai «12692836208 « 3508 Kolkata 4631392149 7226 Chennai 4328063157 7617 >>> dtf5.loc[ +, ‘Population’ : Hospitals") Population Hospitals Delhi 19927986189 Mumbai «12691836208 Kolkata «4631392, 149 Chennai «4328063157 - INFORMATICS PRACTICES _ Cha PON NON fos use eee ied n> : cendcolunn>]. © Tose mg avn cnr, ear cmunbos' , ‘Population’ : Hospitals] poadt 5. Joc{ Delhi EXAMPLE BRE Consider a dataframe af es shown below population Hospitals “ang args of obrins - ____ Sample Dataframe df (Reference 1, pelhi 0927988189 “fom aange fos ‘an pe Sales ies : a eaaeanaEEEE oe ie 8 ge Saas RR ae iar au baron ay a at teres he i by NGOs or diferent Sites Ss 2 Eee roys socks Uniform Shoes . Ee = andrea yone=—«189.— «18 sexe crv galt eee oo caista 5088708508 6738 eed ae me 69611 sett See eee eat up, 76176187457 as7 eS eet Ee We ton od te ling: (Dipl rus 2 fo 4 hin) Ci) Frm rue 24 ht cs die cles, he Typ on ol aft (i) Ports 2 0 th ice, dp fer atin Ge) Dipl Sls Chana and “Oner Df an 5 ny. SOLUTION (maf 25) Write «progam to display te id for states ‘Andra’ end ‘Ovi’ for Books and Uniform ony SOLUTION core import pandas as pd i Datafrane aid created or loaded print( aid.loc{‘Andhra’ :‘Odisha',, "Books" :‘Uniform*]) andnra 6189 odisha 8208 ‘You may also specify distinct row id and column names as lists with lo, €8 ‘id.loc{[ "Andhra", “U.P."], ['Toys", "Shoe"]] will list data only from rows with row id ‘Andhra’ and ‘U.P.’ and from columns “Toys ‘Shoes’ only. You will see such a statement as part of next example. foosart 257 ten type Sales Channel Onder Date Order 10 Total Revenge Total Cost Total Profit ters Cae opting arnt “tsar ster sy oun.) akin ‘racke—«>> @F.10c [ 2:4, ["TtemType", ‘Total Profit] pea. loc [ 2:4» ["Iten Type, “Total Profit") } “OF object>.iloe[ start row index> : , : >> at #5.doc{@:2, 1:2] Hospitals a Dethi 139 Mumbai 208 ‘when ven as start: end. an 762 offline sas ‘Recall hat when you use Hoc, then >> df-Aloc[2:5,0:4] (iv) >>> df toe{(4, 5], ["Sales Channel, "Orders" >>> dts. loc[0:2, 1:3] NOTE a (0,5) t : By hospitals schools San tetas) [Sidhaadinah Causto, S| bane as 76 ahi tn sa el on eee ental |e ae eer tee| Mumbai «2888508 label are included when given I eeaaee) te coc Grencezic'sra/ youu eam 1a omine 8 \ start end, but with og ike i 5 personel Core Online 3/13/2017 12 offtine 60 1 ‘end index/position is exclude a ‘Smacks Online 2/25/2017 Is ‘online oor i INFORMATICS PRACTICES _ yy 50 s ing Individual Value 1.125 Selectina/Accesing ndvitNS Ts dataframe, you can use any ofthe following “To select/acess an individual data v4 ‘methods (@ Bither give name of tow oF oF object «column? numeric index in square brackets with, ie, as this [crow nare or row nuneric index>] Consider the following examples nninan ets. Population( ‘DeIht"} “nn oebeteaten oo at ore Clie conjure edt >o> dt¢s.Population( 1) 12681836 {i You can use ator iat atibutes with DF object as shown below se To 14 > at{ roy labels, col Label ‘Access a single value for a tow/ |) “oFobject>.at{eron label», col label>] tere ‘Access a single value for a row! column pair by integer postion , col Sndex n0.>,] Consider examples given below Youn gv ane win 9 82 803 >>> dt#5.at[ "Chennai, 'Schools") ure ow nde and nome ued lar or nih iat atte pop dtf5.iat(3, 2] ‘sess a 7617 NOTE 1.12.6 Selecting Datafrome Rows/Columns based on Boolean Conditions ‘Sometimes you need to select rows/columns from a dataframe based an a condition, just the way you filtered entries in series ‘objets in section 1.66. When you compare a dataframe with a value then Pandas wil execute that comparison condition for ‘ach element of the dataftame and give you True or False accordingly for each element, eg, ‘The a and iat attributes are used to. acces. single values. (saa values) at specif leaton in 2 atafame while at uses row and column libes, iat uses ine postion to acess the vale. oo fea >> det > 0 a « True True Tre ‘ea, conpared wach tteant iates au ‘Chops 1: PTHON PANDAS — 1 51 ‘You can apply condition to individual columns or a range of values to0 as shown below poo ates People Amount average @ 28 279008 13950. eaee00 136 39680831622. 222222 2 44 — 56300032795. asesas 3 34 49680014611. 754706 4 49° 763000 © 18573.428571 >>> dtf5[ ‘average"] > 14000 fete) se isn pt came False (cle Aap) othe date ifs e a 2 False 4 1 app te gb coon toch sme fe 4 True ‘olor aon oe te eth True alvin the cola ae Nane: Average, dtype: bool Duteppying te conden ike hae ave given youjust theres Teor Fe To ett sie es nuame a puts ane eae ec eereeeee cron idee spare bee sete asec eee ee erereetedeeaen| Now cle expe low ht py hued shown ee bt tin, i mboet ofa Go tat eutes ee acim een ee 2 ae seme sett r6aes a il Similarly, check the following statement spat arn") > 08 ete me eee the comin is end ie gts Internally Pandas checks the condition for each row and retums True or False. These truth values (True/False) act as an index for the rows and the rows with True index are returned, similar to Boolean Indexing covered later. The Boolean selection discussed above, works similar to Boolean indexing but is different from that. Their difference will become clear to you when you go through the topic Boolean indexing later in the chapter. NS gp 32 ed simple NOTE we have dso , ease rementer al Mors. Bolen ston 9S a e ndosing = condition oy conditions for extras yond the scope of the book: Gataame’s subsets nade gl advanced conditions # PSM auctions canbe refered 19 sare brackets next tg However, Arend B— Bolen datarame name viel the read more about it values that match the i yu want to read mer Pt esubees from a Series object ch the ee Inthe same way, you an &* " ‘condition. ‘posed on a condition. Consider following 0 mpl too, a.com PRAMPLE BN Fron series Ser of reas of sates in kn, find out the areas that ar ens tal stores ens of ites Li ries Ser tha 50000 bn SOLUTION ee puta a08 Snr, 90, 5, 782, UST, 788, 257, TE, 1637632, 25723, 2367, 11789, 345, 256517]) print(Sera{Sert > 50000]) EXAMPLE BY Gen «Sri ober. White program to store he squares of he Sve wales in objet Display 6 aus which ae > 1. SOLUTION {import pandas as pd Series cbjact s5 created or loaded print( "Series object s5 :°) print(s5) 562552 4 s6.created print("Values in s625 :*) print(s6[ s6>25]) 1.13. Adding/Moditying Rows'/Columns’ Values in DataFrames You can assign or modify data in a dataframe in the same way as you do with other objects. All you need to do isto specify the row name and/or column name along with the dataframe's hhame. The proces of adding and modifying rows /columas’ value is similar, as you will seein the following sub-sections 1.13.1. Adding/Modiying o Column Columns in a datatame canbe refered to in multiple ways. Assigning a value toa column: © will moi ifthe cokumn steady exists 9 will add anew column, if it des notes already To change or adda column, use syntay ‘OF object >. = cnew values ‘OF object >. columns: new values CChopter 1 PYTHON PANDAS — I the given column name does not e 53 Mie g vist in dataframe then a new column with this name is, >>> dtFS[ ‘Density’ >>> dts 219 Population Hospitals Schools Delhi 10927986 ago 7ag Mumbai 12691836208 aso, Kolkata 4631392 us 726 Chennai 4328063 577617 ‘Although the above method adds a column BUT here the catch is that all the rows of this new ‘column have the same given value. If you want to add a proper new column that has different values for all its rows, then you can assign the data values for each row of the column in the form, ‘Since haa sna conn Hf you asin something to (Fusing at oF lee and if column name does nat fet, then a new colun i created and that has the Same vale for alts ows. of alist, ie, as shown below d>> tf SL Density!) = (1599, 1219, 1636, 1050, 1180] —, po aees —— Population Hospitals ‘Schools This time Python Delhi ——10927586.0 © 189.8 7916.0 ies sed) himboi —12691836.0 208.0 «—as08.0 Dae Kolkata aesnssz@ = us.@ 7226.8 yates rat Chemnai a328063.8 157.8 7617.8 fon apc at tanglore Se78057.0 1200.8 1200.8 Same way, you can modify an existing column by assigning a new list of values to it. Thats, for existing column, it will change the data values and for non-existing column, it will add a new column, There are some other ways of adding a column to a dataframe. These are .at[ : , } = «values for colusa> Or -Joe[ : , ] « Or OF object.» =) For example, given below are some statements that will add a gore: new column if the mentioned column name does not exist in a dataframe When you asign something to a column of datarane, then for existing column, it ensi e wil change the data values tS. loc{:, “Density"] = [1500, 1219, 1630, 1050, 1109] ni ge he Ot LFS « dba, assign( Density =[1500, 1228, 1630, 1050, 1100] ) wil add a new column. You just need to make sure in above examples thatthe sequence which contains values for the ‘nev column must have values equal to number of rows inthe dataframe otherwise Python will ai re error (eeror : ValueErt0r ) CES 54 1.13.2. Adding/Modifving © Row — Tike alums, you an change or 244 rows 108 Data rame SiG at oF 106 at, explained below ‘To change or add a row, use syntax = , +] 1: ] =newvalue> ‘new value> Or «oF object» Loc{, Likewise, if there i no row with such row label, then Python adds new row with this rg and assigns given values to al its columns ooo dt#5.at{ Banglore’, :] = 1208 poo dts ie Population Hospitals Schools Density. Delhi 10927986.@ «189.8 7916.8 1219.0 Mumbai -«*12692836.8 «208.0 8508.8 1219.8 Kolkata 4631392.0 149.0 7226.0 1219.8 Chennai 4328063.0 157.0 7617.8 1219.0, sion vale fra) Tanglore 1200.0 1200.0 1200.0 1200.0 Roun ‘As you can see in the above output that ifa mentioned row label with ator loc attributes d not exist in the DataFrame, Python will create a new row for it and this is how a new added toa DataFrame. But there isa catch if you specify only a single value, then all the va in the newly added row will have the same value as it did in the above output. ‘You can add a new row by specifying individual values for each column. For this, specify all ‘values of the new row to be added as a sequence such as a list ete, >>» dtf5.at{ Bangalore’, :] = [10002980, 171, 7311, 1200], >>> dts Population Hospitals Schools Density Deahi 10927986.0 189.0 7916.8 1219.0 Mumbai —-—«12691836.0 208.8 8508.0 1219.8 Kolkata —-«4621392.0 «149.8 7226.9 1219.8 Chennai __4328063.0 «157.0 7637.0 1219.8 Banglore 10002980.0 171.0 7311.0 1208. While adding a row this way, make sure that the sequence containing values for diff columns has values forall the columns, otherwise Python will raise ValueError (se below) Ifyou try to add a row having 4 values to the above DataFrame dtf5 having 5 columns, Python will give you error his stato i yin inset >> dS. 2oc{ "Mohali, :] = (452980, 72, 281] <——— fu oa having 3 aes the dane {ur coons od this hea Value€rror: cannot copy sequence with size 3 to array axis with dimension 4 ——_ NOTE You can se at orc tirbvtes of Dataframe to adé/modty a row, cok or ini Chops 1: PYTHON PANDAS — 1 ng EXAMPLE BID Consider the flowing dataframe saleDf Target sales zoned 56000 58000 zoneB 70000 68000 zonec 75008 78000 zoned 60000 61000 Write program to adda column namely Orders having values 6000, 6700, 6200 and 6000 respectively forthe zones ‘A,B, Cand D. The program should also adda new row fora new zone ZoneE. Add some dummy values in this row. SOLUTION import pandas as pd 4# saleOF created or loaded here saleDf{'Orders'] = [5000, 6720, 6200, 6000 ] saleDf.loc["zonet",, :]= [ $2802, 45000, $200] print(saleDF) Output Target sales orders zones $600.0 $8000.0 6000.0 zone8 7000.0 6800.0 6700.0 zonec 7500.0 7800.0 6200.0 zoned 6000.0 61000.0 6000.0, zone 5000.0 4500.0 5000.0 1.13.3. Modifying o Single Cell ‘You may use iat{ ) to modify values using row and column position. (Refer Multiple Choice Question 22 given in Objective type questions atthe end ofthis chapter) ‘To change or modify a single data value, use syntax . [] = >2> dtf5.Population| "Banglore’ ) = 5678037 o> ates Population Hospitals Schools Density Delhi 10927985.0 189.0 7516.8 1219.0 Mumbai —-—«12692836.0 208.0 8598.0 1219.0 Kolkata 4631382.0 149.8 7226.0 1219.0 Chennai 4328063.0 157.8 7617.0 1219.0 Banglore ((5678097.0) 1200.0 1200.0 1200.0 ‘ee, tis tma only his oe got modted atatrame as shown below = an exist when you cee 2 daame using an est ya einige?) oun 1 55 7 58 poo df= pd. ataFrane(sf2) sd ei 2 eure 1 55 67 58 Now if you change the value ov fa Doe(, 2] = 85 one etl ing i domed fone cell of tbe dataame afl (orginal dataframe) 35 ‘Butt you check the value ofthe new dataframe vale of he given clin dataframe Saat easmicibe fi wl also be charged, oe >of me et 2 enue B Changed ae ue iattame, unaffected from the changes of af1? Why didithappen ? Wasn't df2 supposed tobe a diferent da lasDataFrame{ ) method. So far you have reed “The answer toll your questions isin the syntax ofthe pands the syntax of pandas.DataFrame( ) method as “cdatFraneObject> = pandas, DataFrane( , \ [columns = J, [index « cindex sequence>]) Bt there isan aditional optional argument, copy whichis by default set to False, ie, ye cdatFramebject» = pandas. DataFrane(, \ [columns = J, [index » cindex sequence>], [copy = False]) ‘With copy as False(s default value), the new Datarameis not created asa separate copy andis thus linked to the orignal datas ony anew labels created efersng othe orignal dataframe). Hence all changes made to the original detatame are reflected in it and ths was the reason that 2 also showed the changes above. In order to creste a datatrme as a completly different unlinked dataframe, you need to set the copy Opto orwmen co, whi Feb df argument s Tivo tht the ve dtfrare i rated as are copy (deep copy), i. as >>> df3=pd.DataFrane( #1, copy = True) 1 ene hat he alae 29> ft Doe[4, 2]= 78 spat ‘utd a Tre py an change in aan) ‘he cgi dataome ar refed init eunnp 1 55 67 78 Datoome Jf coated wi copy es Fate vee a et ae Sis al cargo ee tina detajome B Dutfame 3 cue wih copy ws Tru isnot inks othe rial dea Trae copy 155 67 as doce doesnt fet any ches of ft ‘Chopter 1: PYTHON PANDAS 57 1.14 Deleting/Renaming Columns/Rows Deleting/Renaming Columns/Rows Python Pandas provides two ‘ways to delete rows and columns — del statement function. Pandas also provides rename) function to rename rowsand cohume Inthe we shall tlk about how rows/columns can be deleted or renamed nn nse Let us now talk about how you can delete or rename columns and 1.14.1. Deleting Rows/Columns in a Dotaframe ‘To delete a column, you use del statement as this del [ ] For example, >>> del dts ‘Density"] . >>> dts ms Population Hospitals Schools 4” Delhi 10927986.¢ 19.8 7016.0 mumbai -—«12691836.0 208.0 8508.0 Kolkata 4631382.8 3.8 7226.0 Chennai 4328063.8 157.8 7617.0 Banglore §678097.6 1200.0 1200.0 ‘To delete rows from a dataframe, you can use .droplindex or sequence of indexes), e.g, Both these commands will delete the rows with indexes 2,4, 6, 8, 12 from dataframe df F -drop(range(2, 13, 2)) Argument to drop shuld be ther an AF drop( 2, 4, 6, 8, 12]) incor aagnne onaing nds You can also give axis = 1 along with indexes/labels then drop() will drop the columns, ie, the following command will drop the mentioned columns from dataframe df df.drop( (“Total Cost”, "Order 10"), axis = 1) Ths rane ied cons EXAMPLE [BBM From the aY wed above, create another DataFrame and it must not contain the column ‘Population’ and the row Bangelore SOLUTION import pandas as pd # DataFrane dtf5 created or loaded de6 = pd.DataFrane(dtF5) del dt6[ ‘Population’ } Output def 6 = dt¥6.drop(['Bangalore]) nee Cees ethi 189.0 7916.0 mumbai 208.0 8508.0 kotkata 49.0 7226.0 chennai 157.0 7617.0 oo TSACTICES: 58 youn use te ENAMEL) ft 1142 : change the name below. Sota per ee cof ronan inden" cetums = (mes HONE), pace sg, want o rename rows, wes senames wists) HOU. a pe index rg i psy ey index clams anes 704 Wan FEAMEcLMNG yy, gumen names-change dictionary st nets pect the names ad 6 For bh index and column 3 form ike (ld name new name) tening es re oo aman rns es Spat ince 2 tif om Min a new dataame is ested. wif al es yo sp ths 30m a dal eas wnarerd Lets now pata se how rename) works Consider the dataame tp shown below foitno Name Marks sock us Pai 87.5 secs 236 Rishi 98 Sec 307 Pret 98.5 seco 2 Paula 98.0 “To change the ow labels as ‘A; °C; D’, you can write "Sec "Sec D':'0'}) The cat of name ha so he change inde $6 rth had egal pf wel? ‘The above statement wil show the changed indexes but when you display the dataframe topDf after executing above statement, it will show you the original dataframe (sce below) because default enamel) rats a new dataframe with changed names. 99 topo folln Wane 45 avni 236 Rishi 387 Preet a he ts She rit uae tps wach coe 1 PHN PON xR rm age 5 oe hacia he etna ea fie A my pee ier "seco 1 'SeCC's'C', 'ec0"s"0"), inplace= True) oe gig ge Tr ergs nn ok he angi he a dof "ih npc as Tr, rename) az hg te inden oil tap Df You can use rename() to change columns names too,¢. following statement will change the columns names in the topDf dataframe. (Consider the original topDf shown earlier in the beginning of this section) You can choose to change selective columns’ names or all the columns’ names. Only the names given in the name-change dictionary will et changed. For example, >>> topo reame(columsoteine Ro Name Marks Seck 115 Pavni 97.5 Sece 236 Rishi 98.0 SecC 307 Preet 98.5 Sec 422 Paula 98.0 md tow each, ‘You can combine any of these arguments together: index, columns, inplace arguments. Following examples are illustrating the same. EXAMPLE [EI Consider te saleDf shown below. Target Sales zone $6008 58800 zone 70008 68000 zone 7580@ 78000 zoned 69000 61000 Write «program to rename indexes of ‘soneC’ and ‘zoneD’ as ‘Central and ‘Dakshin’ respectively andthe clu ames "Target and ‘Sales’ as “Targeted” and ‘Achieved’ respectively. SOLUTION import pandas as pd ft saleof created or loaded here print(saleOF.renane(index = ("zonec' : ‘Central, ‘zoneD":'Dakshin'}, \ Output columns = {"Target': Targeted’, 'Sales”:'Achieved"))) + oe Targeted Achieved zonen $6000 $8000 zones —-70000»—68000 ‘entra? 7500078000 pakshin 60000 61000 ne er inged index. it ge se te angeles nd revi pest rns. sn te pevions iy? Make cans . saleDf she changed indexes and cOlUNS' names SOLUTION gam i 14 inp = THE AEUMEn y 1e previous pr is api Mec Te rep cn wih wae) le would BE me a ramn to ma Te as sed snort ands eed or ead Ne se saleoFcreater saleof.renane( index print(saledf) ataframe Indexing {zone colums= (Taree «zoned: Dakshin’) \ "central", : 7 "saes!:"Aehieved"))» Lnplace stag ‘targeted’, BOOLEAN INDEXING i LS Mon oe ee Daaframesin varity of Ways, YOU hav alo lean Til now yo to change indexes, col sometimes] as indexes ofa dat Ww” 1 Wout isthe ef Pion Ponas ray? 2 Name the Pads bj tat an sre one dinersnl aay We eject an nha umes eed ners. 3. Can you ave dpe idee in 2 seis eject? 4. at do these atbaes of see sity? (@) se (i) temaze (0) yee 5. tin ais objet ten how wl te) aed St cum) bea ? at Nas? Hw you sta them ina deuce? 7. Tue/Fie. Snes objets aay hae indexes 6 to n= ne 8. What sth se of dl satenent 9. What oes dp ction do? Woat the rt of eo pace agen ‘ane ) funtion net ya have learnt to create and lun names, rename DataFrames - Boolean Indexing. dng, the ae uggs ES cone taframe. You mig having indexes as True or False? them etc Let us now talk about an interesting feature having Boolean Values [(True or False) or (1 or ight be wondering - Why? What is the need fap Well, your question is genuine and so is the answer ‘Adualy, in some situations, we may need to divide our data in two subsets ~ True or False, eg., your school has decided to launch online classes for you. But some. “ | the week are designated for it, So, a dataframe related to information might look like Day No. of Classes Monday 6 Tuesday Wednesday Thursday Friday “lai ‘True rows and False rows. This is useful division in situatic where you find out things like - On which days, the oll lesa held }Or Which ones are ofine classes days? A The Boolean indexes divide the dataframe in fo =| BOOLEAN INDEXING Boolean indexing "! having Boolean values iT False) or (1 or} asindeashett dataframe. Boolean indexes can either bein True/False form or in (1.070) form, 7" — ON colamiie” ame 4 chapter I = THON PANDAS — | sad indexes and col te ape ra 1.15.1 Creating Dototromes with Boolean Indexs : Lat wa teat tes Bock nega Tara ly Wale acta aay and Fle. While craig a data - make sure that True and False are not enclosed in {hence it will give you error (KeyEror) while accessing Toc, because ‘True’ and ‘False’ are string values, not Boolean quotes (ie, like ‘true’ or False’, data with Boolean indexes using values. Solved problem 32 depi 's the same problem, Let us first create above shown dataframe containing — code given below. %6 online classes’ information, through the import pandas as pd Days =['Nonday’, "Tuesday', ‘Wednesday’, "Thursday", “Friday* ] Classes =[6, 8, 3, 0, 8) eco indes provid Days’ :Days, ‘No. of Classes" :Classes} Eee lasDf = pd.DataFrane(de, index = (True, False, True, False, True] ) Belo vals rt sings ie (aot ened in gute >>> clasoe 5 Days No. of Classes Monday 6 Tuesday 8 Wednesday 3 Thursday @ 8 Friday ‘You can also provide Boolean indexing to dataframes as 1s and 0s. Let us create another dataframe (clasDff) similar to the above shown datairame (clasDf) having indexes as Is and 0s, through the code given below, “import pandas as pd ays [‘Nonday", "Tuesday", ‘wednesday’, "Thursday", ‘Friday’ ] Classes = [6, 8, 3, 8, 8] dc= (‘Days" :days, ‘No. of Classes’ :Classes} clasDfl = pd.DataFrane(de, index = [1, 8, 2, 8, 2] ) : ae ~ tides provid fo each ne end Oe hi ime >>> clasbfl Days. No. of Classes. Monday 6 Tuesday @ Wednesday 3 e 8 Thursday Friday / W ean Indexes 62 fron Datofromes ened ie, fr fining oF extracting sing for fiteing rn dataframes with Boole, =a 1.152 ACES very ust ecards from data oolean indexes ia Boolean inde ju can iter “ : a Fue inderd ed below lith True index. Joc atta atodisplay all i ith False index oo) stocisplay 21 re index as 1 oa alse all records wi attra emi) eae arecrdswithindexas® Bsn on. gro aisel® ert oie?) ap 7 svete 04] ‘ no, of Classes 6 3 8 days ‘No. oF Classes Tuesday = Thursday 5 : >>> clasDA1. 1oc{@] Tariana ‘ cane bays No, oF Classes a Tvestay ° Turstey Tals Me hivecometo te end ofthis chapter. Let us quickly revise what we have lari shape. LET Us Revise “ry fom O10 Lava. tg daa da “dope, shape, nbytes, ni, size, temsize, hasnans, em? PYTHON PANDAS — 1 hope | J 1 Aslic objets rete from Series objec ut Sg @ sma of .T 4 When you assign someting toa column ofdtframe, he for existing column, i wil change the daa vlues nd for rnon-eisting clump, wil add @rew column, 4 Acolumn canbe deleted using dl cormand. Opiective Type Questions Multiple Choice Questions f create an empty Series object, you can use (@) pa Series(empty) (©) pa Seriesinp NaN) (0 paSeries) (@)all ofthese 2, To specify datatype int16 fora Series object, you can write (@) pa Series(data = array, dtype = inti6) _(@) pa Series(data = array, dtype = numpy.int6) (6) paSeries(data = array.dtype = pandas.nt16) (@ all of the above OTQs 3. To get the number of dimensions of a Series object, _ attribute is displayed. (@) index () size (0 itemsize (0 dim 4. To get the size of the datatype ofthe items in Series object, you can display __ attribute, (0) index () size (©) itemsize (@ nin 5. To get the number of elements in a Series object, __ attribute may be used. (0) index (size (© itemsize (@ dion 6. To get the number of bytes of the Series data, _ attribute is displayed. (@) basnans (0) bytes (nim (@) dtype 7. To chuck ifthe Series object contains NaN values, ___ attribute is displayed. (w) hasnans (W) nbytes (9 ndim (@) type 8, To display third element ofa Series object S, you will write ws) wsei 881 say &

You might also like