0% found this document useful (0 votes)
51 views11 pages

Bai2 Data - Pandas

Uploaded by

Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
51 views11 pages

Bai2 Data - Pandas

Uploaded by

Thanh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 11
1221, 031012023 1 Untitlest import numpy as np import pandas as pd # PHAN DATAFRAMES - TUONG TY DUNG EXCEL # data frane thé hign cdu trite dang bing nhw excel sheet # c6 rows va columns # MBi column Li 1 Series # Khéi tao # Tit dictionary d={ "name": ["John", "Bob", "Jane"], “age” :[18,20,30], "edu" :["BS", "MS", "BS"] 1d.DataFrane(d) of name age edu Q Johr 18 8S 1 Bob 20 MS 2 Jane 30 3S # trubng hgp thém 1 Jack vao nhung ko khai tudi va edu thi nd sé Ldi # c6 1 cach La dung NaN missing value cho né ce "name": ["John” "Jane", "Jacl “age” :[18,20,30,np.nan], "edu": ["BS","MS","BS", np.nan] 1, 1d.DataFrame(d) name age edu © Johr 180 BS 1 Bob 200 MS 2 Jane 300 BS 3 Jack NaN NaN # Khdi tgo tir 2d List - List 2 chi@u 18,"BS"),, »20,"MS*), ("Jane*, 38, "BS") 1 f=pd.DataFrame(1) of localhost B888inbconvorthtmllHOC PYTHONIUntiedtjpynd?download=false amt 1221, 0310172028 Untitlest o1 2 © Johr 18 Bs 1 Bob 20 MS 2 Jane 30 BS ("John",18,"Bs"),, ("Bob", 28, "MS"), ("Jane",38, "BS") ] df=pd.DataFrame(1, columns=["Nane" "Age", "Edu"), index=[1,2,3]) af Name Age Edu 1 Johr 18 3S. 2 Bob 20 MS 3 Jane 30 38 A # 3.2 BASIC OPERATIONS f=pd.read_excel ("invoice.x1sx") A # B / INSPECT pd.set_option( 'display.max_rows' ,500) dF. head(2) Invoice StockCode Description Quantity WHITE HANGING (0 536365 85123A HEART T-LIGHT 6 OLDER : WHITE METAL 1 53636571053 Panccal 6 n [10]: df tai (2) Invoice StockCode Quantity BAKING SET 9 541908 581587 22138, PIECE a RETROSPCT 541909 581587 POST POSTAGE 1 # tuy nhién nd sé khdng cé tén cot va né ra 812 dat tén cot cho nd , tién thé dat st thanh 1 2 3 Ludn InvoiceDate Price 2010-12-01 08:26:00 2010-12-07 08:26:00 InvoiceDate 2011-12-08 12:50:00 2011-12-08 12:50:00 n [11]: # Mudn biét file data nay cé bao dong, bén excel phdi kéo # bén day cht easy ding .shape . nd tré vé s6 hang, sé cot localhost B888inbconverthtmliHOC PYTHONIUntiedt pynd?download=false Price 495 18.00 Customer 1D 17850.0 17850.0 Customer 1D 126800 126800 Country Unitee Kingdom Unitee Kingdom Country France France amt 1221, 031012023 Untitlest df. shape . (541910, 8) In [12]: # Két qué 541910 hang va 8 cot. # 56 dong cia df Gf. shape[ 0] Saisie In [13]: # 86 cot f. shape[] 8 In [14]: # Kiém tra ten c6t columns name vi 1 cét La seerries nhung hon 2 cét => Datafrane # mudn Lay nhidu cOt phdi cé List tén cot FI ["Country", "Invoice"]].head() Country Invoice United Kingdom 536365 1. United Kingdom 536365, 2 United Kingdom 536365 3. United Kingdom 536365 4 United Kingdom 536365 # Gid sir data cé tdi ngan cOt, khong thé go ra hét duyc # Chon nhiéu cOt recommended dung .Loc # df.Loc[a,b] thi nd cé 2 thanh t6 a La hang b La c6t. Do dang Loc cét nén # truéc dau , ché a ta dé dau df Loc[ :, ["Price", "Invoice" }].head() Price Invoice G 258 536365 1 330 536365, 275 536365 3.39 536365 3.39 536365 # tei sao phéi ding Loc. vi nd cho phép ta Loc theo hang nto # Vi dy cdi ndo dudi 5 a6 ko Léy GF Loc[df["Price™]>5,["Invoice","Price"]] # 6 day ta cé df["Price”}>5 La Loc # gid nao Lén hon 5. Nb La phan a trude dau phdy trong céu tric df. Loc[a,b] localhost B888inbconvorthtmllHOC PYTHONIUntiedtjpynd?download=false mt 1221, 031012023 20 a 541861 541878 541892 541893 541909 87993 rows x 2 columns In [40]: | # trutmg hop muén chon nhiu cot Lién tiép ma Luvi bdm tay co thé slice dF oc[df[ "Price" ]>5, "Invoice": Invoice 536365 536367 536367 536367 536367 581580 581585 581586 581586 581587 Price 768 995 595 595 798 575 595 9s 7.08 18.00 #tdi cdi cOt Price # Luu ¥ La khéng 66 né trong List nia Untitlest 'Price"] #trong day La né sé Ldy cot localhost B888inbconvorthtmllHOC PYTHONIUntiedtjpynd?download=false ant 1221, 0310172028 20 a B 541861 541878 541892 541893 541909 Invoice StockCode 536365 536367 536367 536367 536367 581580 581585 581586 581586 581587 22752 22622 21754 21758 207 7932" 23356 21217 20685 Post 87993 rows x 6 columns In # nay h La read théi, gid La Léi nd ra va Untitlest Description Quantity SET 7 BABUSHKA NESTING BOXES BOX OF VINTAGE ALPHABET BLOCKS HOME BUILDING BLOCK WORD LOVE BUILDING BLOCK WORD RECIPE BCX WITH METAL HEART CHILL LIGHTS LOVE HOT WATER BOTTLE RED RETROSPOT ROUND CAKE TINS DOORMAT RED RETROSPCT POSTAGE stta nd write nd. 2 4 InvoiceDate 2010-12-01 08:26:00 2010-12-01 08:34:00 2010-12-01 08:34:00 2010-12-01 08:34:00 2010-12-01 08:34:00 2011-12-08 12:20:00 2011-12-09 123100 2011-12-08 12:49:00 2011-12-08 12:49:00 2011-12-08 12:50:00 # vi dy dé ddi gid hang hoa , dé thay déi sé Lugng hang dang cén trong kho df ["Quantity"]* 10 541905 541906 541907 541908 541909 60 60 80 6 60 68 40 40 30 10 Name: Quantity, Length: 541910, dtype: intea Ir # 6 trén ta chi nhdn 19 thoi nhung nd khong cé update mi update thi phdi # gon ngugc né Loi GF[ "Quantity" ]=dF[ "quantity" ]*10 In dF ("Quantity") .head(5) localhost 8888inbconverthtmllHOC PYTHONIUntiledt pynb?download=false Price 765 9.95 598 595 798 578 595 3.95 7.08 18.00 ant 1221, 0310172028 1 68 1 60 2 8 3 60 4 60 Name: Quantity, dtype: inte4 A # Ket qué da update roi # Vo cdi nay khéng dnh hudng gi téi file excel géc nhé In [45]: | # Bay gig ép kiéu danh cho date tine df. columns. tolist () “Invoice’, *stockCode’ “Description' , ‘Quantity’, “InvoiceDate’, "Price’, "Customer 1D", “Country"] In dF ["InvoiceDate” ] head(4) 3 2010- @ 2010-12-01 08:26:00 1 2010-12-01 08 2 2010-12-01 08:26:00 a1 0 Name: InvoiceDate, dtype: datetimeé4[ns] In ATi cdi Invotc @ 1 2 3 4 541905 541906 541907 541908 541909 201¢ 201¢ 201¢ 2e1e 201¢ 2e11 2011 2011 2011 2e11 Untitlest Date ¢ trén trich xudt phiin ndm thang df ["InvoiceDate" ].dt.year Name: InvoiceDate, Length: 54191@, dtype: inte Ir df ["Invoicebate" ].dt.month 541905 541906 541907 541908 541909 2 2 2 2 2 2 2 2 2 2 Name: InvoiceDate, Length: 54191@, dtype: int64 localhost 8888inbconverthtmllHOC PYTHONIUntiledt pynb?download=false son 1221, 031012023 In [50 In [ Untitlest #hodc xudt chudi ngdy théng ném va Luu y La ép kiéu nay sé bién #Cdi kiéu dtype ban dau La datetime thanh object dF ["InvoiceDate" ].dt.strftime("%d/Xn/X¥") #strftime La string from time e 01/12/2016 1 1/12/2016 2 1/12/2016 3 1/12/2016 4 e1/12/201€ 541985 09/12/2011 541986 09/12/2011 541987 09/12/2611 541988 09/12/2011 541989 09/12/2011 Name: InvoiceDate, Length: 541910, dtype: object # Thém cot moi #0on gidn nhu dictionary # Gis ta sé tao 1 c6t revenue tir gid bén x quantity GF ["Revenue") = df[ "Price" }*d#["Quantity" ]

You might also like