0% found this document useful (0 votes)
3 views

Lab9

The document outlines a data warehousing project using Python, focusing on data cleaning and transformation techniques. It includes steps for handling missing values, removing duplicates, creating new columns, and visualizing data distributions. The visualizations cover age group distributions, average purchase amounts by country, and sign-ups by month, utilizing libraries such as pandas and seaborn.

Uploaded by

shapparhay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
3 views

Lab9

The document outlines a data warehousing project using Python, focusing on data cleaning and transformation techniques. It includes steps for handling missing values, removing duplicates, creating new columns, and visualizing data distributions. The visualizations cover age group distributions, average purchase amounts by country, and sign-ups by month, utilizing libraries such as pandas and seaborn.

Uploaded by

shapparhay
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 3
Submitted By. v Minal Fatima 21-CP-55 Lab#o9 Basics of Data Warehousing using Python Anport pandos 25 94 {ingot nunpy 25 99 noort natplotlibpyplot as plt mport Seabor as sas 41. bata Upload 4 Load the SV file (ata = pa.read_csv( sample data.csv") print("initial bata:\n", eata.nead()) Print "\ntissing Valuess\o", data-Serull()-run0)) 1 Ranove duplicates (if any) gata = data.crop_duplicates() 4 FATT missing values (4¢ any) with appropriate values atal age" ]-Ft2Ina(datal "Age" Jsnedian(), inplacesTrue) ‘atal Purenase_Anount']-fillna(éatal ‘Purchase Ancurt'].wean(), nplace-Trve) 13, pata Transformation 1 Create a new colunn “Age Group” based on age atal ‘Age Group") = pd.cut(eatal Age" J, i 7, 25, 98, 45, 6b], Labelse( "8-25", °26-35°, 36-45", °46-60")) 1 convert ‘Signup_bate’ to datetine and extract year and nonth ‘ata[ 'stgnup_oate’] ~ pé.to_datetine(detal signup Date" }) ‘ata[ 'Signop_ veer] = eata["Signup Pate’ ].ct.year ‘atal ‘Signup Month’) = dsta{'Sigmup_pate'].at.nonth 4 isplay transferred dota Print("\ntransforned Datas\n", data-hesa()) 44, vseuolization 9 Age Group aisteibution pltstigure(Fgsize-(28, 5)) ns. countplot(xe'age Group’, datasdata, paletten'victdts') pltstitie(‘Age Group Distrievtion") pies stou() ‘purchase Arount by Country plt-tigure(figsize-(28, 5)) fns.barplot{ae"Country", y-"Purchase_Anount', dataadata, estinatoranp.nean, palettes‘ coolnarn') PIe-tatLe(“Average Purchase Amount by Country") pit. stou() 1 signup Month asstesbutson plt.tigure(Figsizes(28, 5)) fns.countplott ple esede( Sig pits show() pastel) TO Nore Age Gender Country. Purchase_trount Signup. ate © 1 User t 38 Won-binary Australia en.82 12 User2 22 Moncbinary USA 196.7 23 Ueera3 di Femate Germany 52.35 2020-05-22 24 usera 57 male u 52.78 2021-03-20 45 Users 37 Female Germany 6.61 2021-08-26 fissing values nee Sender * country ° Sigup.dte 8 aeypet ants ‘Transtornee oata: 1D Nowe Age Gender Country. Purechase_trount @ 1 Userst 18 Won-binary Australia 3.82 12 User 221 Non-binary USA es.73 23 user's ba Female Germany 52.35 bt Usera 5 “Yale ve 5278 4S Users 27 Female Germany e.61 2071-08-26 ‘ge_Group Signup Year Signup North e382 2020 2 2 4e80 za 2 4638 gen. * “loython-input-2-27824807e506:20: Futurekarning: A value 1s trying to be Set on a copy of a DataFrate er Series through chained as: ‘The behavior will change in pandas 3.8. This inplace nethod will never work because the Internediate ebject on walch we are setting For exanple, wen doing “sf[col].nethod(value, inplace-True)", try using “ef methed((col! value}, inplacestrue)’ or df[col] = aco ataf age" ).f11Ina(datal "Age" J-necian(), inplacestrue) loython-tnput-2-aPszasi7eS0">:20: Futurekarning: A value 35 trying to be set on a copy of a UstaFrane or Series through chained 25+ ‘ihe behavior will change in pandas 3.2. This inplace nethod will never work because the Intermedtate cbect on wen we are setting For exanple, wnen doing 'ef[col].nethod(value, inplacertrue)', try using ‘efsmethod({col: value), inglaceetrve)’ or d¥[col] = a¥[co! otal Purchase Amount" }-#1lina(datal‘Puschase_smount].nean(), snplacesTrve) “lpytton-tnput--a7824807050°9237" Futurekarning: sns.countplot(an"Age_Srovp', datavcata, paletter'viridis") ‘Age Group Distribution count 2625 2635 30°00 age_cr0up

You might also like