Open navigation menu
Close suggestions
Search
Search
en
Change Language
Upload
Sign in
Sign in
Download free for days
0 ratings
0% found this document useful (0 votes)
46 views
29 pages
Lesson - 3 - 1 Data Wrangling
Uploaded by
isarita
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Download
Save
Save Lesson_3_1 Data Wrangling For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
0 ratings
0% found this document useful (0 votes)
46 views
29 pages
Lesson - 3 - 1 Data Wrangling
Uploaded by
isarita
AI-enhanced title
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here
.
Available Formats
Download as PDF or read online on Scribd
Carousel Previous
Carousel Next
Download
Save
Save Lesson_3_1 Data Wrangling For Later
Share
0%
0% found this document useful, undefined
0%
, undefined
Print
Embed
Report
Download
Save Lesson_3_1 Data Wrangling For Later
You are on page 1
/ 29
Search
Fullscreen
Data Wrangling Agenda 1 this session, we will cover the following concepts with the help of a business use case © Data acquisition * Different methods for data wrangling ™ Merge datasets = Concatenate datasets = Identify unique values ™ Drop unnecessary columns = Check the dimension of the dataset = Check the datatype of the dataset = Check datatype summary = Treat missing values Validate correctness of the data in primary level if applicable What Is Data Wrangling? Data wrangling is the process of converting and formating data from its raw form to usable format further down the data science pipeline What Is the Need for Data Wrangling? Without feeding proper data into a model, one cannot expect a model that is dependable and gives higher accuracy. Problem Statement You are a junior data scientist and you are assigned a new task to perform data wrangling on ¢ set of datasets. The datasets have many ambiguities. You have to identify those and apply different data wrangling techniques to get a dataset for further usage. Dataset * Download the dataset_1 and dataset_2 from Course Resources and upload the datasets to the lab Data Dictionary Attribute Information: «date = date of the ride © season - 1 = spring, holiday - whether the day is considered a holiday « workingday - whether the day is neither a weekend nor holiday ‘weather - 1: Clear, Few clouds, Partly cloudy, Fartly cloudy = summer, 3 = fall, 4 = winter2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog © temp - temperature in Celsius « atemp - “feels like” temperature in Celsius «humidity - relative humidity + windspeed - wind spec «casual - number of non-registered user rentals initiatec «registered - number of registered user rentals initiatec ‘© count - number of total rentals Import libraries «Pandas is a high-level data manipulation tool « NumPy is used for working with multidimensional arrays import pandas as pd import numpy as np print (pd.show_versions()) INSTALLED VERSIONS conmit b5958ee1999e92ead1938c0bba2b6743788073d python 3.7.6.final.@ python-bits : 64 0s : Linux 0S-release + 4,4,0-210-generic Version +: #242-Ubuntu SMP Fri Apr 16 @9:57:56 UTC 2021 machine x86_¢ processor x86_64 byteorder little + None en_US.UTF-8 2 en_US.UTF-8 dateutil pip setuptools cython pytest hypothesis sphinx blosc feather xisxwriter Luml.etree htmls1ib pymysql psycopg2 Sinjaz pythonpandas_datareader: None bsa 5 bottleneck fsspec fastparquet acsfs matplotlib numexpr odfpy openpyx1 pandas_gbq pyarrow pytables pyxlsb s3fs scipy salalcheny tables tabulate xarray xird xlwt numba None Load the first dataset dataset_1 = pd.read_csv(‘dataset_1.csv') Observations: «We have to upload the dataset in the file explorer on the left panel of your lab We are reading the file through the dataset_1 variable The file is in CSV format We use the pd.read_csv() function to read a CSV file «We provide the exact path of the file within the round bracket () Check the type of dataset cecute the below command to understand type of data we are having type(dataset_1) pandas.core.frame.DataFrame Observations: The result shows that the dataset is DataFrame * Dataframe is a tabular structure consisting of rows and columns Shape of the datasetdataset_1.shape (610, 10) Observation: « The dataset_1 has 610 rows and 10 columns Print first 5 rows of the dataset dataset_1.head() instant dteday ‘season yr mnth hr holiday weekday weathersit temp 0 1 01-01-2017 10 10 False 6 1 024 1 2 01-01-2017 10 101 False 6 1 022 2 3 01-01-2017 10 102 False 6 7 022 3 4 91-01-2017 10 1030 False 6 7 024 4 5 01-01-2017 70 14 False 6 7 024 Observation: «The ‘dataset_1.head()' function displays only the initial five rows of the dataset. Load the second dataset ‘© Use the function carefully since it is an excel file dataset_2 = pd.read_excel(‘dataset_2.x1sx') Shape of the dataset dataset_2.shape (610, 8) Observi « The result shows that dataset_2 has 610 rows and 8 columns. Print first 5 rows of the dataset dataset_2.head() Unnamed: 0 instant atemp hum windspeed casual registered cnt 0 0 1 02878 081 oo 3 13 16 1 1 2 02727 o0.8C oo 8 32 40Unnamed: 0 instant atemp hum windspeed casual registered cnt 2 2 3 02727 080 a0 5 27 32 3 3 4 02879 075 oo 3 10 13 4 4 5 02879 075 oo 0 14 Observation: ‘* We can see a column named unnamed:@ , which is not in the data dictionary. Let's remove it Drop the column dataset_2 = dataset_2.drop(["Unnaned: @"], axis=1) Lets check the shape of the dataset again after the drop dataset_2.shape (610, 7) Observa ‘© We had 8 columns before the drop « When we check the shape of the fie after the drop, we see that the column Unnamed: @ has been dropped Top 5 rows of the dataset * Let's check the dataset 2 again dataset_2.head() instant atemp hum windspeed casual registered cnt ° + 02879 081 oo 3 316 1 2 02727 0.80 oo 8 32. 40 2 3 02727 0.80 oo 5 a 32 3 4 02879. 078 oo 3 13 4 5 02879 O78 00 0 sd Observation: * dataset_2 does not have Unnamed: @ column Merge the datasets «We have two datasets. They are dataset_1 and dataset_2‘* As both datasets have one common column ‘instant’, let's merge the datasets on that column «We are going to save the resultant data inside the combined data as shown below combined_data pd.merge(dataset_1, dataset_2, on='instant') Check the shape of combined dataset combined_data. shape (610, 16) Observation: ‘© The shape of the combined_data has 610 rows and 16 columns Top 5 rows of the combined dataset combined_data.head() instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum win ° + et 10 10 False 6 1 024 02879 081 1 2 or 100014 Fake é 1 022 02727 0.80 2 3 or so 2 False é 1 022 02727 080 3 4 Mot +o 3 é 1 024 02879 075 4 5 oer 10 1 4 False é 1 024 0.2879 075 7 » Now, load the 3rd dataset « The dataset is saved in s3 bucket, we are going to download the dataset 3 import pip Ipip install wget Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: wget in ./.local/lib/python3.7/site-packages (3.2) WARNING: You are using pip version 20.3.3; however, version 21.3.1 is available. You should consider upgrading via the '/usr/local/bin/python3.7 -m pip install --upg rade pip’ conmand. import weet url = ‘https: //datasciencetrack.s3.us-east-2.amazonaws.con/dataset_3.csv wget .download(url)‘dataset_3 (2).csv Observation: « twill download the file directly from the main server after importing the wget module «As shown above, we need to specify URL of the file only * You will see the downloaded dataset_3.csv on your lab's file explorer Import the dataset dataset_3 = pd.read_csv(‘dataset_3.csv') Check the shape of the dataset dataset_3.shape (390, 16) Top 5 rows of the dataset dataset_3.head() instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum winc 0 60 BO ot False 6 1 022 02273. 064 29-01 : ; « 3 16x 808 ° 2 False é 1 022 02273 064 0, . 2 6a 01 ° 3 False 6 1 020 02127 064 3 63 ON 0 4 False 6 1 016 0.1818 068 4 64 70" 10 1 6 False é 1 016 0.1818 064 < > Bottom 15 rows of the dataset * Justlike the hea¢ function, the tail function is used to see the bottom rows of the dataset + if you want to see the specific number of rows, then specify the number inside the bracket () as shown below dataset_3.tail(15)Instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum w 375995 402g gk 2 Base 036 03333 040 376 ong HOt gk 8 Kase 1 1 034 93182 046 377997, “402 10 2 4 false 1 + 032 93030 053 2011 ara ove SO gk Base 1 1 032 93030 053 ars og 0% gk ase 034 93030 046 3e0 1000 $0 5 go 7 False 034 93030 046 31 ot) SON 10 116 false 5 1 022 92727 080 sez 512 BO 117 ae 5 024 92424 075 ses 53 BO gg 1 18 ase 5 024 92273 075 ga oa BOE gg ease 5 2 02d 92s 075 3e5 ts “BOT 10 1.20 False 5 2 02d 92273 070 3e6 ote “BOT 10 121 False 5 2 022 92273 075 ser 517 BO gt 22 ase 5 024 92121 065 ga ote BO tg ase 5 024 92273 0.60 3g tg 0 10 1.0 false 6 1 022 9.1970 0.64 < > «The bottom 15 rows of the dataset_3 is shown above, as we mention 15 inside the bracket () « Here, we can see that the rows are not sorted well according to the instant number. Let's resolve it Sort values of a column # To sort the values per our will, we use the sort_values function and in the square brackets, we specify the name of the column by which we want to sort, as shown below dataset_3 = dataset_3.sort_values(by=[ ‘instant’ ])«Let's check head and tail to verify the sort operation dataset_3.head() instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum w 28-01 ser ott BO a1 16 ase 5 1 022 92727 080 si. 28-01 sez 512 BO gg 117 ase 5 024 92824 075 gig 2801 , : 3x63 BON gg 8 Fase 5 024 92273 075 seq stg 701 10) 119 false 5 2 024 d24e4 075 2011 355s SM 10 120 false 5 2 02d 92273 070 2011 < » dataset_3.tail() instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum w 14-02. 376 oe SO 1 gaa Fase 034 03182 046 37997 SO gk a Fase 1 1 032 03030 053 37a age SOP 10 2.5 fale 1 032 03030 953 ara ge Og gk Fase 034 03030 0.46 seo 100 0% 5g 7 Fale 034 03030 046 < > Concatenate the combine_data with dataset_3 + Let's concatenate both Dataframe combined _data and dataset.3 into a single DataFrame using the concat function, as shown below «Store the final DataFrame inside the final_data variable final_data = pd.concat([combined_data, dataset_3]) Check the shape of the new dataset final_data. shape (1000, 16)Observa « Now, the final_date has 1000 rows and 16 columns Let's diplay the columns of the final_data DataFrame final_data = final_data.rename(colunns=(‘dteday': ‘date’, ‘yr’: ‘year’, ‘nnth':*mont ‘weathersit':'weather', ‘hum': ‘humidity’, ‘¢ final_data.head() instant date season year month hour holiday weekday weather temp atemp humidity or 0 + oF 1 0 1 2011 False 6 1024 02879 ogi or 1 2 oF 1 0 1 False 6 1 022 02727 oat 2011 a 2 3 or 10 2 False 6 1022 02727 oe 2011 or 3 4 o- 1 0 13 False 6 1 024 02879 O75 2011 ar 4 5 or 10 4 False 6 1 024 02879 075 2011 Data types of different column values final_data.dtypes instant inte date object season intea year inte4 mnonth inte hour intea holiday bool weekday inte4 weather intea temp floated atemp floated humidity floated windspeed _floate4 casual intea registered inte4 count: intea dtype: object Observations:« We can see that the majority of our data columns are of type int64. They are therefore 64- bit integers. Some of the columns are of the type float64, which implies that they have decimals in them. However, only the date column has an object type, indicating that it contains strings Check for null values ‘© Execute the given command to check the unknown values in the DataFrame final_data.isna() instant date season year month hour holiday weekday weather temp atemp humidi 0 False False False False False False False False ~— False False False Fal 1 false False False False False False False. = False. ~— False False. | False Fal 2 False False False False False False False False ~— False False False Fal 3 false False False False False False False = False False False Fal 4 False False False False False False False False ~— False False False Fal 376 false False False False False False False ‘False. ~— False False «False = Fal 377 false False False False False False False —- “False. ~— False False. False = Fal 378 false False False False False False False ~—False False False Fal 379 false False False False False False False False False False Fal 380 false False False False False False False False False False Fal 1000 rows x 16 columns < » Observation: « The isna() function returns DataFrame of Boolean values that are True for null values « Ina huge dataset, the code given above is not going to help ‘© We do not get enough idea of the null values by looking at the given tabular dataset ‘© The next line of code is more convenient in this case. final_data.isna().sum(axis=0) instant date season year mnonth hour holiday weekday weather tempatemp 1 humidity windspeed casual registered count dtype: intes Observations: «The isna().sum(axis=@) function provides a clear picture of the number of null values ina DataFrame «In the given result, we can see that the atemp column has 11 null values Let's check the percentage of the rows with mi ing value «We are performing this operation to determine whether the NA value rows can be dropped off or not so that we cannot deviate from our desired model percentage_of_missing values = (final_data[‘atenp" ].isna().sum(axis=0)/final_data.sh percentage_of missing values 1.0999999999999999 Observations: «We divide the number of null values by the shape of the DataFrame to get the percentage of missing values «Since the percentage is 1, itis very less. Usually, the industry practice allows us to drop rows up to 30%, So, we can drop the rows with missing values. Drop the rows with jing values * Weill use the dropna function to drop the null value rows final_data = final_data.dropna(axis-2) final_data. shape (989, 16) Observations: © We can see that the shape of the DataFrame reduced to 989 from 1000. It shows that the missing value rows have been wiped off ‘In further lessons of this course, we'll see different methods to treat missing values. Now, let's again check the missing value count after the drop final_data-isna() .sum(axis=0)instant date season year mmonth hour holiday weekday weather temp atemp humidity windspeed casual registered count: dtype: intea Perform sanity checks on the dataset © It verifies the logical correctness of the data points Check if casual + registered is always equal to count np-sun(final_data['casual'] + final_data[‘registered’] - final_data['‘count'] e Month values should be in the range of 1-12 * Weill use the unique() function to find the elements of an array np-unique(#inal_data.month) array((1, 2]) Hour should be in the range of 1-24 np-unique(final_data.hour) array([ @, 1, 2, 3, 4 5, 6 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]) Print the statistical summary of the data © We will use the describe() function to see the stastical summary of the dataset print (final_data.describe()) instant season year nnonth hour weekday \ count 989.0000 989.@ 989.0 989.00000@ 989.¢00000 989.0000 mean §05.622851 1.8 © @.@ 1.315478 11.753286 2.991911 std 286.274765 0.80.8 (0.464938 6.891129 2.084727 min 1.000002 © 1.0.8 © 1.080000 © 8.000000 0.000000 25% -259.000000 «1.88.8 1.000000 © 6.000008 © 1.000000 So% 506.000000 © «1.88.8 1.000000 © 12.0000 © 3.00000075% _753.000000 «1.8 8.8 2.000000 © 18.000000 © 5.000000 max 1000.080000 «1.8 8.8 2.000008 © 23.000000 © 6.000000 weather tenp atemp humidity _windspeed casual \ count 989.000000 989.0000 989.000000 989.000000 989.000000 989.0000 mean 1.479272 @.2@4712 «211958 «8.581769 0.194609 4.921132 std 9.651085 9.077789 «076703 0.187786 9.129225 7.666231 min 1.000000 8.020080 0.000000 © 9.210000 8.000000 0.000000, 25% 1.000000 0.160080 0.166700 0.440000 .104500 0.000000 50% 1.000000 0.200008 8.212100 0.550000 © 0.164200 3.200000 75% 2.000000 0.240008 0.257600 0.700000 8.283600 6.000000 max 4.000000 8.460000 0.454500 1.000000 © 0.582100 62. 000000 registered count count 989.000000 989.000000 mean 53,689585 58.610718 std 48019224 51.120572 min 2.200000 © 1.200000 25% 15.000000 16.0000 5e% 46.000000 © 50.000000 75% 75.090000 © 84.000000 max 247,000000 249.0000 Note: We have seen almost all the methods of data wrangling, now let's see explicitly outlier detection and removing Import the Libraries «Apart from the Pandas and NumPy, this time we are also calling Scikit-learr ‘* Scikit-learn (sklearn) is an open-source module that has some inbuilt datasets, like boston and iris ‘Each dataset has a corresponding function used to load the dataset ‘© These functions follow the same format: “load_DATASET()", where DATASET refers to the name of the dataset «We are importing two datasets from sklearn.datasets in the cell below import nunpy as np import pandas as pd from sklearn.datasets import load_boston, load_iris Load the Data « Since these datasets are directly importing from Scikit Learn, the load functions (such as load_boston0) do not return data in the tabular format ‘© The data is stored in the form of keys (words) and values (definition) like the dictionary structure « Let's load the dataset and store it in a variable called bostor « Now, we are going to call the keys for boston dataset print (boston. keys())boston = load_boston() #Find the dic keys print (boston. keys()) dict_keys([‘data’, ‘target’, ‘feature names’, ‘DESCR', ‘filename']) Observations: «We get the keys such as data, target, feature_names, DESCR, and filename The first two keys ‘data’ and ‘target’ has the only actual data, rest serve a descriptive ourpose «datz has all the input features of the dataset in a NumPy array and target has the output feature based on which we do the prediction. target is in the NumPy array « feature_names has all the column names of the dataset in a NumPy array and DESCR is the description of the dataset filename that has the file path in CSV format Find features name * Let's see the columns in the dataset #Pind features and target x = boston.data y = boston. target colunns = boston. feature_names colunns array(['CRIM", 'ZN', ‘INDUS’, "CHAS", ‘NOX', 'RM', AGE", ‘DIS", ‘RAD', “TAX', "PTRATIO', ‘BY, ‘LSTAT'], dtype="
2 a 6 3 0 2 Ds, IQR (Interquartile range ) technique for outlier treatment def outlier_treatment (col): sorted(col) Q1,03 = np.percentile(col , [25,75]) IQR = Q3 - Qt Jower_range = QL - (1.5 * IQR)upper_range = Q3 + (1.5 * IQR) return lower_range,upper_range Observations: «In this technique, we divide our dataset under percentiles like 25th and 75th of a sample «After that, we find the IQR between these two percentiles + Now, to remove the outliers, we calculate the lower and upper range by using the given formula ‘All the values which are beyond these ranges are considered outliers and must be removec lower_range, upper_range = outlier_treatment (boston_df['DIS" ]) print ("Lower Range:” ower_range) print ("Upper Range:",upper_range) Lower Range Upper Range -2.5322000000000005 9.820800000000002 Observation: We have calculated the lower and upper ranges for DIS feature of our boston_df lower_boston_df = boston_df[boston_df["DIS"].values < lower_range] lower _boston_df CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT MEDV Let us show the values which are beyond upper and lower range in our dataset upper_boston_df = boston_d#[boston_df["DIS"].values > upper_range] upper_boston_df CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO 8 _LSTA’ 351 0.07950 600 169 00 0411 6579 359 107103 40 411.C 37078 54 352 0.07244 600 169 00 0411 5.884 185 107103 40 411.C 39233 77 353 0.01708 900 202 90 D410 6728 36.1 12.1265 50 1870 38446 45) 354 0.04301 800 19° 00 D412 5.662 219 105857 40 334 22.0 38280 80 355 0.10659 800 19° 00 D412 5936 195 105857 40 334C 22.0 37604 5.5 < » Observations: «There are no rows that are lesser than the lower range, but there are five rows that are greater than our upper range lower_outliers = lower_boston_df.value_counts().sum(axis=0)upper_outliers = upper_boston_df.value_counts().sum(axis=0) total_outliers = lower_outliers + upper_outliers print("Total Number of Outliers:",total_outliers) Total Number of Outliers: 5 Observation: «With the given code, we are summing up the total number of outlier rows Let us list down the row numbers that contain outliers: ower_index = 1ist(boston_df{ boston_d¢['DIS"] < lower_range ]-index) upper_index = list (boston_df{ boston_df['DIS"] > upper_range ].index) total_index = list (lower_index + upper_index) print (total_index) (351, 352, 353, 354, 355] Drop the outlier rows print("Shape Before Dropping Outlier Rows:", boston_df.shape) boston_dF.drop(total_index, inplace = True) print ("Shape After Dropping Outlier Rows:", boston_df.shape) Shape Before Dropping Outlier Rows: (506, 14) Shape After Dropping Outlier Rows: (581, 14) Observation: « In the given code, we checked the shape of the dataset before and after dropping outliers rows You can see that the rows before dropping the outliers were 506 and after dropping it became 501. Thus, we have successfully dropped the unwanted rows print (boston_éf.mean()) cRIM 3.648951, N 10.738523 INDUS 31.229521 CHAS 0.269860 NOx. 0.556123 RM, 6.285898 AGE 68.996008 Drs 3.723699 RAD 9.602794 TAK 408.964072 PTRATIO —_18.4agi¢ 8 356.428443 LSTAT 12.716667 2.534938 floated
You might also like
CH-6 Data Loading, Storage, and File Formats
PDF
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
DAP Module4 Notes
PDF
No ratings yet
DAP Module4 Notes
17 pages
Part A Assignment 6
PDF
No ratings yet
Part A Assignment 6
28 pages
Data Wrangling and Analysis
PDF
100% (1)
Data Wrangling and Analysis
36 pages
Python For DS Cheat Sheet
PDF
100% (2)
Python For DS Cheat Sheet
6 pages
Cheat Sheet: Python For Data Science
PDF
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
01-Numpy & Pandas
PDF
No ratings yet
01-Numpy & Pandas
69 pages
Python Notes by Prof T
PDF
No ratings yet
Python Notes by Prof T
10 pages
DSBDAL
PDF
No ratings yet
DSBDAL
87 pages
Unit Iv
PDF
No ratings yet
Unit Iv
63 pages
Pandas
PDF
No ratings yet
Pandas
13 pages
Cheat Sheet
PDF
No ratings yet
Cheat Sheet
10 pages
Chapter2 - Data Wrangling
PDF
No ratings yet
Chapter2 - Data Wrangling
48 pages
Unit 4 - Working With Graphs - Python
PDF
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
41b Data Wrangling, Grouping and Aggregation
PDF
No ratings yet
41b Data Wrangling, Grouping and Aggregation
31 pages
Python Libraries Cheat Sheets
PDF
No ratings yet
Python Libraries Cheat Sheets
6 pages
Task2 Eda Cleaning
PDF
No ratings yet
Task2 Eda Cleaning
33 pages
Justenoughpython Pandas 220915 175329
PDF
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Group-3 Report
PDF
No ratings yet
Group-3 Report
38 pages
Chapter 3
PDF
No ratings yet
Chapter 3
47 pages
Pandas Course Slides
PDF
No ratings yet
Pandas Course Slides
90 pages
Data Preprocessing
PDF
No ratings yet
Data Preprocessing
84 pages
Pandas
PDF
No ratings yet
Pandas
94 pages
Introduction To Data Wrangling
PDF
No ratings yet
Introduction To Data Wrangling
22 pages
04-Data Manipulation With Pandas
PDF
No ratings yet
04-Data Manipulation With Pandas
28 pages
Python CSBS Bhavya Lab Manual
PDF
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
PDF
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Introduction To Pandas in Data Analytics
PDF
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
Common Python Data Science Interview Questions1
PDF
No ratings yet
Common Python Data Science Interview Questions1
5 pages
MLStack Cafe 2
PDF
No ratings yet
MLStack Cafe 2
11 pages
Overview of Data Cleaning
PDF
No ratings yet
Overview of Data Cleaning
17 pages
ANL252 SU4 Jul2022
PDF
No ratings yet
ANL252 SU4 Jul2022
55 pages
Performing Analysis of Meteorological Data: Punam Seal
PDF
No ratings yet
Performing Analysis of Meteorological Data: Punam Seal
21 pages
Python (Unit - 2)
PDF
No ratings yet
Python (Unit - 2)
22 pages
DA0101EN-Review-Introduction - Jupyter Notebook
PDF
No ratings yet
DA0101EN-Review-Introduction - Jupyter Notebook
8 pages
Data Cleaning
PDF
No ratings yet
Data Cleaning
13 pages
Dataframing in CSV
PDF
No ratings yet
Dataframing in CSV
14 pages
Yash Week 3 Uber Case Study
PDF
No ratings yet
Yash Week 3 Uber Case Study
38 pages
WorkingWithData - Ipynb - Colaboratory
PDF
No ratings yet
WorkingWithData - Ipynb - Colaboratory
13 pages
Lab1.ipynb - Colaboratory
PDF
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
Python For DS Unit4
PDF
No ratings yet
Python For DS Unit4
11 pages
Working With Data in Python
PDF
No ratings yet
Working With Data in Python
5 pages
Recurrent Neural Network-Programs
PDF
No ratings yet
Recurrent Neural Network-Programs
9 pages
PDS Exp 7 To 9
PDF
No ratings yet
PDS Exp 7 To 9
10 pages
Part A Assignment - No - 1
PDF
No ratings yet
Part A Assignment - No - 1
7 pages
CH 3 2
PDF
No ratings yet
CH 3 2
17 pages
Cheat Sheet: Python For Data Science
PDF
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Pandas DataFrame Notes
PDF
100% (1)
Pandas DataFrame Notes
10 pages
Lab 7
PDF
No ratings yet
Lab 7
6 pages
Lesson - 1 Course Introduction Data Science Foundation
PDF
100% (1)
Lesson - 1 Course Introduction Data Science Foundation
26 pages
Pandas DataFrame Notes
PDF
No ratings yet
Pandas DataFrame Notes
13 pages
Data Exploration Preparation
PDF
No ratings yet
Data Exploration Preparation
12 pages
Lab 7
PDF
No ratings yet
Lab 7
6 pages
Exercise 7 - Pandas
PDF
No ratings yet
Exercise 7 - Pandas
2 pages
Data Wrangling With Python and Pandas
PDF
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
PDF
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrameObject
PDF
No ratings yet
Pandas DataFrameObject
4 pages
Pandas Cheat Sheet Final
PDF
No ratings yet
Pandas Cheat Sheet Final
1 page
Lesson - 2 Introduction To Data Science
PDF
No ratings yet
Lesson - 2 Introduction To Data Science
29 pages
Project Idea Development Note
PDF
No ratings yet
Project Idea Development Note
3 pages
Application Development Framework
PDF
No ratings yet
Application Development Framework
23 pages