Lesson - 3 - 1 Data Wrangling

Uploaded by

isarita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

46 views29 pages

Lesson - 3 - 1 Data Wrangling

Uploaded by

isarita

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 29

Data Wrangling Agenda 1 this session, we will cover the following concepts with the help of a business use case © Data acquisition * Different methods for data wrangling ™ Merge datasets = Concatenate datasets = Identify unique values ™ Drop unnecessary columns = Check the dimension of the dataset = Check the datatype of the dataset = Check datatype summary = Treat missing values Validate correctness of the data in primary level if applicable What Is Data Wrangling? Data wrangling is the process of converting and formating data from its raw form to usable format further down the data science pipeline What Is the Need for Data Wrangling? Without feeding proper data into a model, one cannot expect a model that is dependable and gives higher accuracy. Problem Statement You are a junior data scientist and you are assigned a new task to perform data wrangling on ¢ set of datasets. The datasets have many ambiguities. You have to identify those and apply different data wrangling techniques to get a dataset for further usage. Dataset * Download the dataset_1 and dataset_2 from Course Resources and upload the datasets to the lab Data Dictionary Attribute Information: «date = date of the ride © season - 1 = spring, holiday - whether the day is considered a holiday « workingday - whether the day is neither a weekend nor holiday ‘weather - 1: Clear, Few clouds, Partly cloudy, Fartly cloudy = summer, 3 = fall, 4 = winter2: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist 3: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds 4: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog © temp - temperature in Celsius « atemp - “feels like” temperature in Celsius «humidity - relative humidity + windspeed - wind spec «casual - number of non-registered user rentals initiatec «registered - number of registered user rentals initiatec ‘© count - number of total rentals Import libraries «Pandas is a high-level data manipulation tool « NumPy is used for working with multidimensional arrays import pandas as pd import numpy as np print (pd.show_versions()) INSTALLED VERSIONS conmit b5958ee1999e92ead1938c0bba2b6743788073d python 3.7.6.final.@ python-bits : 64 0s : Linux 0S-release + 4,4,0-210-generic Version +: #242-Ubuntu SMP Fri Apr 16 @9:57:56 UTC 2021 machine x86_¢ processor x86_64 byteorder little + None en_US.UTF-8 2 en_US.UTF-8 dateutil pip setuptools cython pytest hypothesis sphinx blosc feather xisxwriter Luml.etree htmls1ib pymysql psycopg2 Sinjaz pythonpandas_datareader: None bsa 5 bottleneck fsspec fastparquet acsfs matplotlib numexpr odfpy openpyx1 pandas_gbq pyarrow pytables pyxlsb s3fs scipy salalcheny tables tabulate xarray xird xlwt numba None Load the first dataset dataset_1 = pd.read_csv(‘dataset_1.csv') Observations: «We have to upload the dataset in the file explorer on the left panel of your lab We are reading the file through the dataset_1 variable The file is in CSV format We use the pd.read_csv() function to read a CSV file «We provide the exact path of the file within the round bracket () Check the type of dataset cecute the below command to understand type of data we are having type(dataset_1) pandas.core.frame.DataFrame Observations: The result shows that the dataset is DataFrame * Dataframe is a tabular structure consisting of rows and columns Shape of the datasetdataset_1.shape (610, 10) Observation: « The dataset_1 has 610 rows and 10 columns Print first 5 rows of the dataset dataset_1.head() instant dteday ‘season yr mnth hr holiday weekday weathersit temp 0 1 01-01-2017 10 10 False 6 1 024 1 2 01-01-2017 10 101 False 6 1 022 2 3 01-01-2017 10 102 False 6 7 022 3 4 91-01-2017 10 1030 False 6 7 024 4 5 01-01-2017 70 14 False 6 7 024 Observation: «The ‘dataset_1.head()' function displays only the initial five rows of the dataset. Load the second dataset ‘© Use the function carefully since it is an excel file dataset_2 = pd.read_excel(‘dataset_2.x1sx') Shape of the dataset dataset_2.shape (610, 8) Observi « The result shows that dataset_2 has 610 rows and 8 columns. Print first 5 rows of the dataset dataset_2.head() Unnamed: 0 instant atemp hum windspeed casual registered cnt 0 0 1 02878 081 oo 3 13 16 1 1 2 02727 o0.8C oo 8 32 40Unnamed: 0 instant atemp hum windspeed casual registered cnt 2 2 3 02727 080 a0 5 27 32 3 3 4 02879 075 oo 3 10 13 4 4 5 02879 075 oo 0 14 Observation: ‘* We can see a column named unnamed:@ , which is not in the data dictionary. Let's remove it Drop the column dataset_2 = dataset_2.drop(["Unnaned: @"], axis=1) Lets check the shape of the dataset again after the drop dataset_2.shape (610, 7) Observa ‘© We had 8 columns before the drop « When we check the shape of the fie after the drop, we see that the column Unnamed: @ has been dropped Top 5 rows of the dataset * Let's check the dataset 2 again dataset_2.head() instant atemp hum windspeed casual registered cnt ° + 02879 081 oo 3 316 1 2 02727 0.80 oo 8 32. 40 2 3 02727 0.80 oo 5 a 32 3 4 02879. 078 oo 3 13 4 5 02879 O78 00 0 sd Observation: * dataset_2 does not have Unnamed: @ column Merge the datasets «We have two datasets. They are dataset_1 and dataset_2‘* As both datasets have one common column ‘instant’, let's merge the datasets on that column «We are going to save the resultant data inside the combined data as shown below combined_data pd.merge(dataset_1, dataset_2, on='instant') Check the shape of combined dataset combined_data. shape (610, 16) Observation: ‘© The shape of the combined_data has 610 rows and 16 columns Top 5 rows of the combined dataset combined_data.head() instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum win ° + et 10 10 False 6 1 024 02879 081 1 2 or 100014 Fake é 1 022 02727 0.80 2 3 or so 2 False é 1 022 02727 080 3 4 Mot +o 3 é 1 024 02879 075 4 5 oer 10 1 4 False é 1 024 0.2879 075 7 » Now, load the 3rd dataset « The dataset is saved in s3 bucket, we are going to download the dataset 3 import pip Ipip install wget Defaulting to user installation because normal site-packages is not writeable Requirement already satisfied: wget in ./.local/lib/python3.7/site-packages (3.2) WARNING: You are using pip version 20.3.3; however, version 21.3.1 is available. You should consider upgrading via the '/usr/local/bin/python3.7 -m pip install --upg rade pip’ conmand. import weet url = ‘https: //datasciencetrack.s3.us-east-2.amazonaws.con/dataset_3.csv wget .download(url)‘dataset_3 (2).csv Observation: « twill download the file directly from the main server after importing the wget module «As shown above, we need to specify URL of the file only * You will see the downloaded dataset_3.csv on your lab's file explorer Import the dataset dataset_3 = pd.read_csv(‘dataset_3.csv') Check the shape of the dataset dataset_3.shape (390, 16) Top 5 rows of the dataset dataset_3.head() instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum winc 0 60 BO ot False 6 1 022 02273. 064 29-01 : ; « 3 16x 808 ° 2 False é 1 022 02273 064 0, . 2 6a 01 ° 3 False 6 1 020 02127 064 3 63 ON 0 4 False 6 1 016 0.1818 068 4 64 70" 10 1 6 False é 1 016 0.1818 064 < > Bottom 15 rows of the dataset * Justlike the hea¢ function, the tail function is used to see the bottom rows of the dataset + if you want to see the specific number of rows, then specify the number inside the bracket () as shown below dataset_3.tail(15)Instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum w 375995 402g gk 2 Base 036 03333 040 376 ong HOt gk 8 Kase 1 1 034 93182 046 377997, “402 10 2 4 false 1 + 032 93030 053 2011 ara ove SO gk Base 1 1 032 93030 053 ars og 0% gk ase 034 93030 046 3e0 1000 $0 5 go 7 False 034 93030 046 31 ot) SON 10 116 false 5 1 022 92727 080 sez 512 BO 117 ae 5 024 92424 075 ses 53 BO gg 1 18 ase 5 024 92273 075 ga oa BOE gg ease 5 2 02d 92s 075 3e5 ts “BOT 10 1.20 False 5 2 02d 92273 070 3e6 ote “BOT 10 121 False 5 2 022 92273 075 ser 517 BO gt 22 ase 5 024 92121 065 ga ote BO tg ase 5 024 92273 0.60 3g tg 0 10 1.0 false 6 1 022 9.1970 0.64 < > «The bottom 15 rows of the dataset_3 is shown above, as we mention 15 inside the bracket () « Here, we can see that the rows are not sorted well according to the instant number. Let's resolve it Sort values of a column # To sort the values per our will, we use the sort_values function and in the square brackets, we specify the name of the column by which we want to sort, as shown below dataset_3 = dataset_3.sort_values(by=[ ‘instant’ ])«Let's check head and tail to verify the sort operation dataset_3.head() instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum w 28-01 ser ott BO a1 16 ase 5 1 022 92727 080 si. 28-01 sez 512 BO gg 117 ase 5 024 92824 075 gig 2801 , : 3x63 BON gg 8 Fase 5 024 92273 075 seq stg 701 10) 119 false 5 2 024 d24e4 075 2011 355s SM 10 120 false 5 2 02d 92273 070 2011 < » dataset_3.tail() instant dteday season yr mnth hr holiday weekday weathersit temp atemp hum w 14-02. 376 oe SO 1 gaa Fase 034 03182 046 37997 SO gk a Fase 1 1 032 03030 053 37a age SOP 10 2.5 fale 1 032 03030 953 ara ge Og gk Fase 034 03030 0.46 seo 100 0% 5g 7 Fale 034 03030 046 < > Concatenate the combine_data with dataset_3 + Let's concatenate both Dataframe combined _data and dataset.3 into a single DataFrame using the concat function, as shown below «Store the final DataFrame inside the final_data variable final_data = pd.concat([combined_data, dataset_3]) Check the shape of the new dataset final_data. shape (1000, 16)Observa « Now, the final_date has 1000 rows and 16 columns Let's diplay the columns of the final_data DataFrame final_data = final_data.rename(colunns=(‘dteday': ‘date’, ‘yr’: ‘year’, ‘nnth':*mont ‘weathersit':'weather', ‘hum': ‘humidity’, ‘¢ final_data.head() instant date season year month hour holiday weekday weather temp atemp humidity or 0 + oF 1 0 1 2011 False 6 1024 02879 ogi or 1 2 oF 1 0 1 False 6 1 022 02727 oat 2011 a 2 3 or 10 2 False 6 1022 02727 oe 2011 or 3 4 o- 1 0 13 False 6 1 024 02879 O75 2011 ar 4 5 or 10 4 False 6 1 024 02879 075 2011 Data types of different column values final_data.dtypes instant inte date object season intea year inte4 mnonth inte hour intea holiday bool weekday inte4 weather intea temp floated atemp floated humidity floated windspeed _floate4 casual intea registered inte4 count: intea dtype: object Observations:« We can see that the majority of our data columns are of type int64. They are therefore 64- bit integers. Some of the columns are of the type float64, which implies that they have decimals in them. However, only the date column has an object type, indicating that it contains strings Check for null values ‘© Execute the given command to check the unknown values in the DataFrame final_data.isna() instant date season year month hour holiday weekday weather temp atemp humidi 0 False False False False False False False False ~— False False False Fal 1 false False False False False False False. = False. ~— False False. | False Fal 2 False False False False False False False False ~— False False False Fal 3 false False False False False False False = False False False Fal 4 False False False False False False False False ~— False False False Fal 376 false False False False False False False ‘False. ~— False False «False = Fal 377 false False False False False False False —- “False. ~— False False. False = Fal 378 false False False False False False False ~—False False False Fal 379 false False False False False False False False False False Fal 380 false False False False False False False False False False Fal 1000 rows x 16 columns < » Observation: « The isna() function returns DataFrame of Boolean values that are True for null values « Ina huge dataset, the code given above is not going to help ‘© We do not get enough idea of the null values by looking at the given tabular dataset ‘© The next line of code is more convenient in this case. final_data.isna().sum(axis=0) instant date season year mnonth hour holiday weekday weather tempatemp 1 humidity windspeed casual registered count dtype: intes Observations: «The isna().sum(axis=@) function provides a clear picture of the number of null values ina DataFrame «In the given result, we can see that the atemp column has 11 null values Let's check the percentage of the rows with mi ing value «We are performing this operation to determine whether the NA value rows can be dropped off or not so that we cannot deviate from our desired model percentage_of_missing values = (final_data[‘atenp" ].isna().sum(axis=0)/final_data.sh percentage_of missing values 1.0999999999999999 Observations: «We divide the number of null values by the shape of the DataFrame to get the percentage of missing values «Since the percentage is 1, itis very less. Usually, the industry practice allows us to drop rows up to 30%, So, we can drop the rows with missing values. Drop the rows with jing values * Weill use the dropna function to drop the null value rows final_data = final_data.dropna(axis-2) final_data. shape (989, 16) Observations: © We can see that the shape of the DataFrame reduced to 989 from 1000. It shows that the missing value rows have been wiped off ‘In further lessons of this course, we'll see different methods to treat missing values. Now, let's again check the missing value count after the drop final_data-isna() .sum(axis=0)instant date season year mmonth hour holiday weekday weather temp atemp humidity windspeed casual registered count: dtype: intea Perform sanity checks on the dataset © It verifies the logical correctness of the data points Check if casual + registered is always equal to count np-sun(final_data['casual'] + final_data[‘registered’] - final_data['‘count'] e Month values should be in the range of 1-12 * Weill use the unique() function to find the elements of an array np-unique(#inal_data.month) array((1, 2]) Hour should be in the range of 1-24 np-unique(final_data.hour) array([ @, 1, 2, 3, 4 5, 6 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23]) Print the statistical summary of the data © We will use the describe() function to see the stastical summary of the dataset print (final_data.describe()) instant season year nnonth hour weekday \ count 989.0000 989.@ 989.0 989.00000@ 989.¢00000 989.0000 mean §05.622851 1.8 © @.@ 1.315478 11.753286 2.991911 std 286.274765 0.80.8 (0.464938 6.891129 2.084727 min 1.000002 © 1.0.8 © 1.080000 © 8.000000 0.000000 25% -259.000000 «1.88.8 1.000000 © 6.000008 © 1.000000 So% 506.000000 © «1.88.8 1.000000 © 12.0000 © 3.00000075% _753.000000 «1.8 8.8 2.000000 © 18.000000 © 5.000000 max 1000.080000 «1.8 8.8 2.000008 © 23.000000 © 6.000000 weather tenp atemp humidity _windspeed casual \ count 989.000000 989.0000 989.000000 989.000000 989.000000 989.0000 mean 1.479272 @.2@4712 «211958 «8.581769 0.194609 4.921132 std 9.651085 9.077789 «076703 0.187786 9.129225 7.666231 min 1.000000 8.020080 0.000000 © 9.210000 8.000000 0.000000, 25% 1.000000 0.160080 0.166700 0.440000 .104500 0.000000 50% 1.000000 0.200008 8.212100 0.550000 © 0.164200 3.200000 75% 2.000000 0.240008 0.257600 0.700000 8.283600 6.000000 max 4.000000 8.460000 0.454500 1.000000 © 0.582100 62. 000000 registered count count 989.000000 989.000000 mean 53,689585 58.610718 std 48019224 51.120572 min 2.200000 © 1.200000 25% 15.000000 16.0000 5e% 46.000000 © 50.000000 75% 75.090000 © 84.000000 max 247,000000 249.0000 Note: We have seen almost all the methods of data wrangling, now let's see explicitly outlier detection and removing Import the Libraries «Apart from the Pandas and NumPy, this time we are also calling Scikit-learr ‘* Scikit-learn (sklearn) is an open-source module that has some inbuilt datasets, like boston and iris ‘Each dataset has a corresponding function used to load the dataset ‘© These functions follow the same format: “load_DATASET()", where DATASET refers to the name of the dataset «We are importing two datasets from sklearn.datasets in the cell below import nunpy as np import pandas as pd from sklearn.datasets import load_boston, load_iris Load the Data « Since these datasets are directly importing from Scikit Learn, the load functions (such as load_boston0) do not return data in the tabular format ‘© The data is stored in the form of keys (words) and values (definition) like the dictionary structure « Let's load the dataset and store it in a variable called bostor « Now, we are going to call the keys for boston dataset print (boston. keys())boston = load_boston() #Find the dic keys print (boston. keys()) dict_keys([‘data’, ‘target’, ‘feature names’, ‘DESCR', ‘filename']) Observations: «We get the keys such as data, target, feature_names, DESCR, and filename The first two keys ‘data’ and ‘target’ has the only actual data, rest serve a descriptive ourpose «datz has all the input features of the dataset in a NumPy array and target has the output feature based on which we do the prediction. target is in the NumPy array « feature_names has all the column names of the dataset in a NumPy array and DESCR is the description of the dataset filename that has the file path in CSV format Find features name * Let's see the columns in the dataset #Pind features and target x = boston.data y = boston. target colunns = boston. feature_names colunns array(['CRIM", 'ZN', ‘INDUS’, "CHAS", ‘NOX', 'RM', AGE", ‘DIS", ‘RAD', “TAX', "PTRATIO', ‘BY, ‘LSTAT'], dtype=" upper_range] upper_boston_df CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO 8 _LSTA’ 351 0.07950 600 169 00 0411 6579 359 107103 40 411.C 37078 54 352 0.07244 600 169 00 0411 5.884 185 107103 40 411.C 39233 77 353 0.01708 900 202 90 D410 6728 36.1 12.1265 50 1870 38446 45) 354 0.04301 800 19° 00 D412 5.662 219 105857 40 334 22.0 38280 80 355 0.10659 800 19° 00 D412 5936 195 105857 40 334C 22.0 37604 5.5 < » Observations: «There are no rows that are lesser than the lower range, but there are five rows that are greater than our upper range lower_outliers = lower_boston_df.value_counts().sum(axis=0)upper_outliers = upper_boston_df.value_counts().sum(axis=0) total_outliers = lower_outliers + upper_outliers print("Total Number of Outliers:",total_outliers) Total Number of Outliers: 5 Observation: «With the given code, we are summing up the total number of outlier rows Let us list down the row numbers that contain outliers: ower_index = 1ist(boston_df{ boston_d¢['DIS"] < lower_range ]-index) upper_index = list (boston_df{ boston_df['DIS"] > upper_range ].index) total_index = list (lower_index + upper_index) print (total_index) (351, 352, 353, 354, 355] Drop the outlier rows print("Shape Before Dropping Outlier Rows:", boston_df.shape) boston_dF.drop(total_index, inplace = True) print ("Shape After Dropping Outlier Rows:", boston_df.shape) Shape Before Dropping Outlier Rows: (506, 14) Shape After Dropping Outlier Rows: (581, 14) Observation: « In the given code, we checked the shape of the dataset before and after dropping outliers rows You can see that the rows before dropping the outliers were 506 and after dropping it became 501. Thus, we have successfully dropped the unwanted rows print (boston_éf.mean()) cRIM 3.648951, N 10.738523 INDUS 31.229521 CHAS 0.269860 NOx. 0.556123 RM, 6.285898 AGE 68.996008 Drs 3.723699 RAD 9.602794 TAK 408.964072 PTRATIO —_18.4agi¢ 8 356.428443 LSTAT 12.716667 2.534938 floated

Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Cheat Sheet: Python For Data Science
No ratings yet
Cheat Sheet: Python For Data Science
4 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
Cheat Sheet
No ratings yet
Cheat Sheet
10 pages
Data Wrangling and Analysis
100% (1)
Data Wrangling and Analysis
36 pages
CH-6 Data Loading, Storage, and File Formats
No ratings yet
CH-6 Data Loading, Storage, and File Formats
163 pages
Pandas Course Slides
No ratings yet
Pandas Course Slides
90 pages
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
100% (1)
Cheat Sheet: The Pandas Dataframe Object: Preliminaries Get Your Data Into A Dataframe
10 pages
Pandas
No ratings yet
Pandas
13 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
Python Libraries Cheat Sheets
No ratings yet
Python Libraries Cheat Sheets
6 pages
Part A Assignment 6
No ratings yet
Part A Assignment 6
28 pages
Pandas DataFrame Notes
100% (1)
Pandas DataFrame Notes
10 pages
DAP Module4 Notes
No ratings yet
DAP Module4 Notes
17 pages
Pandas DataFrame Notes
No ratings yet
Pandas DataFrame Notes
13 pages
Chapter 3
No ratings yet
Chapter 3
47 pages
Group-3 Report
No ratings yet
Group-3 Report
38 pages
Performing Analysis of Meteorological Data: Punam Seal
No ratings yet
Performing Analysis of Meteorological Data: Punam Seal
21 pages
Introduction To Data Wrangling
No ratings yet
Introduction To Data Wrangling
22 pages
Unit Iv
No ratings yet
Unit Iv
63 pages
Lab 7
No ratings yet
Lab 7
6 pages
Lab 7
No ratings yet
Lab 7
6 pages
WorkingWithData - Ipynb - Colaboratory
No ratings yet
WorkingWithData - Ipynb - Colaboratory
13 pages
Task2 Eda Cleaning
No ratings yet
Task2 Eda Cleaning
33 pages
MLStack Cafe 2
No ratings yet
MLStack Cafe 2
11 pages
DA0101EN-Review-Introduction - Jupyter Notebook
No ratings yet
DA0101EN-Review-Introduction - Jupyter Notebook
8 pages
Python For DS Unit4
No ratings yet
Python For DS Unit4
11 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
Lab1.ipynb - Colaboratory
No ratings yet
Lab1.ipynb - Colaboratory
9 pages
Data Preprocessing
No ratings yet
Data Preprocessing
84 pages
ANL252 SU4 Jul2022
No ratings yet
ANL252 SU4 Jul2022
55 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
01-Numpy & Pandas
No ratings yet
01-Numpy & Pandas
69 pages
Data Wrangling With Python and Pandas
No ratings yet
Data Wrangling With Python and Pandas
7 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
Pandas
No ratings yet
Pandas
94 pages
Recurrent Neural Network-Programs
No ratings yet
Recurrent Neural Network-Programs
9 pages
Python CSBS Bhavya Lab Manual
No ratings yet
Python CSBS Bhavya Lab Manual
14 pages
Pandas Cheat Sheet Final
No ratings yet
Pandas Cheat Sheet Final
1 page
Part A Assignment - No - 1
No ratings yet
Part A Assignment - No - 1
7 pages
Justenoughpython Pandas 220915 175329
No ratings yet
Justenoughpython Pandas 220915 175329
64 pages
Exercise 7 - Pandas
No ratings yet
Exercise 7 - Pandas
2 pages
Dataframing in CSV
No ratings yet
Dataframing in CSV
14 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Yash Week 3 Uber Case Study
No ratings yet
Yash Week 3 Uber Case Study
38 pages
41b Data Wrangling, Grouping and Aggregation
No ratings yet
41b Data Wrangling, Grouping and Aggregation
31 pages
DSBDAL
No ratings yet
DSBDAL
87 pages
PDS Exp 7 To 9
No ratings yet
PDS Exp 7 To 9
10 pages
CH 3 2
No ratings yet
CH 3 2
17 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Pandas DataFrameObject
No ratings yet
Pandas DataFrameObject
4 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Working With Data in Python
No ratings yet
Working With Data in Python
5 pages
Introduction To Pandas in Data Analytics
No ratings yet
Introduction To Pandas in Data Analytics
12 pages
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Lesson - 1 Course Introduction Data Science Foundation
100% (1)
Lesson - 1 Course Introduction Data Science Foundation
26 pages
Common Python Data Science Interview Questions1
No ratings yet
Common Python Data Science Interview Questions1
5 pages
Project Idea Development Note
No ratings yet
Project Idea Development Note
3 pages
Lesson - 2 Introduction To Data Science
No ratings yet
Lesson - 2 Introduction To Data Science
29 pages
Application Development Framework
No ratings yet
Application Development Framework
23 pages

Lesson - 3 - 1 Data Wrangling

Uploaded by

Lesson - 3 - 1 Data Wrangling

Uploaded by

You might also like