0% found this document useful (0 votes)
87 views18 pages

Pandas 12 Pivot Table and Drop - A

This lecture explores fundamental techniques for data analysis and model preparation. Topics covered include: Pivot Table Functionality and Construction: Learn to utilize pivot tables for data summarization and effective visualization. Descriptive Statistics with mean: Explore the mean function for calculating central tendency and its application with the axis argument. Feature Selection and Engineering: Understand the importance of feature selection and various feature engineering techniques,
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
87 views18 pages

Pandas 12 Pivot Table and Drop - A

This lecture explores fundamental techniques for data analysis and model preparation. Topics covered include: Pivot Table Functionality and Construction: Learn to utilize pivot tables for data summarization and effective visualization. Descriptive Statistics with mean: Explore the mean function for calculating central tendency and its application with the axis argument. Feature Selection and Engineering: Understand the importance of feature selection and various feature engineering techniques,
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Pandas 12

Working with Pivot table and Drop

Mostafa Elhosseini
Professor at Computers Engineering and Sys. Dept
Faculty of Engineering
Mansoura University
https://youtube.com/drmelhosseini
| Dataset
| Group by Country and City Vs. Year
▪ Slicing is a technique for selecting consecutive elements from objects
▪ Sort the index before you slice
▪ Recall that: You can sort rows using the sort_values method, passing
in a column name that you want to sort by
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
▪ In essence, pivot tables are just dataframes with sorted indexes
▪ In order to subset a pivot table, you can manipulate a DataFrame
with sorted indexes, so you can use the techniques you already
know.
▪ Thus, all the knowledge you gained previously, will be applicable.
▪ The combination of .loc[] and slicing can be particularly helpful
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
| mean method – axis argument
▪ Default value is index
▪ calculating the statistics across rows
| mean method – axis argument
▪ To calculate summary statistics for each row - across the columns
| mean method – axis argument
▪ With multiple types of data within each column, setting the axis
argument does not make sense for dataframes.
▪ Due to the fact that every column in pivot tables contains the same
kind of data, pivot tables can be considered special
| Feature Selection and Engineering
▪ In many datasets, attributes that do not provide predictive power
must be eliminated before modeling is conducted
— unique identifiers such as phone numbers,
— social security numbers, and
— account numbers
▪ Dropping columns from Pandas DataFrames is possible via the drop
method
| Drop
| Drop
| Dropping Correlated features
▪ The model can also get rid of highly correlated features, since they
add no additional information to it
▪ Correlation can be explored using the corr method
▪ corr will find the Pearson correlation (default) between the columns
| References
▪ Python Data Analytics, Data Analysis and Science Using Pandas,
matplotlib, and the Python Programming Language, Fabio Nelli
▪ Mastering pandas, Second Edition, A complete guide to pandas, from
installation to advanced data analysis techniques, Ashish Kumar,
▪ Pandas for everyone, Pandas Data Analysis, Daniel Y. Chen
▪ Master Datascience and Data Analysis with Pandas by Arun
▪ Python for Data Analysis, Data Wrangling with Pandas, NumPy, and
IPython, Wes McKinney

You might also like