Pandas 12 Pivot Table and Drop - A
Pandas 12 Pivot Table and Drop - A
Mostafa Elhosseini
Professor at Computers Engineering and Sys. Dept
Faculty of Engineering
Mansoura University
https://youtube.com/drmelhosseini
| Dataset
| Group by Country and City Vs. Year
▪ Slicing is a technique for selecting consecutive elements from objects
▪ Sort the index before you slice
▪ Recall that: You can sort rows using the sort_values method, passing
in a column name that you want to sort by
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
▪ In essence, pivot tables are just dataframes with sorted indexes
▪ In order to subset a pivot table, you can manipulate a DataFrame
with sorted indexes, so you can use the techniques you already
know.
▪ Thus, all the knowledge you gained previously, will be applicable.
▪ The combination of .loc[] and slicing can be particularly helpful
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
| Group by Country and City Vs. Year
| mean method – axis argument
▪ Default value is index
▪ calculating the statistics across rows
| mean method – axis argument
▪ To calculate summary statistics for each row - across the columns
| mean method – axis argument
▪ With multiple types of data within each column, setting the axis
argument does not make sense for dataframes.
▪ Due to the fact that every column in pivot tables contains the same
kind of data, pivot tables can be considered special
| Feature Selection and Engineering
▪ In many datasets, attributes that do not provide predictive power
must be eliminated before modeling is conducted
— unique identifiers such as phone numbers,
— social security numbers, and
— account numbers
▪ Dropping columns from Pandas DataFrames is possible via the drop
method
| Drop
| Drop
| Dropping Correlated features
▪ The model can also get rid of highly correlated features, since they
add no additional information to it
▪ Correlation can be explored using the corr method
▪ corr will find the Pearson correlation (default) between the columns
| References
▪ Python Data Analytics, Data Analysis and Science Using Pandas,
matplotlib, and the Python Programming Language, Fabio Nelli
▪ Mastering pandas, Second Edition, A complete guide to pandas, from
installation to advanced data analysis techniques, Ashish Kumar,
▪ Pandas for everyone, Pandas Data Analysis, Daniel Y. Chen
▪ Master Datascience and Data Analysis with Pandas by Arun
▪ Python for Data Analysis, Data Wrangling with Pandas, NumPy, and
IPython, Wes McKinney