0% found this document useful (0 votes)

86 views12 pages

Data Manipulation in Python Using Pandas

Uploaded by

stpmp24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

86 views12 pages

Data Manipulation in Python Using Pandas

Uploaded by

stpmp24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Data Manipulation in Python using Pandas

06-11-2024

GM KOUSHIKA PRIYADHARSHINI
Research Scholar
Data Manipulation
• Data manipulation - Organizing and refining raw data for analysis, including tasks
like cleaning, merging, and transforming data.
• In Python, the Pandas library provides efficient tools for performing these data
manipulation tasks.

Data manupulation Techniques

• Reshaping and Pivoting
• Data Cleaning • Sorting and Ordering
• Data Transformation • Index Manipulations
• Filtering and Selection • Exporting Data
• Data Aggregation and Grouping
Why Pandas?
• Categorical Data: NumPy does not have direct support for categorical or mixed
data types.
• DataFrames and Relational Operations: Tasks like merging or joining based
on specific column values are not directly supported.
• Lack of Labels: NumPy arrays lack labels.
Data Cleaning
Data cleaning involves preparing raw data by handling inconsistencies, errors, and missing
values.

• Handling Missing Values - dropna(), fillna()

• Handling Duplicates - duplicated(), drop_duplicates()
• Data Type Conversion - astype(), to_datetime(), to_numeric(), to_categorical()
• String Cleaning and Manipulation - str.strip(), str.lower(), str.replace()
• Outlier Detection and Handling - statistical methods or conditional filtering
Data Transformation
Transforming data to make it more suitable for analysis, including scaling, encoding, and
feature engineering.

• Scaling and Normalization - MinMaxScaler, StandardScaler

• Encoding Categorical Variables - pd.get_dummies()
• Feature Engineering - Creating new columns based on existing ones.

Example: df['new_col'] = df['col1'] * df['col2']

Filtering and Selection
Extracting specific data based on conditions or specific criteria.

• Row Selection - Boolean indexing: df[df['column'] > 50]

.loc[] and .iloc[]: Select rows by labels or indices.

• Column Selection - Select single or multiple columns: df[['col1', 'col2']]

Select columns by data type: df.select_dtypes(include=[...])

• Conditional Filtering - Use conditions with logical operators: (df['col1'] > 50) & (df['col2'] < 20).
Data Aggregation and Grouping
Grouping data to calculate summary statistics or aggregate results.

• Grouping - groupby()
• Aggregation Functions - sum(), mean(), count(), min(), max(), std(), agg()
• Multi-level Grouping - df.groupby(['col1', 'col2']).mean()
• Custom Aggregation - Applying multiple aggregation functions with agg({'col1': 'mean', 'col2':
['sum', 'count']}).
Reshaping and Pivoting
Rearranging the structure of data to make it easier to analyze.

• Pivoting - pivot(), pivot_table()

• Stacking and Unstacking - stack(), unstack()
Sorting and Ordering
Sorting data to organize it based on specified criteria.

• Sorting Rows - sort_values(by='column')

Multi-column sorting with different orders: sort_values(by=['col1', 'col2'], ascending=[True,

False]).

• Sorting Index - sort_index(): Sort the DataFrame by index labels.

Index Manipulations
Working with indices to reorganize or access specific data points.

• Setting and Resetting Index - set_index(), reset_index()

• Reindexing - reindex() Conform DataFrame to new index with optional filling
• MultiIndexing - Create multi-level index with set_index(['col1', 'col2']).
• Renaming Index or Columns - rename(): Rename specific index or column labels.
Exporting Data
Saving the processed data to various file formats.

• Export to CSV - to_csv('filename.csv')

• Export to Excel - to_excel('filename.xlsx')
• Export to JSON - to_json('filename.json')
• Export to HTML - to_html('filename.html')
THANK YOU !!

Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
Computational Tools and Software MATLAB Python
No ratings yet
Computational Tools and Software MATLAB Python
5 pages
Matlab Matlab Toolbox Deep Learning Toolbox Neural Network Toolbox Libraries Functions How To Use
No ratings yet
Matlab Matlab Toolbox Deep Learning Toolbox Neural Network Toolbox Libraries Functions How To Use
5 pages
Python 2.1.3
No ratings yet
Python 2.1.3
6 pages
Numerical Analysis
No ratings yet
Numerical Analysis
2 pages
Programmation Météo en Python
No ratings yet
Programmation Météo en Python
50 pages
Core Python Summer Training Course
No ratings yet
Core Python Summer Training Course
3 pages
Python NumPy for Beginners
No ratings yet
Python NumPy for Beginners
50 pages
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
No ratings yet
ENG 202: Computers and Engineering Object Oriented Programming in PYTHON
56 pages
Seaborn Lib
No ratings yet
Seaborn Lib
26 pages
Data Wrangling and Data Visualization Unit - Iv
No ratings yet
Data Wrangling and Data Visualization Unit - Iv
43 pages
Matlab Tutorials
No ratings yet
Matlab Tutorials
172 pages
Mastering Python Functions
No ratings yet
Mastering Python Functions
22 pages
Lecture Notes Interpolation and Data Fitting
No ratings yet
Lecture Notes Interpolation and Data Fitting
16 pages
Excel Automation with xlwings
No ratings yet
Excel Automation with xlwings
214 pages
Customer Data Analysis & Feature Engineering
No ratings yet
Customer Data Analysis & Feature Engineering
35 pages
Lecture 1 Pyhton Programming DOST 1
No ratings yet
Lecture 1 Pyhton Programming DOST 1
67 pages
Scikit-Learn Supervised Learning Guide
100% (1)
Scikit-Learn Supervised Learning Guide
108 pages
Python en Toc PDF
No ratings yet
Python en Toc PDF
27 pages
Introduction To Programming For Engineers Using Python
No ratings yet
Introduction To Programming For Engineers Using Python
358 pages
B-Splines Primer
No ratings yet
B-Splines Primer
52 pages
Kernel Methods in Machine Learning
No ratings yet
Kernel Methods in Machine Learning
3 pages
Anaconda Installation and Creating Environment - Lecture - 03
No ratings yet
Anaconda Installation and Creating Environment - Lecture - 03
40 pages
Curse NG
No ratings yet
Curse NG
464 pages
Ch8 Data Wrangling Join, Combine, and Reshape
No ratings yet
Ch8 Data Wrangling Join, Combine, and Reshape
13 pages
Curvefit MATLAB Toolbox
No ratings yet
Curvefit MATLAB Toolbox
288 pages
Approaching Any Machine Learning Problem
No ratings yet
Approaching Any Machine Learning Problem
22 pages
Part1 20180910.13500.1596979305.4946 PDF
No ratings yet
Part1 20180910.13500.1596979305.4946 PDF
94 pages
EDA Starter Pack for Data Scientists
No ratings yet
EDA Starter Pack for Data Scientists
40 pages
An Overview of Practical Time Series Forecasting Using Pytho
No ratings yet
An Overview of Practical Time Series Forecasting Using Pytho
30 pages
M110 TMA Spring 2022/2023 Overview
No ratings yet
M110 TMA Spring 2022/2023 Overview
3 pages
NMFSC
100% (2)
NMFSC
716 pages
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
No ratings yet
C OMBINATORIAL M ODELS OF C OMPLEX S YSTEMSTesis Doctorado Eng
194 pages
Center Manifold Reduction
100% (2)
Center Manifold Reduction
8 pages
Pde
No ratings yet
Pde
146 pages
Python Programming For Engineers - Part 4: Graphical User Interfaces II
No ratings yet
Python Programming For Engineers - Part 4: Graphical User Interfaces II
121 pages
Database Programming in Python
No ratings yet
Database Programming in Python
21 pages
MATHEMATICS Parallel Scientific Computation
No ratings yet
MATHEMATICS Parallel Scientific Computation
324 pages
Midsem Regular MFDS 22-12-2019 Answer Key PDF
No ratings yet
Midsem Regular MFDS 22-12-2019 Answer Key PDF
5 pages
Financial Analytics With Python
100% (1)
Financial Analytics With Python
40 pages
Numpy Ref
No ratings yet
Numpy Ref
1,128 pages
C&Matlab Primer
No ratings yet
C&Matlab Primer
412 pages
Coordinate Descent and Golden Selection Search
No ratings yet
Coordinate Descent and Golden Selection Search
2 pages
Python for Scientific Computing
No ratings yet
Python for Scientific Computing
87 pages
Introduction To Python For Econometrics PDF
No ratings yet
Introduction To Python For Econometrics PDF
359 pages
Chapter2-Working With Dask Arrays
No ratings yet
Chapter2-Working With Dask Arrays
41 pages
Erle Robotics Learning Python Gitbook Free PDF
No ratings yet
Erle Robotics Learning Python Gitbook Free PDF
129 pages
A First Course in Scientific Computing
No ratings yet
A First Course in Scientific Computing
239 pages
Problem Solving in Data Structures Algorithms Using Python Programming Interview Guide 1st Edition Hemant Jain available all format
No ratings yet
Problem Solving in Data Structures Algorithms Using Python Programming Interview Guide 1st Edition Hemant Jain available all format
432 pages
IT Semester Curriculum Overview
No ratings yet
IT Semester Curriculum Overview
191 pages
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
No ratings yet
An Introduction To PyCUDA Using Prefix Sum Algorithm PDF
6 pages
Data Wrangling & Data Manipulation With Pandas
No ratings yet
Data Wrangling & Data Manipulation With Pandas
6 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
11 pages
Data Prep & EDA for Python Users
No ratings yet
Data Prep & EDA for Python Users
12 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Core of ML - Part 1 Handling Data
No ratings yet
Core of ML - Part 1 Handling Data
3 pages
WBJEEM 2015 Physics & Chemistry Key
No ratings yet
WBJEEM 2015 Physics & Chemistry Key
24 pages
Reviewer Quiz Bee
No ratings yet
Reviewer Quiz Bee
6 pages
Multi-Step Predictor-Corrector Methods
No ratings yet
Multi-Step Predictor-Corrector Methods
28 pages
Pick and Place Fia
No ratings yet
Pick and Place Fia
4 pages
Bio Codoped BCZT
No ratings yet
Bio Codoped BCZT
10 pages
Mathematical Models of Analogous Systems
No ratings yet
Mathematical Models of Analogous Systems
12 pages
Mobile Prepaid PDF
No ratings yet
Mobile Prepaid PDF
2 pages
PSK and FM Modulation Lab Manual
No ratings yet
PSK and FM Modulation Lab Manual
73 pages
A+ Blog Class 8 Chemistry Samagra Chapter 2 Question Pool (Em)
No ratings yet
A+ Blog Class 8 Chemistry Samagra Chapter 2 Question Pool (Em)
13 pages
Baloney Detection Kit
No ratings yet
Baloney Detection Kit
2 pages
Pre PH.D Syllabus - Electrical and Electronics Engineering
No ratings yet
Pre PH.D Syllabus - Electrical and Electronics Engineering
22 pages
The IOTA ETS-20 and ETS-20-DR: IOTA Emergency Lighting Technical Library
No ratings yet
The IOTA ETS-20 and ETS-20-DR: IOTA Emergency Lighting Technical Library
4 pages
BJT and FET Biasing and Stabilization
No ratings yet
BJT and FET Biasing and Stabilization
15 pages
SharpEye v2
No ratings yet
SharpEye v2
30 pages
Pons Fabricius: Ancient Bridge History
No ratings yet
Pons Fabricius: Ancient Bridge History
41 pages
Fibre Length
No ratings yet
Fibre Length
6 pages
Fuel Systems in SI and CI Engines
No ratings yet
Fuel Systems in SI and CI Engines
103 pages
Lab 1 Introduction To Stateflow
No ratings yet
Lab 1 Introduction To Stateflow
39 pages
Understanding the Delta Rule in Learning
No ratings yet
Understanding the Delta Rule in Learning
10 pages
2011 P5 Math SA1 MGS
No ratings yet
2011 P5 Math SA1 MGS
28 pages
Spacing Effect in Vocabulary Learning
No ratings yet
Spacing Effect in Vocabulary Learning
9 pages
Basics of Uncertainty Analysis: Activity
No ratings yet
Basics of Uncertainty Analysis: Activity
4 pages
Alfa Romeo 155 2.5 v6 Cat
No ratings yet
Alfa Romeo 155 2.5 v6 Cat
1 page
Livre John J. A. Johnson D.G Whitaker D Statistical Thinking in Business Second Edition CRC Press 2005 2
100% (1)
Livre John J. A. Johnson D.G Whitaker D Statistical Thinking in Business Second Edition CRC Press 2005 2
400 pages
6-PDE-Laplace Equation
No ratings yet
6-PDE-Laplace Equation
27 pages
HVAC Engineers' Daikin Specs Sheet
No ratings yet
HVAC Engineers' Daikin Specs Sheet
1 page
CHAPTER III Undergraduate Thesis
No ratings yet
CHAPTER III Undergraduate Thesis
9 pages
Midterm Sample # 3 - GEO1111
100% (1)
Midterm Sample # 3 - GEO1111
7 pages
HIGHWAY Engineering
No ratings yet
HIGHWAY Engineering
23 pages
DX Diag
No ratings yet
DX Diag
41 pages

Data Manipulation in Python Using Pandas

Uploaded by

Data Manipulation in Python Using Pandas

Uploaded by

Data Manipulation in Python using Pandas

Data manupulation Techniques

• Handling Missing Values - dropna(), fillna()

• Scaling and Normalization - MinMaxScaler, StandardScaler

Example: df['new_col'] = df['col1'] * df['col2']

• Row Selection - Boolean indexing: df[df['column'] > 50]

.loc[] and .iloc[]: Select rows by labels or indices.

Select columns by data type: df.select_dtypes(include=[...])

• Pivoting - pivot(), pivot_table()

• Sorting Rows - sort_values(by='column')

Multi-column sorting with different orders: sort_values(by=['col1', 'col2'], ascending=[True,

• Sorting Index - sort_index(): Sort the DataFrame by index labels.

• Setting and Resetting Index - set_index(), reset_index()

• Export to CSV - to_csv('filename.csv')

You might also like