0% found this document useful (0 votes)
10 views

Data Manipulation in Python Using Pandas

Uploaded by

stpmp24
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Data Manipulation in Python Using Pandas

Uploaded by

stpmp24
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Data Manipulation in Python using Pandas

06-11-2024

GM KOUSHIKA PRIYADHARSHINI
Research Scholar
Data Manipulation
• Data manipulation - Organizing and refining raw data for analysis, including tasks
like cleaning, merging, and transforming data.
• In Python, the Pandas library provides efficient tools for performing these data
manipulation tasks.

Data manupulation Techniques


• Reshaping and Pivoting
• Data Cleaning • Sorting and Ordering
• Data Transformation • Index Manipulations
• Filtering and Selection • Exporting Data
• Data Aggregation and Grouping
Why Pandas?
• Categorical Data: NumPy does not have direct support for categorical or mixed
data types.
• DataFrames and Relational Operations: Tasks like merging or joining based
on specific column values are not directly supported.
• Lack of Labels: NumPy arrays lack labels.
Data Cleaning
Data cleaning involves preparing raw data by handling inconsistencies, errors, and missing
values.

• Handling Missing Values - dropna(), fillna()


• Handling Duplicates - duplicated(), drop_duplicates()
• Data Type Conversion - astype(), to_datetime(), to_numeric(), to_categorical()
• String Cleaning and Manipulation - str.strip(), str.lower(), str.replace()
• Outlier Detection and Handling - statistical methods or conditional filtering
Data Transformation
Transforming data to make it more suitable for analysis, including scaling, encoding, and
feature engineering.

• Scaling and Normalization - MinMaxScaler, StandardScaler


• Encoding Categorical Variables - pd.get_dummies()
• Feature Engineering - Creating new columns based on existing ones.

Example: df['new_col'] = df['col1'] * df['col2']


Filtering and Selection
Extracting specific data based on conditions or specific criteria.

• Row Selection - Boolean indexing: df[df['column'] > 50]

.loc[] and .iloc[]: Select rows by labels or indices.


• Column Selection - Select single or multiple columns: df[['col1', 'col2']]

Select columns by data type: df.select_dtypes(include=[...])


• Conditional Filtering - Use conditions with logical operators: (df['col1'] > 50) & (df['col2'] < 20).
Data Aggregation and Grouping
Grouping data to calculate summary statistics or aggregate results.

• Grouping - groupby()
• Aggregation Functions - sum(), mean(), count(), min(), max(), std(), agg()
• Multi-level Grouping - df.groupby(['col1', 'col2']).mean()
• Custom Aggregation - Applying multiple aggregation functions with agg({'col1': 'mean', 'col2':
['sum', 'count']}).
Reshaping and Pivoting
Rearranging the structure of data to make it easier to analyze.

• Pivoting - pivot(), pivot_table()


• Stacking and Unstacking - stack(), unstack()
Sorting and Ordering
Sorting data to organize it based on specified criteria.

• Sorting Rows - sort_values(by='column')

Multi-column sorting with different orders: sort_values(by=['col1', 'col2'], ascending=[True,


False]).

• Sorting Index - sort_index(): Sort the DataFrame by index labels.


Index Manipulations
Working with indices to reorganize or access specific data points.

• Setting and Resetting Index - set_index(), reset_index()


• Reindexing - reindex() Conform DataFrame to new index with optional filling
• MultiIndexing - Create multi-level index with set_index(['col1', 'col2']).
• Renaming Index or Columns - rename(): Rename specific index or column labels.
Exporting Data
Saving the processed data to various file formats.

• Export to CSV - to_csv('filename.csv')


• Export to Excel - to_excel('filename.xlsx')
• Export to JSON - to_json('filename.json')
• Export to HTML - to_html('filename.html')
THANK YOU !!

You might also like