0% found this document useful (0 votes)
10 views1 page

Python For Data Analysis: Data Wrangling

This cheat sheet provides essential methods for data wrangling in Python, including techniques for handling missing data, fixing data types, normalizing data, binning, changing column names, and creating indicator variables. Each method is accompanied by a brief description and a code example for implementation. It serves as a quick reference for data analysts working with pandas in Python.

Uploaded by

w123lucy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views1 page

Python For Data Analysis: Data Wrangling

This cheat sheet provides essential methods for data wrangling in Python, including techniques for handling missing data, fixing data types, normalizing data, binning, changing column names, and creating indicator variables. Each method is accompanied by a brief description and a code example for implementation. It serves as a quick reference for data analysts working with pandas in Python.

Uploaded by

w123lucy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

2/23/25, 9:17 PM about:blank

Data Analysis with Python


Cheat Sheet: Data Wrangling

Package/Method Description Code Example

Replace missing data with Replace the missing values of the data set attribute with the mode common occurring MostFrequentEntry = df['attribute_name'].value_counts().idxmax()
df['attribute_name'].replace(np.nan,MostFrequentEntry,inplace=True)
frequency entry in the column.

Replace the missing values of the data set attribute with the mean of all the entries in AverageValue=df['attribute_name'].astype(<data_type>).mean(axis=0)
Replace missing data with mean df['attribute_name'].replace(np.nan, AverageValue, inplace=True)
the column.

df[['attribute1_name', 'attribute2_name', ...]] =


df[['attribute1_name', 'attribute2_name', ...]].astype('data_type')
Fix the data types Fix the data types of the columns in the dataframe. #data_type is int, float, char, etc.

df['attribute_name'] =
Data Normalization Normalize the data in a column such that the values are restricted between 0 and 1. df['attribute_name']/df['attribute_name'].max()

bins = np.linspace(min(df['attribute_name']),
max(df['attribute_name'],n)
# n is the number of bins needed
Binning Create bins of data for better analysis and visualization. GroupNames = ['Group1','Group2','Group3,...]
df['binned_attribute_name'] =
pd.cut(df['attribute_name'], bins, labels=GroupNames, include_lowest=True)

df.rename(columns={'old_name':\'new_name'}, inplace=True)
Change column name Change the label name of a dataframe column.

dummy_variable = pd.get_dummies(df['attribute_name'])
Indicator Variables Create indicator variables for categorical data. df = pd.concat([df, dummy_variable],axis = 1)

about:blank 1/1

You might also like