0% found this document useful (0 votes)
6 views3 pages

Download

This cheat sheet provides methods for data wrangling in Python, including techniques for replacing missing values with the mode or mean, fixing data types, normalizing data, binning for analysis, changing column names, and creating indicator variables for categorical data. Each method is accompanied by a brief description and a code example. The document serves as a quick reference for data manipulation tasks using Python's pandas library.

Uploaded by

muhammad idrees
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

Download

This cheat sheet provides methods for data wrangling in Python, including techniques for replacing missing values with the mode or mean, fixing data types, normalizing data, binning for analysis, changing column names, and creating indicator variables for categorical data. Each method is accompanied by a brief description and a code example. The document serves as a quick reference for data manipulation tasks using Python's pandas library.

Uploaded by

muhammad idrees
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

4/24/25, 5:33 PM about:blank

Data Analysis with Python


Cheat Sheet: Data Wrangling
Package/Method Description Code Example
MostFrequentEntry = df['attribute_name'].value_counts().idxmax()
df['attribute_name'].replace(np.nan,MostFrequentEntry,inplace=True)

Replace the missing values of the


Replace missing data data set attribute with the mode
with frequency common occurring entry in the
column.

AverageValue=df['attribute_name'].astype(<data_type>).mean(axis=0)
df['attribute_name'].replace(np.nan, AverageValue, inplace=True)

Replace the missing values of the


Replace missing data
data set attribute with the mean of all
with mean
the entries in the column.

about:blank 1/3
4/24/25, 5:33 PM about:blank

df[['attribute1_name', 'attribute2_name', ...]] =


df[['attribute1_name', 'attribute2_name', ...]].astype('data_type')
#data_type is int, float, char, etc.

Fix the data types of the columns in


Fix the data types
the dataframe.

df['attribute_name'] =
df['attribute_name']/df['attribute_name'].max()

Normalize the data in a column such


Data Normalization that the values are restricted between
0 and 1.

bins = np.linspace(min(df['attribute_name']),
max(df['attribute_name'],n)
# n is the number of bins needed
GroupNames = ['Group1','Group2','Group3,...]
df['binned_attribute_name'] =
pd.cut(df['attribute_name'], bins, labels=GroupNames, include_lowest=True)

Create bins of data for better analysis


Binning
and visualization.

Change column name Change the label name of a df.rename(columns={'old_name':\'new_name'}, inplace=True)


dataframe column.

about:blank 2/3
4/24/25, 5:33 PM about:blank

dummy_variable = pd.get_dummies(df['attribute_name'])
df = pd.concat([df, dummy_variable],axis = 1)

Create indicator variables for


Indicator Variables
categorical data.

about:blank 3/3

You might also like