Python For Data Analysis: Data Wrangling
Python For Data Analysis: Data Wrangling
Replace missing data with Replace the missing values of the data set attribute with the mode common occurring MostFrequentEntry = df['attribute_name'].value_counts().idxmax()
df['attribute_name'].replace(np.nan,MostFrequentEntry,inplace=True)
frequency entry in the column.
Replace the missing values of the data set attribute with the mean of all the entries in AverageValue=df['attribute_name'].astype(<data_type>).mean(axis=0)
Replace missing data with mean df['attribute_name'].replace(np.nan, AverageValue, inplace=True)
the column.
df['attribute_name'] =
Data Normalization Normalize the data in a column such that the values are restricted between 0 and 1. df['attribute_name']/df['attribute_name'].max()
bins = np.linspace(min(df['attribute_name']),
max(df['attribute_name'],n)
# n is the number of bins needed
Binning Create bins of data for better analysis and visualization. GroupNames = ['Group1','Group2','Group3,...]
df['binned_attribute_name'] =
pd.cut(df['attribute_name'], bins, labels=GroupNames, include_lowest=True)
df.rename(columns={'old_name':\'new_name'}, inplace=True)
Change column name Change the label name of a dataframe column.
dummy_variable = pd.get_dummies(df['attribute_name'])
Indicator Variables Create indicator variables for categorical data. df = pd.concat([df, dummy_variable],axis = 1)
about:blank 1/1