Pandas - Cheat - Sheet
Pandas - Cheat - Sheet
Add_prefix() & Add_suffix() Method: (Doesn’t modify the original Series labels)
>Alphabet.add_prefix(‘_prefix_label’)
>Alphabet.add_suffix(‘_suffix_label’)
If the parsed data only contains one column then setting squeeze parameter as True will return a Series
>Alcohol = pd.read_csv(‘https://andybek.com/drinks’, usecols = [‘country’,’wine_servings’], index_col = [‘country’],
squeeze = True)
>type(Alcohol) ---> pandas.core.series.Series
Size & Shape:
.size - number if elements in the Series
.shape - tuple of the dimension for a Series
Descriptive Statistics:
>Alcohol.sum() - Excludes NA’s
>Alcohol.mean()
>Alcohol.median()
>Alcohol.quantile(q=0.5)
IQR (Interquantile Range) -> Alcohol.quantile(0.75) – Alcohol.qunatile(0.25)
>Alcohol.min()
>Alcohol.max()
>Alcohol.std()
>Alcohol.var()
Note: Alcohol.std()**2 = Alcohol.var() | Mode - Item with highest frequency
Describe() Method:
Gives an overall statistical description of the dataset
>Alcohol.describe(percentile = [0.79,0.19], include = float, exclude = object)
Value_counts():
A sorted series containing unique values and their counts
>Alcohol.value_counts(sort = True, ascending = False, dropna = True, normalize = False)
Note: Normalize provides relative frequency
>(Alcohol.subtract(Alcohol.mean())**2).sum()/(Alcohol.count() -1)
Note: Standard Deviation is square of Variance
Cumulative Operations:
>Alcohol.cumsum(skipna =True) - Calculate a progressive / cumulative sum (of the values preceding in the Series)
>Alcohol.cumprod()
>Alcohol.cummin()
>Alcohol.cummax()
Note: ‘NaN’ are skipped (i.e., ‘skipna’ parameter is set to ‘True’ by default) as sum of any number with ‘NaN’ is ‘NaN’)
>for i in Alcohol.index:
print(i)
-Prints labels without values
>for i in Alcohol.index:
print(i, Alcohol[i])
-Prints both label and value
pandas.Series.where() - Replace values with ‘NaN / set value’ where the condition is False
>Alcohol.where(lambda x: x > 200, np.nan).dropna()
pandas.Series.mask() - Replace values with ‘NaN / set value’ where the condition is True
>Alcohol.mask(lambda x: x > 200, np.nan).dropna()
map() - Map values of Series according to input correspondence | Used for substituting each value in a Series with
another value, that may be derived from a function, a dictionary Series
>Ser.map({‘old value’ : ‘new value’})
>Alcohol.map(lambda x : x**2)
DataFrame:
A table of data that contains a collection of rows and columns.
Key Aspects -
1. DataFrames have two dimensions: labeled indices and columns
2. Each column in a DataFrame is a Series and each column must be of same size
3. Unlike Series, DataFrames could be heterogenous (i.e., have multiple data types)
Need to specify index position as well as column position / name to fetch specific value from DataFrame
> df.iloc[2,0] --> Output: ‘Brian’
.drop():
Removes specific rows / columns from the DataFrame
>Nutrition.drop(‘Unnamed : 0’, axis = 1)
.set_index():
Set specified column as index of the DataFrame
>Nutrition.set_index(‘Unnamed : 0’)