Pandas (import pandas as pd):
1. Reading Data:
• pd.read_csv('filename.csv'): Read a CSV file into a DataFrame.
• pd.read_excel('filename.xlsx'): Read an Excel file into a DataFrame.
2. Data Exploration:
• df.head(): Display the first few rows of the DataFrame.
• df.describe(): Summary statistics for numerical columns.
• df.info(): Information about the DataFrame, including data types and null
values.
• .dtype() to check the data type
• df.shape: Get the dimensions of the DataFrame (rows, columns).
3. Data Selection and Filtering:
• df['column_name'] or df.column_name: Select a single column.
• df[['col1', 'col2']]: Select multiple columns.
• df.loc[row_indexer, col_indexer]: Access a group of rows and columns by
labels.
• df.iloc[row_indexer, col_indexer]: Access a group of rows and columns by
integer position.
4. Data Cleaning:
• df.isnull(): Check for null values in the DataFrame.
• df.dropna(): Remove rows with null values.
• df.fillna(value): Fill null values with a specified value.
• df.replace(old/missing _value, new_value)
• .astype() to change the data type
5. Data Manipulation:
• df.groupby('column_name').agg(func): Group by a column and apply an
aggregation function.
• df['new_column'] = df['col1'] + df['col2']: Create a new column based on
existing columns.
• pd.concat([df1, df2], axis=0): Concatenate DataFrames vertically (along rows).
• pd.concat([df1, df2], axis=1): Concatenate DataFrames horizontally (along
columns).
Matplotlib (import matplotlib.pyplot as plt):
1. Basic Plots:
• plt.plot(x, y): Line plot.
• plt.scatter(x, y): Scatter plot.
• plt.bar(x, height): Bar plot.
• plt.hist(data, bins=30): Histogram.
2. Customization:
• plt.xlabel('xlabel'), plt.ylabel('ylabel'): Set axis labels.
• plt.title('title'): Set plot title.
• plt.legend(): Display legend.
3. Saving and Showing:
• plt.savefig('filename.png'): Save the plot to a file.
• plt.show(): Display the plot.
Seaborn (import seaborn as sns):
1. Data Visualization:
• sns.scatterplot(x='col1', y='col2', data=df): Scatter plot.
• sns.lineplot(x='col1', y='col2', data=df): Line plot.
• sns.histplot(data=df, x='column_name', bins=30): Histogram.
• sns.boxplot(x='col1', y='col2', data=df): Box plot.
2. Statistical Estimations:
• sns.regplot(x='col1', y='col2', data=df): Regression plot.
• sns.lmplot(x='col1', y='col2', data=df, hue='category'): Scatter plot with a
linear fit for each category.
3. Categorical Plots:
• sns.barplot(x='col1', y='col2', data=df): Bar plot.
• sns.countplot(x='column_name', data=df): Count plot.
4. Heatmaps and Matrices:
• sns.heatmap(corr_matrix, annot=True, cmap='coolwarm'): Heatmap of a
correlation matrix.
• sns.clustermap(corr_matrix, cmap='coolwarm'): Hierarchical clustering of a
correlation matrix.
NumPy (import numpy as np):
1. Creating Arrays:
• np.array([1, 2, 3]): Create a 1D array.
• np.zeros((3, 3)): Create an array of zeros with the specified shape.
• np.ones((3, 3)): Create an array of ones with the specified shape.
2. Array Operations:
• np.sum(arr): Sum of array elements.
• np.mean(arr): Mean of array elements.
• np.max(arr), np.min(arr): Maximum and minimum values in the array.
• np.arange(start, stop, step): Create an array with a range of values.
3. Array Manipulation:
• arr.reshape((rows, cols)): Reshape the array.
• np.vstack((arr1, arr2)): Stack arrays vertically.
• np.hstack((arr1, arr2)): Stack arrays horizontally.
SciPy (from scipy import stats):
1. Statistical Tests:
• stats.ttest_ind(a, b): Independent t-test.
• stats.pearsonr(x, y): Pearson correlation coefficient and p-value.
• stats.norm.pdf(x, loc, scale): Probability density function of a normal
distribution.
2. Distribution Fitting:
• params = stats.norm.fit(data): Fit data to a normal distribution.
3. Descriptive Statistics:
• stats.describe(data): Compute several descriptive statistics.