0% found this document useful (0 votes)
17 views3 pages

Python Comands

The document provides a comprehensive guide on using Pandas, Matplotlib, Seaborn, NumPy, and SciPy for data manipulation, visualization, and statistical analysis. It includes functions for reading data, exploring data, cleaning, manipulating, and visualizing it through various types of plots and statistical tests. Each library's key functionalities and methods are outlined for effective data handling and analysis.

Uploaded by

Jaio Etx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

Python Comands

The document provides a comprehensive guide on using Pandas, Matplotlib, Seaborn, NumPy, and SciPy for data manipulation, visualization, and statistical analysis. It includes functions for reading data, exploring data, cleaning, manipulating, and visualizing it through various types of plots and statistical tests. Each library's key functionalities and methods are outlined for effective data handling and analysis.

Uploaded by

Jaio Etx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Pandas (import pandas as pd):

1. Reading Data:

• pd.read_csv('filename.csv'): Read a CSV file into a DataFrame.

• pd.read_excel('filename.xlsx'): Read an Excel file into a DataFrame.

2. Data Exploration:

• df.head(): Display the first few rows of the DataFrame.

• df.describe(): Summary statistics for numerical columns.

• df.info(): Information about the DataFrame, including data types and null
values.

• .dtype() to check the data type

• df.shape: Get the dimensions of the DataFrame (rows, columns).

3. Data Selection and Filtering:

• df['column_name'] or df.column_name: Select a single column.

• df[['col1', 'col2']]: Select multiple columns.

• df.loc[row_indexer, col_indexer]: Access a group of rows and columns by


labels.

• df.iloc[row_indexer, col_indexer]: Access a group of rows and columns by


integer position.

4. Data Cleaning:

• df.isnull(): Check for null values in the DataFrame.

• df.dropna(): Remove rows with null values.

• df.fillna(value): Fill null values with a specified value.

• df.replace(old/missing _value, new_value)

• .astype() to change the data type


5. Data Manipulation:

• df.groupby('column_name').agg(func): Group by a column and apply an


aggregation function.

• df['new_column'] = df['col1'] + df['col2']: Create a new column based on


existing columns.

• pd.concat([df1, df2], axis=0): Concatenate DataFrames vertically (along rows).

• pd.concat([df1, df2], axis=1): Concatenate DataFrames horizontally (along


columns).

Matplotlib (import matplotlib.pyplot as plt):


1. Basic Plots:

• plt.plot(x, y): Line plot.

• plt.scatter(x, y): Scatter plot.

• plt.bar(x, height): Bar plot.

• plt.hist(data, bins=30): Histogram.

2. Customization:

• plt.xlabel('xlabel'), plt.ylabel('ylabel'): Set axis labels.

• plt.title('title'): Set plot title.

• plt.legend(): Display legend.

3. Saving and Showing:

• plt.savefig('filename.png'): Save the plot to a file.

• plt.show(): Display the plot.

Seaborn (import seaborn as sns):

1. Data Visualization:

• sns.scatterplot(x='col1', y='col2', data=df): Scatter plot.

• sns.lineplot(x='col1', y='col2', data=df): Line plot.

• sns.histplot(data=df, x='column_name', bins=30): Histogram.

• sns.boxplot(x='col1', y='col2', data=df): Box plot.

2. Statistical Estimations:

• sns.regplot(x='col1', y='col2', data=df): Regression plot.

• sns.lmplot(x='col1', y='col2', data=df, hue='category'): Scatter plot with a


linear fit for each category.

3. Categorical Plots:

• sns.barplot(x='col1', y='col2', data=df): Bar plot.

• sns.countplot(x='column_name', data=df): Count plot.

4. Heatmaps and Matrices:

• sns.heatmap(corr_matrix, annot=True, cmap='coolwarm'): Heatmap of a


correlation matrix.

• sns.clustermap(corr_matrix, cmap='coolwarm'): Hierarchical clustering of a


correlation matrix.

NumPy (import numpy as np):

1. Creating Arrays:
• np.array([1, 2, 3]): Create a 1D array.

• np.zeros((3, 3)): Create an array of zeros with the specified shape.

• np.ones((3, 3)): Create an array of ones with the specified shape.

2. Array Operations:

• np.sum(arr): Sum of array elements.

• np.mean(arr): Mean of array elements.

• np.max(arr), np.min(arr): Maximum and minimum values in the array.

• np.arange(start, stop, step): Create an array with a range of values.

3. Array Manipulation:

• arr.reshape((rows, cols)): Reshape the array.

• np.vstack((arr1, arr2)): Stack arrays vertically.

• np.hstack((arr1, arr2)): Stack arrays horizontally.

SciPy (from scipy import stats):

1. Statistical Tests:

• stats.ttest_ind(a, b): Independent t-test.

• stats.pearsonr(x, y): Pearson correlation coefficient and p-value.

• stats.norm.pdf(x, loc, scale): Probability density function of a normal


distribution.

2. Distribution Fitting:

• params = stats.norm.fit(data): Fit data to a normal distribution.

3. Descriptive Statistics:

• stats.describe(data): Compute several descriptive statistics.

You might also like