0% found this document useful (0 votes)
26 views3 pages

EDA With Pandas CheatSheet

Uploaded by

kollilokesh24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views3 pages

EDA With Pandas CheatSheet

Uploaded by

kollilokesh24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Exploratory Data Analysis (EDA) with Pandas [CheatSheet]

### Importing Pandas

```python

import pandas as pd

```

### Loading Data

```python

df = pd.read_csv('file.csv') # Load a CSV file

df = pd.read_excel('file.xlsx') # Load an Excel file

df = pd.read_json('file.json') # Load a JSON file

```

### Basic Data Inspection

```python

df.head() # Display first 5 rows

df.tail() # Display last 5 rows

df.shape # Display number of rows and columns

df.info() # Display concise summary of the DataFrame

df.describe() # Generate descriptive statistics

df.columns # List all column names

df.dtypes # Display data type of each column

```

### Data Selection


```python

df['column_name'] # Select a single column

df[['col1', 'col2']] # Select multiple columns

df.iloc[0] # Select the first row

df.iloc[0:5] # Select the first 5 rows

df.loc[df['column'] > value] # Select rows based on column value condition

```

### Data Cleaning

```python

df.dropna() # Drop rows with missing values

df.fillna(value) # Fill missing values with a specific value

df.drop(columns=['col1', 'col2']) # Drop specific columns

df.rename(columns={'old_name': 'new_name'}) # Rename columns

df.duplicated() # Find duplicate rows

df.drop_duplicates() # Drop duplicate rows

```

### Data Transformation

```python

df['new_column'] = df['col1'] + df['col2'] # Create a new column

df.apply(lambda x: x + 1) # Apply a function to each element

df.groupby('column') # Group by a column

df.sort_values(by='column', ascending=False) # Sort by a column

df.pivot_table(index='col1', columns='col2', values='col3') # Pivot table

```
### Visualization (with Matplotlib)

```python

import matplotlib.pyplot as plt

df['column'].hist() # Histogram of a column

df.plot(kind='bar') # Bar plot

df.plot(kind='line') # Line plot

df.plot(kind='scatter', x='col1', y='col2') # Scatter plot

plt.show() # Display the plot

```

### Saving Data

```python

df.to_csv('file.csv', index=False) # Save DataFrame to a CSV file

df.to_excel('file.xlsx', index=False) # Save DataFrame to an Excel file

df.to_json('file.json') # Save DataFrame to a JSON file

```

You might also like