Interactive Data Analysis With Jupyter Cheatsheet 1731972443
This cheat sheet provides a comprehensive guide to using Jupyter Notebooks for interactive data analysis, covering basics, magic commands, data import/export, exploration, cleaning, manipulation, visualization with Matplotlib and Seaborn, statistical analysis, and machine learning with Scikit-learn. It includes essential commands and code snippets for each topic, making it a valuable resource for data analysts and scientists. The document is authored by Waleed Mousa.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
94 views10 pages
Interactive Data Analysis With Jupyter Cheatsheet 1731972443
This cheat sheet provides a comprehensive guide to using Jupyter Notebooks for interactive data analysis, covering basics, magic commands, data import/export, exploration, cleaning, manipulation, visualization with Matplotlib and Seaborn, statistical analysis, and machine learning with Scikit-learn. It includes essential commands and code snippets for each topic, making it a valuable resource for data analysts and scientists. The document is authored by Waleed Mousa.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10
[ Interactive Data Analysis with Jupyter Notebooks ] ( CheatSheet )
1. Jupyter Notebook Basics
● Start Jupyter Notebook: jupyter notebook
● Create new notebook: Click "New" > "Python 3" ● Run cell: Shift + Enter ● Insert cell above: A ● Insert cell below: B ● Delete cell: D, D (press twice) ● Change cell type to Markdown: M ● Change cell type to Code: Y ● Toggle line numbers: L ● Toggle output: O ● Clear cell output: Clear > Clear Cell Output ● Restart kernel: 0, 0 (press twice) ● Save notebook: Ctrl + S ● Convert to Python script: jupyter nbconvert --to script notebook.ipynb ● Convert to HTML: jupyter nbconvert --to html notebook.ipynb
2. Magic Commands
● List all magic commands: %lsmagic
● Run Python file: %run script.py ● Time cell execution: %%time ● Time multiple executions: %timeit function() ● Display plots inline: %matplotlib inline ● Display plots in a separate window: %matplotlib qt ● Load extension: %load_ext autoreload ● Autoreload modules: %autoreload 2 ● Display all variables: %who ● Display all variables of a specific type: %who_ls str ● Delete variable: %reset_selective variable_name ● Run shell command: !ls -l ● Set environment variable: %env MY_VAR=value ● Debug with pdb: %pdb ● Profile code: %prun function()
By: Waleed Mousa
3. Data Import and Export
● Import pandas: import pandas as pd
● Read CSV: df = pd.read_csv('file.csv') ● Read CSV with specific encoding: df = pd.read_csv('file.csv', encoding='utf-8') ● Read CSV with custom delimiter: df = pd.read_csv('file.csv', sep='\t') ● Read Excel: df = pd.read_excel('file.xlsx', sheet_name='Sheet1') ● Read JSON: df = pd.read_json('file.json') ● Read SQL query: df = pd.read_sql_query("SELECT * FROM table", connection) ● Read from URL: df = pd.read_csv('https://example.com/data.csv') ● Read from clipboard: df = pd.read_clipboard() ● Read multiple CSV files: df = pd.concat([pd.read_csv(f) for f in glob.glob('*.csv')]) ● Write to CSV: df.to_csv('output.csv', index=False) ● Write to Excel: df.to_excel('output.xlsx', index=False) ● Write to JSON: df.to_json('output.json') ● Write to SQL: df.to_sql('table_name', connection, if_exists='replace') ● Write to clipboard: df.to_clipboard()
4. Data Exploration
● Display first rows: df.head()
● Display last rows: df.tail() ● Display random sample: df.sample(n=5) ● Get dataframe info: df.info() ● Get dataframe statistics: df.describe() ● Get column names: df.columns ● Get data types: df.dtypes ● Get dimensions: df.shape ● Check for null values: df.isnull().sum() ● Get unique values: df['column'].unique() ● Get value counts: df['column'].value_counts() ● Get correlation matrix: df.corr() ● Get covariance matrix: df.cov() ● Display all rows: pd.set_option('display.max_rows', None) ● Display all columns: pd.set_option('display.max_columns', None) ● Reset display options: pd.reset_option('display') ● Get memory usage: df.memory_usage(deep=True)
By: Waleed Mousa
● Get column data types and non-null count: df.info(verbose=True, null_counts=True) ● Get basic information about RangeIndex: df.index ● Get summary of a specific column: df['column'].describe()
5. Data Cleaning
● Drop null values: df.dropna()
● Drop null values in specific columns: df.dropna(subset=['column1', 'column2']) ● Fill null values with a specific value: df.fillna(value) ● Fill null values with column mean: df.fillna(df.mean()) ● Fill null values with column median: df.fillna(df.median()) ● Fill null values with forward fill: df.fillna(method='ffill') ● Fill null values with backward fill: df.fillna(method='bfill') ● Replace values: df.replace(old_value, new_value) ● Replace values using dictionary: df.replace({'old1': 'new1', 'old2': 'new2'}) ● Remove duplicates: df.drop_duplicates() ● Remove duplicates based on specific columns: df.drop_duplicates(subset=['column1', 'column2']) ● Rename columns: df.rename(columns={'old_name': 'new_name'}) ● Change data type: df['column'] = df['column'].astype('int64') ● Convert to datetime: df['date'] = pd.to_datetime(df['date']) ● Handle outliers using IQR: df = df[(df['column'] > df['column'].quantile(0.25) - 1.5 * (df['column'].quantile(0.75) - df['column'].quantile(0.25))) & (df['column'] < df['column'].quantile(0.75) + 1.5 * (df['column'].quantile(0.75) - df['column'].quantile(0.25)))] ● Strip whitespace from string columns: df = df.apply(lambda x: x.str.strip() if x.dtype == "object" else x) ● Replace inf and -inf with NaN: df = df.replace([np.inf, -np.inf], np.nan) ● Coerce errors to NaN when changing data types: df['column'] = pd.to_numeric(df['column'], errors='coerce') ● Drop columns: df = df.drop(['column1', 'column2'], axis=1) ● Reset index: df = df.reset_index(drop=True)