Introduction To Pandas in Data Analytics
Introduction To Pandas in Data Analytics
Pandas in Data
Analytics
Pandas DataFrame is an essential tool for data analysis in Python, offering
a powerful and flexible tabular data structure.
1 Labeled Axes
Pandas DataFrame provides a two-dimensional, size-mutable, and
potentially heterogeneous tabular data structure with labeled rows
and columns.
2 Data Analysis
Commonly used alongside NumPy and Matplotlib for
comprehensive data manipulation and visualization.
python
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Loading Data into a
DataFrame
Methods for loading data from various sources into a DataFrame. Code
Snippets:
From CSV
df_csv = pd.read_csv('file.csv')
From Excel
df_excel = pd.read_excel('file.xlsx', sheet_name='Sheet1')
From MySQL
import sqlalchemy engine =
sqlalchemy.create_engine('mysql://username:password@localhost/dbnam
e')
Example DataFrame
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]})
Example Series
s = pd.Series([1, 2, 3])
Working with Rows and
Columns
Content: Accessing and manipulating rows and columns. Selecting, adding, and deleting rows and columns.
Selecting a column
df['A']
Deleting a column
Selecting rows
Using .loc
Using .iloc
df.iloc[0:1, 0:2]
Vectorized operations
df['A'] + df['B']
Filtering and Filtering
Grouping filtered = df[df['A'] > 1]
Content:
Grouping
Filtering functions and grouping by row index.
grouped = df.groupby('A').sum()
Merging DataFrames Title: Merging
DataFrames
Merging DataFrames using pd.merge(). Types of joins: inner, outer, left, right. Code Snippets:
Inner join
inner_merge = pd.merge(df1, df2, on='key', how='inner')
Outer join
outer_merge = pd.merge(df1, df2, on='key', how='outer')
Left join
left_merge = pd.merge(df1, df2, on='key', how='left')
Right join
right_merge = pd.merge(df1, df2, on='key', how='right')
Concatenating DataFrames using pd.concat(). Concatenating along rows and columns. Code Snippets:
Code Snippets:
Joining DataFrames
joined_df = df1.join(df2, how='inner')
Code Snippets:
Creating a DataFrame
df = pd.DataFrame({ 'Category': ['A', 'B', 'A', 'B'], 'Value': [10, 20, 30, 40] })
Code Snippets:
Using query
filtered_query = df.query('Value > 20')
Code Snippets:
Sorting by values
sorted_values = df.sort_values(by='Value')
Sorting by index
sorted_index = df.sort_index()
Code Snippets:
To CSV
df.to_csv('output.csv')
To Excel
df.to_excel('output.xlsx', sheet_name='Sheet1')
To Python dictionary
df_dict = df.to_dict()
To string
df_str = df.to_string()
To MySQL
df.to_sql('table_name', engine)
Non-indexing attributes
df.T
df.axes
df.dtypes
df.empty
df.ndim
df.shape
df.size
df.values
Utility methods
df_copy = df.copy()
df_ranked = df.rank()
df_sorted = df.sort_values(by='A')
df = df.astype({'A': 'float64'})
Iterating Over DataFrames Title: Iterating
Over DataFrames
Methods for iterating over DataFrames.
print(label, content)
print(index, row)
Timestamps
ts = pd.Timestamp('2023-01-01')
Periods
period = pd.Period('2023-01')
Date range
date_range = pd.date_range('2023-01-01', periods=10)
Period range
period_range = pd.period_range('2023-01', periods=10, freq='M')
Pivot table
pivot = df.pivot_table(values='A', index='B', columns='C')
Melting
melted = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
Unstacking
unstacked = df.unstack()
Slide 13: Time Series Data Title: Time Series
Data
Handling time series data with DatetimeIndex and PeriodIndex. Upsampling, downsampling, and resampling. Code
Snippets:
DatetimeIndex
dt_index = pd.DatetimeIndex(['2023-01-01', '2023-01-02'])
PeriodIndex
period_index = pd.PeriodIndex(['2023-01', '2023-02'], freq='M')
Resampling
resampled = df.resample('M').mean()
Value counts
value_counts = df['A'].value_counts()
Content:
Loading Data: Methods to load data from various sources into DataFrames.
DataFrame and Series Objects: Differences and usage.
Working with Rows and Columns: Accessing, selecting, and modifying data.
Indexing and Selecting Data: Using .loc, .iloc, and vectorized operations.
Saving and Exporting: Exporting DataFrames to different formats.
Attributes and Methods: Key attributes and utility methods.
Iterating Over DataFrames: Methods to iterate through rows and columns.
Dates and Times: Handling date and time data.
Pivot Tables and Reshaping: Techniques for reshaping data.
Filtering and Grouping: Data filtering and aggregation.