0% found this document useful (0 votes)
46 views13 pages

Pandas

A tutorial in Pandas

Uploaded by

hikmatbaniya20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views13 pages

Pandas

A tutorial in Pandas

Uploaded by

hikmatbaniya20
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Pandas

Full Pandas Tutorial


Pandas is a powerful data manipulation and analysis library for Python. It is built
on top of NumPy and provides data structures such as DataFrames, which allow
you to efficiently work with structured data.
Here’s a detailed tutorial that will help you understand Pandas from basic
concepts to advanced techniques.

1. Installation
You can install Pandas using pip:

bashCopy code
pip install pandas

2. Importing Pandas
Pandas is typically imported with the alias pd :

pythonCopy code
import pandas as pd

3. Pandas Data Structures

3.1 Series
A Series is a one-dimensional labeled array capable of holding any data type.

pythonCopy code
s = pd.Series([1, 3, 5, 7, 9])

Pandas 1
print(s)

You can specify an index for the Series:

pythonCopy code
s = pd.Series([1, 3, 5, 7], index=['a', 'b', 'c', 'd'])
print(s)

3.2 DataFrame
A DataFrame is a two-dimensional, size-mutable, and potentially heterogeneous
tabular data structure with labeled axes (rows and columns).
You can create a DataFrame from a dictionary:

pythonCopy code
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
'Age': [28, 24, 35, 32],
'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
print(df)

4. DataFrame Operations

4.1 Viewing Data


First few rows:

pythonCopy code
df.head() # By default, it shows the first 5 rows

Last few rows:

Pandas 2
pythonCopy code
df.tail(3) # Shows the last 3 rows

DataFrame shape (rows, columns):

pythonCopy code
df.shape

Column names:

pythonCopy code
df.columns

Data types of each column:

pythonCopy code
df.dtypes

4.2 Selecting Data


Select a single column:

pythonCopy code
df['Name']

Select multiple columns:

pythonCopy code
df[['Name', 'Age']]

Pandas 3
Select rows by label using .loc[] :

pythonCopy code
df.loc[0] # Select the row with index 0
df.loc[0:2, ['Name', 'City']] # Select rows 0 to 2 and colum
ns 'Name' and 'City'

Select rows by position using .iloc[] :

pythonCopy code
df.iloc[0] # First row
df.iloc[0:2, 0:2] # First two rows and first two columns

Boolean indexing (filtering rows):

pythonCopy code
df[df['Age'] > 30]

5. Data Cleaning

5.1 Handling Missing Data


Check for missing values:

pythonCopy code
df.isnull().sum() # Sum of null values in each column

Drop missing values:

Pandas 4
pythonCopy code
df.dropna() # Drop rows with missing values

Fill missing values:

pythonCopy code
df.fillna(value=0) # Replace NaN with 0

5.2 Renaming Columns

pythonCopy code
df.rename(columns={'Name': 'Full Name'}, inplace=True)

5.3 Changing Data Types

pythonCopy code
df['Age'] = df['Age'].astype(float) # Convert Age to float

6. Data Transformation

6.1 Adding a New Column

pythonCopy code
df['Country'] = ['USA', 'France', 'Germany', 'UK']

6.2 Removing Columns

Pandas 5
pythonCopy code
df.drop(['City'], axis=1, inplace=True) # Drop the 'City' co
lumn

6.3 Sorting Data

pythonCopy code
df.sort_values(by='Age', ascending=False) # Sort by 'Age' in
descending order

6.4 Applying Functions to Data


You can apply a function to each element in a column or row using apply() .

Apply a function to a column:

pythonCopy code
df['Age'] = df['Age'].apply(lambda x: x + 1)

7. Grouping and Aggregation


Pandas makes it easy to group data and apply aggregations to it.

7.1 Grouping Data


You can group data based on one or more columns and apply an aggregate
function (such as sum , mean , count , etc.).

Group by one column and aggregate:

pythonCopy code
grouped = df.groupby('Country')['Age'].mean()

Pandas 6
print(grouped)

Group by multiple columns:

pythonCopy code
grouped = df.groupby(['Country', 'City'])['Age'].mean()
print(grouped)

7.2 Aggregating Data


You can also apply multiple aggregation functions at once.

Multiple aggregations:

pythonCopy code
df.groupby('Country').agg({'Age': ['mean', 'sum'], 'Name': 'c
ount'})

8. Merging and Joining DataFrames

8.1 Concatenating DataFrames


You can concatenate DataFrames along a particular axis (row-wise or column-
wise).

Row-wise concatenation:

pythonCopy code
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
pd.concat([df1, df2], axis=0)

Pandas 7
Column-wise concatenation:

pythonCopy code
pd.concat([df1, df2], axis=1)

8.2 Merging DataFrames


Merge DataFrames based on a key column, similar to SQL joins.

Inner join (only matching keys):

pythonCopy code
df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value1': [1, 2,
3]})
df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value2': [4, 5,
6]})
merged_df = pd.merge(df1, df2, on='key', how='inner')

Left join:

pythonCopy code
pd.merge(df1, df2, on='key', how='left')

Outer join:

pythonCopy code
pd.merge(df1, df2, on='key', how='outer')

9. Input and Output

9.1 Reading from CSV

Pandas 8
Pandas can read data from a variety of file formats, with CSV being the most
common.

pythonCopy code
df = pd.read_csv('data.csv')

9.2 Writing to CSV

pythonCopy code
df.to_csv('output.csv', index=False)

9.3 Reading from Excel

pythonCopy code
df = pd.read_excel('data.xlsx', sheet_name='Sheet1')

9.4 Writing to Excel

pythonCopy code
df.to_excel('output.xlsx', sheet_name='Sheet1', index=False)

10. Time Series Data


Pandas has robust support for time series data, including date indexing,
resampling, and time-based filtering.

10.1 Creating a Date Range

pythonCopy code
dates = pd.date_range('2024-01-01', periods=6, freq='D')

Pandas 9
df = pd.DataFrame({'Date': dates, 'Value': [1, 2, 3, 4, 5,
6]})

10.2 Indexing by Date


You can set the index of the DataFrame to be a date column.

pythonCopy code
df.set_index('Date', inplace=True)

10.3 Resampling
You can resample time series data (e.g., daily to monthly).

pythonCopy code
df.resample('M').mean() # Resample to monthly data, taking t
he mean of each month

10.4 Time-based Filtering


You can filter data based on a date range.

pythonCopy code
df['2024-01-01':'2024-01-03'] # Filter rows within this date
range

11. Pivot Tables


Pandas allows you to create pivot tables, which are useful for summarizing data.

11.1 Creating a Pivot Table

Pandas 10
pythonCopy code
pivot = df.pivot_table(values='Age', index='Country', columns
='City', aggfunc='mean')
print(pivot)

12. Visualization with Pandas


Pandas integrates with Matplotlib for simple visualizations.

12.1 Line Plot

pythonCopy code
df.plot(x='Date', y='Value')

12.2 Bar Plot

pythonCopy code
df.plot(kind='bar', x='Name', y='Age')

12.3 Histogram

pythonCopy code
df['Age'].plot(kind='hist')

12.4 Scatter Plot

pythonCopy code
df.plot(kind='scatter', x='Age', y='Value')

Pandas 11
13. Advanced Topics

13.1 MultiIndex
You can work with multiple levels of indexing (hierarchical indexing) in Pandas.

Creating a MultiIndex DataFrame:

pythonCopy code
arrays = [['bar', 'bar', 'baz', 'baz'], ['one', 'two', 'one',
'two']]
index = pd.MultiIndex.from_arrays(arrays, names=('first', 'se
cond'))
df = pd.DataFrame({'A': [1, 2, 3, 4]}, index=index)

Accessing data with MultiIndex:

pythonCopy code
df.loc['bar', 'one']

13.2 Handling Large Datasets


Reading large CSV files in chunks:

pythonCopy code
chunk_size = 1000
for chunk in pd.read_csv('large_file.csv', chunksize=chunk_si
ze):
process(chunk) # You can process each chunk separately

13.3 Categorical Data

Pandas 12
Categorical data is a type of data with a fixed number of possible values
(categories).

pythonCopy code
df['Gender'] = df['Gender'].astype('category')

Pandas 13

You might also like