Lab 02 - Introduction to Pandas
Lab 02 - Introduction to Pandas
BESE – 13B
03rd February 2025
Lab Objectives:
End Goal:
Students will be able to manipulate datasets, perform exploratory data analysis, and create
insightful visualizations using Python.
Datasets:
1. Titanic Dataset:
○ Description: Data on passengers of the Titanic, including survival information.
○ Source: Downloadable from Kaggle (Titanic Dataset).
Pandas:
Importing Pandas:
import pandas as pd
Creating a Series:
Creating a DataFrame:
Loading a dataset:
df = pd.read_csv('data.csv')
print(df.head())
Basic operations:
print(df.info())
print(df.describe())
print(df['Age'].mean())
scikit-learn:
X = data['data']
y = data['target']
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2, random_state=42)
print(X_train.shape, X_test.shape)
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy: {accuracy * 100:.2f}%')
Matplot Lib:
Importing Matplotlib:
plt.plot(x, y, marker='o')
plt.title("Basic Line Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
plt.show()
Scatter Plot:
Histogram:
Bar Plot:
species_counts = iris_df['species'].value_counts()
species_counts.plot(kind='bar', color=['blue', 'orange',
'green'])
plt.title("Species Count")
plt.xlabel("Species")
plt.ylabel("Count")
plt.show()
Lab Tasks:
1. Deliverables: Python code, visualizations, and a report summarizing and one report
on self reflections:
○ Data cleaning steps.
○ Insights derived from EDA and visualizations.
○ Observations regarding patterns and anomalies in the datasets.
2. Learning Outcomes:
○ Gained practical skills in data manipulation using Pandas and NumPy.
○ Conducted exploratory data analysis to extract insights.
○ Applied effective visualization techniques to support data storytelling.
Students will answer the following questions as part of their notebook submission:
1. Understanding of Concepts:
○ Summarize the key concepts of data manipulation and EDA you applied in
this lab.
○ What new skills or knowledge did you gain?
2. Example:
"Through this lab, I learned how to handle missing values effectively using Pandas
and how to create new features to extract meaningful insights. The correlation matrix
visualization was particularly insightful for identifying relationships among variables."
3. Challenges Faced:
○ Describe any difficulties encountered during the lab tasks.
○ How did you resolve them?
4. Example:
"I initially struggled with visualizing correlations using a heatmap. After referring to
the Pandas and Seaborn documentation, I realized I needed to preprocess the data
to exclude non-numerical columns."