0% found this document useful (0 votes)

23 views12 pages

DAVP Lab Manual

Btech python programming lab record

Uploaded by

Ajit Kumar Behuria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views12 pages

DAVP Lab Manual

Btech python programming lab record

Uploaded by

Ajit Kumar Behuria

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

List of Experiments

Subject Name: Data Analysis and Visualization using Python

Subject Code:
Semester: 1st Sem

Faculty Name: Prof. Swarna Prabha Jena

Department: ECE

Experiment 1:
Load a dataset using Pandas and perform Exploratory Data Analysis (EDA).
Pseudo Code:
1. Start
2. Import the pandas library for data handling and analysis.
3. Use the pandas.read_csv() function to load a CSV file (e.g., titanic.csv) into a
DataFrame object.
4. Use the head() method to view the first 5 rows of the dataset for an initial glance at
the structure and data.
5. Call the info() method to get a summary of the dataset, including:
▪ Column names.
▪ Data types.
▪ The number of non-null entries (to identify missing values).
6. Use the describe() method to generate descriptive statistics for numerical columns
(e.g., mean, min, max, standard deviation).
7. End

Code:
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Display the first 5 rows of the dataset
print(df.head())
# Get a summary of the dataset
print(df.info())
# Describe numerical columns
print(df.describe())

Output:

• The first 5 rows of the Titanic dataset will be displayed.

• info() will give details like column names, data types, and missing values.
• describe() will provide statistical insights (mean, standard deviation, etc.) for
numerical columns.
Experiment 2:
Handling the missing Values by filling and removing them.

Pseudo Code:
1. Start
2. Import pandas for data handling and analysis.
3. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
4. Use the isnull() method combined with sum() to count the number of missing values
in each column.
5. Check if the 'Age' column has missing values. If there are missing values, fill them
with the median of the 'Age' column using fillna().
6. Check if the 'Embarked' column has missing values
7. Drop rows where 'Embarked' has missing values using dropna().
8. Use isnull() combined with sum() to verify if the dataset handles missing values
properly.
9. End

Code:
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Check for missing values
print(df.isnull().sum())
# Fill missing values in the 'Age' column with the median
df['Age'].fillna(df['Age'].median(), inplace=True)
# Drop rows with missing 'Embarked' values
df.dropna(subset=['Embarked'], inplace=True)
# Verify that missing values are handled
print(df.isnull().sum())

Output:
• The number of missing values before and after the cleaning process will be shown
Experiment 3:
Visualize the distribution of numerical variables using histograms.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import matplotlib.pyplot for plotting.
4. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
5. Choose the numerical column to visualize (e.g., 'Fare').
6. Use plt.hist() to plot the histogram of the selected column, excluding missing values.
7. Set the number of bins (e.g., 30) to control the granularity of the histogram.
8. Use plt.xlabel() to label the x-axis (e.g., 'Fare').
9. Use plt.ylabel() to label the y-axis (e.g., 'Number of Passengers').
10. Use plt.title() to set the title of the histogram (e.g., 'Fare Distribution of Passengers').
11. Use plt.show() to display the histogram.
12. End

Code:
import matplotlib.pyplot as plt
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Plot histogram of 'Age'
plt.hist(df['Age'], bins=20, color='skyblue')
plt.title('Distribution of Passenger Age')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

Output:

• A histogram showing the distribution of passenger ages.

Experiment 4:
Create a Bar Plot to visualize the frequency of categorical variables.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import matplotlib.pyplot for plotting.
4. Import seaborn for visualization.
5. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
6. Choose the categorical column to visualize (e.g., 'Survived').
7. Use the value_counts() method on the selected categorical column to count the
occurrences of each category.
8. Use plt.bar() to create a bar plot with the categories on the x-axis and their counts on
the y-axis. And set the x-ticks to the category names.
9. Use plt.xlabel() to label the x-axis (e.g., 'Survived').
10. Use plt.ylabel() to label the y-axis (e.g., 'Number of Passengers').
11. Use plt.title() to set the title of the bar plot (e.g., 'Survival Count of Passengers').
12. Use plt.show() to display the bar plot.
13. End

Code:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Bar plot of 'Pclass' (Passenger Class)
sns.countplot(x='Pclass', data=df, palette='Set2')
plt.title('Passenger Class Distribution')
plt.show()

Output:

• A bar plot showing the count of passengers in each class.

Experiment 5:
Compute the correlation matrix for numerical columns and visualize it using a
heatmap.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import seaborn for visualization.
4. Import matplotlib.pyplot for plotting.
5. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
6. Use the corr() method on the DataFrame to calculate the correlation matrix for
numerical variables.
7. Use sns.heatmap() to visualize the correlation matrix.
8. Set the annot parameter to True to display the correlation coefficients on the heatmap.
9. Set the cmap parameter to specify the color palette (e.g., 'coolwarm').
10. Use plt.title() to set the title of the heatmap (e.g., 'Correlation Matrix Heatmap').
11. Use plt.show() to display the heatmap.
12. End

Code:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Correlation matrix
corr_matrix = df.corr()
# Plot heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

Output:
• A heatmap visualizing correlations between numerical features.
Experiment 6:
Create a scatter plot to visualize the relationship between two numerical variables.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import matplotlib.pyplot for plotting.
4. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
5. Choose two numerical columns to visualize the relationship (e.g., 'Age' and 'Fare').
6. Use plt.scatter() to create a scatter plot with one numerical variable on the x-axis and
the other on the y-axis.
7. Set the marker style (e.g., 'o') and optionally, specify a color for the points.
8. Use plt.xlabel() to label the x-axis (e.g., 'Age').
9. Use plt.ylabel() to label the y-axis (e.g., 'Fare').
10. Use plt.title() to set the title of the scatter plot (e.g., 'Scatter Plot of Age vs Fare').
11. Use plt.show() to display the scatter plot.
12. End

Code:
import matplotlib.pyplot as plt
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Scatter plot between Age and Fare
plt.scatter(df['Age'], df['Fare'], color='purple')
plt.title('Scatter plot of Age vs Fare')
plt.xlabel('Age')
plt.ylabel('Fare')
plt.show()

Output:

• A scatter plot showing the relationship between passenger age and fare.
Experiment 7:
Detect outliers in a numerical column using a box plot.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import matplotlib.pyplot for plotting.
4. Import seaborn for visualization.
5. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
6. Choose the numerical column to visualize (e.g., 'Fare').
7. Use plt.boxplot() to create a box plot for the selected numerical column. And, set
additional parameters for aesthetics (e.g., notch, color).
8. Use plt.ylabel() to label the y-axis (e.g., 'Fare').
9. Use plt.title() to set the title of the box plot (e.g., 'Box Plot of Fare').
10. Use plt.show() to display the box plot.
11. End

Code:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Box plot of 'Fare'
sns.boxplot(y='Fare', data=df)
plt.title('Box Plot of Fare')
plt.ylabel('Fare')
plt.show()

Output:

• A box plot highlighting potential outliers in the 'Fare' column.

Experiment 8:
Create a pair plot to visualize the relationship between multiple numerical variables.

Pseudo Code:

1. Start
2. Import pandas for data manipulation.
3. Import seaborn for visualization.
4. Import matplotlib.pyplot for plotting.
5. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
6. Choose the numerical columns to visualize (e.g., 'Age', 'Fare', and 'SibSp').
7. Use sns.pairplot() to create a pair plot for the selected numerical columns. And set the
hue parameter to a categorical variable (e.g., 'Survived') to color the points by
categories.
8. Use plt.suptitle() to set a title for the pair plot (e.g., 'Pair Plot of Age, Fare, and
SibSp').
9. Use plt.show() to display the pair plot.
10. End

Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Pair plot of selected features
sns.pairplot(df[['Age', 'Fare', 'Pclass', 'Survived']], hue='Survived',
palette='coolwarm')
plt.suptitle('Pair Plot of Age, Fare, and SibSp')
plt.show()

Output:

• A pair plot showing scatter plots for pairwise relationships, color-coded by survival.
Experiment 9:
Create a bubble chart to visualize multiple variables of the Titanic Dataset.

Pseudo Code:
• Start
• Import pandas for data manipulation
• Import plotly.express for creating visualization
• Use pamdas.read_csv() to lead the Titanic dataset into a DataFrame.
• Handle the missing values in the relevant columns (e.g. ‘Age’, ‘Fare’)
• To set up the bubble chart, define the x-axis as ‘Fare’ and define the y-axis as ‘Age’
• Set the size of the bubbles based on the number of siblings/spouses aboard (Column
‘SibSp’)
• Use the Pclass (Passenger class) to Differentiate the colours of the bubbles.
• Use the ‘Survived’ column to set the Opacity of the bubbles.
• Use plotly.express.scatter() to create the bubble chart with the specified parameters.
• Add a title for the chart as “Bubble chart of Titanic Passengers”
• Label the x-axis as ‘Fare’ and the y-axis as ‘Age’.
• Use Show to display the charts

Code:
# Import necessary libraries
import pandas as pd
import plotly.express as px

# Load the Titanic dataset

df = pd.read_csv('titanic.csv') # Ensure you have 'titanic.csv' in the same directory

# Clean the dataset (optional)

df.dropna(subset=['Age', 'Fare'], inplace=True) # Remove rows with missing Age or Fare

# Set up the bubble chart parameters

x = 'Fare'
y = 'Age'
size = 'SibSp' # Bubble size based on number of siblings/spouses
color = 'Pclass' # Different colors for each passenger class
opacity = df['Survived'] * 0.5 + 0.5 # Opacity based on survival (0 for not survived, 1 for
survived)

# Create the bubble chart

bubble_chart = px.scatter(df,
x=x,
y=y,
size=size,
color=color,
opacity=opacity,
title='Bubble Chart of Titanic Passengers',
hover_name='Name', # Show passenger name on hover
labels={'Fare': 'Fare ($)', 'Age': 'Age (Years)'}
)

# Set chart labels

bubble_chart.update_layout(xaxis_title='Fare ($)',
yaxis_title='Age (Years)')

# Display the bubble chart

bubble_chart.show()

Output:
• An interactive Bubble charts display the Passenger class to Differentiate the colours
of the bubbles.
Experiment 10:
Create an animated bubble chart to visualize multiple variables in the Titanic Dataset.

Pseudo Code:
1. Start
2. Import pandas for data manipulation
3. Import plotly.express for creating visualization
4. Use pamdas.read_csv() to lead the Titanic dataset into a DataFrame.
5. Handle the missing values in the relevant columns (e.g. ‘Age’, ‘Fare’)
6. To set up the bubble chart, define the x-axis as ‘Fare’ and define the y-axis as ‘Age’
7. Set the size of the bubbles based on the number of siblings/spouses aboard (Column
‘SibSp’)
8. Use the Pclass (Passenger class) to Differentiate the colours of the bubbles.
9. Use the ‘Survived’ column to set the Opacity of the bubbles.
10. Use plotly.express.scatter() to create the bubble chart with the specified parameters.
11. Add a title for the chart as “Bubble chart of Titanic Passengers”
12. Label the x-axis as ‘Fare’ and the y-axis as ‘Age’.
13. Use Show to display the charts

Code:
import pandas as pd
import plotly.express as px

# Load the Titanic dataset

df = pd.read_csv('titanic.csv') # Ensure you have 'titanic.csv' in the same directory

# Clean the dataset (optional)

df.dropna(subset=['Age', 'Fare'], inplace=True) # Remove rows with missing Age or Fare

# Create a new column for Age Group for animation (e.g., binning ages)
df['Age_Group'] = pd.cut(df['Age'], bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
labels=['0-10', '11-20', '21-30', '31-40', '41-50',
'51-60', '61-70', '71-80', '81-90'])

# Set up the bubble chart parameters

x = 'Fare'
y = 'Age'
size = 'SibSp' # Bubble size based on number of siblings/spouses
color = 'Pclass' # Different colors for each passenger class

# Create the bubble chart

bubble_chart = px.scatter(df,
x=x,
y=y,
size=size,
color=color,
animation_frame='Age_Group', # Use Age Group for animation
title='Animated Bubble Chart of Titanic Passengers',
hover_name='Name', # Show passenger name on hover
labels={'Fare': 'Fare ($)', 'Age': 'Age (Years)'},
size_max=60, # Maximum size of the bubbles
)

# Set chart labels

bubble_chart.update_layout(xaxis_title='Fare ($)',
yaxis_title='Age (Years)')

# Display the bubble chart

bubble_chart.show()

Output:
• It shows the interactive bubble charts displayed in the web browser. Hovering over
each bubble displayed additional information about the passengers, such as their
names.
• The bubbles representing survivors will have higher opacity, while non-survivours
will be more transparent.

DVA Practical
No ratings yet
DVA Practical
19 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Data Science Algorithmen Master - 02 Data Handling
No ratings yet
Data Science Algorithmen Master - 02 Data Handling
76 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Sl-3 Assignment No.8
No ratings yet
Sl-3 Assignment No.8
21 pages
DSBDL Write Ups 8 To 10
No ratings yet
DSBDL Write Ups 8 To 10
7 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Exp7 11 Data Science
No ratings yet
Exp7 11 Data Science
23 pages
Matplotlib
No ratings yet
Matplotlib
5 pages
ML Expt 1 Description
No ratings yet
ML Expt 1 Description
15 pages
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
No ratings yet
3rd Semester DDM AI DAA DEV Print Pages For Spiral Record 25-1-24 - Removed
28 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
Chirayu (1) Merged Merged
No ratings yet
Chirayu (1) Merged Merged
76 pages
Summary: Introduction To Data Visualization Tools
No ratings yet
Summary: Introduction To Data Visualization Tools
13 pages
Pandas Complete + Visualisation Summary of IBM Visualization
No ratings yet
Pandas Complete + Visualisation Summary of IBM Visualization
21 pages
Advanced Plot Types With Seaborn
No ratings yet
Advanced Plot Types With Seaborn
8 pages
Data Sci
No ratings yet
Data Sci
10 pages
Unit 2
No ratings yet
Unit 2
36 pages
Exp 2 SDK Ok
No ratings yet
Exp 2 SDK Ok
18 pages
BDA File
No ratings yet
BDA File
26 pages
DXV Guidelines
No ratings yet
DXV Guidelines
3 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
CS1010S Lecture 11 - Visualising Data
No ratings yet
CS1010S Lecture 11 - Visualising Data
68 pages
Vanshika Goyal Gec Practicals
No ratings yet
Vanshika Goyal Gec Practicals
31 pages
Import As: Pandas PD Titanic - Data PD - Read - CSV Titanic - Data - Head
No ratings yet
Import As: Pandas PD Titanic - Data PD - Read - CSV Titanic - Data - Head
12 pages
Machinelearning Prac
No ratings yet
Machinelearning Prac
17 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Certificate
No ratings yet
Certificate
25 pages
AE II Simulation File PDF
No ratings yet
AE II Simulation File PDF
32 pages
Data Visualization
No ratings yet
Data Visualization
48 pages
GE Practical Sem 2
No ratings yet
GE Practical Sem 2
28 pages
Week 3 Laboratory Activity
No ratings yet
Week 3 Laboratory Activity
7 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Data Analisis 2
No ratings yet
Data Analisis 2
13 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Data Visualization With Python
No ratings yet
Data Visualization With Python
34 pages
Manishadav
No ratings yet
Manishadav
27 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
ML Lab Manual
No ratings yet
ML Lab Manual
36 pages
Data Science
No ratings yet
Data Science
18 pages
Chapter 3
No ratings yet
Chapter 3
18 pages
Pandas - Data Manipulation and Analysis Library - Educative
No ratings yet
Pandas - Data Manipulation and Analysis Library - Educative
7 pages
Pandas 3-2
No ratings yet
Pandas 3-2
27 pages
Data Visualization Using Matplotlib in Python
No ratings yet
Data Visualization Using Matplotlib in Python
15 pages
Data Visualization Part 2
No ratings yet
Data Visualization Part 2
18 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
Lab 3
No ratings yet
Lab 3
14 pages
DEV Experiment No.3
No ratings yet
DEV Experiment No.3
10 pages
Pandas
No ratings yet
Pandas
25 pages
Ass 8 DSBDL
No ratings yet
Ass 8 DSBDL
27 pages
Learn Seaborn 1674064934
No ratings yet
Learn Seaborn 1674064934
24 pages
6) Exploratory Data Analysis
No ratings yet
6) Exploratory Data Analysis
29 pages
DP Prog
No ratings yet
DP Prog
10 pages
Edp 3
No ratings yet
Edp 3
16 pages
Python Codes
No ratings yet
Python Codes
17 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
14-Pattern of Data
No ratings yet
14-Pattern of Data
27 pages
SPC Manual
No ratings yet
SPC Manual
49 pages
Linear Regression and Correlation A Level Notes (Precision Academy)
No ratings yet
Linear Regression and Correlation A Level Notes (Precision Academy)
17 pages
Business Analysis Using Regression - A Casebook
No ratings yet
Business Analysis Using Regression - A Casebook
359 pages
Alford Chapter 2
No ratings yet
Alford Chapter 2
86 pages
Mathematics Item Specification: Grade 8
No ratings yet
Mathematics Item Specification: Grade 8
9 pages
This Study Resource Was: Scatterplot of Attendance Vs Team Salary
No ratings yet
This Study Resource Was: Scatterplot of Attendance Vs Team Salary
6 pages
Diskusi 7 BING4102
100% (1)
Diskusi 7 BING4102
8 pages
Chapter 4
No ratings yet
Chapter 4
32 pages
Math1281, DF, U6
No ratings yet
Math1281, DF, U6
2 pages
Our Lady of Fatima University Midterm Reviewer SASA211 Chapter 3: Graphing Data
No ratings yet
Our Lady of Fatima University Midterm Reviewer SASA211 Chapter 3: Graphing Data
7 pages
Econometrics Lab I Assignment
No ratings yet
Econometrics Lab I Assignment
5 pages
Chapter 3
No ratings yet
Chapter 3
14 pages
Data Visualisation
No ratings yet
Data Visualisation
232 pages
Visualization With Help of PANDAS
No ratings yet
Visualization With Help of PANDAS
83 pages
Scatter Diagrams
No ratings yet
Scatter Diagrams
12 pages
Industrial Production Engineering Department
No ratings yet
Industrial Production Engineering Department
37 pages
MCM1C03
No ratings yet
MCM1C03
131 pages
STATA Graphics
No ratings yet
STATA Graphics
35 pages
J Ajce 20241205 12
No ratings yet
J Ajce 20241205 12
9 pages
25476236
No ratings yet
25476236
12 pages
Visual Vocabulary of Financial Times
No ratings yet
Visual Vocabulary of Financial Times
1 page
Scatter Plots Real-World Practice
No ratings yet
Scatter Plots Real-World Practice
4 pages
Quality Management Methodologies
No ratings yet
Quality Management Methodologies
8 pages
1 Biostatistics Lecture Notes Part One
No ratings yet
1 Biostatistics Lecture Notes Part One
237 pages
Worksheet - Correlation Coefficients
No ratings yet
Worksheet - Correlation Coefficients
4 pages
IB Data Analysis Practice 1
No ratings yet
IB Data Analysis Practice 1
3 pages
Numb3rs Season1 Ep3 Vector Worksheet - Modelling Solutions PDF
No ratings yet
Numb3rs Season1 Ep3 Vector Worksheet - Modelling Solutions PDF
3 pages
Cost Concepts Classification Behavior
No ratings yet
Cost Concepts Classification Behavior
46 pages
H1 MATH (Statistics)
No ratings yet
H1 MATH (Statistics)
18 pages

DAVP Lab Manual

Uploaded by

DAVP Lab Manual

Uploaded by

List of Experiments

Subject Name: Data Analysis and Visualization using Python

Faculty Name: Prof. Swarna Prabha Jena

• The first 5 rows of the Titanic dataset will be displayed.

• A histogram showing the distribution of passenger ages.

• A bar plot showing the count of passengers in each class.

• A box plot highlighting potential outliers in the 'Fare' column.

# Load the Titanic dataset

# Clean the dataset (optional)

# Set up the bubble chart parameters

# Create the bubble chart

# Set chart labels

# Display the bubble chart

# Load the Titanic dataset

# Clean the dataset (optional)

# Set up the bubble chart parameters

# Create the bubble chart

# Set chart labels

# Display the bubble chart

You might also like