0% found this document useful (0 votes)
23 views12 pages

DAVP Lab Manual

Btech python programming lab record
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views12 pages

DAVP Lab Manual

Btech python programming lab record
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

List of Experiments

Subject Name: Data Analysis and Visualization using Python


Subject Code:
Semester: 1st Sem

Faculty Name: Prof. Swarna Prabha Jena


Department: ECE

Experiment 1:
Load a dataset using Pandas and perform Exploratory Data Analysis (EDA).
Pseudo Code:
1. Start
2. Import the pandas library for data handling and analysis.
3. Use the pandas.read_csv() function to load a CSV file (e.g., titanic.csv) into a
DataFrame object.
4. Use the head() method to view the first 5 rows of the dataset for an initial glance at
the structure and data.
5. Call the info() method to get a summary of the dataset, including:
▪ Column names.
▪ Data types.
▪ The number of non-null entries (to identify missing values).
6. Use the describe() method to generate descriptive statistics for numerical columns
(e.g., mean, min, max, standard deviation).
7. End

Code:
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Display the first 5 rows of the dataset
print(df.head())
# Get a summary of the dataset
print(df.info())
# Describe numerical columns
print(df.describe())

Output:

• The first 5 rows of the Titanic dataset will be displayed.


• info() will give details like column names, data types, and missing values.
• describe() will provide statistical insights (mean, standard deviation, etc.) for
numerical columns.
Experiment 2:
Handling the missing Values by filling and removing them.

Pseudo Code:
1. Start
2. Import pandas for data handling and analysis.
3. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
4. Use the isnull() method combined with sum() to count the number of missing values
in each column.
5. Check if the 'Age' column has missing values. If there are missing values, fill them
with the median of the 'Age' column using fillna().
6. Check if the 'Embarked' column has missing values
7. Drop rows where 'Embarked' has missing values using dropna().
8. Use isnull() combined with sum() to verify if the dataset handles missing values
properly.
9. End

Code:
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Check for missing values
print(df.isnull().sum())
# Fill missing values in the 'Age' column with the median
df['Age'].fillna(df['Age'].median(), inplace=True)
# Drop rows with missing 'Embarked' values
df.dropna(subset=['Embarked'], inplace=True)
# Verify that missing values are handled
print(df.isnull().sum())

Output:
• The number of missing values before and after the cleaning process will be shown
Experiment 3:
Visualize the distribution of numerical variables using histograms.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import matplotlib.pyplot for plotting.
4. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
5. Choose the numerical column to visualize (e.g., 'Fare').
6. Use plt.hist() to plot the histogram of the selected column, excluding missing values.
7. Set the number of bins (e.g., 30) to control the granularity of the histogram.
8. Use plt.xlabel() to label the x-axis (e.g., 'Fare').
9. Use plt.ylabel() to label the y-axis (e.g., 'Number of Passengers').
10. Use plt.title() to set the title of the histogram (e.g., 'Fare Distribution of Passengers').
11. Use plt.show() to display the histogram.
12. End

Code:
import matplotlib.pyplot as plt
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Plot histogram of 'Age'
plt.hist(df['Age'], bins=20, color='skyblue')
plt.title('Distribution of Passenger Age')
plt.xlabel('Age')
plt.ylabel('Frequency')
plt.show()

Output:

• A histogram showing the distribution of passenger ages.


Experiment 4:
Create a Bar Plot to visualize the frequency of categorical variables.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import matplotlib.pyplot for plotting.
4. Import seaborn for visualization.
5. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
6. Choose the categorical column to visualize (e.g., 'Survived').
7. Use the value_counts() method on the selected categorical column to count the
occurrences of each category.
8. Use plt.bar() to create a bar plot with the categories on the x-axis and their counts on
the y-axis. And set the x-ticks to the category names.
9. Use plt.xlabel() to label the x-axis (e.g., 'Survived').
10. Use plt.ylabel() to label the y-axis (e.g., 'Number of Passengers').
11. Use plt.title() to set the title of the bar plot (e.g., 'Survival Count of Passengers').
12. Use plt.show() to display the bar plot.
13. End

Code:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Bar plot of 'Pclass' (Passenger Class)
sns.countplot(x='Pclass', data=df, palette='Set2')
plt.title('Passenger Class Distribution')
plt.show()

Output:

• A bar plot showing the count of passengers in each class.


Experiment 5:
Compute the correlation matrix for numerical columns and visualize it using a
heatmap.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import seaborn for visualization.
4. Import matplotlib.pyplot for plotting.
5. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
6. Use the corr() method on the DataFrame to calculate the correlation matrix for
numerical variables.
7. Use sns.heatmap() to visualize the correlation matrix.
8. Set the annot parameter to True to display the correlation coefficients on the heatmap.
9. Set the cmap parameter to specify the color palette (e.g., 'coolwarm').
10. Use plt.title() to set the title of the heatmap (e.g., 'Correlation Matrix Heatmap').
11. Use plt.show() to display the heatmap.
12. End

Code:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Correlation matrix
corr_matrix = df.corr()
# Plot heatmap
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Correlation Matrix')
plt.show()

Output:
• A heatmap visualizing correlations between numerical features.
Experiment 6:
Create a scatter plot to visualize the relationship between two numerical variables.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import matplotlib.pyplot for plotting.
4. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
5. Choose two numerical columns to visualize the relationship (e.g., 'Age' and 'Fare').
6. Use plt.scatter() to create a scatter plot with one numerical variable on the x-axis and
the other on the y-axis.
7. Set the marker style (e.g., 'o') and optionally, specify a color for the points.
8. Use plt.xlabel() to label the x-axis (e.g., 'Age').
9. Use plt.ylabel() to label the y-axis (e.g., 'Fare').
10. Use plt.title() to set the title of the scatter plot (e.g., 'Scatter Plot of Age vs Fare').
11. Use plt.show() to display the scatter plot.
12. End

Code:
import matplotlib.pyplot as plt
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Scatter plot between Age and Fare
plt.scatter(df['Age'], df['Fare'], color='purple')
plt.title('Scatter plot of Age vs Fare')
plt.xlabel('Age')
plt.ylabel('Fare')
plt.show()

Output:

• A scatter plot showing the relationship between passenger age and fare.
Experiment 7:
Detect outliers in a numerical column using a box plot.

Pseudo Code:
1. Start
2. Import pandas for data manipulation.
3. Import matplotlib.pyplot for plotting.
4. Import seaborn for visualization.
5. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
6. Choose the numerical column to visualize (e.g., 'Fare').
7. Use plt.boxplot() to create a box plot for the selected numerical column. And, set
additional parameters for aesthetics (e.g., notch, color).
8. Use plt.ylabel() to label the y-axis (e.g., 'Fare').
9. Use plt.title() to set the title of the box plot (e.g., 'Box Plot of Fare').
10. Use plt.show() to display the box plot.
11. End

Code:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Box plot of 'Fare'
sns.boxplot(y='Fare', data=df)
plt.title('Box Plot of Fare')
plt.ylabel('Fare')
plt.show()

Output:

• A box plot highlighting potential outliers in the 'Fare' column.


Experiment 8:
Create a pair plot to visualize the relationship between multiple numerical variables.

Pseudo Code:

1. Start
2. Import pandas for data manipulation.
3. Import seaborn for visualization.
4. Import matplotlib.pyplot for plotting.
5. Use pandas.read_csv() to load the dataset (e.g., titanic.csv) into a DataFrame.
6. Choose the numerical columns to visualize (e.g., 'Age', 'Fare', and 'SibSp').
7. Use sns.pairplot() to create a pair plot for the selected numerical columns. And set the
hue parameter to a categorical variable (e.g., 'Survived') to color the points by
categories.
8. Use plt.suptitle() to set a title for the pair plot (e.g., 'Pair Plot of Age, Fare, and
SibSp').
9. Use plt.show() to display the pair plot.
10. End

Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load dataset (e.g., Titanic dataset)
df = pd.read_csv('titanic.csv')
# Pair plot of selected features
sns.pairplot(df[['Age', 'Fare', 'Pclass', 'Survived']], hue='Survived',
palette='coolwarm')
plt.suptitle('Pair Plot of Age, Fare, and SibSp')
plt.show()

Output:

• A pair plot showing scatter plots for pairwise relationships, color-coded by survival.
Experiment 9:
Create a bubble chart to visualize multiple variables of the Titanic Dataset.

Pseudo Code:
• Start
• Import pandas for data manipulation
• Import plotly.express for creating visualization
• Use pamdas.read_csv() to lead the Titanic dataset into a DataFrame.
• Handle the missing values in the relevant columns (e.g. ‘Age’, ‘Fare’)
• To set up the bubble chart, define the x-axis as ‘Fare’ and define the y-axis as ‘Age’
• Set the size of the bubbles based on the number of siblings/spouses aboard (Column
‘SibSp’)
• Use the Pclass (Passenger class) to Differentiate the colours of the bubbles.
• Use the ‘Survived’ column to set the Opacity of the bubbles.
• Use plotly.express.scatter() to create the bubble chart with the specified parameters.
• Add a title for the chart as “Bubble chart of Titanic Passengers”
• Label the x-axis as ‘Fare’ and the y-axis as ‘Age’.
• Use Show to display the charts

Code:
# Import necessary libraries
import pandas as pd
import plotly.express as px

# Load the Titanic dataset


df = pd.read_csv('titanic.csv') # Ensure you have 'titanic.csv' in the same directory

# Clean the dataset (optional)


df.dropna(subset=['Age', 'Fare'], inplace=True) # Remove rows with missing Age or Fare

# Set up the bubble chart parameters


x = 'Fare'
y = 'Age'
size = 'SibSp' # Bubble size based on number of siblings/spouses
color = 'Pclass' # Different colors for each passenger class
opacity = df['Survived'] * 0.5 + 0.5 # Opacity based on survival (0 for not survived, 1 for
survived)

# Create the bubble chart


bubble_chart = px.scatter(df,
x=x,
y=y,
size=size,
color=color,
opacity=opacity,
title='Bubble Chart of Titanic Passengers',
hover_name='Name', # Show passenger name on hover
labels={'Fare': 'Fare ($)', 'Age': 'Age (Years)'}
)

# Set chart labels


bubble_chart.update_layout(xaxis_title='Fare ($)',
yaxis_title='Age (Years)')

# Display the bubble chart


bubble_chart.show()

Output:
• An interactive Bubble charts display the Passenger class to Differentiate the colours
of the bubbles.
Experiment 10:
Create an animated bubble chart to visualize multiple variables in the Titanic Dataset.

Pseudo Code:
1. Start
2. Import pandas for data manipulation
3. Import plotly.express for creating visualization
4. Use pamdas.read_csv() to lead the Titanic dataset into a DataFrame.
5. Handle the missing values in the relevant columns (e.g. ‘Age’, ‘Fare’)
6. To set up the bubble chart, define the x-axis as ‘Fare’ and define the y-axis as ‘Age’
7. Set the size of the bubbles based on the number of siblings/spouses aboard (Column
‘SibSp’)
8. Use the Pclass (Passenger class) to Differentiate the colours of the bubbles.
9. Use the ‘Survived’ column to set the Opacity of the bubbles.
10. Use plotly.express.scatter() to create the bubble chart with the specified parameters.
11. Add a title for the chart as “Bubble chart of Titanic Passengers”
12. Label the x-axis as ‘Fare’ and the y-axis as ‘Age’.
13. Use Show to display the charts

Code:
import pandas as pd
import plotly.express as px

# Load the Titanic dataset


df = pd.read_csv('titanic.csv') # Ensure you have 'titanic.csv' in the same directory

# Clean the dataset (optional)


df.dropna(subset=['Age', 'Fare'], inplace=True) # Remove rows with missing Age or Fare

# Create a new column for Age Group for animation (e.g., binning ages)
df['Age_Group'] = pd.cut(df['Age'], bins=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90],
labels=['0-10', '11-20', '21-30', '31-40', '41-50',
'51-60', '61-70', '71-80', '81-90'])

# Set up the bubble chart parameters


x = 'Fare'
y = 'Age'
size = 'SibSp' # Bubble size based on number of siblings/spouses
color = 'Pclass' # Different colors for each passenger class

# Create the bubble chart


bubble_chart = px.scatter(df,
x=x,
y=y,
size=size,
color=color,
animation_frame='Age_Group', # Use Age Group for animation
title='Animated Bubble Chart of Titanic Passengers',
hover_name='Name', # Show passenger name on hover
labels={'Fare': 'Fare ($)', 'Age': 'Age (Years)'},
size_max=60, # Maximum size of the bubbles
)

# Set chart labels


bubble_chart.update_layout(xaxis_title='Fare ($)',
yaxis_title='Age (Years)')

# Display the bubble chart


bubble_chart.show()

Output:
• It shows the interactive bubble charts displayed in the web browser. Hovering over
each bubble displayed additional information about the passengers, such as their
names.
• The bubbles representing survivors will have higher opacity, while non-survivours
will be more transparent.

You might also like