0% found this document useful (0 votes)
6 views6 pages

ADS Exp3

The document outlines an experiment focused on data visualization techniques using the Iris and Titanic datasets, emphasizing the importance of visualizations in understanding data patterns and distributions. It details a step-by-step implementation for loading datasets, creating histograms and boxplots, and identifying outliers. The conclusions drawn highlight the insights gained from the visualizations, showcasing the role of data visualization in informed decision-making.

Uploaded by

abhijaysingh66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views6 pages

ADS Exp3

The document outlines an experiment focused on data visualization techniques using the Iris and Titanic datasets, emphasizing the importance of visualizations in understanding data patterns and distributions. It details a step-by-step implementation for loading datasets, creating histograms and boxplots, and identifying outliers. The conclusions drawn highlight the insights gained from the visualizations, showcasing the role of data visualization in informed decision-making.

Uploaded by

abhijaysingh66
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Experiment 3: Data Visualization Techniques

Aim: To explore data visualization techniques using the Iris and Titanic datasets. This
includes identifying feature types, creating histograms and boxplots, comparing distributions,
and identifying outliers.

Theory:
Data visualization is a crucial step in data analysis as it helps in understanding patterns, trends, and
distributions. Some common types of visualizations include:
• Univariate Visualization: Examines one variable at a time (e.g., histograms, quartile
distributions).
• Multivariate Visualization: Displays relationships between multiple variables (e.g., scatter
plots, density charts).
• High-Dimensional Data Visualization: Projects multiple variables onto a two-dimensional
space using techniques like parallel coordinates.
Using visualization, we can:
1. Understand data distribution.
2. Identify outliers and anomalies.
3. Detect patterns and relationships between variables.

Step-wise Implementation:

Step 1: Load the Iris Dataset


• Download the Iris dataset from the given URL.
• Load it into a Pandas DataFrame.

Step 2: List Features and Their Types


• Identify feature names and check whether they are numeric or categorical.

Step 3: Create Histograms


• Plot histograms for each feature to analyze their distributions.

Step 4: Create Boxplots


• Generate boxplots to visualize the spread and detect potential outliers.

Step 5: Compare Distributions and Identify Outliers


• Use the IQR method to detect and highlight outliers in the dataset.
Step 6: Load the Titanic Dataset
• Load the inbuilt Titanic dataset using Seaborn.

Step 7: Plot Ticket Price Distribution


• Create a histogram to analyze the distribution of ticket fares.

Step 8: Analyze Age Distribution by Gender and Survival


• Generate a boxplot comparing age distributions across gender and survival status.

Step 9: Draw Inferences


• Summarize insights gained from the visualizations.

CODE :
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Step 1: Load the Iris dataset


url = "https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data"
column_names = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'class']
iris_df = pd.read_csv(url, names=column_names)

# Step 2: List features and their types


print("Features and their types:")
print(iris_df.dtypes)

# Step 3: Create histograms


iris_df.hist(bins=20, figsize=(12, 8))
plt.suptitle("Histograms of Iris Dataset Features")
plt.show()

# Step 4: Create boxplots


plt.figure(figsize=(12, 8))
sns.boxplot(data=iris_df)
plt.title("Boxplots of Iris Dataset Features")
plt.show()

# Step 5: Compare distributions and identify outliers using IQR method


for feature in iris_df.columns[:-1]:
Q1 = iris_df[feature].quantile(0.25)
Q3 = iris_df[feature].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
outliers = iris_df[(iris_df[feature] < lower_bound) | (iris_df[feature] > upper_bound)]
print(f"Outliers in {feature}:", outliers)

# Step 6: Load Titanic dataset


titanic = sns.load_dataset('titanic')

# Step 7: Plot histogram for ticket prices


plt.figure(figsize=(10, 6))
sns.histplot(titanic['fare'], bins=30, kde=True)
plt.title("Distribution of Ticket Prices on Titanic")
plt.xlabel("Fare")
plt.show()

# Step 8: Boxplot for age distribution by gender and survival


plt.figure(figsize=(12, 8))
sns.boxplot(x='sex', y='age', hue='survived', data=titanic)
plt.title("Age Distribution by Gender and Survival on Titanic")
plt.show()
OUTPUT :
Observations:
1. The Iris dataset consists of four numeric features: sepal length, sepal width, petal length, and
petal width.
2. Histograms show that sepal length and width have a normal distribution, while petal length
and width have multiple peaks.
3. Boxplots reveal that sepal width has a wider spread than other features.
4. Using the IQR method, no significant outliers were found in the Iris dataset.
5. The Titanic fare histogram indicates a wide range of ticket prices, with most passengers
paying lower fares.
6. The boxplot analysis of the Titanic dataset shows:
• Female passengers generally had a younger age distribution.
• There are differences in age distributions between survivors and non-survivors,
particularly among males.

Conclusion:
Data visualization plays a crucial role in understanding datasets. By analyzing the Iris and Titanic
datasets, we explored different visualization techniques such as histograms, boxplots, and outlier
detection methods. This experiment demonstrates how visual representations help in identifying
patterns and making informed decisions in data analysis.

You might also like