0% found this document useful (0 votes)
14 views8 pages

Data Visualization Lab: Experiment 1

s,ncasnccncscd,v,vmdvc,m dcv.dfd

Uploaded by

mohammadafrin03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views8 pages

Data Visualization Lab: Experiment 1

s,ncasnccncscd,v,vmdvc,m dcv.dfd

Uploaded by

mohammadafrin03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Date: 03-01-25

DATA VISUALIZATION LAB


Mohammad Afrin

24PDD0021

SCOPE

Experiment 1:
Create a dataset of 20 rows and 10 columns of data associated with any of your

interested domains. The dataset shall include data of types: Qualitative and

Quantitative (ordinal, nominal, interval and ratio – continuous / discrete).

For the created dataset, perform the following visualization of:

a. Precise Comparison of Two or more categorical data


b. Numerical data across more than one categorical data

Code:
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

# Set random seed for reproducibility

np.random.seed(42)

# Create a dataset with 20 rows and 10 columns

data = {

'Employee ID': [f'EMP{1000+i}' for i in range(1, 21)],

'Department': np.random.choice(['Sales', 'HR', 'IT', 'Finance'], 20),


'Job Level': np.random.randint(1, 6, size=20),

'Gender': np.random.choice(['Male', 'Female'], 20),

'Age': np.random.randint(22, 60, size=20),

'Salary': np.random.randint(40000, 120000, size=20),

'Years of Experience': np.random.randint(1, 35, size=20),

'Performance Score': np.random.randint(50, 101, size=20),

'Work Hours per Week': np.random.randint(30, 60, size=20),

'Promoted': np.random.choice(['Yes', 'No'], 20)

# Create DataFrame

df = pd.DataFrame(data)

# a. Precise Comparison of Two or More Categorical Data (e.g., Gender vs Department)

plt.figure(figsize=(10, 6))

sns.countplot(data=df, x='Department', hue='Gender', palette='Set1')

plt.title('Comparison of Gender across Departments')

plt.xlabel('Department')

plt.ylabel('Count')

plt.show()

# b. Numerical Data across more than one categorical data (e.g., Salary vs Department with
Job Level as Hue)

plt.figure(figsize=(10, 6))

sns.boxplot(data=df, x='Department', y='Salary', hue='Job Level', palette='Set2')

plt.title('Salary Distribution across Departments and Job Levels')

plt.xlabel('Department')

plt.ylabel('Salary (USD)')
plt.show()

# Bonus Visualization: Performance Score by Age Group

df['Age Group'] = pd.cut(df['Age'], bins=[20, 30, 40, 50, 60], labels=['20-30', '30-40', '40-50',
'50-60'])

plt.figure(figsize=(10, 6))

sns.boxplot(data=df, x='Age Group', y='Performance Score', palette='coolwarm')

plt.title('Performance Score by Age Group')

plt.xlabel('Age Group')

plt.ylabel('Performance Score')

plt.show()

Output:
Experiment 5:

Create a dataset of 20 rows and 10 columns of data associated with any of your

interested domains. The dataset shall include data of types: Qualitative and

Quantitative (ordinal, nominal, interval and ratio – continuous / discrete).

For the created dataset, perform the following visualization of:

a. Two or more Continuous Data over a period of time


b. Relative Proportion of one or more categorical data

Code:
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

# Set random seed for reproducibility

np.random.seed(42)

# Generate the dataset

data = {

'Transaction ID': [f'TR{1000+i}' for i in range(1, 21)],

'Product Category': np.random.choice(['Electronics', 'Clothing', 'Books', 'Home


Appliances'], 20),

'Customer Region': np.random.choice(['North', 'South', 'East', 'West'], 20),

'Customer Age': np.random.randint(18, 60, size=20),

'Transaction Amount': np.random.randint(50, 500, size=20),

'Discount Applied': np.random.uniform(5, 30, size=20),


'Payment Method': np.random.choice(['Credit Card', 'PayPal', 'Bank Transfer'], 20),

'Transaction Date': pd.date_range(start='2023-01-01', periods=20, freq='D'),

'Quantity Purchased': np.random.randint(1, 5, size=20),

'Customer Satisfaction': np.random.randint(1, 6, size=20)

# Create DataFrame

df = pd.DataFrame(data)

# a. Two or more Continuous Data over a period of time

# We'll plot Transaction Amount and Discount Applied over the Transaction Date.

plt.figure(figsize=(10, 6))

plt.plot(df['Transaction Date'], df['Transaction Amount'], label='Transaction Amount (USD)',


marker='o', color='blue')

plt.plot(df['Transaction Date'], df['Discount Applied'], label='Discount Applied (%)',


marker='o', color='green')

plt.title('Transaction Amount and Discount Applied Over Time')

plt.xlabel('Date')

plt.ylabel('Value')

plt.legend()

plt.xticks(rotation=45)

plt.tight_layout()

plt.show()

# b. Relative Proportion of one or more categorical data

# We'll plot the proportion of each product category sold.


plt.figure(figsize=(10, 6))

sns.countplot(data=df, x='Product Category', palette='Set2')

plt.title('Relative Proportion of Product Categories Sold')

plt.xlabel('Product Category')

plt.ylabel('Count')

plt.show()

# Bonus Visualization: Average Satisfaction by Payment Method

# We will visualize the average customer satisfaction score by Payment Method.

plt.figure(figsize=(10, 6))

sns.boxplot(data=df, x='Payment Method', y='Customer Satisfaction', palette='coolwarm')

plt.title('Customer Satisfaction by Payment Method')

plt.xlabel('Payment Method')

plt.ylabel('Customer Satisfaction')

plt.show()

Output:

You might also like