0% found this document useful (0 votes)
10 views

Module-6

The document discusses various data visualization techniques used in exploratory data analysis across different fields, including finance, bankruptcy analysis, eBay auctions, insurance risk analysis, medical imaging, and genetic network reconstruction. It highlights the importance of visualizations like histograms, box plots, scatter plots, and heatmaps in understanding data patterns, relationships, and trends. Additionally, it covers specific applications and tools for each domain, emphasizing the role of visualizations in enhancing data interpretation and decision-making.

Uploaded by

ragebhanukiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

Module-6

The document discusses various data visualization techniques used in exploratory data analysis across different fields, including finance, bankruptcy analysis, eBay auctions, insurance risk analysis, medical imaging, and genetic network reconstruction. It highlights the importance of visualizations like histograms, box plots, scatter plots, and heatmaps in understanding data patterns, relationships, and trends. Additionally, it covers specific applications and tools for each domain, emphasizing the role of visualizations in enhancing data interpretation and decision-making.

Uploaded by

ragebhanukiran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

1.

Financial dataset:

Exploratory Graphics are visualizations used in Exploratory Data Analysis (EDA) to


understand the patterns, trends, and relationships in a dataset before applying any machine
learning or statistical models.

Let’s explain this with a simple financial dataset example — imagine we have a dataset
containing monthly financial information for a group of customers:

Customer_ID Monthly_Income Credit_Score Age Loan_Amount Defaulted


001 50000 720 35 200000 No

002 30000 680 28 150000 Yes

003 70000 750 45 250000 No

... ... ... ... ... ...

Common Exploratory Graphics and What They Show:


1. Histogram

Used for: Distribution of a single variable

Example: Plot a histogram of Monthly_Income

 Helps to see how income is distributed: skewed, normal, etc.


 Are most people earning between $30K–$60K?

import matplotlib.pyplot as plt


import seaborn as sns

sns.histplot(data['Monthly_Income'], bins=10)
plt.title("Distribution of Monthly Income")
plt.xlabel("Monthly Income")
plt.ylabel("Number of Customers")
plt.show()

2. Box Plot

Used for: Detecting outliers and comparing distributions

Example: Boxplot of Loan_Amount by Defaulted

 Are people who defaulted typically having higher loan amounts?

sns.boxplot(x='Defaulted', y='Loan_Amount', data=data)


plt.title("Loan Amount by Default Status")
plt.show()
3. Scatter Plot

Used for: Relationship between two numeric variables

Example: Credit_Score vs. Loan_Amount

 See if higher credit scores relate to lower loan amounts

sns.scatterplot(x='Credit_Score', y='Loan_Amount', hue='Defaulted', data=data)


plt.title("Credit Score vs Loan Amount")
plt.show()

4. Pair Plot

Used for: Multiple scatter plots to see pairwise relationships

sns.pairplot(data[['Monthly_Income', 'Credit_Score', 'Age', 'Loan_Amount']],


diag_kind='kde')
plt.suptitle("Pairwise Relationships", y=1.02)
plt.show()

5. Correlation Heatmap

Used for: Showing correlation between variables

corr = data[['Monthly_Income', 'Credit_Score', 'Age', 'Loan_Amount']].corr()


sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Matrix")
plt.show()

Exploratory graphics help you:

 Detect patterns or trends (e.g., income vs. default rate)


 Spot outliers
 Understand relationships (e.g., age and credit score)
 Inform feature selection or preprocessing steps

2.Bankruptcy Analysis:

Bankruptcy analysis involves examining financial indicators to determine whether a company


is likely to go bankrupt. Common variables used might include:

 Debt Ratio
 Current Ratio
 Net Profit
 Total Assets
 Bankrupt Status (Yes/No)

ompany_I Debt_Rat Current_Ra Net_Prof Total_Asse Bankru


D io tio it ts pt
C001 0.85 0.7 -50000 200000 Yes
C002 0.4 1.8 60000 500000 No
C003 0.65 1 -10000 300000 Yes
C004 0.3 2.1 70000 800000 No
Graphical Representations
1. Histogram – Distribution of Debt Ratio

sns.histplot(data['Debt_Ratio'], bins=10, kde=True)

plt.title("Distribution of Debt Ratios")

plt.xlabel("Debt Ratio")

plt.ylabel("Number of Companies")

plt.show()

Box Plot – Debt Ratio by Bankruptcy Status

Helps compare the debt levels of bankrupt vs. healthy companies.

sns.boxplot(x='Bankrupt', y='Debt_Ratio', data=data)


plt.title("Debt Ratio by Bankruptcy Status")
plt.show()

3. Scatter Plot – Current Ratio vs. Net Profit

Visualizes relationship between liquidity and profitability.

sns.scatterplot(x='Current_Ratio', y='Net_Profit', hue='Bankrupt', data=data)


plt.title("Current Ratio vs Net Profit")
plt.show()

4. Correlation Heatmap

Reveals how financial variables relate to each other.

corr = data[['Debt_Ratio', 'Current_Ratio', 'Net_Profit', 'Total_Assets']].corr()


sns.heatmap(corr, annot=True, cmap='RdBu', center=0)
plt.title("Financial Features Correlation Matrix")
plt.show()
5. Bar Chart – Count of Bankrupt vs. Non-Bankrupt

sns.countplot(x='Bankrupt', data=data)
plt.title("Count of Bankrupt vs. Non-Bankrupt Companies")
plt.show()

3.E-bay:

functional Data refers to information collected over a continuum, usually time. Instead of
individual data points, we think of curves — e.g., the price of a product over time during an
auction.

eBay Auction Prices

Suppose we track the price evolution over 7 days for 3 items in eBay auctions.

Day Item A Item B Item C


1 10 15 5
2 12 16 7
3 15 20 10
4 18 22 15
5 21 25 17
6 25 28 18
7 30 32 20
1. Line Plot – Price Evolution Over Time

Show how each item's price changes throughout the auction.

import pandas as pd
import matplotlib.pyplot as plt

# Sample data
days = [1, 2, 3, 4, 5, 6, 7]
item_a = [10, 12, 15, 18, 21, 25, 30]
item_b = [15, 16, 20, 22, 25, 28, 32]
item_c = [5, 7, 10, 15, 17, 18, 20]

# Plot
plt.plot(days, item_a, label='Item A', marker='o')
plt.plot(days, item_b, label='Item B', marker='o')
plt.plot(days, item_c, label='Item C', marker='o')

plt.title("Price Evolution in eBay Auctions")


plt.xlabel("Day")
plt.ylabel("Price ($)")
plt.legend()
plt.grid(True)
plt.show()
2. Functional Boxplot (Advanced)

In larger datasets, you can use Functional Boxplots to summarize curves:

 Median price curve


 Range (min–max)
 Outliers

Tools: fda in R or scikit-fda in Python (for functional data analysis)

3. Heatmap or Intensity Plot

If there are many items, show bid activity over time:

 X-axis: Time
 Y-axis: Items
 Color intensity: Bid amount or frequency

4.Insurance Risk Analysis:

Visualization tools for insurance risk processes help actuaries and analysts understand and
communicate how risks (like claims, losses, or premiums) behave over time, across
categories, or under uncertainty.

Policy_I Ag Premiu Clai Claim_Amo Risk_Sco


D e m ms unt re
P001 45 1000 1 2500 High
P002 30 800 0 0 Low
P003 60 1200 2 4000 Medium
P004 35 950 1 1000 Low
Visualization Tools & Examples
1. Histogram – Distribution of Claim Amounts

Understand the overall distribution of claims.

import seaborn as sns


import matplotlib.pyplot as plt

sns.histplot(data['Claim_Amount'], bins=10, kde=True)


plt.title("Distribution of Claim Amounts")
plt.xlabel("Claim Amount ($)")
plt.ylabel("Number of Policies")
plt.show()
2. Box Plot – Claim Amounts by Risk Category

Visualize spread and outliers across different risk levels.

sns.boxplot(x='Risk_Score', y='Claim_Amount', data=data)


plt.title("Claim Amounts by Risk Category")
plt.show()

3. Bar Chart – Average Claim Amount by Age Group

Shows how age affects claim costs.

python
CopyEdit
data['Age_Group'] = pd.cut(data['Age'], bins=[20, 40, 60, 80], labels=['20–40', '40–60', '60–
80'])
avg_claim = data.groupby('Age_Group')['Claim_Amount'].mean().reset_index()

sns.barplot(x='Age_Group', y='Claim_Amount', data=avg_claim)


plt.title("Average Claim Amount by Age Group")
plt.show()

4. Line Plot – Claims Over Time

If you have time data (e.g., monthly claims), show trends over time.

# Sample synthetic time data


time_series = pd.DataFrame({
'Month': pd.date_range(start='2023-01', periods=6, freq='M'),
'Total_Claims': [100, 120, 150, 110, 130, 160]
})

sns.lineplot(x='Month', y='Total_Claims', data=time_series)


plt.title("Monthly Claims Trend")
plt.xticks(rotation=45)
plt.show()

5. Heatmap – Correlation Between Risk Variables

Shows relationships between numeric variables like premium, claims, age.

corr = data[['Age', 'Premium', 'Claims', 'Claim_Amount']].corr()


sns.heatmap(corr, annot=True, cmap='coolwarm')
plt.title("Correlation Matrix of Risk Variables")
plt.show()
Summary of Tools
Tool Best For

Histogram Understanding distributions

Box Plot Spotting outliers & category comparisons

Line Plot Visualizing time-based claim trends

Bar Chart Comparing group-wise statistics

Heatmap Seeing correlations between risk factors

5.Medical Images:
Visualization and analysis of medical images is a crucial area in medical diagnostics,
research, and treatment planning. It involves the use of software tools and algorithms to
interpret data from imaging modalities such as MRI, CT, X-ray, ultrasound, and PET scans.
Here’s a breakdown of key elements and concepts in this domain.

Medical Imaging Modalities

Each modality provides different types of information:

 X-ray: Best for bones and detecting fractures.


 CT (Computed Tomography): Cross-sectional imaging with great spatial resolution.
 MRI (Magnetic Resonance Imaging): Excellent for soft tissue contrast (e.g., brain,
muscles).
 Ultrasound: Real-time imaging; used in obstetrics and cardiology.
 PET (Positron Emission Tomography): Functional imaging, often combined with
CT.

2. Visualization Techniques

 2D Slicing: Viewing image slices in axial, sagittal, or coronal planes.


 3D Rendering: Volume rendering to visualize structures in 3D.
 Surface Rendering: Uses segmentation to display surfaces (e.g., of organs or
tumors).
 Multiplanar Reconstruction (MPR): Combines data to visualize in multiple planes.

3. Image Analysis

 Segmentation: Identifying structures like tumors, organs, or vessels.


o Manual, semi-automated, or fully automated.
o Tools: Thresholding, region growing, U-Net (deep learning).
 Registration: Aligning images from different times/modalities.
 Classification: Diagnosing disease types using machine learning.
 Quantification: Measuring volumes, areas, densities, etc.

4. Software & Tools


 ITK-SNAP: Manual and semi-automated segmentation.
 3D Slicer: Open-source software for visualization and analysis.
 Fiji (ImageJ): Versatile platform for bio-image analysis.
 MATLAB: Custom image processing scripts.
 Python Libraries:
o SimpleITK, Pydicom, Nibabel: for image handling.
o OpenCV, scikit-image, MedPy: for processing.
o MONAI, nnU-Net: deep learning frameworks for medical imaging.

5. Applications

 Tumor detection and monitoring.


 Surgical planning (e.g., neurosurgery).
 Disease progression analysis (e.g., Alzheimer’s).
 Radiomics: Extracting features from images to predict prognosis or therapy response.

6. Emerging Trends

 AI/Deep Learning: For automated diagnosis and predictive modeling.


 Radiogenomics: Linking imaging features with genetic data.
 Federated Learning: Collaborative model training across hospitals without sharing
patient data.
Basic MRI Visualization Program:
import numpy as np
import nibabel as nib
import matplotlib.pyplot as plt

# Load MRI image


file_path = 'your_mri_image.nii' # Replace with the path to your NIfTI MRI file
img = nib.load(file_path)

# Get the image data


img_data = img.get_fdata()

# Print basic info


print(f"Shape of the image: {img_data.shape}")
print(f"Data type: {img_data.dtype}")

# Show a middle slice in all three planes


fig, axes = plt.subplots(1, 3, figsize=(15, 5))

# Axial slice (top view)


axes[0].imshow(img_data[:, :, img_data.shape[2]//2], cmap='gray')
axes[0].set_title('Axial Slice')

# Coronal slice (front view)


axes[1].imshow(img_data[:, img_data.shape[1]//2, :], cmap='gray')
axes[1].set_title('Coronal Slice')

# Sagittal slice (side view)


axes[2].imshow(img_data[img_data.shape[0]//2, :, :], cmap='gray')
axes[2].set_title('Sagittal Slice')

plt.tight_layout()
plt.show()

6.What is Genetic Network Reconstruction?

Genetic Network Reconstruction is the process of building a network (graph) that shows
relationships or interactions between genes. Each gene is a node, and each interaction
(like co-expression or regulation) is an edge connecting two nodes.

This helps scientists:

 Understand gene functions.


 Discover disease-related gene clusters.
 Visualize how genes influence each other.

Advantages of Genetic Network Reconstruction Visualization

1. Intuitive Understanding of Complex Interactions

 Helps you see relationships between genes that might be hard to spot in raw data
(like correlation matrices or tables).
 Makes it easier to identify gene hubs, clusters, or isolated genes.

Example: Spotting that GeneD is central and connected to many others.


Detect Functional Modules

 Genes involved in the same biological process often cluster together.


 You can use community detection algorithms to find these modules.

Useful in understanding pathways like cell cycle regulation or immune response.

3. Discover Key Regulatory Genes (Hubs)

 Some genes regulate many others. These appear as central nodes (hubs).
 Hub genes are often critical and can be targets for drugs or therapies.
4. Compare Healthy vs. Diseased Networks

 You can build separate networks for healthy and diseased samples, and compare
their structures.
 Helps identify dysregulated pathways in diseases like cancer, diabetes, etc.

5. Data Reduction and Feature Selection

 Instead of analyzing all genes, you can focus on important sub-networks.


 Ideal for feature selection in machine learning on genomics data.

6. Supports Hypothesis Generation

 Suggests new hypotheses, like:


o "What happens if we knock out GeneX?"
o "Why are these genes always co-expressed?"

7. Publication-Ready Visuals

 Networks make great visual aids in scientific papers, presentations, or grant


proposals.
 Easier for stakeholders to grasp complex ideas.

8. Interoperable with Tools

 Graphs created with tools like networkx can be exported to:


o Cytoscape (for advanced biology-based visualizations),
o Gephi, or
o GraphML format for further analysis

From Gene Correlation Matrix to Network Visualization


1. Step 1: Sample Correlation Matrix
import pandas as pd
import numpy as np

# Simulate a random correlation matrix for 5 genes


np.random.seed(0)
genes = ['GeneA', 'GeneB', 'GeneC', 'GeneD', 'GeneE']
corr_matrix = pd.DataFrame(np.random.rand(5, 5), index=genes, columns=genes)

# Symmetrize it and set diagonal to 1


for i in range(5):
for j in range(i+1, 5):
corr_matrix.iloc[i, j] = corr_matrix.iloc[j, i]
np.fill_diagonal(corr_matrix.values, 1)

print("Correlation Matrix:")
print(corr_matrix)
threshold = 0.8

# Find high-correlation pairs


edges = []
for i in range(len(genes)):
for j in range(i+1, len(genes)):
if corr_matrix.iloc[i, j] > threshold:
edges.append((genes[i], genes[j]))

print("High-correlation gene pairs (edges):", edges)


Step 3: Build and Visualize the Genetic Network
import networkx as nx
import matplotlib.pyplot as plt

# Create an undirected graph


G = nx.Graph()
G.add_nodes_from(genes)
G.add_edges_from(edges)

# Draw network
plt.figure(figsize=(8, 6))
pos = nx.spring_layout(G, seed=42)
nx.draw(G, pos, with_labels=True, node_color='lightgreen', edge_color='gray',
node_size=1500, font_size=10, font_weight='bold')
plt.title("Genetic Co-expression Network (Threshold > 0.8)")
plt.show()

You might also like