Code

The document outlines a data analysis workflow using Python, focusing on a dataset that includes economic indicators. It includes steps for data standardization, K-Means clustering, feature importance analysis using Random Forest, and various visualizations such as correlation matrices and scatterplots. The analysis aims to explore relationships between personal consumption expenditure, unemployment, and other economic metrics.

Uploaded by

Ramesh Vankara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views2 pages

Code

Uploaded by

Ramesh Vankara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

# Import necessary libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.preprocessing import StandardScaler
import warnings

warnings.filterwarnings("ignore", category=FutureWarning)

# Sample Data (Replace with your dataset)

data = {
'date': pd.date_range(start='1967-07-31', periods=10, freq='M'),
'pce': np.random.rand(10) * 1000, # Personal Consumption Expenditure
'pop': np.random.randint(100000, 500000, 10), # Population
'psavert': np.random.rand(10) * 10, # Personal Saving Rate
'uempmed': np.random.rand(10) * 5, # Median Duration of Unemployment
'unemploy': np.random.randint(2000, 5000, 10), # Unemployed people
'contributors': np.random.randint(50, 500, 10), # Contributor Activity
'article_density': np.random.rand(10) * 100, # Number of articles per capita
'gdp': np.random.randint(50000, 200000, 10) # GDP
}
df = pd.DataFrame(data)

# Standardizing numerical columns

scaler = StandardScaler()
df[['pce', 'pop', 'psavert', 'uempmed', 'unemploy', 'contributors',
'article_density', 'gdp']] = \
scaler.fit_transform(df[['pce', 'pop', 'psavert', 'uempmed', 'unemploy',
'contributors', 'article_density', 'gdp']])

# DISPLAY TABLE DATA FIRST

# Display first few rows of the dataset
print("🔹 First 5 Rows of the Dataset:")
print(df.head())

# Show summary statistics of numerical columns

print("\n🔹 Summary Statistics:")
print(df.describe())

# Display correlation matrix as a table (useful before heatmap)

print("\n🔹 Correlation Matrix Table:")
print(df.drop(columns=['date']).corr())

# 1. K-Means Clustering Plot

kmeans = KMeans(n_clusters=3, random_state=42)
df['cluster'] = kmeans.fit_predict(df[['pce', 'pop', 'psavert', 'uempmed',
'unemploy']])
plt.figure(figsize=(8, 6))
sns.scatterplot(x='pce', y='unemploy', hue='cluster', data=df, palette='Set2',
s=100)
plt.title('K-Means Clustering: PCE vs Unemployment')
plt.show()

# 2. Feature Importance (Random Forest)

X = df[['pce', 'pop', 'psavert', 'uempmed']]
y = df['unemploy']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
random_state=42)
rf_model = RandomForestRegressor(random_state=42)
rf_model.fit(X_train, y_train)
importance = rf_model.feature_importances_
plt.figure(figsize=(8, 6))
sns.barplot(x=importance, y=X.columns, color='skyblue')
plt.title('Feature Importance (Random Forest)')
plt.show()

# 3. Correlation Matrix Heatmap

plt.figure(figsize=(8, 6))
sns.heatmap(df.drop(columns=['date', 'cluster']).corr(), annot=True,
cmap='coolwarm', fmt=".2f", linewidths=0.5)
plt.title('Correlation Matrix Heatmap')
plt.show()

# 4. Box-and-Whisker Plot for Engagement Metrics

plt.figure(figsize=(8, 6))
sns.boxplot(data=df[['pce', 'pop', 'psavert', 'uempmed', 'unemploy']],
palette="Set3")
plt.title('Box-and-Whisker Plot for Engagement Metrics')
plt.xticks(rotation=45)
plt.show()

# 5. Histogram of Contributor Activity

plt.figure(figsize=(8, 6))
sns.histplot(df['contributors'], bins=10, kde=True, color='purple')
plt.title('Histogram of Contributor Activity')
plt.xlabel('Contributor Activity')
plt.ylabel('Frequency')
plt.show()

# 6. Scatterplot: Article Density vs. GDP

plt.figure(figsize=(8, 6))
sns.scatterplot(x='article_density', y='gdp', data=df, color='red')
plt.title('Scatterplot: Article Density vs. GDP')
plt.xlabel('Article Density')
plt.ylabel('GDP')
plt.show()

Supervised Learning
100% (1)
Supervised Learning
15 pages
Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
RM 240 MCQs BBAIV Sakshi 60 MCQs Each Units
No ratings yet
RM 240 MCQs BBAIV Sakshi 60 MCQs Each Units
54 pages
Notes On Cronbach's Alpha
100% (1)
Notes On Cronbach's Alpha
10 pages
STF1103 - Kruskal-Wallis Friedman Test Assignment v2
No ratings yet
STF1103 - Kruskal-Wallis Friedman Test Assignment v2
3 pages
Ken Black QA 5th Chapter 3 Solution
No ratings yet
Ken Black QA 5th Chapter 3 Solution
47 pages
Analysis of Variance Anova
No ratings yet
Analysis of Variance Anova
7 pages
Basic Stat - ACADEMIC
100% (1)
Basic Stat - ACADEMIC
3 pages
Distribution in Statistics
No ratings yet
Distribution in Statistics
49 pages
State Wise Health Income Clustering 18th December 2021 PDF
100% (2)
State Wise Health Income Clustering 18th December 2021 PDF
29 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Solomon B QP - S1 Edexcel
No ratings yet
Solomon B QP - S1 Edexcel
4 pages
MM Lab Chi Square
No ratings yet
MM Lab Chi Square
8 pages
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
No ratings yet
Know Your Dataset: Season Holiday Weekday Workingday CNT 726 727 728 729 730
1 page
Data Mining - Project
100% (2)
Data Mining - Project
11 pages
Mgeb12 23S T21
No ratings yet
Mgeb12 23S T21
22 pages
Chapter 7: Introduction: 1 Convergence in Distribution
No ratings yet
Chapter 7: Introduction: 1 Convergence in Distribution
6 pages
Geostatistics Assignment 5 1 A) Statistics and Variogram Modelling For Domain 1 North
No ratings yet
Geostatistics Assignment 5 1 A) Statistics and Variogram Modelling For Domain 1 North
12 pages
Practice Final Exam S13
No ratings yet
Practice Final Exam S13
15 pages
Import Numpy As NP Import Pandas As PD
No ratings yet
Import Numpy As NP Import Pandas As PD
7 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Tolerancias Mettler
No ratings yet
Tolerancias Mettler
247 pages
Final ML File
No ratings yet
Final ML File
34 pages
Research Methods Mind Map
No ratings yet
Research Methods Mind Map
1 page
Statistical Errors: P Values, The Gold Standard' of Statistical Validity, Are
No ratings yet
Statistical Errors: P Values, The Gold Standard' of Statistical Validity, Are
3 pages
ML Lab Codes
No ratings yet
ML Lab Codes
14 pages
Datascience 2 PDF
No ratings yet
Datascience 2 PDF
24 pages
Petrol Assignment
No ratings yet
Petrol Assignment
5 pages
AMR Concept Notes (Sessions 1-2)
No ratings yet
AMR Concept Notes (Sessions 1-2)
8 pages
DWDM Lab Report
No ratings yet
DWDM Lab Report
26 pages
Yates y Cochran 1938
No ratings yet
Yates y Cochran 1938
25 pages
Machine Learning (ML)
No ratings yet
Machine Learning (ML)
35 pages
PROJECTS
No ratings yet
PROJECTS
6 pages
Distributed Lag Models
No ratings yet
Distributed Lag Models
9 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Sta 404 - Test - Aug 2023
No ratings yet
Sta 404 - Test - Aug 2023
3 pages
Certificate
No ratings yet
Certificate
33 pages
Loan Prediction
No ratings yet
Loan Prediction
26 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Project Intern - Jupyter Notebook
No ratings yet
Project Intern - Jupyter Notebook
16 pages
Exp2 - Data Visualization and Cleaning and Feature Selection
No ratings yet
Exp2 - Data Visualization and Cleaning and Feature Selection
13 pages
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
No ratings yet
Mall Customer Segmentation Using KMeans Clustering Algorithm and Classification Algorithm
40 pages
NF Assighment4
No ratings yet
NF Assighment4
5 pages
Regression and Eda
No ratings yet
Regression and Eda
47 pages
Confusion Matrix in Machine Learning FGVBN
No ratings yet
Confusion Matrix in Machine Learning FGVBN
4 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
What Drives The Development of Life Insurance Sect
No ratings yet
What Drives The Development of Life Insurance Sect
15 pages
Z and T Test
No ratings yet
Z and T Test
63 pages
Econometrics Assignment Week 1-806979
No ratings yet
Econometrics Assignment Week 1-806979
6 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Heart Disease Diagnosis Using Machine Learning
No ratings yet
Heart Disease Diagnosis Using Machine Learning
26 pages
Data Visualization EDA-print
No ratings yet
Data Visualization EDA-print
18 pages
Unit II - RPLA QB - (2024) Students
No ratings yet
Unit II - RPLA QB - (2024) Students
10 pages
Cheat Sheet Modeldeploy
No ratings yet
Cheat Sheet Modeldeploy
2 pages
Assignmnet 5
No ratings yet
Assignmnet 5
11 pages
Untitled Document
No ratings yet
Untitled Document
6 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
Practical 5
No ratings yet
Practical 5
6 pages
Elbow Method
No ratings yet
Elbow Method
2 pages
Comparative Performance of Three Perennial
No ratings yet
Comparative Performance of Three Perennial
14 pages
Slip
No ratings yet
Slip
5 pages
Healthcare-Project-Simplilearn - Week3
No ratings yet
Healthcare-Project-Simplilearn - Week3
7 pages
C Programs
No ratings yet
C Programs
6 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
AML Code For m2
No ratings yet
AML Code For m2
7 pages
ML
No ratings yet
ML
23 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
A Training and Testing Model Is Developed Using The Provided Dataset in Jupyter Notebook 2
No ratings yet
A Training and Testing Model Is Developed Using The Provided Dataset in Jupyter Notebook 2
4 pages
Project Data Mining (AMAN YADAV)
No ratings yet
Project Data Mining (AMAN YADAV)
12 pages
Kmeans
No ratings yet
Kmeans
4 pages
8614, Unit 5 6
No ratings yet
8614, Unit 5 6
32 pages
Income (K-Means Clustering On A Sample Data Set)
No ratings yet
Income (K-Means Clustering On A Sample Data Set)
3 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
S6 - Data Mining Lab Experiments (Except 1)
No ratings yet
S6 - Data Mining Lab Experiments (Except 1)
6 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Healthcare Insurance Prediction Main
No ratings yet
Healthcare Insurance Prediction Main
74 pages
DepartmentofStatistics
No ratings yet
DepartmentofStatistics
3 pages
Kritika Sejwal 24MCI10023 ML Lab Project Report
No ratings yet
Kritika Sejwal 24MCI10023 ML Lab Project Report
10 pages
Week 1 Get Familier With Jupyter Notebook
No ratings yet
Week 1 Get Familier With Jupyter Notebook
4 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Dav Lab Manual
No ratings yet
Dav Lab Manual
28 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
Chapter6 Handout 2024 25
No ratings yet
Chapter6 Handout 2024 25
42 pages
1
No ratings yet
1
13 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
24 pages
Da Lab Mannual
No ratings yet
Da Lab Mannual
25 pages