0% found this document useful (0 votes)

33 views10 pages

EDA Report

The document provides an analysis of a COVID-19 dataset from the US, detailing demographic information such as gender and age across different counties. It includes code snippets for data manipulation and visualization using Python libraries like Pandas and Matplotlib. The dataset consists of 3220 rows and 11 columns, and aims to identify patterns to predict patient survival rates during the pandemic.

Uploaded by

Santhiya S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views10 pages

EDA Report

Uploaded by

Santhiya S

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

COVID-19 US County JHU Data & Demographics

Introduction :
The United States of America has recently, had the most reported COVID-19 cases and this
dataset that I have taken gives a piece of detailed information about the country, state, male,
female, age group, and demographics information such as latitude and longitude. To perform
this research, I used this dataset.

DATASET LINK:
https://drive.google.com/drive/folders/1RfLhJVOK45x9oGBmOKyZEpBAaHuITYaw
US_COUNTY.CSV

The main objective of this analysis is to find out the patterns within the dataset to get a
further understanding of the data. I also wanted to leverage it to choose a machine algorithm
for predicting the survival rate of patients during the period of COVID-19.

The dataset consists of demographic information population information (Such as male and
female rates) and age information.

Data attributes: Fips, County, State, State code, male, female, median age, population,
female_percentage, lat, long.
So totally my dataset has 3220 rows * 11 columns with no null values. The columns have a
title/heading, which makes them readable.

Observations of the dataset:

 It has all the states in the United States of America.
 The data includes patients whose ages range from 30 to 60.
 The data also contains fips code, latitude, and longitude details for easy understanding
of the location details.

Dataset and Code Description:

This data contains the total population, male and female.
Explanation 1: This code helps us to know the total count of males from different states.

print(data_frame["male"].value_counts)
Explanation 2: This code helps us to know the total count of females from different states.

Code:
print(data_frame["female"].value_counts)

Explanation 3: This code helps us to know the total count of population from different state

print(data_frame['population'].value_counts)

Important note:
Before performing this code, we need to down the dataset and upload it in the Google Colab
environment.
Code: This code helps me to read a CSV or Excel file in order to due EDA

import pandas as pd
import matplotlib.pyplot as plt

def read_csv_or_excel(file_path):
"""
Reads a CSV or Excel file based on the file extension.

Args:
file_path (str): The path to the CSV or Excel file.

Returns:
pd.DataFrame: A Pandas DataFrame containing the data from the
file.
>>> read_csv_or_excel(file_path)
>>> us_county
if incase its a wrong file
>>> read_csv_or_excel(file_path)
>>> This file format is incorrect. Please provide a CSV or
Excel file.
"""
if file_path.endswith('.csv'):
# This is the part where it tries to read a CSV file
df = pd.read_csv(file_path)
elif file_path.endswith('.xlsx'):
# This is the part where it tries to read a Excel file
df = pd.read_excel(file_path)
else:
#This is the exception handling that I have kept
raise ValueError("This file format is incorrect. Please provide
a CSV or Excel file.")

return df

file_path = '/content/us_county.csv'
data_frame = read_csv_or_excel(file_path)
print(data_frame)

Output:

Boxplot Graph:
This graph shows a clear understanding of the male and female ratio
import matplotlib.pyplot as plt

#Here i want to create a boxplot for a specific column

data_to_plot = data_frame['population']

# I am trying to create a boxplot

plt.boxplot(data_to_plot)

# here i am adding labels and title

plt.xlabel('X-axis male')
plt.ylabel('Y-axis female')
plt.title('Boxplot for ' + 'population')

# output
plt.show()
Scatterplot:
This graph shows a clear understanding of the male and female ratio.
import matplotlib.pyplot as plt
file_path = '/content/us_county.csv' # Replace with the path to your
CSV or Excel file
data_frame = read_csv_or_excel(file_path)

#two columns 'X' and 'Y' in your DataFrame

x = data_frame['male']
y = data_frame['female']

# here i am trying to create a scatter plot

plt.scatter(x, y)

# i am adding labels and title

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Scatter Plot')

#output
plt.show()
Histogram:
This graph shows a clear understanding of the male and female ratio
import matplotlib.pyplot as plt
data_to_plot = data_frame['population']

# here i am trying to create a histogram for population data

plt.hist(data_to_plot, bins=100) # You can adjust the number of bins
as needed

# i am adding labels and title

plt.xlabel('male')
plt.ylabel('female')
plt.title('Histogram of Population Data')

# output
plt.show()

Important Links:
Dataset Link:
https://drive.google.com/drive/folders/1RfLhJVOK45x9oGBmOKyZEpBAaHuITYaw
https://docs.google.com/spreadsheets/d/1OVgcN0T2npE5nRc9RTND8tUP9znStHVZJwMrO
thtqDo/edit#gid=1650272371
GitHub Link:
https://github.com/santhiya-hds5210/ORES-5160-EDA
Drive Link:
https://drive.google.com/drive/folders/1W8AiXxbgTYK-HOXSPKjee9qGdj_Ari1O
Appendix:
 https://www.google.com/search?q=what+is+eda+in+data+science&oq=what+is+EDA+inn&gs
_lcrp=EgZjaHJvbWUqCQgBEAAYDRiABDIGCAAQRRg5MgkIARAAGA0YgAQyCQgCEAAYDRiABDI
JCAMQABgNGIAEMgkIBBAAGA0YgAQyCQgFEAAYDRiABDIJCAYQABgNGIAEMgkIBxAAGA0YgA
QyCQgIEAAYDRiABDIJCAkQABgNGIAE0gEJMTE4MjhqMGo3qAIAsAIA&sourceid=chrome&ie=
UTF-8
 https://www.kaggle.com/datasets/headsortails/covid19-us-county-jhu-data-
demographics?select=us_county.csv
 https://stackoverflow.com/questions/18039057/pandas-parser-cparsererror-error-
tokenizing-data
 https://chat.openai.com/c/8da6a9dc-bee7-4983-9bf9-7530b2178d31
 https://www.kaggle.com/code/masoudfaramarzi/basics-of-accesing-data-from-urls-using-
pandas
 https://www.forefront.ai/app/chat/new
 https://www.numbeo.com/quality-of-life/rankings_by_country.jsp
 https://www.analyticsvidhya.com/blog/2022/03/exploratory-data-analysis-with-an-example/
 https://docs.google.com/spreadsheets/d/1OVgcN0T2npE5nRc9RTND8tUP9znStHVZJwMrOth
tqDo/edit#gid=1650272371
 https://canvas.slu.edu/courses/45377/assignments/343230
 https://colab.research.google.com/drive/1Yr_FH_rjTCW7741e1rArixu4ZWL02FGC#scrollTo=Z
fIbVsMyiqOI
 https://github.com/santhiya-hds5210/ORES-5160-EDA
 https://www.google.com/search?q=scatter+plot&oq=scatter&gs_lcrp=EgZjaHJvbWUqDQgBE
AAYgwEYsQMYgAQyDwgAEEUYORiDARixAxiABDINCAEQABiDARixAxiABDIKCAIQABixAxiABDIN
CAMQABiDARixAxiABDINCAQQABiDARixAxiABDIKCAUQABixAxiABDINCAYQABiDARixAxiABDI
HCAcQABiABDIKCAgQABixAxiABDINCAkQABiDARixAxiABNIBCDMzOTdqMGo3qAIAsAIA&sour
ceid=chrome&ie=UTF-8
 https://www.google.com/search?q=boxplot&oq=boxpl&gs_lcrp=EgZjaHJvbWUqDAgBEAAYQx
ixAxiKBTIGCAAQRRg5MgwIARAAGEMYsQMYigUyDwgCEAAYQxiDARixAxiKBTIKCAMQABixAxiA
BDIJCAQQABhDGIoFMgcIBRAAGIAEMgkIBhAAGEMYigUyCQgHEAAYQxiKBTIJCAgQABhDGIoF
MgcICRAAGIAE0gEIMzEwNmowajeoAgCwAgA&sourceid=chrome&ie=UTF-8

Validation Bootcamp
No ratings yet
Validation Bootcamp
281 pages
XII CS Unit1 CSV Notes
No ratings yet
XII CS Unit1 CSV Notes
6 pages
Data Analysis
No ratings yet
Data Analysis
20 pages
Pipes SS316
No ratings yet
Pipes SS316
7 pages
cs3362 Foundations of Data Science Lab Manual
75% (8)
cs3362 Foundations of Data Science Lab Manual
53 pages
IP Py Project
No ratings yet
IP Py Project
45 pages
Dav Practicals
No ratings yet
Dav Practicals
33 pages
Basis of Design
No ratings yet
Basis of Design
18 pages
Analysis The Biomedical Datasets CSV File
No ratings yet
Analysis The Biomedical Datasets CSV File
12 pages
Essential Python
No ratings yet
Essential Python
16 pages
Terror Casualty Attack
No ratings yet
Terror Casualty Attack
6 pages
Mimo Radar: Presented by
No ratings yet
Mimo Radar: Presented by
16 pages
Population Analysis - Krish, Muffadal
No ratings yet
Population Analysis - Krish, Muffadal
10 pages
NUM-BSMATH-2023-15 Lab Report 8 663c5f49df9a0
No ratings yet
NUM-BSMATH-2023-15 Lab Report 8 663c5f49df9a0
4 pages
Python For Exploratory Data Analysis
No ratings yet
Python For Exploratory Data Analysis
12 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
Change Control Sheet: Change Request Impact Analysis
No ratings yet
Change Control Sheet: Change Request Impact Analysis
3 pages
Vijaya Lakshman Task-2
No ratings yet
Vijaya Lakshman Task-2
15 pages
Horizon AFC 564A 566A
No ratings yet
Horizon AFC 564A 566A
6 pages
Rajendra Task-2
No ratings yet
Rajendra Task-2
15 pages
Statistical Thinking in Python I: Introduction To Exploratory Data Analysis
No ratings yet
Statistical Thinking in Python I: Introduction To Exploratory Data Analysis
41 pages
Marika SENG Research Presentation v1
No ratings yet
Marika SENG Research Presentation v1
27 pages
Statistical Thinking in Python I: Introduction To Exploratory Data Analysis
No ratings yet
Statistical Thinking in Python I: Introduction To Exploratory Data Analysis
41 pages
NS114 Appendix E - Installation Manual
No ratings yet
NS114 Appendix E - Installation Manual
32 pages
Exercise 1
No ratings yet
Exercise 1
2 pages
2014 Toyota Sai Press Release
No ratings yet
2014 Toyota Sai Press Release
4 pages
DSA Lab Manual Pgms - fINAL
No ratings yet
DSA Lab Manual Pgms - fINAL
34 pages
Li - Limit Analysis of Materials With Non-Associated Flow
No ratings yet
Li - Limit Analysis of Materials With Non-Associated Flow
19 pages
Borehole Imaging Fracture Analysis Client PDF
100% (1)
Borehole Imaging Fracture Analysis Client PDF
57 pages
Rajendra Reddy Task 3
No ratings yet
Rajendra Reddy Task 3
8 pages
Data Science Experiments
No ratings yet
Data Science Experiments
31 pages
Pig Launcher and Receiver
100% (7)
Pig Launcher and Receiver
16 pages
Comp Lab 2 GunExample 2425
No ratings yet
Comp Lab 2 GunExample 2425
15 pages
DS - Lab Manual
No ratings yet
DS - Lab Manual
31 pages
Selective Mechanization in Rice Cultivation For Energy Saving and Enhancing The Profitability
No ratings yet
Selective Mechanization in Rice Cultivation For Energy Saving and Enhancing The Profitability
14 pages
Share INFORMATICS PRACTICES KABIR
No ratings yet
Share INFORMATICS PRACTICES KABIR
37 pages
Hal Helicopter Division Training Report
No ratings yet
Hal Helicopter Division Training Report
37 pages
Project Details
No ratings yet
Project Details
9 pages
Data Cheat Sheet
No ratings yet
Data Cheat Sheet
2 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
DSBDA Lab Plan
No ratings yet
DSBDA Lab Plan
5 pages
1ub18450ea PDF
No ratings yet
1ub18450ea PDF
69 pages
Korg Usb Manual Mkii
No ratings yet
Korg Usb Manual Mkii
9 pages
Reading KSTN Charts
100% (3)
Reading KSTN Charts
6 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Introduction To Signals & Systems - Practice Sheet 02
No ratings yet
Introduction To Signals & Systems - Practice Sheet 02
5 pages
DAVP Lab Manual
No ratings yet
DAVP Lab Manual
12 pages
Practice Model Coding Questions
No ratings yet
Practice Model Coding Questions
2 pages
221 Practice Midterm 2
No ratings yet
221 Practice Midterm 2
5 pages
It Acquisition Management
No ratings yet
It Acquisition Management
6 pages
Commercializing Boron Nitride NanoTubes (BNNTS) For The Advanced Engineering Materials Industry An Interview With Jerome Pollak
No ratings yet
Commercializing Boron Nitride NanoTubes (BNNTS) For The Advanced Engineering Materials Industry An Interview With Jerome Pollak
6 pages
Low-Noise, High-Gain Transimpedance Amplifier Integrated With Siapd For Low-Intensity Near-Infrared Light Detection
No ratings yet
Low-Noise, High-Gain Transimpedance Amplifier Integrated With Siapd For Low-Intensity Near-Infrared Light Detection
13 pages
Week2 Lab
No ratings yet
Week2 Lab
8 pages
Entrepreneurship Individual Assignement
No ratings yet
Entrepreneurship Individual Assignement
8 pages
PDS Exp 13 To 16
No ratings yet
PDS Exp 13 To 16
14 pages
Official Invitation To The Ninth Edition of Mashujaa Open-1
No ratings yet
Official Invitation To The Ninth Edition of Mashujaa Open-1
4 pages
ECS ARINC600连接器
No ratings yet
ECS ARINC600连接器
4 pages
WS#3
No ratings yet
WS#3
4 pages
Data Frame
No ratings yet
Data Frame
95 pages
TAMIL
No ratings yet
TAMIL
9 pages
ML Final Prac
No ratings yet
ML Final Prac
47 pages
5E Lesson Plan Template: Handout
No ratings yet
5E Lesson Plan Template: Handout
5 pages
Details
No ratings yet
Details
5 pages
Pandas PDF
No ratings yet
Pandas PDF
25 pages
Using Python For Data Analysis - July 2018 - Slides
No ratings yet
Using Python For Data Analysis - July 2018 - Slides
43 pages
PP Manual Exp No. 07
No ratings yet
PP Manual Exp No. 07
9 pages
Python For Machine Learning
No ratings yet
Python For Machine Learning
66 pages
Notebook PYTHON DATA SCIENCE
No ratings yet
Notebook PYTHON DATA SCIENCE
16 pages
ICT2103 Full Book-Part-3
No ratings yet
ICT2103 Full Book-Part-3
14 pages
Stip Ch1 Slides
No ratings yet
Stip Ch1 Slides
41 pages
1 s2.0 S1369800124002439 Main
No ratings yet
1 s2.0 S1369800124002439 Main
13 pages
TL 8 Torque Wrench Kit
No ratings yet
TL 8 Torque Wrench Kit
2 pages
Machine Learning Project Report
No ratings yet
Machine Learning Project Report
65 pages
Final Dev Record
No ratings yet
Final Dev Record
49 pages
CS 3362 FDS
No ratings yet
CS 3362 FDS
53 pages
Pmsonline - Bih.nic - in Pmsedubcebc2223 (S (Wbtw233ev2qt1u2t32j3zyda) ) PMS App StudentDetails - Aspx
No ratings yet
Pmsonline - Bih.nic - in Pmsedubcebc2223 (S (Wbtw233ev2qt1u2t32j3zyda) ) PMS App StudentDetails - Aspx
1 page
cs3362 Foundations of Data Science Lab Manual
No ratings yet
cs3362 Foundations of Data Science Lab Manual
53 pages
DVPD LABfile
No ratings yet
DVPD LABfile
41 pages
BDA File
No ratings yet
BDA File
26 pages
Course - Introduction To Data Science (SD211105)
No ratings yet
Course - Introduction To Data Science (SD211105)
10 pages
Python Project On Lokshabha Election
100% (1)
Python Project On Lokshabha Election
22 pages
Intro To Py and ML - Part 2
No ratings yet
Intro To Py and ML - Part 2
10 pages
2
No ratings yet
2
18 pages
PMT2 20
No ratings yet
PMT2 20
32 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
DAV Guidelines
No ratings yet
DAV Guidelines
4 pages
Ip Project File: Class-Xii ' Roll No.
No ratings yet
Ip Project File: Class-Xii ' Roll No.
23 pages

EDA Report

Uploaded by

EDA Report

Uploaded by

COVID-19 US County JHU Data & Demographics

Observations of the dataset:

Dataset and Code Description:

#Here i want to create a boxplot for a specific column

# I am trying to create a boxplot

# here i am adding labels and title

#two columns 'X' and 'Y' in your DataFrame

# here i am trying to create a scatter plot

# i am adding labels and title

# here i am trying to create a histogram for population data

# i am adding labels and title

You might also like