0% found this document useful (0 votes)
33 views10 pages

EDA Report

The document provides an analysis of a COVID-19 dataset from the US, detailing demographic information such as gender and age across different counties. It includes code snippets for data manipulation and visualization using Python libraries like Pandas and Matplotlib. The dataset consists of 3220 rows and 11 columns, and aims to identify patterns to predict patient survival rates during the pandemic.

Uploaded by

Santhiya S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views10 pages

EDA Report

The document provides an analysis of a COVID-19 dataset from the US, detailing demographic information such as gender and age across different counties. It includes code snippets for data manipulation and visualization using Python libraries like Pandas and Matplotlib. The dataset consists of 3220 rows and 11 columns, and aims to identify patterns to predict patient survival rates during the pandemic.

Uploaded by

Santhiya S
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

COVID-19 US County JHU Data & Demographics

Introduction :
The United States of America has recently, had the most reported COVID-19 cases and this
dataset that I have taken gives a piece of detailed information about the country, state, male,
female, age group, and demographics information such as latitude and longitude. To perform
this research, I used this dataset.

DATASET LINK:
https://drive.google.com/drive/folders/1RfLhJVOK45x9oGBmOKyZEpBAaHuITYaw
US_COUNTY.CSV

The main objective of this analysis is to find out the patterns within the dataset to get a
further understanding of the data. I also wanted to leverage it to choose a machine algorithm
for predicting the survival rate of patients during the period of COVID-19.

The dataset consists of demographic information population information (Such as male and
female rates) and age information.

Data attributes: Fips, County, State, State code, male, female, median age, population,
female_percentage, lat, long.
So totally my dataset has 3220 rows * 11 columns with no null values. The columns have a
title/heading, which makes them readable.

Observations of the dataset:


 It has all the states in the United States of America.
 The data includes patients whose ages range from 30 to 60.
 The data also contains fips code, latitude, and longitude details for easy understanding
of the location details.

Dataset and Code Description:


This data contains the total population, male and female.
Explanation 1: This code helps us to know the total count of males from different states.

print(data_frame["male"].value_counts)
Explanation 2: This code helps us to know the total count of females from different states.

Code:
print(data_frame["female"].value_counts)

Explanation 3: This code helps us to know the total count of population from different state

print(data_frame['population'].value_counts)

Important note:
Before performing this code, we need to down the dataset and upload it in the Google Colab
environment.
Code: This code helps me to read a CSV or Excel file in order to due EDA

import pandas as pd
import matplotlib.pyplot as plt

def read_csv_or_excel(file_path):
"""
Reads a CSV or Excel file based on the file extension.

Args:
file_path (str): The path to the CSV or Excel file.

Returns:
pd.DataFrame: A Pandas DataFrame containing the data from the
file.
>>> read_csv_or_excel(file_path)
>>> us_county
if incase its a wrong file
>>> read_csv_or_excel(file_path)
>>> This file format is incorrect. Please provide a CSV or
Excel file.
"""
if file_path.endswith('.csv'):
# This is the part where it tries to read a CSV file
df = pd.read_csv(file_path)
elif file_path.endswith('.xlsx'):
# This is the part where it tries to read a Excel file
df = pd.read_excel(file_path)
else:
#This is the exception handling that I have kept
raise ValueError("This file format is incorrect. Please provide
a CSV or Excel file.")

return df

file_path = '/content/us_county.csv'
data_frame = read_csv_or_excel(file_path)
print(data_frame)

Output:

Boxplot Graph:
This graph shows a clear understanding of the male and female ratio
import matplotlib.pyplot as plt

#Here i want to create a boxplot for a specific column


data_to_plot = data_frame['population']

# I am trying to create a boxplot


plt.boxplot(data_to_plot)

# here i am adding labels and title


plt.xlabel('X-axis male')
plt.ylabel('Y-axis female')
plt.title('Boxplot for ' + 'population')

# output
plt.show()
Scatterplot:
This graph shows a clear understanding of the male and female ratio.
import matplotlib.pyplot as plt
file_path = '/content/us_county.csv' # Replace with the path to your
CSV or Excel file
data_frame = read_csv_or_excel(file_path)

#two columns 'X' and 'Y' in your DataFrame


x = data_frame['male']
y = data_frame['female']

# here i am trying to create a scatter plot


plt.scatter(x, y)

# i am adding labels and title


plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Scatter Plot')

#output
plt.show()
Histogram:
This graph shows a clear understanding of the male and female ratio
import matplotlib.pyplot as plt
data_to_plot = data_frame['population']

# here i am trying to create a histogram for population data


plt.hist(data_to_plot, bins=100) # You can adjust the number of bins
as needed

# i am adding labels and title


plt.xlabel('male')
plt.ylabel('female')
plt.title('Histogram of Population Data')

# output
plt.show()

Important Links:
Dataset Link:
https://drive.google.com/drive/folders/1RfLhJVOK45x9oGBmOKyZEpBAaHuITYaw
https://docs.google.com/spreadsheets/d/1OVgcN0T2npE5nRc9RTND8tUP9znStHVZJwMrO
thtqDo/edit#gid=1650272371
GitHub Link:
https://github.com/santhiya-hds5210/ORES-5160-EDA
Drive Link:
https://drive.google.com/drive/folders/1W8AiXxbgTYK-HOXSPKjee9qGdj_Ari1O
Appendix:
 https://www.google.com/search?q=what+is+eda+in+data+science&oq=what+is+EDA+inn&gs
_lcrp=EgZjaHJvbWUqCQgBEAAYDRiABDIGCAAQRRg5MgkIARAAGA0YgAQyCQgCEAAYDRiABDI
JCAMQABgNGIAEMgkIBBAAGA0YgAQyCQgFEAAYDRiABDIJCAYQABgNGIAEMgkIBxAAGA0YgA
QyCQgIEAAYDRiABDIJCAkQABgNGIAE0gEJMTE4MjhqMGo3qAIAsAIA&sourceid=chrome&ie=
UTF-8
 https://www.kaggle.com/datasets/headsortails/covid19-us-county-jhu-data-
demographics?select=us_county.csv
 https://stackoverflow.com/questions/18039057/pandas-parser-cparsererror-error-
tokenizing-data
 https://chat.openai.com/c/8da6a9dc-bee7-4983-9bf9-7530b2178d31
 https://www.kaggle.com/code/masoudfaramarzi/basics-of-accesing-data-from-urls-using-
pandas
 https://www.forefront.ai/app/chat/new
 https://www.numbeo.com/quality-of-life/rankings_by_country.jsp
 https://www.analyticsvidhya.com/blog/2022/03/exploratory-data-analysis-with-an-example/
 https://docs.google.com/spreadsheets/d/1OVgcN0T2npE5nRc9RTND8tUP9znStHVZJwMrOth
tqDo/edit#gid=1650272371
 https://canvas.slu.edu/courses/45377/assignments/343230
 https://colab.research.google.com/drive/1Yr_FH_rjTCW7741e1rArixu4ZWL02FGC#scrollTo=Z
fIbVsMyiqOI
 https://github.com/santhiya-hds5210/ORES-5160-EDA
 https://www.google.com/search?q=scatter+plot&oq=scatter&gs_lcrp=EgZjaHJvbWUqDQgBE
AAYgwEYsQMYgAQyDwgAEEUYORiDARixAxiABDINCAEQABiDARixAxiABDIKCAIQABixAxiABDIN
CAMQABiDARixAxiABDINCAQQABiDARixAxiABDIKCAUQABixAxiABDINCAYQABiDARixAxiABDI
HCAcQABiABDIKCAgQABixAxiABDINCAkQABiDARixAxiABNIBCDMzOTdqMGo3qAIAsAIA&sour
ceid=chrome&ie=UTF-8
 https://www.google.com/search?q=boxplot&oq=boxpl&gs_lcrp=EgZjaHJvbWUqDAgBEAAYQx
ixAxiKBTIGCAAQRRg5MgwIARAAGEMYsQMYigUyDwgCEAAYQxiDARixAxiKBTIKCAMQABixAxiA
BDIJCAQQABhDGIoFMgcIBRAAGIAEMgkIBhAAGEMYigUyCQgHEAAYQxiKBTIJCAgQABhDGIoF
MgcICRAAGIAE0gEIMzEwNmowajeoAgCwAgA&sourceid=chrome&ie=UTF-8

You might also like