0% found this document useful (0 votes)

0 views9 pages

Introduction R For DS

This document details a project analyzing global COVID-19 testing data using R, focusing on data extraction, preprocessing, and analysis. It includes tasks such as fetching data from Wikipedia, cleaning the dataset, and calculating the worldwide positive testing ratio. The final output is a CSV file containing the processed data and insights for a news feature story.

Uploaded by

tugasyunikuliah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views9 pages

Introduction R For DS

Uploaded by

tugasyunikuliah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Global COVID-19 Testing Analysis Using R

Yuni Astari

2025-04-15

Introduction

The COVID-19 pandemic has significantly impacted countries worldwide, influencing public health systems,
economies, and daily life. One of the most important strategies in managing the spread of the virus has
been large-scale testing. Accurate and accessible testing data is essential to understanding the scope of the
outbreak and implementing timely interventions.
In this project, I take on the role of a data analyst for a news channel’s data science team. The team is
preparing a feature story on global COVID-19 testing efforts, and I have been assigned to gather and analyze
real-world testing data to support the story with data-driven insights.

#install.packages("httr")
#install.packages("rvest")
library(httr)

## Warning: package ’httr’ was built under R version 4.4.3

library(rvest)

## Warning: package ’rvest’ was built under R version 4.4.3

Task 1: Get a COVID-19 pandemic Wiki page using HTTP request

get_wiki_covid19_page <- function() {

wiki_url <- "https://en.wikipedia.org/w/index.php"
response <- GET(wiki_url, query = list(title = "Template:COVID-19_testing_by_country"))
return(response)
}

get_wiki_covid19_page()

## Response [https://en.wikipedia.org/w/index.php?title=Template%3ACOVID-19_testing_by_country]
## Date: 2025-04-15 04:15
## Status: 200
## Content-Type: text/html; charset=UTF-8
## Size: 452 kB
## <!DOCTYPE html>
## <html class="client-nojs vector-feature-language-in-header-enabled vector-fea...

1
## <head>
## <meta charset="UTF-8">
## <title>Template:COVID-19 testing by country - Wikipedia</title>
## <script>(function(){var className="client-js vector-feature-language-in-heade...
## RLSTATE={"ext.globalCssJs.user.styles":"ready","site.styles":"ready","user.st...
## <script>(RLQ=window.RLQ||[]).push(function(){mw.loader.impl(function(){return...
## }];});});</script>
## <link rel="stylesheet" href="/w/load.php?lang=en&modules=ext.cite.styles%...
## ...

Task 2: Extract COVID-19 testing data table from the wiki HTML page

wiki_extr <- read_html(get_wiki_covid19_page())

wiki_extr

## {html_document}
## <html class="client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-pa
## [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8 ...
## [2] <body class="skin--responsive skin-vector skin-vector-search-vue mediawik ...

table <- html_nodes(wiki_extr, "table")

table

## {xml_nodeset (4)}
## [1] <table class="box-Update plainlinks ombox ombox-content ambox-Update" rol ...
## [2] <table class="wikitable plainrowheaders sortable collapsible autocollapse ...
## [3] <table class="plainlinks ombox mbox-small ombox-notice" role="presentatio ...
## [4] <table class="wikitable mw-templatedata-doc-params">\n<caption><p class=" ...

data_covid <- as.data.frame(html_table(table[2]))

head(data_covid)

## Country.or.region Date.a. Tested Units.b. Confirmed.cases.

## 1 Afghanistan 17 Dec 2020 154,767 samples 49,621
## 2 Albania 18 Feb 2021 428,654 samples 96,838
## 3 Algeria 2 Nov 2020 230,553 samples 58,574
## 4 Andorra 23 Feb 2022 300,307 samples 37,958
## 5 Angola 2 Feb 2021 399,228 samples 20,981
## 6 Antigua and Barbuda 6 Mar 2021 15,268 samples 832
## Confirmed..tested.. Tested..population.. Confirmed..population.. Ref.
## 1 32.1 0.40 0.13 [1]
## 2 22.6 15.0 3.4 [2]
## 3 25.4 0.53 0.13 [3][4]
## 4 12.6 387 49.0 [5]
## 5 5.3 1.3 0.067 [6]
## 6 5.4 15.9 0.86 [7]

Task 3: Pre-process and export the extracted data frame

2
summary(data_covid)

## Country.or.region Date.a. Tested Units.b.

## Length:173 Length:173 Length:173 Length:173
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
## Confirmed.cases. Confirmed..tested.. Tested..population..
## Length:173 Length:173 Length:173
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
## Confirmed..population.. Ref.
## Length:173 Length:173
## Class :character Class :character
## Mode :character Mode :character

preprocess_covid_data <- function(data_frame) {

shape <- dim(data_frame)

# Remove the "World" row

data_frame <- data_frame[!(data_frame$`Country.or.region` == "World"),]

# Remove the last row

data_frame <- data_frame[1:172, ]

# Remove unnecessary columns

data_frame["Ref."] <- NULL
data_frame["Units.b."] <- NULL

# Renaming the columns

names(data_frame) <- c("country", "date", "tested", "confirmed", "confirmed.tested.ratio",
"tested.population.ratio", "confirmed.population.ratio")

# Convert column data types

data_frame$country <- as.factor(data_frame$country)
data_frame$date <- as.factor(data_frame$date)
data_frame$tested <- as.numeric(gsub(",", "", data_frame$tested))
data_frame$confirmed <- as.numeric(gsub(",", "", data_frame$confirmed))
data_frame$confirmed.tested.ratio <- as.numeric(gsub(",", "", data_frame$confirmed.tested.ratio))
data_frame$tested.population.ratio <- as.numeric(gsub(",", "", data_frame$tested.population.ratio))
data_frame$confirmed.population.ratio <- as.numeric(gsub(",", "", data_frame$confirmed.population.ra

return(data_frame)
}

proper_data_covid<- preprocess_covid_data(data_covid)
head(proper_data_covid)

## country date tested confirmed confirmed.tested.ratio

## 1 Afghanistan 17 Dec 2020 154767 49621 32.1
## 2 Albania 18 Feb 2021 428654 96838 22.6
## 3 Algeria 2 Nov 2020 230553 58574 25.4

3
## 4 Andorra 23 Feb 2022 300307 37958 12.6
## 5 Angola 2 Feb 2021 399228 20981 5.3
## 6 Antigua and Barbuda 6 Mar 2021 15268 832 5.4
## tested.population.ratio confirmed.population.ratio
## 1 0.40 0.130
## 2 15.00 3.400
## 3 0.53 0.130
## 4 387.00 49.000
## 5 1.30 0.067
## 6 15.90 0.860

summary(proper_data_covid)

## country date tested

## Afghanistan : 1 2 Feb 2023 : 6 Min. : 3880
## Albania : 1 1 Feb 2023 : 4 1st Qu.: 512037
## Algeria : 1 31 Jan 2023: 4 Median : 3029859
## Andorra : 1 1 Mar 2021 : 3 Mean : 31377219
## Angola : 1 23 Jul 2021: 3 3rd Qu.: 12386725
## Antigua and Barbuda: 1 29 Jan 2023: 3 Max. :929349291
## (Other) :166 (Other) :149
## confirmed confirmed.tested.ratio tested.population.ratio
## Min. : 0 Min. : 0.00 Min. : 0.006
## 1st Qu.: 37839 1st Qu.: 5.00 1st Qu.: 9.475
## Median : 281196 Median :10.05 Median : 46.950
## Mean : 2508340 Mean :11.25 Mean : 175.504
## 3rd Qu.: 1278105 3rd Qu.:15.25 3rd Qu.: 156.500
## Max. :90749469 Max. :46.80 Max. :3223.000
##
## confirmed.population.ratio
## Min. : 0.000
## 1st Qu.: 0.425
## Median : 6.100
## Mean :12.769
## 3rd Qu.:16.250
## Max. :74.400
##

write.csv(proper_data_covid,file='covid-19(2023).csv',row.names=FALSE)

# Get working directory

wd <- getwd()
# Get exported
file_path <- paste(wd, sep="", "/covid.csv")
# File path
print(file_path)

## [1] "C:/Users/LENOVO/OneDrive/Documents/Coursera/IBM/covid.csv"

file.exists(file_path)

## [1] FALSE

4
# My saved file with new name
file_path <- paste(wd, sep="", "/covid-19(2023).csv")
print(file_path)

## [1] "C:/Users/LENOVO/OneDrive/Documents/Coursera/IBM/covid-19(2023).csv"

file.exists(file_path)

## [1] TRUE

Task 4: Get a subset of the extracted data frame

# Read covid_data_frame_csv from the csv file

#read.csv("covid-19(2023).csv")
covid_data <- read.csv("covid-19(2023).csv")
# Get the 5th to 10th rows, with two "country" "confirmed" columns
covid_data[5:10,c('country','confirmed')]

## country confirmed
## 5 Angola 20981
## 6 Antigua and Barbuda 832
## 7 Argentina 9060495
## 8 Armenia 422963
## 9 Australia 10112229
## 10 Austria 5789991

Task 5: Calculate worldwide COVID testing positive ratio

# Get the total confirmed cases worldwide

tot_confirmed <- sum(covid_data[,'confirmed'])
tot_confirmed

## [1] 431434555

# Get the total tested cases worldwide

tot_tested <- sum(covid_data[,'tested'])
tot_tested

## [1] 5396881644

# Get the positive ratio (confirmed / tested)

positive_ratio <- tot_confirmed/tot_tested
round(positive_ratio,2)

## [1] 0.08

Task 6: Get a sorted name list of countries that reported their testing data

5
covid_data$country <- as.character(covid_data$country)

# Sort A to Z
sort(covid_data$country)

## [1] "Afghanistan" "Albania" "Algeria"

## [4] "Andorra" "Angola" "Antigua and Barbuda"
## [7] "Argentina" "Armenia" "Australia"
## [10] "Austria" "Azerbaijan" "Bahamas"
## [13] "Bahrain" "Bangladesh" "Barbados"
## [16] "Belarus" "Belgium" "Belize"
## [19] "Benin" "Bhutan" "Bolivia"
## [22] "Bosnia and Herzegovina" "Botswana" "Brazil"
## [25] "Brunei" "Bulgaria" "Burkina Faso"
## [28] "Burundi" "Cambodia" "Cameroon"
## [31] "Canada" "Chad" "Chile"
## [34] "China[c]" "Colombia" "Costa Rica"
## [37] "Croatia" "Cuba" "Cyprus[d]"
## [40] "Czechia" "Denmark[e]" "Djibouti"
## [43] "Dominica" "Dominican Republic" "DR Congo"
## [46] "Ecuador" "Egypt" "El Salvador"
## [49] "Equatorial Guinea" "Estonia" "Eswatini"
## [52] "Ethiopia" "Faroe Islands" "Fiji"
## [55] "Finland" "France[f][g]" "Gabon"
## [58] "Gambia" "Georgia[h]" "Germany"
## [61] "Ghana" "Greece" "Greenland"
## [64] "Grenada" "Guatemala" "Guinea"
## [67] "Guinea-Bissau" "Guyana" "Haiti"
## [70] "Honduras" "Hungary" "Iceland"
## [73] "India" "Indonesia" "Iran"
## [76] "Iraq" "Ireland" "Israel"
## [79] "Italy" "Ivory Coast" "Jamaica"
## [82] "Japan" "Jordan" "Kazakhstan"
## [85] "Kenya" "Kosovo" "Kuwait"
## [88] "Kyrgyzstan" "Laos" "Latvia"
## [91] "Lebanon" "Lesotho" "Liberia"
## [94] "Libya" "Lithuania" "Luxembourg[i]"
## [97] "Madagascar" "Malawi" "Malaysia"
## [100] "Maldives" "Mali" "Malta"
## [103] "Mauritania" "Mauritius" "Mexico"
## [106] "Moldova[j]" "Mongolia" "Montenegro"
## [109] "Morocco" "Mozambique" "Myanmar"
## [112] "Namibia" "Nepal" "Netherlands"
## [115] "New Caledonia" "New Zealand" "Niger"
## [118] "Nigeria" "North Korea" "North Macedonia"
## [121] "Northern Cyprus[k]" "Norway" "Oman"
## [124] "Pakistan" "Palestine" "Panama"
## [127] "Papua New Guinea" "Paraguay" "Peru"
## [130] "Philippines" "Poland" "Portugal"
## [133] "Qatar" "Romania" "Russia"
## [136] "Rwanda" "Saint Kitts and Nevis" "Saint Lucia"
## [139] "Saint Vincent" "San Marino" "Saudi Arabia"
## [142] "Senegal" "Serbia" "Singapore"

6
## [145] "Slovakia" "Slovenia" "South Africa"
## [148] "South Korea" "South Sudan" "Spain"
## [151] "Sri Lanka" "Sudan" "Sweden"
## [154] "Switzerland[l]" "Taiwan[m]" "Tanzania"
## [157] "Thailand" "Togo" "Trinidad and Tobago"
## [160] "Tunisia" "Turkey" "Uganda"
## [163] "Ukraine" "United Arab Emirates" "United Kingdom"
## [166] "United States" "Uruguay" "Uzbekistan"
## [169] "Venezuela" "Vietnam" "Zambia"
## [172] "Zimbabwe"

# Sort Z to A
ztoa_country <- sort(covid_data$country, decreasing = TRUE)
print(ztoa_country)

## [1] "Zimbabwe" "Zambia" "Vietnam"

## [4] "Venezuela" "Uzbekistan" "Uruguay"
## [7] "United States" "United Kingdom" "United Arab Emirates"
## [10] "Ukraine" "Uganda" "Turkey"
## [13] "Tunisia" "Trinidad and Tobago" "Togo"
## [16] "Thailand" "Tanzania" "Taiwan[m]"
## [19] "Switzerland[l]" "Sweden" "Sudan"
## [22] "Sri Lanka" "Spain" "South Sudan"
## [25] "South Korea" "South Africa" "Slovenia"
## [28] "Slovakia" "Singapore" "Serbia"
## [31] "Senegal" "Saudi Arabia" "San Marino"
## [34] "Saint Vincent" "Saint Lucia" "Saint Kitts and Nevis"
## [37] "Rwanda" "Russia" "Romania"
## [40] "Qatar" "Portugal" "Poland"
## [43] "Philippines" "Peru" "Paraguay"
## [46] "Papua New Guinea" "Panama" "Palestine"
## [49] "Pakistan" "Oman" "Norway"
## [52] "Northern Cyprus[k]" "North Macedonia" "North Korea"
## [55] "Nigeria" "Niger" "New Zealand"
## [58] "New Caledonia" "Netherlands" "Nepal"
## [61] "Namibia" "Myanmar" "Mozambique"
## [64] "Morocco" "Montenegro" "Mongolia"
## [67] "Moldova[j]" "Mexico" "Mauritius"
## [70] "Mauritania" "Malta" "Mali"
## [73] "Maldives" "Malaysia" "Malawi"
## [76] "Madagascar" "Luxembourg[i]" "Lithuania"
## [79] "Libya" "Liberia" "Lesotho"
## [82] "Lebanon" "Latvia" "Laos"
## [85] "Kyrgyzstan" "Kuwait" "Kosovo"
## [88] "Kenya" "Kazakhstan" "Jordan"
## [91] "Japan" "Jamaica" "Ivory Coast"
## [94] "Italy" "Israel" "Ireland"
## [97] "Iraq" "Iran" "Indonesia"
## [100] "India" "Iceland" "Hungary"
## [103] "Honduras" "Haiti" "Guyana"
## [106] "Guinea-Bissau" "Guinea" "Guatemala"
## [109] "Grenada" "Greenland" "Greece"
## [112] "Ghana" "Germany" "Georgia[h]"
## [115] "Gambia" "Gabon" "France[f][g]"

7
## [118] "Finland" "Fiji" "Faroe Islands"
## [121] "Ethiopia" "Eswatini" "Estonia"
## [124] "Equatorial Guinea" "El Salvador" "Egypt"
## [127] "Ecuador" "DR Congo" "Dominican Republic"
## [130] "Dominica" "Djibouti" "Denmark[e]"
## [133] "Czechia" "Cyprus[d]" "Cuba"
## [136] "Croatia" "Costa Rica" "Colombia"
## [139] "China[c]" "Chile" "Chad"
## [142] "Canada" "Cameroon" "Cambodia"
## [145] "Burundi" "Burkina Faso" "Bulgaria"
## [148] "Brunei" "Brazil" "Botswana"
## [151] "Bosnia and Herzegovina" "Bolivia" "Bhutan"
## [154] "Benin" "Belize" "Belgium"
## [157] "Belarus" "Barbados" "Bangladesh"
## [160] "Bahrain" "Bahamas" "Azerbaijan"
## [163] "Austria" "Australia" "Armenia"
## [166] "Argentina" "Antigua and Barbuda" "Angola"
## [169] "Andorra" "Algeria" "Albania"
## [172] "Afghanistan"

Task 7: Identify country names with a specific pattern

# Find country names that contain a space (i.e., countries with multiple words in their name)
space_matches <- grep(" ", covid_data$country, value = TRUE)
print(space_matches)

## [1] "Antigua and Barbuda" "Bosnia and Herzegovina" "Burkina Faso"

## [4] "Costa Rica" "Dominican Republic" "DR Congo"
## [7] "El Salvador" "Equatorial Guinea" "Faroe Islands"
## [10] "Ivory Coast" "New Caledonia" "New Zealand"
## [13] "North Korea" "North Macedonia" "Northern Cyprus[k]"
## [16] "Papua New Guinea" "Saint Kitts and Nevis" "Saint Lucia"
## [19] "Saint Vincent" "San Marino" "Saudi Arabia"
## [22] "South Africa" "South Korea" "South Sudan"
## [25] "Sri Lanka" "Trinidad and Tobago" "United Arab Emirates"
## [28] "United Kingdom" "United States"

Task 8: Pick two countries you are interested in, and then review their testing data

india <- covid_data[covid_data$country == "India",

c("country", "tested", "confirmed", "confirmed.population.ratio")]
germany<- covid_data[covid_data$country == "Germany",
c("country", "tested", "confirmed", "confirmed.population.ratio")]
india

## country tested confirmed confirmed.population.ratio

## 73 India 866177937 43585554 31.7

8
germany

## country tested confirmed confirmed.population.ratio

## 60 Germany 65247345 3733519 4.5

Task 9: Compare which one of the selected countries has a larger ratio of confirmed cases to
population

# Use if-else statement

if (germany$confirmed.population.ratio > india$confirmed.population.ratio) {
print("Germany has a higher COVID-19 infection rate per population than India.")
} else {
print("India has a higher COVID-19 infection rate per population than Germany.")
}

## [1] "India has a higher COVID-19 infection rate per population than Germany."

Task 10: Find countries with confirmedcases to population ratio rate less than a threshold

# Get a subset of any countries with `confirmed.population.ratio` less than the threshold
low_risk_countries <- covid_data[(covid_data$`confirmed.population.ratio` <1), ]
head(low_risk_countries)

## country date tested confirmed confirmed.tested.ratio

## 1 Afghanistan 17 Dec 2020 154767 49621 32.1
## 3 Algeria 2 Nov 2020 230553 58574 25.4
## 5 Angola 2 Feb 2021 399228 20981 5.3
## 6 Antigua and Barbuda 6 Mar 2021 15268 832 5.4
## 14 Bangladesh 24 Jul 2021 7417714 1151644 15.5
## 19 Benin 4 May 2021 595112 7884 1.3
## tested.population.ratio confirmed.population.ratio
## 1 0.40 0.130
## 3 0.53 0.130
## 5 1.30 0.067
## 6 15.90 0.860
## 14 4.50 0.700
## 19 5.10 0.067

Urethritis
100% (2)
Urethritis
19 pages
NCP Final
0% (1)
NCP Final
4 pages
Microbiology Introduction
0% (1)
Microbiology Introduction
37 pages
Multiple Choice Questions
100% (1)
Multiple Choice Questions
5 pages
Informatics Practices Project 12 New
No ratings yet
Informatics Practices Project 12 New
31 pages
P. YearB - Sc. Project File Micro Reference by Adarsh
No ratings yet
P. YearB - Sc. Project File Micro Reference by Adarsh
93 pages
The All-Frequencies CAFL (AFCAFL)
100% (1)
The All-Frequencies CAFL (AFCAFL)
27 pages
Covid Data Report
No ratings yet
Covid Data Report
21 pages
WBC Morphology and Cases Quiz2
No ratings yet
WBC Morphology and Cases Quiz2
41 pages
IP Projects For Class Xii
0% (1)
IP Projects For Class Xii
20 pages
CHAPTER 16 Specific Host Defense Mechanisms
No ratings yet
CHAPTER 16 Specific Host Defense Mechanisms
10 pages
Report MSA Practice02
No ratings yet
Report MSA Practice02
29 pages
202324S2 SEHH1015 L12 Immunity-Sv
No ratings yet
202324S2 SEHH1015 L12 Immunity-Sv
38 pages
Sample
No ratings yet
Sample
13 pages
Corona Virus in India
No ratings yet
Corona Virus in India
29 pages
USC Immunization Checklist Form
No ratings yet
USC Immunization Checklist Form
2 pages
MMSBF23051 - Shreya Chakraborty
No ratings yet
MMSBF23051 - Shreya Chakraborty
19 pages
15 Immunogenetics-89402
No ratings yet
15 Immunogenetics-89402
34 pages
Co Vids QL Present N 0710
No ratings yet
Co Vids QL Present N 0710
27 pages
I.P Project
No ratings yet
I.P Project
24 pages
Maheswari Public School Kalwar Road: Project File Session 2023-24
No ratings yet
Maheswari Public School Kalwar Road: Project File Session 2023-24
28 pages
Ip Project File For Class 12
No ratings yet
Ip Project File For Class 12
25 pages
Spatial Disparities in COVID-19 Vaccination Coverage in Bangladesh 8july21
No ratings yet
Spatial Disparities in COVID-19 Vaccination Coverage in Bangladesh 8july21
34 pages
Corona Virus Analysis
No ratings yet
Corona Virus Analysis
27 pages
IP Project Covid-19 Impact
No ratings yet
IP Project Covid-19 Impact
22 pages
R Jeevitha
No ratings yet
R Jeevitha
16 pages
Baby
No ratings yet
Baby
18 pages
Sagar Singh IP Project Covid-19 Impact
No ratings yet
Sagar Singh IP Project Covid-19 Impact
25 pages
Himanshi IP
No ratings yet
Himanshi IP
24 pages
Name
No ratings yet
Name
23 pages
Covid-19 in Germany: A Case Study: Abrar Ahmed
No ratings yet
Covid-19 in Germany: A Case Study: Abrar Ahmed
44 pages
Covid-19 Regression2
No ratings yet
Covid-19 Regression2
6 pages
COVID 19 Pandemic Analysis
No ratings yet
COVID 19 Pandemic Analysis
26 pages
IP Project Covid-19 Impact
No ratings yet
IP Project Covid-19 Impact
25 pages
14LG tzBxS6oOsaad9J6gb1GQn2CDEcGL
No ratings yet
14LG tzBxS6oOsaad9J6gb1GQn2CDEcGL
22 pages
IP Project Covid-19 Impact (Mahalaxmi) PDF
No ratings yet
IP Project Covid-19 Impact (Mahalaxmi) PDF
24 pages
IP Project Covid-19 Impact (Navita) PDF
No ratings yet
IP Project Covid-19 Impact (Navita) PDF
24 pages
IP Project Covid-19 Impact (Sam) PDF
No ratings yet
IP Project Covid-19 Impact (Sam) PDF
24 pages
Computer Science Ip
No ratings yet
Computer Science Ip
16 pages
Jupyter Notebook2
No ratings yet
Jupyter Notebook2
15 pages
My P Report
No ratings yet
My P Report
14 pages
Important Questions
No ratings yet
Important Questions
31 pages
Ashutosh Project
No ratings yet
Ashutosh Project
19 pages
05-Event Based Surveillance 2012
No ratings yet
05-Event Based Surveillance 2012
32 pages
COVID-19 Data Analysis With Pandas and NumPy
No ratings yet
COVID-19 Data Analysis With Pandas and NumPy
5 pages
Covid Report PDF
No ratings yet
Covid Report PDF
17 pages
Smear Negative TB
No ratings yet
Smear Negative TB
53 pages
Data Analysis Report Team 5
No ratings yet
Data Analysis Report Team 5
15 pages
Title Manuscripts
No ratings yet
Title Manuscripts
19 pages
R Training AM
No ratings yet
R Training AM
6 pages
Report - Data Visualization and Exploration
No ratings yet
Report - Data Visualization and Exploration
14 pages
COVID
No ratings yet
COVID
19 pages
COVID 19 Some Challenges Some Data 1
No ratings yet
COVID 19 Some Challenges Some Data 1
26 pages
Ip Project Priyakshat
No ratings yet
Ip Project Priyakshat
18 pages
The Global Effect of Maternal Education On Complete Childhood Vaccination: A Systematic Review and Meta-Analysis
No ratings yet
The Global Effect of Maternal Education On Complete Childhood Vaccination: A Systematic Review and Meta-Analysis
16 pages
4.05 Virtual Blood Typing / Transfusion Lab Simulation Lab
100% (1)
4.05 Virtual Blood Typing / Transfusion Lab Simulation Lab
3 pages
Systemic Lupus Erythematosus
100% (1)
Systemic Lupus Erythematosus
9 pages
Pyr Agossou FR
No ratings yet
Pyr Agossou FR
12 pages
English For Medical Doctors - Intakhab Alam Khan
No ratings yet
English For Medical Doctors - Intakhab Alam Khan
7 pages
Mini
No ratings yet
Mini
6 pages
Regression Analys
No ratings yet
Regression Analys
7 pages
Assignment - Ipynb - Colaboratory
No ratings yet
Assignment - Ipynb - Colaboratory
14 pages
Writeup Final Project Jack Mckelligon
No ratings yet
Writeup Final Project Jack Mckelligon
7 pages
Exually Tran Mitted Disease: Rean Jane Escabarte
No ratings yet
Exually Tran Mitted Disease: Rean Jane Escabarte
14 pages
Assignment Sujith S
No ratings yet
Assignment Sujith S
13 pages
Region and Domain Region and Domain
No ratings yet
Region and Domain Region and Domain
3 pages
DataQuest - Project
No ratings yet
DataQuest - Project
4 pages
COMP2501 - Assignment - 1 - Questions - RMD 2
No ratings yet
COMP2501 - Assignment - 1 - Questions - RMD 2
7 pages
Studi Kasus Arthritis Rheumatoid
No ratings yet
Studi Kasus Arthritis Rheumatoid
11 pages
Data Analytics Assignment 1
No ratings yet
Data Analytics Assignment 1
11 pages
Análisis de Propagación Del Coronavirus: Angel Villamizar
No ratings yet
Análisis de Propagación Del Coronavirus: Angel Villamizar
16 pages
Modelling COVID-19 Spatio-Temporal Spread Using Bayesian Nonparametric Covariance Regresssion
No ratings yet
Modelling COVID-19 Spatio-Temporal Spread Using Bayesian Nonparametric Covariance Regresssion
15 pages
Sambulawan ES WinS Narrative Report
No ratings yet
Sambulawan ES WinS Narrative Report
2 pages
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
No ratings yet
Interactive Visualization of COVID-19 Data and Animated Map: Some Instructions
6 pages
Tutorial Worksheet wk7
No ratings yet
Tutorial Worksheet wk7
2 pages
Mastoiditis (Case Report)
No ratings yet
Mastoiditis (Case Report)
4 pages
Visualizing COVID-19 Data Beautifully in Python (In 5 Minutes or Less!!) - by Nik Piepenbreier - Towards Data Science
No ratings yet
Visualizing COVID-19 Data Beautifully in Python (In 5 Minutes or Less!!) - by Nik Piepenbreier - Towards Data Science
8 pages
CRP Lab Sheet PDF
No ratings yet
CRP Lab Sheet PDF
4 pages
Package COVID19': January 6, 2021
No ratings yet
Package COVID19': January 6, 2021
6 pages
Covid 19 India Dashboard Using Python and Voila
No ratings yet
Covid 19 India Dashboard Using Python and Voila
6 pages
Shupe Assignment
No ratings yet
Shupe Assignment
7 pages
National Rabies Prevention and Control Program New Evaluation Form For Rabies Free Declaration 2021
No ratings yet
National Rabies Prevention and Control Program New Evaluation Form For Rabies Free Declaration 2021
2 pages
Analysis and Prediction of COVID-19 For Different Regions and Countries Methods
No ratings yet
Analysis and Prediction of COVID-19 For Different Regions and Countries Methods
6 pages
Modelling COVID-19 Outbreak: Segmented Regression To Assess Lockdown Effectiveness
No ratings yet
Modelling COVID-19 Outbreak: Segmented Regression To Assess Lockdown Effectiveness
5 pages
Allergy and Clinical Immunology Services in Europe
No ratings yet
Allergy and Clinical Immunology Services in Europe
6 pages
Covid19 Visualization
No ratings yet
Covid19 Visualization
2 pages
r128630443 - Nereida - Cruz - Giron - CUR128630443 2
No ratings yet
r128630443 - Nereida - Cruz - Giron - CUR128630443 2
1 page
Chart - WBC Disorders
No ratings yet
Chart - WBC Disorders
1 page
Bibliography
No ratings yet
Bibliography
2 pages
Cody's Data Cleaning Techniques Using SAS, Third Edition
From Everand
Cody's Data Cleaning Techniques Using SAS, Third Edition
Ron Cody
4.5/5 (3)

Introduction R For DS

Uploaded by

Introduction R For DS

Uploaded by

Global COVID-19 Testing Analysis Using R

## Warning: package ’httr’ was built under R version 4.4.3

## Warning: package ’rvest’ was built under R version 4.4.3

Task 1: Get a COVID-19 pandemic Wiki page using HTTP request

get_wiki_covid19_page <- function() {

wiki_extr <- read_html(get_wiki_covid19_page())

table <- html_nodes(wiki_extr, "table")

data_covid <- as.data.frame(html_table(table[2]))

## Country.or.region Date.a. Tested Units.b. Confirmed.cases.

Task 3: Pre-process and export the extracted data frame

## Country.or.region Date.a. Tested Units.b.

preprocess_covid_data <- function(data_frame) {

shape <- dim(data_frame)

# Remove the "World" row

# Remove the last row

# Remove unnecessary columns

# Renaming the columns

# Convert column data types

## country date tested confirmed confirmed.tested.ratio

## country date tested

# Get working directory

Task 4: Get a subset of the extracted data frame

# Read covid_data_frame_csv from the csv file

Task 5: Calculate worldwide COVID testing positive ratio

# Get the total confirmed cases worldwide

# Get the total tested cases worldwide

# Get the positive ratio (confirmed / tested)

## [1] "Afghanistan" "Albania" "Algeria"

## [1] "Zimbabwe" "Zambia" "Vietnam"

Task 7: Identify country names with a specific pattern

## [1] "Antigua and Barbuda" "Bosnia and Herzegovina" "Burkina Faso"

india <- covid_data[covid_data$country == "India",

## country tested confirmed confirmed.population.ratio

## country tested confirmed confirmed.population.ratio

# Use if-else statement

## country date tested confirmed confirmed.tested.ratio

You might also like