0% found this document useful (0 votes)
83 views31 pages

A Comparison of Time Series Models To Predict COVID-19 Cases

This document compares different time series models to predict COVID-19 cases. It describes using linear regression, ARIMA, Bayesian ridge regression, support vector regression, and Holt's linear trend on COVID case data from the 10 most populated countries. The models were trained on 96% of the data and tested on the remaining 4%. Mean squared error was calculated for each country and model, and the results were visualized. Bayesian ridge regression had the lowest error overall, but performance varied between countries, showing predicting COVID cases is complex.

Uploaded by

Manoj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views31 pages

A Comparison of Time Series Models To Predict COVID-19 Cases

This document compares different time series models to predict COVID-19 cases. It describes using linear regression, ARIMA, Bayesian ridge regression, support vector regression, and Holt's linear trend on COVID case data from the 10 most populated countries. The models were trained on 96% of the data and tested on the remaining 4%. Mean squared error was calculated for each country and model, and the results were visualized. Bayesian ridge regression had the lowest error overall, but performance varied between countries, showing predicting COVID cases is complex.

Uploaded by

Manoj Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 31

A Comparison of Time Series Models to Predict

COVID-19 Cases

Kenny Lau, Eddie Aung, Elaine Pranadjaya, Wittawat Chailiab


Devastating Effects of COVID-19
● As of November 29th, 2020 there are...
○ 63,066,168 cases worldwide
○ 1,465,048 deaths worldwide
● It has completely changed our daily lives and how we interact with people
● Is it possible to accurately determine number of future cases using
statistical/machine learning models?
○ If so, countries can optimize usage of national resources and lead to quick
recover for the global health and economy
Project Overview
● Using a COVID-19 dataset and different time series models, we will apply our
knowledge of supervised learning regressions to try to accurately predict new cases
● We will be using the following models:
○ Linear Regression
○ Autoregressive Integrated Moving Average (ARIMA)
○ Bayesian Ridge Regression
○ Support Vector Regressor
○ Holt’s Linear Trend
● Visualize and compare the different models
COVID-19 Data Attributes
● Lots of attributes with general information about country
○ Continent
○ Location
○ Population
○ Population Density
○ Median Age
○ % of Population Over 65
○ % of Population Over 70
○ GDP per capita
○ And many more attributes
● Each country also had an attribute with JSON formatted data of the number of cases
for the corresponding date
What We First Tried
● Maybe all the attributes for each country can be used in a ML model to predict the
average number of new cases
● If this is true we can...
○ Cleaned up the data
○ Find the correlations between all the attributes and average new cases
○ Apply different ML models (KNN, Neural Network, Regression) to data
○ Compare results
What We Ended Up Doing
● We learned that there was no correlation between the average new cases and all the
attributes from the dataset
● We can still use Time Series Forecasting to predict future new cases
○ Time series forecasting: The use of a model to predict future values based on
previously observed values
○ Picked the 5 most popular time series methods
● The goal was to find the model that provided the most accurate results for the 10
most populated countries
What We Ended Up Doing
● Train with 96% of the data and test with remaining 4%
● Calculated the Mean Squared Error for each country with respective model
● Visualized data to compare actual to predicted values
● Final comparison between all models to find the model that was the most accurate
First Column (0) = 12-31-2019

Last Column (331) = 11-26-2020


Bayesian Ridge Regression
● Type of linear regression.
● Reflects the Bayesian framework: forming an initial
estimate and then improving the estimate as more data is
gathered.
● Bayesian regression is able to deal with insufficient data or
poorly distributed data by formulating linear regression
using probability distributors rather than point estimates.
Support Vector Regression
● Developed from Support Vector Machine (SVM) for regression analysis.
● Decide a Hyperplane(best fit line) and draw a boundary line along the Hyperplane.
Linear Regression (Polynomial)
● Statistical method for predictive analysis.
● Show linear relationship between variables (dependent and
independents).
● Apply polynomial regression for better accuracy.
Holt’s Linear Trend
● AKA Holt-Winter Exponential Smoothing
● Uses exponential smoothing to encode values from the past
○ Uses the past data to predict “typical” values for present data
Autoregressive Integrated Moving Average
● Contains two components
○ Autoregression (AR)
○ Moving average (MA)
● AR regresses on its own lagged or prior values while MA
incorporates dependency between an observation and residual
error
● Effective in predicting value changes on intervals
Results
RSME
Obstacles We Ran Into
● It hard to schedule times to collaborate together virtually
● Had a hard time determining the dataset and what data we want to predict
○ Wanted a to find a relevant dataset that would be interesting to analyze
● Not being able to apply the machine learning models we learned in class
○ Considered switching datasets but we found COVID-19 data to be the most
interesting
What We Learned
● 4 different types of regression models and their behavior
● COVID-19 cases surprisingly have no correlation with many of the attributes for
each country (GDP, population density, etc.)
○ Showcases importances of having strict public health guidelines and
regulations
● No model that consistently & accurately predicts cases for countries
○ Estimating the number of new cases per day is highly complex
● Some models work better on specific countries
○ For example, ARIMA doesn’t work at all on Russia dataset
● Wear a mask
Thank you!

You might also like