0% found this document useful (0 votes)
107 views

Data Analytics Assignment 1

- The document is an assignment submission for a data analytics course containing an R code solution. - The code collects COVID-19 case data from various countries and applies time series analysis and forecasting methods like linear regression, growth rates, SIR modeling. - Visualizations created include time series plots, maps, comparing cases between countries. The code analyzes COVID-19 data from India and US in detail.

Uploaded by

RADHIKA CHANDAK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
107 views

Data Analytics Assignment 1

- The document is an assignment submission for a data analytics course containing an R code solution. - The code collects COVID-19 case data from various countries and applies time series analysis and forecasting methods like linear regression, growth rates, SIR modeling. - Visualizations created include time series plots, maps, comparing cases between countries. The code analyzes COVID-19 data from India and US in detail.

Uploaded by

RADHIKA CHANDAK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Assignment – I

BFT - 6
(Deepening Specialisation 2: Apparel Production Management)

Name of the Subject : Data Analytics & R Name : Radhika Chandak


Subject Code : BFT603DS2 Roll No. : BFT/19/21
Subject Id : 15250 Date of Submission : 27.04.2022

Assignment:
Using data collection methods and applying principles of statistics carry out the following:
Identify problems faced in industrial engineering and collect appropriate data.
Use any one of the following methods in Principle of Forecasting:
Time Series
Solution:

We start by installing a package already available for covid cases , ie covid19.analytics .


To begin , we take out the time series of the confirmed cases and then death cases .
The code will be as follows :

ag<-covid19.data(case='aggregated')

tsc<-covid19.data(case = 'ts-confirmed')

#summary
report.summary(Nentries=10 , graphical.output = F)

- We will be able to see graphs and charts on the right side under plots , upon
zooming we observe :
● We see that the range of dates is from : january 2020 to april 2022 , it is for top 10
countries .
● The pie chart and bar graph show the countries with the confirmed cases and death
cases respectively.
● While Us has the highest no. of confirmed cases , Turkey has the least .
● For death cases , the US is again the highest but France is the lowest .

TIME SERIES - CONFIRMED CASES


TIME SERIES - DEATH CASES

Time Series Worldwide TOTS ****


ts-confirmed ts-deaths ts-recovered
511748975 6228621 0
1.22% 0%
**** Time Series Worldwide AVGS ****
ts-confirmed ts-deaths ts-recovered
1801933.01 21931.76 0
1.22% 0%
**** Time Series Worldwide SDS ****
ts-confirmed ts-deaths ts-recovered
6617130.29 86526.01 0
1.31% 0%
- Then we take out the total per location for our country India and the country with
most cases , ie. , Us .

#total per location


tots.per.location(tsc, geo.loc = c('us' ,'india'))

So under running model we get the linear regression model .


● On the top we can see no. of cases in the log scale and x axis represent no. of days
. Each line of the plot represents the linear regression model . The plot has the
cumulative values and we can see the concave pattern , that is the increasing trend
and then the small concave pattern showing decrease in trend .
● At the bottom we have a bar chart and the values are in the log scale for y axis .
Similarly , we also get it for Us .

LINEAR REGRESSION MODEL - India and Us


- Now to see the Growth Rate of specific countries we can type (For India here )

#growth rate
growth.rate(tsc, geo.loc = 'india')

We can see that we get 2 plots , on the top , y has 2 axis ,one in regular and other in log
scale , what we can observe from here is that during the second lockdown the cases were
increasing more rapidly than before the first lockdown .
At the bottom we have the growth rate as a part of log scale .
- Now let us extract one more time series data , for all the cases and we save it into
tsa - the name of dataframe.

tsa<-covid19.data(case = 'ts-ALL')

And then using

#TOTALS PLOT
totals.plt(tsa)

We can create interactive data for time series cases .


In the linear graph and log graph , we can see that there are around 511.79 million confirmed
cases and 505.520 million active cases ,and so on .
- To see the different Covid cases across the globe we can use the function of live.map
with the dataframe tsa .

#live map
live.map(tsa)

By clicking on the viewer and scrolling on the particular countries we can see the no. of
cases .
- One of the model that is popular among the researchers working on covid 19 data is
called as SIR model . This groups the people into 3 categories , in the first category
we have

● S-people who are healthy but susceptible to the disease .


● I- people who are infected
● R- people who are recovered

We use the function called generate sir model :

#sir model
generate.SIR.model(tsc, 'india',tot.population = 1383000000)

So on the top we have two plots ,


● On the left we have yn axis which represents no. of infected people in the regular
scale and x axis represents no. of days for the first 25 days and the plot is created .
● On the right , the y axis represents no. Of infected people in the log scale and x axis
represents no. of days for the first 25 days .
● In the bottom we have no. of subjects in the log scale . The 3 different lines are
different linear models. Blue shows people susceptible , red shows infected and
green shows recovered people .
● We can observe that from 0 to day 90 approx the no. of people getting infected
reaches to peak and no. of people recovered also reaches to peak .
This is a screenshot of the coding .

You might also like