Capstone Presentation
Capstone Presentation
Time Series
Andy Peng
1
Overview
● Images
● Modeling
We will be displaying images from our data exploration and talking about the results
from our modeling. But before we do that we will first discuss about the air quality
index PM2.5.
2
What is PM2.5?
⚫ Particles with diameter less than 2.5 micrometers
⚫ Examples
⚫ Burning Fuel
⚫ Chemical reactions
What is the air quality index PM2.5? PM2.5 represents particles that have diameter
less than 2.5 micrometers which is more than 10 times thinner than a human hair.
These particles are formed as a result of burning fuel and chemical reactions that
take place in the atmosphere. But what level of PM2.5 index is considered normal
and what level is considered unhealthy?
3
PM2.5 Cutoffs
4
PM2.5 Cutoffs
These are the four levels that we care about because this is when PM2.5 levels
become harmful for us. It starts with recommending individuals with certain
respiratory or heart disease to limit their prolonged exertion to everyone avoiding
outdoor exertion with certain people remaining indoors for safety reasons. But how
is Gucheng’s PM2.5 values?
5
Gucheng’s PM2.5 Values
This is a graph of the PM2.5 values measured at Gucheng’s air quality monitoring site.
The time range of this data ranges from March 01, 2013 to February 28, 2017. As you
can see in this picture majority of the days in each year have values in the unhealthy
for sensitive groups to very unhealthy region. PM2.5 value increases when there is
burning fuel or chemical reactions happening in the atmosphere. What do you think
would lead to a result of this in Gucheng?
6
Chinese New Year
It would be Chinese New Year. On Chinese New Year, people would be driving their
cars to visit families and lighting fireworks to celebrate the holiday.
7
Does holidays affect PM2.5 levels?
Does holidays affect PM2.5 levels? We graphed the daily data and the different
holiday ranges to see if there is a relationship between public holidays in China and
PM2.5 levels.
8
Does holidays affect PM2.5 levels?
Based off this picture we can see a slight increase during and after the New Years
holiday. Other than that holiday, for all the other public holidays, we can’t really tell
if there is any increase or decrease in PM2.5 levels. Now let’s move onto the
modeling part of our presentation.
9
Modeling
● Naive Forecasting
● Linear Regression
● SARIMA Model
For modeling we used Naive Forecasting, linear regression and SARIMA model. And
here are the results.
10
Naive Forecasting
Using naive forecasting, we were able to predict the PM2.5 values within a 51.65
range. Here is a graph demonstrating how our prediction would look like compare to
our data.
11
Linear Regression
Using linear regression, we were able to predict the PM2.5 values within a 113.71
range. The graph here demonstrates on well our prediction for linear regression is
doing compare to the actual data. As you can see the linear regression prediction is a
horizontal line and therefore would perform worser than the naive forecasting.
12
SARIMA Model
Using SARIMA Model, we were able to predict the PM2.5 values within a 56.67 range.
Here is a graph demonstrating how our prediction would look like compare to our
data. As you can see this model is better than the linear regression model, but a
performs slightly worser than the naive forecasting.
13
Recommendations
● Minimize fireworks
on New Years Day
● Naive Forecasting
To summarize all that we just talked about, we found out that there seems to be an
increase in PM2.5 levels during and after New Years day. We could minimize the
amount of fireworks lit on New Year's day so that the PM2.5 levels wouldn’t increase
as much. For modeling, I would use Naive Forecasting because it yielded the best
result in predicting PM2.5 levels.
14
Next Steps
● Holidays
For our next steps we can investigate more on holiday effects on PM2.5. I only
graphed public holidays in China on the graph we seen earlier, but there might be
certain holidays that appear specifically in Gucheng that we do not know of.
15
Next Steps
● Holidays
●Day/Night
We could also investigate whether there is a difference in PM2.5 values during the
day time compare to night time. Would there be a higher burning of fuels from cars
during the day when people have work or would there be a higher burning of fuels
from night workers at night?
16
Next Steps
● Holidays
●Day/Night
●Spikes in our data
Not only can we explore day and night PM2.5 values, we could also explore why
there are certain huge spikes in our data. Could there be a certain event that happen
around the air quality monitoring site that led to this huge spike?
17
Next Steps
● Holidays
●Day/Night
●Spikes in our data
● Cross Validation
Our model can also be improve by performing cross validation on the data set.
18
Next Steps
● Holidays
●Day/Night
●Spikes in our data
● Cross Validation
●Neural Networks
We could also try out other time series modeling techniques that we didn’t mention
in this presentation such as building a Neural Network to predict PM2.5 values.
19
Next Steps
● Holidays
●Day/Night
●Spikes in our data
● Cross Validation
●Neural Networks
● Gather More Data
Lastly our models can further be improved by gathering data because our data only
goes up to February 28, 2017. But, we are missing a chunk of data from March 1,
2017 to present day. However, all of these next steps require more money and time
to collect more data and to train our model.
20
Thank You
Thank You
21