0% found this document useful (0 votes)
206 views23 pages

Individual Assignment: Technology Park Malaysia

This document provides instructions for an individual assignment on programming for data analysis. Students are asked to submit their assignment by March 3rd, 2022, which is worth 75% of the grade. The assignment must be properly bound and include both a soft and hard copy. Late submissions will not be accepted without an approved extenuating circumstances form. Plagiarism will be penalized. The goal is to analyze hourly weather data from two New York airports to compile useful information and insights through various data visualization and manipulation techniques in R. Fifteen analyses are described exploring relationships between variables like temperature, wind speed, pressure, and humidity over time.

Uploaded by

Gaming World
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
206 views23 pages

Individual Assignment: Technology Park Malaysia

This document provides instructions for an individual assignment on programming for data analysis. Students are asked to submit their assignment by March 3rd, 2022, which is worth 75% of the grade. The assignment must be properly bound and include both a soft and hard copy. Late submissions will not be accepted without an approved extenuating circumstances form. Plagiarism will be penalized. The goal is to analyze hourly weather data from two New York airports to compile useful information and insights through various data visualization and manipulation techniques in R. Fifteen analyses are described exploring relationships between variables like temperature, wind speed, pressure, and humidity over time.

Uploaded by

Gaming World
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 23

INDIVIDUAL ASSIGNMENT

TECHNOLOGY PARK MALAYSIA


CT127-3-2-PFDA

PROGRAMMING FOR DATA ANALYSIS


NP2F2009IT

HAND OUT DATE: 24 OCTOBER 2021

HAND IN DATE: 03 MARCH 2022

WEIGHTAGE: 75%

INSTRUCTIONS TO CANDIDATES:

1 Submit your assignment at the administrative counter.

2 Students are advised to underpin their answers with the use of references (cited
using the Harvard Name System of Referencing).

3 Late submission will be awarded zero (0) unless Extenuating Circumstances


(EC) are upheld.

4 Cases of plagiarism will be penalized.

5 The assignment should be bound in an appropriate style (comb bound or


stapled).

6 Where the assignment should be submitted in both hardcopy and softcopy, the
softcopy of the written assignment and source code (where appropriate) should
be on a CD in an envelope / CD cover and attached to the hardcopy.

7 You must obtain 50% overall to pass this module.


Acknowledgement
This project has been prepared to fulfil the partial fulfilment of the requirement for BSc.
IT IV Semester from Asia Pacific University (APU). We would like to express our
deepest appreciation to all those who provided us the possibility to complete this report.
Special gratitude is towards our second-year project, provided by the college and our
classmates whose contribution to stimulating suggestions and encouragement helped us
to coordinate our project especially in writing this report.
Furthermore, we would also like to acknowledge with much appreciation Mr. R N
Thakur who gave his precious time for the completion of the project.

Sincerely,

Harishchandra Yadav (NP000394)

IV Semester, B. Sc.IT
Executive summary
In the given scenario we have plot the data into the given scenario where we have make
the meaningful data while plotting the data. Using the different plot and converting into
the meaningful data of the by using the data manipulation, data visualization and data
exploration to the given data. We have taken the data from the csv format in the xml file.
We have used the 15 rows data into the meaningful format.
Table of Contents
1. Introduction......................................................................................................................1
2. Aims.................................................................................................................................1
3. Objectives........................................................................................................................1
4. Assumptions....................................................................................................................1
5) Analysis Descriptions......................................................................................................2
5.1 Analysis 1..................................................................................................................2
5.2 Analysis2...................................................................................................................4
5.3 Analysis 3..................................................................................................................5
5.4 Analysis 4..................................................................................................................6
5.5 Analysis 5..................................................................................................................7
5.6 Analysis 6..................................................................................................................9
5.7 Analysis 7................................................................................................................10
5.8 Analysis 8................................................................................................................11
5.9 Analysis 9................................................................................................................12
5.10 Analysis 10............................................................................................................13
5.11 Analysis 11............................................................................................................14
5.12 Analysis 12............................................................................................................15
5.13 Analysis 13............................................................................................................16
5.14 Analysis 14............................................................................................................17
5.15 Extra Feature 1.......................................................................................................18
6. Conclusion.....................................................................................................................19
Table of Figure
Figure 1: Install Package.....................................................................................................2
Figure 2: Source code of Temperature................................................................................2
Figure 3: Histogram Of Temperature..................................................................................3
Figure 4: Source code of Visible.........................................................................................4
Figure 5: Histogram of visible............................................................................................4
Figure 6: source code of precipitation against Humid.........................................................6
Figure 7: Scatter plot of Precipitate against Humid.............................................................6
Figure 8: source code of wind speed...................................................................................7
Figure 9: Histogram of wind speed.....................................................................................7
Figure 10: source code of precipitate...................................................................................8
Figure 11: Histogram of Precipitate....................................................................................8
Figure 12: source code of Dew Point against Temperature.................................................9
Figure 13: Scatter plot of Dew Point against Temperature.................................................9
Figure 14: source code of wind Gust against Wind Speed................................................10
Figure 15: Scatter Plot of Wind Gust against Wind Speed...............................................10
Figure 16: source code of Pressure against Temperature..................................................11
Figure 17: Scatter Plot of Pressure against Temperature...................................................11
Figure 18: source code of Dew Point against Humid........................................................12
Figure 19: Scatter plot of Dew point against humid..........................................................12
Figure 20: source code of Pressure against month............................................................13
Figure 21: Boxplot of Pressure against Month..................................................................13
Figure 22: source code of Humid against visible..............................................................14
Figure 23: Boxplot of Humid against visible....................................................................14
Figure 24: Source code of Temperature against Month....................................................15
Figure 25: Boxplot of Temperature against month............................................................15
Figure 26: source code of Wind Direction........................................................................16
Figure 27: Polar Bar Plot of Wind Direction.....................................................................16
Figure 28: source code of Matrix of Data Set....................................................................17
Figure 29: Scatter Plot of Matrix of Data Set....................................................................17
Figure 30: Source code of Weight density graph of wind speed.......................................18
Figure 31:Weight density graph of Wind Speed(Mph).....................................................18
1. Introduction
"A DATA ANALYSIS PROJECT USING HOURLY WEATHER DATA," according to
our assignment, I have developed ways to study hourly weather data sets to in order to
compile the information needed to make a choice. This assignment's dataset matches both
hourly meteorological data for LaGuardia Airport (LGA) and John F. Kennedy
International Airport (JFK) in the United States. In total, there are 15 columns and 17,412
rows. Below, I've reviewed a collection of hourly weather data and categorized it using
multiple methods in such a way that it generates the crucial data that aids in decision-
making. I loaded the data into the datasets and then pre-processed it using the needed
instructions to convert it to the suitable format. For this project, I used data visualization
and experimentation methods, as well as manipulation. The approaches that were adopted
have been clarified and justified by myself. I used applicable graphics to illustrate,
analyze, and correctly justify the results. Below is a supplementary document that uses R
programming techniques to represent the graph and code.

2. Aims
The primary goal of the research is to examine the gathering of hourly weather data and
categorize using a variety of approaches in compiling the necessary information for
making a choice.

3. Objectives
 To analyze the data, produce visualizations, and suggest a result.
 To Analyze the facts to make a decision on the weather conditions.

4. Assumptions
I have utilized R programming throughout the assignment to show and interpret a library
of hourly weather conditions using different methodologies and so many data fields.
Every year, I've begun utilizing applications like ggplot2 to assist us here in plotting
diverse types of infographics for a variety frames.

1
5) Analysis Descriptions

Figure 1: Install Package


To run the program, i have install ggplot2 library firstly then magrittr and dplyr library
also installed to run the program smoothly.

5.1 Analysis 1

Figure 2: Source code of Temperature


The viewing and modification method to information is used in this program. Statistics
are used to represent the set of data in a set. I was using the application geom histrogram
to create histograms for the information I was provided (). You may change colors and
contour of the bars in the chart by modifying the filling and colour settings. Binwidth=2
the accompanying code means so each summary bar is 2 inches wide. the accompanying

2
Figure 3: Histogram Of Temperature
On the X axis, the temperature is shown in Fahrenheit, and on the Y axis, it's shown in
counts. As can be seen by this histogram of the temperature graph, the temperature differs
each day. From the above histogram of temperature, we can see that the maximum
temperature is. 

3
5.2 Analysis2

Figure 4: Source code of Visible


The Histogram graph shown below is represented by the code below. The ggplot2
package is used to create the graphical representation in the code above. To represent
data in histogram bars as bars, use the geom histrogram() command, which is an alias for
geom bar plus stat bin. For changing the shape of bar and color in the graph, fill property
is used. As with visibility, miles shown the x-axis and count in the y-axis. To set the color
of the histogram bar fill, the parameter fills in geom_histrogram is set to brown.

Output

Figure 5: Histogram of visible


In the histogram of visible distance, we can see how often certain distances are observed
throughout the year. The distance where the frequencies are the highest in the histogram
is ten miles, which can be seen as the mostly visible distance. A left-tailed histogram is
also shown. As a result, the possibilities of a cloudy day or an invisible sky are relatively
low, which indicate that, the most of time weather is clearly.

4
5.3 Analysis 3

Code

Figure 6: source code of precipitation against Humid


Below is a diagram that shows humidity against precipitation. In the code, the X-axis
displays humidity, while the precipitation in inches shows on the Y-axis. In the
scatterplot below, humidiy and precipitation are presented across X and Y-axes,
respectively, using the geom_point() function. The opacity of the scatter plot is
represented by the alpha parameter in geom_point.
Output

Figure 7: Scatter plot of Precipitate against Humid

From the above scatter diagram of precipitated versus moisture, we show the moisture on
the x-axis as well as the corresponding amount of rainfall on the Y axis. Using the LGA
weather station as an example, the precipitate-humidity pair is shown in brown, while the
precipitate-humidity pair collected from JFK highlighted in red. Precipitation is slightly
higher when humidity is high, meaning that there is a tiny positive link between humidity
and precipitation. Humidity could also be caused by other sources.

5
5.4 Analysis 4

Figure 8: source code of wind speed


I defined the number of data points within a series using the Histograms in the example
above. For the provided data, I used the geom_histogram() tool to create histograms. In
the graph, the bar color and its shape are changed by the fill and the color. Similarly,
binwidth=2 indicates that each histogram bar is 2 cm wide in the code above.

Figure 9: Histogram of wind speed


We can see that, the X-axis in the above histogram reflects wind speed in mph, and the
Y-axis shows frequency. As seen in the graph, there is a peak at 10 mph, which
represents its average value.

6
5.5 Analysis 5

Figure 10: source code of precipitate


I utilized scatter plot to compute the amount of bits in a sequence, just like I did in the
prior case. The histograms were produced to use the geom_histogram() tools and the
input data. For changing the shape of bar and color in the graph, fill property is used. In
the same way, the x-axis for precipitation represents inch and the y-axis for the count. To
set the color of the histogram bar fill, the parameter fills in geom_histogram is set to
brown.

Output

Figure 11: Histogram of Precipitate

It is shown on the histogram how frequently each precipitation value occurs throughout
the year. The maximum value is 0.0 inches of precipitation, and therefore it does not rain

7
the majority of the time. Additionally, the histogram is slanted to the right. Thus, heavy
rainfall chances have been less.

5.6 Analysis 6

Figure 12: source code of Dew Point against Temperature


In the graphic below, the dew point is shown versus temperature. The X-axis indicates
temperature, whereas dew point in Fahrenheit represents on Y-axis. The scatterplot below
displays temperature and dew point on the X and Y axes, respectively, using the
geom_point() function.
Output

Figure 13: Scatter plot of Dew Point against Temperature

Above that the scattering figure shows a link among temperature as well as dew point,
with temperatures on the x-axis and vapor pressure on the y-axis. In the chart we also see
that, the observations obtained from LGA are brown dots, while those obtained from JFK

8
are red dots. A positive slope is indicated by the regression line. As a result, whenever the
temperature rises, the dew point rises with it.

5.7 Analysis 7

Figure 14: source code of wind Gust against Wind Speed


The following diagram below shows wind gusts versus wind speed. The X-axis in the
graph represents wind speed, while the Y-axis represents gusts measured in miles per
hour. The scatterplot below displays wind speed and wind gust across the X and Y axes,
respectively, using the geom_point() method. The alpha option in geom point signifies
the scatter plot's opacity.

Output

Figure 15: Scatter Plot of Wind Gust against Wind Speed


The figure of scatter above shows the connection among wind gust and wind speed, the
x-axis with wind direction and the y-axis show on wind gust. A positive slope is indicated

9
by the regression line. As a result, whenever the wind speed rises, the wind gust rises
with it.

5.8 Analysis 8

Figure 16: source code of Pressure against Temperature


In the graphic below, pressure is plotted versus temperature. In the graph, X-axis
indicates the temperature whereas Y-axis shows the pressure in milibars. In the
scatterplot below, temperature and pressure are presented across X and Y-axes,
respectively, using the geom_point() function. The opacity of the scatter plot is
represented by the alpha parameter in geom point.
Output

Figure 17: Scatter Plot of Pressure against Temperature


Wind pressure and temperature are shown as scatter plots on the graph. Temperature is
mapped along the x-axis and pressure along the y-axis. In the chart we can see that, the
observations obtained from LGA are brown dots, while those obtained from JFK are red
dots. With a slight negative slope, the regression line runs nearly parallel to the

10
temperature axis. As a result, wind pressure and temperature do not seem to be related.
Changes in one do not have a significant impact on the other.

5.9 Analysis 9

Figure 18: source code of Dew Point against Humid


In the graphic below, dew pint is plotted versus humid. The X-axis indicates humid,
whereas dew point in Fahrenheit represents on Y-axis. In the scatterplot below, dew point
and humid are presented across X and Y-axes, respectively, using the geom_point()
function. The opacity of the scatter plot is signified by the alpha parameter in geom point.

Output

Figure 19: Scatter plot of Dew point against humid

We can see the humidity and dew point relation above by observing the x-axis for
humidity and the y-axis for dew point. Using the LGA weather station as an example, the
precipitate-humidity pair is shown in brown, while the precipitate-humidity pair collected

11
from JFK highlighted in red. A positive slope is indicated by the regression line. As a
result, whenever the humidity rises, the dew point rises with it.

5.10 Analysis 10

Code

Figure 20: source code of Pressure against month


Altitude is compared against humidity in the chart following. The X-axis depicts the
monthly, while the Y-axis displays temperature in milibars. The geom boxplot() method
is used to depict calendar and temperature along X and Y axes in the q - q plot above.
Missing data entries are discreetly removed with notice when the na.rm=true option is set
to true.

Figure 21: Boxplot of Pressure against Month

The wind pressure in millibars for several months is shown in box plots with their
relevant five digits. Wind pressure is usually around 1020 millibars during most of the
months, but the minimum and maximum values vary depending on the season.

12
5.11 Analysis 11

Code

Figure 22: source code of Humid against visible


Below is a diagram showing humid against visible. The X-axis shows visible in miles,
whereas humid in Fahrenheit represents on Y-axis. In the boxplot below, month and
pressure are presented across X and Y-axes, respectively, using the geom_boxpolt()
function.

Output

Figure 23: Boxplot of Humid against visible


Various humidity values are represented by box plots with their five most significant
digits. With the rise in visibility, the humidity's median value is somewhat dropping.
Therefore, with a decrease in humidity, visibility increases.

13
5.12 Analysis 12

Code

Figure 24: Source code of Temperature against Month


In the graphic below, temperature is plotted versus humid. The X-axis shows the month,
whereas temperature in Fahrenheit represents on Y-axis. In the boxplot below, month
and temperature are presented across X and Y-axes, respectively, using the
geom_boxpolt() function. To set the color of the histogram bar fill, the parameter fill in
geom_boxpolt() is set to origin.

Output

Figure 25: Boxplot of Temperature against month


Temperatures for several months in degree Fahrenheit are shown in box plots with their
significance five figures. In the chart we can see that, the observations obtained from
LGA are brown dots, while those obtained from JFK are red dots. During the beginning
and end of the year, the temperature is lower, while it rises in the middle.

14
5.13 Analysis 13

Figure 26: source code of Wind Direction


The program above displays a polar bar map of direction of the wind. The ggplot2 library
is loaded and used there ggplot utility for graphical format. To present the data in bitmap
bar as bars, just use geom histrogram() function, that is an alternative for geom bar + stat
bin. The fill property is used to alter the shape and color of the chart's bars. The x-axis
represents the distance in hours, while the y-axis shows the number of miles. In geom
histogram, parameter fill is set to brown to change the color of the histogram bar fill.

Figure 27: Polar Bar Plot of Wind Direction


The wind flow of the directions in a polar form, with the width of the lines reflecting the
regularity with which wind blows in a certain direction, may be seen using the pole plot.
The wind usually blows around 3000 and 2750 degrees.

15
5.14 Analysis 14

Figure 28: source code of Matrix of Data Set


The above code shows the matrix of data set. Data frame combinations based on
variables are plotted by R's pair method in a plot matrix. In the above code, the pairs
command's basic syntax is seen. To set the color of the histogram fill, the parameter fill is
set to black in geom_scatterplot.

Figure 29: Scatter Plot of Matrix of Data Set

A scatter plot matrix shows all the weather variables such as humidity, wind gust, dew
point, pressure, wind speed, and temperature in a compact way during several months.

16
5.15 Extra Feature 1

Code

Figure 30: Source code of Weight density graph of wind speed


A density graph is produced by the above graph. Geom_density method from the ggplot2
package, i was used to build this graph. The data property is set to the data from the csv
file, and the x-axis value is set to wind speed. The plot is then created using the geom
density. The fill and color properties are used to give both airports different colors. the
graph's opacity level was establishing by used of the alpha property.
Output

Figure 31:Weight density graph of Wind Speed(Mph)


The wind density level is shown in this graph. We can see in this graph that the average
wind speed at JFK airport is about 12-15 mph, while it is around 8-10 mph at LGA. This
shows that on average JFK has higher wind gust speed than the LGA airport.

17
6. Conclusion
After finishing the R project's hourly weather data analysis, I experimented with
visualizing, analyzing, and manipulating the data by looking at the various graphs shown
above. Basically, to assist us plot various kinds of visuals for varying periods year round,
I utilized tools such ggplot2. Afterwards when, I generated a geo plot of meteorological
data but also 14 examples of visual analytics, experimentation, and exploit to gather the
knowledge I required to make a conclusion. In addition, I used preferred programming
standards such as comments, function naming conventions, and indentation. I added one
more element that might help with outcomes that aren't curriculum-related.

18

You might also like