0% found this document useful (0 votes)
11 views

Module 3 - Time Oriented Data-1

Uploaded by

sanaaaaaa0.2
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Module 3 - Time Oriented Data-1

Uploaded by

sanaaaaaa0.2
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 78

Module 3

• Visual Analysis of data from various domain [12 Hrs] [Bloom’s Level
Selected: Apply]

• Time-oriented data visualization – Spatial data visualization, Text data


visualization – Multivariate data visualization and case studies,
Finance- marketing-insurance-healthcare etc.
Time Oriented Data Visualisation
What is Time Series Data
• A time series is a collection of data points gathered over a period of time and ordered
chronologically.
• The primary characteristic of a time series is that it’s indexed or listed in time order, which is a
critical distinction from other types of data sets.

• Time-series data analysis is becoming very important in so many industries like financial
industries, pharmaceuticals, social media companies, web service providers, research, and
many more
Examples of Time Series Data:
• Electrical activity in the brain
• Rainfall measurements
• Stock prices
• Number of sunspots
• Annual retail sales
• Monthly subscribers
• Heartbeats per minute
How is Time measured?
• Scale
• When are the data measurements/samples taken?
• Ordinal -- before, during, after
• Discrete -- clear intervals (seconds, minutes, hours.....)
• Continuous -- mapping to the real numbers. Discrete values can be interpolated
• Scope
• The range of time associated with a measurement/sample
• Point -- the sample is from a point in time that has no duration
• Interval-based -- there is a duration; a start and end
• These time primitives can be anchored (absolute) or unanchored (relative)
How is Time measured?
• Arrangement
• Time often has a cyclical nature, compared to the linear nature described above
• Hourly Cycle
• 24 hour cycle in a daily cycle
• 7 days in a Weekly Cycle (Mon->Tues....Sun->Mon)
• 30 days in a Monthly Cycle
• Lunar cycle
• Quarterly/Seasonal Cycle (financial, astronomical, meteorological)
• 365 days, 52 weeks, 12 months in a Yearly Cycle
• Decades
How is Time measured?
• Viewpoint

• Ordered -- Samples occur one after the other

• Branching -- There may be a partial ordering; events may occur in parallel streams.

• Mulitple Perspectives -- same event observed at different times by different observers,


e.g., news reports filed on the same event by different reporters, repeated experiments have
steps or events with skewed time intervals
How is Time measured?
Characteristics of Time Oriented Data
• Scale:
• Quantitative -- numeric, ordinal, discrete, continuous
• Qualitative -- nominal, categorical
• Frame of reference
• abstract -- data may include time, but not spatial
• spatial -- there is a location associated with the data
• Kind of data-- scope revisited
• events -- point of time that the data represents
• states -- data represents information between events, in an interval
Characteristics of Time Oriented data
• Number of variables
• Univariate :A univariate time series consists of the values taken by a single variable at
periodic time instances over a period
• Multivariate: multivariate time series consists of the values taken by multiple variables at
the same periodic time instances over a period.
Characteristics of Time Oriented data
Time Oriented Data Analysis and Visualization
• The following libraries can be used to analyse and visualize the time series data
1. Numpy
2. Pandas
3. SciPy
4. Scikit learn
5. Statsmodels
6. Matplotlib
7. Seaborn
8. DateTime
Date and Time Data Types and Tools
The Python standard library includes data types for date and time data, as well as
calendar-related functionality.
Data Types in Date Time Module
Converting Between String and Datetime
You can format datetime objects and pandas Timestamp objects,
You can use these same format codes to convert strings to dates using date time.strptime
Time Series Basics
A basic kind of time series object in pandas is a Series indexed by timestamps
Indexing, Selection, Subsetting
Time series behaves like any other pandas.Series when you are indexing and select ing data based on label:

Slicing with datetime objects works as well:


Because most time series data is ordered chronologically, you can slice with time stamps not contained in a
time series to perform a range query:
Truncate, that slices a Series between two dates:

Time Series with Duplicate Indices


We can tell that the index is not unique by checking its is_unique property:
Date Ranges, Frequencies, and Shifting
Generating Date Ranges
Resampling and Frequency Conversion
• Resampling refers to the process of converting a time series from one frequency to another.
Aggregating higher frequency data to lower frequency is called downsampling, while converting
lower frequency to higher frequency is called upsampling.
Spatial Data Visualisation
What is Spatial Data?
• The data or information that identifies the geographic location of features and boundaries On
earth, such as natural and constructed features like Ocean, lake, pond etc.
• Spatial data is the information about the shape, location and relationship of geographic
features.
• Two basic types of spatial data models have evolved for storing geographic data digitally
• These are referred as:
• Vector data
• Raster data
Types of Spatial Data
Types of Spatial Data
Vector Data
• Vector data provide a way to represent real world features within the GIS environment.
• A vector feature has its shape represented using geometry.
• The geometry is made up of one or more interconnected vertices.
• A vertex describe a position in space using an x, y and optionally z axis.
• In the vector data model, features on the earth are represented as:
• points
• lines / routes
• polygons / regions
• TINs (Triangulated Irregular Networks)
Vector Data
• This system of recording features is based on the interaction between arcs and nodes,
represented by points, lines and polygons.
• A point is a single node, a line is two nodes with an arc between them.
• A polygon is a closed group of three or more arcs.
• With these three elements , it is possible to record almost all necessary information.
Vector Data
• Advantages:
• accurately representing true shape and size
• representing non-continuous data (e.g., rivers, political boundaries, road lines)
• Vectors can store information About topology
• A vector data model uses points stored by their real (earth) coordinates and so requires a
• precise coordinate system.
• Geographic Coordinate System Latitude/Longitude
• Cartesian Coordinate Systems X,Y Coordinate system
Vector Data
• Disadvantages:
• The location of each vertex needs to be stored explicitly.
• Vector data must be converted into a topological structure.
• This is often processing intensive and usually requires extensive data cleaning.
• Updating or editing of the vector data requires re-building of the topology.
Raster Data
• Raster Data is cell –based data such as aerial imagery and digital elevation models.
• Raster data is characterized by pixel values.
• Basically, a raster file is a giant table, where each pixel is assigned a specific value from 0 to
255.
• The meaning behind these values is specified by the user – they can represent elevations,
temperature, hydrology and etc.
Impact of Resolution

Portraying large areas at high


Storage space increases by the
precision is problematic
square of the resolution
Raster Data
• Advantages:
• Raster is the best way to store continuously changing values such as elevation, slope.
• Analysis faster and more flexible then vector for many application.
• Rapid computations ("map algebra") in which raster layers are treated as elements in
mathematical expressions
Raster Data
• Disadvantages:
• It is especially difficult to adequately represent linear features depending on the cell resolution.
• Network linkages are difficult to establish.
• Processing of associated attribute data may be cumbersome if large amounts of data exists.
• Raster maps inherently reflect only one attribute or characteristic for an area.
• Most output maps from grid-cell systems do not conform to high-quality cartographic needs.
Vector Vs Raster model

Vector Model Raster Model


Spatial data Visualization
TEXT DATA VISUALIZATION
Text Data Visualization
• Text visualization is a visual way of presenting information—word
clouds, graphs, maps, timelines, networks and more, can all be used
to visualize text data.
• Provides a brief understanding of the most important keywords, and
sums up and communicates trends and frameworks within a specific
text.
Text Visualization is Useful for:

• Condensing a lot of content. Cut down on time spent reading


by emphasizing central phrases across multiple texts, grouping
content by topic, sentiment and more.
• Simplifying text data. Our brains are wired to enjoy and make
sense of visual data and it’s proven that we sort through images
quicker than we do with the written word.
• Determining insights in qualitative data. Customer feedback is jam-
packed with practical insights. You’ll get an effective outline of the
products, features and subjects that matter most to your clientele and
the opportunity to figure out not only their pain points but where you’re
succeeding with them.
• Discover hidden trends. Use text analysis and gradually visualize
insights in order to spot easily any inconsistencies and figure out the
leading causes.
Why Text Visualizations are Necessary

• Makes Text Data Easy to Grasp


• Communicates What’s on Your Audience’s Mind
• Condenses Big Volumes of Text
• It is Simple and Direct
Text Data Visualization Examples

• Word Cloud
• Slope chart
• Sankey chart
• Collocate Clouds
• Word Art
Text Data Visualization Examples
• Word Cloud
A word cloud is a grouping of
keywords or tags using a
particular color and font size to
create a representation of a
shape or figure you can easily
recognize
Slope chart
• If you’re wanting to highlight
transitions, absolute values,
rankings and variations in the long
term, then slope charts or graphs
are the right text data visualization.
• Slope charts/graphs are the
perfect text visualization example
when comparing time periods or
other points of reference and want
to underline rises and drops
across diverse categories between
two data points
Sankey Chart
• With a Sankey Chart, you can
visualize how one group of
values flows to the next group.
These two interconnected points
are called ‘nodes’ and the
connections are ‘links’.
MULTIVARIATE DATA VISUALIZATION
Introduction
• Multivariate data visualization is a way to display and analyze data with more
than two variables. It allows you to see relationships between multiple
variables at once and can help identify patterns and trends in the data.
• There are several types of multivariate data visualizations, including scatter
plots, heatmaps, parallel coordinate plots, and treemaps. Each type of
visualization has its strengths and weaknesses, and the choice of which one to
use depends on the data being analyzed and the questions being asked.
Types of Multivariate Data Visualization
1. Scatter Plots
2. Parallel coordinate plots
3. Tree maps
4. Line Graphs
5. Region Based Techniques
1. Scatter Plots
• Scatterplots and scatterplot matrices -- take pairs of attributes, generally ordinal but not
necessarily, and plot the values for immediate determination of relationships.
• Selected data is shown in red. Raw data sample below of the iris dataset.

• Sepal Petal
Len Width Len Width
5.1 3.5 1.4 0.2
4.9 3.0 1.4 0.2
4.7 3.2 1.3 0.2
4.6 3.1 1.5 0.2
5.0 3.6 1.4 0.2

2. Parallel coordinate plots
• A parallel coordinate plot maps each row in the data table as a line, or profile. Each
attribute of a row is represented by a point on the line. This makes parallel coordinate
plots similar in appearance to line charts, but the way data is translated into a plot is
substantially different.
• Consider, for example, a data table where a laboratory has measured the amount of
various carbohydrates contained in various fruit and vegetables.
4. Line Graphs
• A line plot comprises dots connected by a line that shows the relationship between the x
and y variables. The x-axis usually contains time intervals, while the y-axis holds a
numeric variable whose changes we want to track over time
5. Region Based Techniques
• Bar charts and histograms. Often the width of the bar is not significant.
• Stack bar, clustered bar chart. 3D bar charts. Pie charts.
• Cityscape charts on geospatial plots.
Case Study 1: Cause of Deaths in the United
States (1999–2015)
This case study will try to answer the following questions:

• What is the total number of records in the dataset?

• What were the causes of death in this data set?

• What was the total number of deaths in the United States from 1999 to 2015?

• What is the number of deaths per each year from 1999 to 2015?

• Which ten states had the highest number of deaths overall?

• What were the top causes of deaths in the United States during this period?
Data Gathering

The data set in this case study comes from open data from the U.S. government,
which can be accessed through
https://data.gov
You can download it from here:

https://catalog.data.gov/dataset/age-adjusted-death-rates for-the-top-10-leading-ca
uses-of-death-united-states-2013
Data Analysis
• What is the total number of recorded death cases?
• What were the causes of death in this dataset?
Unique States in the Study
• What was the total number of deaths in the United
States from 1999 to 2015?
• What is the number of deaths for each year from 1999 to
2015?
• Which ten states had the highest number of deaths ?
• What were the top causes of deaths in the United States
during this period?
Findings

You might also like