1152cs191 Data Visualization Unit V
1152cs191 Data Visualization Unit V
CO5
Engineering Knowledge
12/10/2024
Problem Analysis
Ethics
Visualization
Individual & Team Work
Communication
Mathematical Concepts
K2
Software Development
taxonomy)
revised Bloom’s
Level of learning
domain (Based on
Transferring Skills
Correlation of COs with Student Outcomes ABET
EAC and CAC
CO5 3 2 2 - - 2 3
CO5 3 2 2 - - 2
This social sharing feature is a helpful factor for the successful relay
of information between team members.
These are the latest trends in the global market for data visualization
tools.
These are the latest trends in the global market for data visualization
tools.
These are the latest trends in the global market for data visualization
tools.
•Data wrangling involves processing the data in various formats like - merging,
grouping, concatenating etc. for the purpose of analysing or getting them ready
to be used with another set of data.
•Python has built-in features to apply these wrangling methods to various data
sets to achieve the analytical goal.
• Data Acquisition
• Joining Data
• Data Cleansing
Extremely flexible.
Easy to use and fast.
Supports large datasets.
Declarative programming.
Code reusability.
Has wide variety of curve generating functions.
Associates data to an element or group of elements in the html
page.
It requires very less code and comes up with the following benefits
−
Great data visualization.
It is modular. You can download a small piece of D3.js, which
you want to use. No need to load the whole library every time.
Easy to build a charting component.
DOM manipulation.
<!DOCTYPE html>
<html lang = "en">
<head>
<script src = "/path/to/d3.min.js"></script>
</head>
<body>
<script> // write your d3 code here.. </script> </body>
</html>
Tableau can connect to files, relational and Big Data sources to acquire
and process data. The software allows data blending and real-time
collaboration, which makes it very unique.
Architecture Agnostic −
Tableau works in all kinds of
devices where data flows. 02 Real-Time Collaboration −
Tableau can filter, sort, and
Hence, the user need not
worry about specific 01 discuss data on the fly and
embed a live dashboard in
hardware or software portals like SharePoint site or
requirements to use Tableau. Salesforce.
NumPy: This package is essential for any data science project. It has
02 a lot of mathematical functions that operate on multi-dimensional
arrays and data frames.
Matplotlib & Seaborn: They are plotting and graphing libraries that to
03 visualize data in an intuitive way.
Creating a Chart
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0,10)
y=x^2
#Simple Plot
plt.plot(x,y)
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0,10)
y=x^2
#Labeling the Axes and Title
plt.title("Graph Drawing")
plt.xlabel("Time")
plt.ylabel("Distance")
#Simple Plot
plt.plot(x,y)
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(0,10)
y=x^2
#Labeling the Axes and Title
plt.title("Graph Drawing")
plt.xlabel("Time")
plt.ylabel("Distance")
data=[{2,3,4,1},{6,3,5,2},{6,3,5,4},{3,7,5,4},
{2,8,1,5}]
Index= ['I1', 'I2','I3','I4','I5']
Cols = ['C1', 'C2', 'C3','C4']
df = DataFrame(data, index=Index,
columns=Cols)
plt.pcolor(df)
plt.show()
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Scatterplots
Scatterplots show many points plotted in the Cartesian plane. Each point
represents the values of two variables. One variable is chosen in the
horizontal axis and another in the vertical axis.
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(50, 4),
columns=['a', 'b', 'c', 'd'])
df.plot.scatter(x='a', y='b')
# create data
x = np.random.rand(40)
y = np.random.rand(40)
z = np.random.rand(40)
colors = np.random.rand(40)
# use the scatter function
plt.scatter(x, y, s=z*1000,c=colors)
plt.show()
chart = plt.figure()
chart3d = chart.add_subplot(111,
projection='3d')
# Plot a wireframe.
chart3d.plot_wireframe(X, Y, Z,
color='r',rstride=15, cstride=10)
plt.show()
12/10/2024 partment of Computer Science & Engineering Data Visualization
Time Series
Time series is a series of data points in which each data point is associated
with a timestamp. A simple example is the price of a stock in the stock market
at different points of time on a given day. Another example is the amount of
rainfall in a region at different months of the year.
from datetime import datetime
import pandas as pd
import matplotlib.pyplot as plt
data = pd.read_csv('path_to_file/stock.csv')
df = pd.DataFrame(data, columns =
['ValueDate', 'Price'])
2. Server.R: This file contains the series of steps to convert the input
given by user into the desired output to be displayed.
Advantages :
Disadvantages :
These modalities also generate large quantities of noisy data that need
modern techniques of computational statistics for image
reconstruction, visualization and analysis.
Prediction error rates for a sequence of MR images are reported, where Obj
represents the object, Bg denotes the background, and Total refers to the
average error of the whole image
Parallel coordinate plot of financial ratios with skew distributions. Seven outliers
have been selected
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Outliers
Scatterplots of TL.TA, the ratio of total liabilities to total assets, plotted against
Total Assets. The let-hand plot presents all of the data, and it shows that all high
values of liabilities are associated with low asset values. The right-hand plot
presents a zoom of about 10-2 on the x-axis by 10-3 on the y-axis, along with
some α-blending
12/10/2024 DEpartment of Computer Science & Engineering Data Visualization
Outliers
Parallel coordinate plot of the financial ratios with skewed distributions. The
seven outliers selected have been removed. The plot’s (red) border is a sign that
not all data are displayed exceeding
Scatterplot of Sales vs. Total Assets with the seven outlying companies
highlighted (the lighter blob in the lower let corner)
A histogram of the current assets ratio on the let and a weighted histogram of the
same variable, weighted by Total Assets, on the right
Scatterplots of Cash.TA and Inv.TA, the ratios of cash and inventories to total
assets (let), and of CA.TA and Kap.TA, current assets and property to total assets
(right)
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Scatterplots
The Scatterplot with a smaller pointsize and α-blending to better display the
bivariate structures
Scatterplots of inventories against fixed assets (let) and of cash against current
assets (right)
Parallel coordinate plot of financial ratios and logTA, excluding 55 outliers, with
bankrupt companies highlighted
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Parallel Coordinate Plot
Parallel coordinate plot of financial ratios and logTA, excluding 55 outliers, with
bankrupt companies highlighted, α-blending=0.1only for unselected data
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Parallel Coordinate Plot
Parallel coordinate plot of financial ratios and logTA, excluding 55 outliers, with
bankrupt companies highlighted, α-blending=0.1only for unselected data
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Parallel Coordinate Plot
Parallel coordinate plot of financial ratios and logTA, excluding 55 outliers, with
bankrupt companies highlighted, α-blending=0.1only for selected and
unselected data
12/10/2024 Department of Computer Science & Engineering Data Visualization
Parallel Coordinate Plot
Scatterplots of the ratios of intangibles and cash to total assets, with companies
that went bankrupt selected. More α-blending has been used in the right-hand
plot
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Parallel Coordinate Plot
Parallel boxplots of logTA by year from to , all on the same
scale
Parallel boxplots of financial ratios and logTA for all companies. he background boxplots are
for all of the data, and the superimposed standard boxplots are for the selected cases,
companies with Total Assets 1000
12/10/2024 Department of Computer Science & Engineering Data Visualization
Parallel Coordinate Plot
Spinograms of the ratios Cash.TA, Inv.TA, Kap.TA, and Intg.TA. he companies with Total
Assets 1000 have been selected
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Parallel Coordinate Plot
Parallel boxplots of financial ratios and logTA for the 18610 companies with Total
Assets 1000. Companies that went bankrupt have been selected
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Graphical Data Representation
in Bankruptcy Analysis
A classification example. The boundary between the classes of solvent (black triangles)
and insolvent (white squares) companies was estimated using DA, the logit regression
(two indistinguishable linear boundaries) and an SVM (a nonlinear boundary) for a
subsample of the Bundesbank data.The background corresponds to the PDs computed
with an SVM Department of Computer Science & Engineering Data
12/10/2024 Visualization
Parallel Coordinate Plot
One-year cumulative PDs evaluated for several financial ratios from the German
Bundesbank data. The ratios are net income change (K21), net interest ratio (K24),
interest coverage ratio (K29), and logarithm of total assets (K33). he k nearest
neighbors procedure was used with a window size of around 8% of all of the
observations. The total number of observations is 553500
Department of Computer Science & Engineering Data
12/10/2024 Visualization
SVM Approach
Accuracy ratios for univariate SVM models. Box plots are estimated based on
100 random subsamples. he AR for the model containing only the random
variable K10 is zero
Department of Computer Science & Engineering Data
12/10/2024 Visualization
AR Model
Accuracy ratios for SVM models with eight variables. Each model includes the
variables K5, K29, K7, K33, K18, K21, K24, and one of the remaining
variables. Box plots are estimated based on 100 random subsamples
Department of Computer Science & Engineering Data
12/10/2024 Visualization
AR Model
Shapes of the mean excess function e(x) for the log-normal (dashed line),
gamma with α < 1 (dotted line), gamma with α >1 (solid line) and a mixture of
two exponential distributions (long-dashed line).
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Log-normal, Pareto, and Burr distributions
Shapes of the mean excess function e(x) for the Pareto (dashed line), Burr
(long-dashed line),Weibull with τ < 1 (solid line) andWeibull with τ >1 (dotted
line) distributions. From XploRe
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Log-normal, Pareto, and Burr distributions
The empirical mean excess function ˆen(x) for the PCS catastrophe loss
amounts in billions of USD (Left panel) and waiting times in years (Right
panel).
Department of Computer Science & Engineering Data
12/10/2024 Visualization
Pareto Probability Plot
Pareto probability plot of the PCS loss data. Apart from the two very extreme
observations (Hurricane Andrew and Northridge Earthquake), the points (crosses)
more or less constitute a straight line, validating the choice of the Pareto
distribution. he inset is a magnification of the bottom let part of the original plot.
From the Ruin Probabilities Toolbox
12/10/2024 Department of Computer Science & Engineering Data Visualization
Log-normal Probability Plot
Log-normal probability plot of the PCS loss data. The x-axis corresponds to
logarithms of the losses. The deviations from the straight line at both ends
question the adequacy of the log-normal law. From the Ruin Probabilities
Toolbox
Log-normal probability plot of the PCS waiting time data. The x-axis
corresponds to logarithms of the losses. From the Ruin Probabilities Toolbox
Exponential probability plot of the PCS waiting time data. The plot deviates
from a straight line at the far end. From the Ruin Probabilities Toolbox
Ruin probability plot with respect to the time horizon T (left axis, in months)
and the initial capital u (right axis, in million DKK).
https://elitedatascience.com/python-data-wrangling-tutorial
https://public.tableau.com/views/Assignment13TableauVisualizations/A
ssignment13?:embed=y&:showVizHome=no&:showTabs=y&:display_c
ount=y&:display_static_image=y&:bootstrapWhenNotified=true
https://www.youtube.com/watch?v=LoKR70IB8Xk
https://scitools.org.uk/cartopy/docs/latest/index.html
https://scitools.org.uk/cartopy/docs/latest/crs/projections.html#cartopy.c
rs.EquidistantConic
https://scitools.org.uk/cartopy/docs/latest/matplotlib/intro.html
https://www.analyticsvidhya.com/blog/2016/10/creating-interactive-
data-visualization-using-shiny-app-in-r-with-examples/
• http://techslides.com/over-1000-d3-js-examples-and-demos
• http://christopheviau.com/d3list/
• https://www.tutorialspoint.com/online_d3js_editor.php
• https://www.tutorialsteacher.com/d3js/create-bar-chart-using-d3js
• https://www.tutorialsteacher.com/d3js/create-svg-chart-in-d3js
• https://www.d3-graph-gallery.com/graph/line_basic.html
• https://basemaptutorial.readthedocs.io/en/latest/first_map.html
• https://www.earthdatascience.org/courses/scientists-guide-to-plotting-da
ta-in-python/plot-spatial-data/customize-raster-plots/interactive-maps/
• https://rosenfelder.ai/create-maps-with-python/
• https://developers.google.com/earth-engine/guides/exporting
• http://techslides.com/over-1000-d3-js-examples-and-demos