Module 4 - Data Exploration and Visualization
Module 4 - Data Exploration and Visualization
• Exploring Data
• Visualizing Data
Exploring Data
1 ABC … 12.2
– Map to and from Azure ML tables
2 XYZ … 13.1
5 ABC … 3.75
• Common Tasks:
– Subsetting by rows and columns
– Logical filtering of rows and columns
Dplyr
library(dplyr)
Col1 Col2 Col3
2012 14 45
2013 13 76
2013 34 65
2014 23 47
import pandas as pd
import os
dir = "c:\data"
file = "values.csv"
path = os.path.join(dir, file)
frame1 = pd.read_csv(path)
Col1 Col2 Col3
2012 14 45
2013 13 76
2013 34 65
2014 23 47
frame1 = frame1["Col2"]
Col1 Col2 Col3
2012 14 45
2013 13 76
2013 34 65
2014 23 47
frame1 = frame1[1:3:1]
Col1 Col2 Col3
2012 14 45
2013 13 76
2013 34 65
2014 23 47
frame1 = frame1[:3]
Col1 Col2 Col3
2012 14 45
2013 13 76
2013 34 65
2014 23 47
frame1 = frame1["Col2"][1:2]
Col1 Col2 Col3
2012
4 14
4 45
4
2013 13
21 76
58.25
2013
0.816497 34
9.763879 65
14.863266
2014
2012 23
13 47
45
2012.75 13.75 46.5
…
frame1 = frame1.describe()
Col1 Col2 Col2 Col3 Col3 Col4
2012 14 14 45 45 59
2013 13 13 76 76 89
2013 34 34 65 65 99
2014 23 23 47 47 70
isnull()
groupby(key|expression, axis)
copy()
where(Boolean)
Other Operations
Pandas.DataFrame.apply(function, axis)
Pandas.Series.Map(function, dictionary | series)
Pandas.DataFrame.applymap(function)
Col1 Col2 Col2 Col3 Col3
2012 14 14 45 45
2013 47 13 141 76
2013 23 34 47 65
2014 23 47
frame1= frame1.groupby("Col1").sum()
R Data Frames in Azure ML
Azure ML
Dataset
Azure ML Table
Execute R Script
Data Frame
1 2
R Device Port
Python Data Frames in Azure ML
Azure ML
Dataset
Azure ML Table
Data Frame
1 2
Device Port
Data Types and Metadata
• Data types
• Continuous and discreet values
• Categorical variables
• Azure ML tools
• Quantization of categorical variables
Azure ML Table Data Types
• Numeric; Floating Point • Categorical
• Numeric: Integer • Date-time
• Boolean • Time-Span
• String • Image
• Scatter
• Scatter plot matrix
• Line plots
• Bar plots
• Histograms
• Box plots
• Violin plots
• Q-Q plots
Aesthetics for visualization
• Allow projection of additional dimensions
• But don’t over do it!
• Color
• Shape
• Size
• Transparency
• Aesthetics specific to plot type
Scatter plot
Scatter plot (larger point size)
Scatter plot (+ color by category)
Scatter plot (+ shape by category)
Scatter plot (+ alpha = 0.3)
Scatter plot matrix
Line plot
Bar Plot - unordered
Bar Plot - ordered
Histogram
Box Plot (group by category)
Violin Plot (group by category)
Q-Q Normal Plot
Conditioned Plots
Conditioned plots
1. Import library
library(ggplot2)
geom_bar
geom_boxplot
geom_histogram
geom_line
geom_point
stat_smooth
stat_hexbin
ggplot2 Options and Asthetics
facet_grid()
xlab(), ylab()
ggtitle()
shape
color
alpha
size
Execute R Script
Azure ML Tables zip file
source("src/myScript.R")
maml.mapOutputPort(“myFrame")
plots
fig1.savefig('scatter2.png')
5. Save figure
Python Plotting in Azure ML
def azureml_main(frame1):
• ax – pyplot axis
• x, y – coordinates
• color – line or symbol color
• s – size by value
• shape
• alpha – transparency
Execute Python Script
Azure ML Tables zip file
import my_package
return myFrame
fig.savefig(‘fig.png')