Machine Learning
Machine Learning
Learning
Why use Orange
Sieve Diagram
shows the frequencies in a two-way contingency table in relation to expected frequencies under
independence
highlights the pattern of association between the row and column variables.
A unit square is divided into rectangles, one for each cell in the contingency table
The height of each rectangle in row i is proportional to the marginal frequency in that row ( f i+ )
the width of each rectangle in column j is proportional to the marginal frequency in that column
( f +j)
the area of each rectangle is proportional to the expected frequency,
The observed frequency in each cell is shown by the number of squares drawn in each
rectangle.
the difference between observed and expected frequency appears as the density of shading,
using color to indicate whether the deviation from independence is positive (blue) or negative
(red).
What do we see
Performance
<2709.258257
2709.258257 – 2783.99236
2783.99236 – 2857.613382
>2857.613382
Using a scatter plot
Low performance C1
Low performance C2
More low performance instances in C2 say below 2400
Low performance C1
Low Performance C2
Finding Informative Projections
Scoring the plots for C2
Observe that the location 73 seems to feature in all the top score plots and
that most of the lowest performance seems most at tools 3 and 8
High Performance C2
Location 15 seems to feature in most top scorers and in particular
tool 4
High performance C1
Tool 3 at loc 73 seems to produce high performance results also.
So can conclude tool 3 is erratic??
Tool 7 at location 15
Performance Analysis: Beach Ball
Soccer Asian Championship
Data Collection using apps on android
StatWatch
StatMine in Play Store
Data
Importing Data
Check in Table
Hierarchical Clustering
One of the easiest techniques to cluster the data.
Find nearest neighbor by measuring distance for example using
euclidean
Eucledian distance calculated as
The shorter the distance the more similar the two instances are.
In the beginning, all instances are in their own particular clusters.
Then we seek for the closest instances of every instance in the plot.
We pin down the closest instance and make a cluster of the original and the
closest instance.
Now we repeat the process again. What is the closest instances to our new
cluster –> add it to the cluster –> find the closest instance.
We repeat this procedure until all the instances are grouped in one single
cluster.
Distances Widget
Hierarchical Clustering
Two Clusters
Comparison
Machine Vision Inspection at
Attach Work Center.
Data Description
The following data was obtained of images classified into 10 different categories
6310
6311
6314
6316
6323
6325
6326
6930
6983
6994
Images taken were inconsistent in terms of orientation, illumination and homogeneity
Each category only consisted between 4 to 6 images
6314
6316
Methodology
Image embedding was first applied to all images. This generated 4096
features for each image
Four images were separated as test samples
Image features
after embedding
Image Training
6326
The images
of 6326 are
confused
with some of
the images
of 6325 and
6323
6325
6323
Image prediction
It can be seen
that Logistic
regression and
Naïve Bayes has
predicted
correctly all
images. From the
confusion matrix
these categories
are with the
highest proportion
of actual images
predicted.