CS215 LectureSlidesSet2 IntroductionToMachineLearning AI
CS215 LectureSlidesSet2 IntroductionToMachineLearning AI
Learning
Social Issues in the
Information Age
Jacob Levman, PhD
Associate Professor
Department of Computer Science
St. Francis Xavier University
Winter, 2024
Instructor
• Dr. Jacob Levman
Associate Professor, Department of Computer Science, St. Francis Xavier University
New Sample
Applications
Learning
Examples of
Interest
Result:
Examples
Sample is of
not of Machine Learning
Interest or
Interest
NOT
New Sample
Feedback critical for future
tech
Examples of
Interest
Result:
Examples
Sample is of
not of Machine Learning
Interest or
Interest
NOT
New Sample
MACHINE LEARNING BEHAVIOUR CHANGES WITH TRAINING DATA!
NEEDS RE-VALIDATION
Except on USS Voyager?
Examples of
Interest
Result:
Examples
Sample is of
not of Machine Learning
Interest or
Interest
NOT
New Sample
HOW TO HANDLE THIS PROBLEM?
IN CURRENT PRACTICE WE REMOVE THE ‘LIVE’ FEEDBACK
VALIDATION OCCURS ONCE
AI behaviour doesn’t change while in operation
Updates require extensive re-validation, new version release
Examples of
Interest
Result:
Examples
Sample is of
not of Machine Learning
Interest or
Interest
NOT
New Sample
Implications of removing live
feedback
New Sample
Data-driven approach
• Collect a dataset with example measurements (could be an
image for example) and labels
• Use machine learning to train a classifier
• Evaluate classifier on withheld set of test images
• Simple example of what API code will look like:
We package all that up with fair statistical
comparisons, feature selection, comparison
between many prominent ML technologies
and have a spreadsheet as the input to the
program! (no programming required)
https://www.medcalc.org/manual/roc-curves.php
› Area under the ROC curve: AUC, a robust metric for separation between
two groups, assessing Dx potential w/o knowing operating point in
advance
$200,000
$150,000
Loan$ Non-Default
$100,000 Default
$50,000
$0
15 20 25 30 35 40 45 50 55 60 65
Age
K-NN $250,000
$200,000
$150,000
Non-Default
Loan$ Default
$100,000
$50,000
$0
15 20 25 30 35 40 45 50 55 60 65
Age
• Strengths
• Simple and intuitive
• Effective (in a basic way!)
• Flexible decision boundary
• Weaknesses
• Easily misled by noise
• Easily misled by irrelevant features
• Must choose a distance function (Euclidean is often too simplistic)
• Vulnerable to high dimensionality problems
• Computation costs can be high
• Many irrelevant distances to distant training samples are computed
though unused
• How to handle unbalanced distributions (more of one group than the
other)?
• Results:
• Computer Engineering
• Electrical and Computer Engineering
• Medical Biophysics (Physics Stream)
• Imaging Research Postdoc
• Biomedical Engineering Postdoc
• Neuroscience Postdoc
Bringing Together Disparate Fields
Research Outline
• Physics Research
• Computational Neuroscience Research
• Machine Learning Research
• Neuroscience Research
Active and Passive fMRI for Presurgical Mapping of Motor and Language Cortex
By Bradley Goodyear, Einat Liebenthal and Victoria Mosher
DOI: 10.5772/58269
Research - fMRI
Research – Diffusion MRI
• Diffusion MRI based on Doppler effect
• Inherently less signal when based on phase shift
• Diffusion measurements acquired in many directions
• Reliability can be a challenge
• Particularly for making pretty tractograms
I Fragata, et al., Early Prediction of Delayed Ischemia …, Stroke 2017, 48(8): 2091-2097.
Research - Diffusion MRI
Computational Neuroscience
Research
Machine Learning Applications:
PICUs
• Predict cardiac arrest before it happens
• Predict renal failure before it happens
• Predict any actionable circumstances in the clinic
Machine Learning Research
Modelling the Radiologist
• Reliably predict when machines can equal or outperform the radiologist
• Report triage statistics, like statistical percentiles governing where a given sample/patient falls
relative to training (i.e. machine can report, 100% of patients this extreme have autism, 94% of
patients presenting like this turn out to have multiple sclerosis, 100% of all patients with this kind of
an image profile are healthy or would be called normal by a radiologist)
• Eventually machines will be handling more and more of their workload
Machine Learning Research
Video Processing
Machine Learning Research
COVID-19 Detection from Lung CT
Machine Learning Research
Brain MRI
• FreeSurfer measurements
• Talk about how future work will entail looking into fMRI and
diffusion difference abnormalities in a variety of medical
conditions
Acknowledgements
X X
X
XX XX
X X X
X X
X X
X X
X X
XX
X
• A technique demanded by many real world
tasks
– Bank/Internet Security: fraud/spam pattern discovery
– Biology: taxonomy of living things such as kingdom, phylum, class, order, family,
genus and species
– City-planning: Identifying groups of houses according to their house type, value, and
geographical location
– Climate change: understanding earth’s climate, finding atmospheric and oceanic
weather change patterns
– Finance: stock clustering analysis to uncover correlation underlying shares
– Image Compression/segmentation: coherent pixels grouped
– Information retrieval/organisation: Google search, topic-based news
– Land use: Identification of areas of similar land use in an earth observation database
– Marketing: Help marketers discover distinct groups in their customer bases, and
then use this knowledge to develop targeted marketing programs
– Social network mining: special interest group automatic discovery
• Imaging: Unsupervised learning is often called
image segmentation
https://www.mathworks.com/discovery/image-segmentation.html
Data courtesy of Boston
Children’s Hospital, Harvard
Medical School
Intro to Validation
• Ideal validation:
• Assessment on many independently acquired datasets
• Challenges: independent dataset often not available to researcher
• Alternative: Publish on a single dataset, validation by other researchers comparing
their work to your publication
• How to have confidence in self assessed ML performance on a single dataset?
• Validation!
• K-fold validation
• Randomized trials
• Leave one out
• Efron’s bootstrap
• Metrics to assess performance
• AUC
• OA
• Sensitivity
• Specificity
• PPV
• NPV
Background – Validation in Supervised
Learning: Leave-one-out validation
Background – Validation in Supervised
Learning: Leave-one-out validation
• Example on whiteboard
Supervised Learning Example
Revisited
Supervised Learning Example
Revisited
Supervised Learning Example
Revisited