Fall2024 W4995 Lecture1
Fall2024 W4995 Lecture1
Fall 2024
Lecture 1
Dr. Vijay Pappu
A little about me...
B.Tech Ph.D.
● Course grading:
○ 5 programming assignments - 50%
○ 1 in-class midterm - 20%
○ 1 project - 30%
https://en.wikipedia.org/wiki/Machine_learning
I like this definition better…
https://en.wikipedia.org/wiki/Machine_learning
Heuristics v.s. ML system
Siamese - angular-looking
with black and tan coloring.
Calico
Heuristics ML system
The short answer is…
EVERYWHERE
(Obvious) Examples of Machine Learning
(Obvious) Examples of Machine Learning
(Not so obvious) Examples of Machine Learning
Computer vision identifies crop health issues on a tray Example of the various recipes being tested for one
of arugula crop over time.
https://boweryfarming.com/artificial-intelligence/
(Not so obvious) Examples of Machine Learning
https://www.nature.com/articles/d41586-020-03348-4
Until 1990’s...
Recently...
One of the reasons...
Supervised Learning
● Supervised learning algorithms learn a function that maps inputs to an output
from a set of labeled training data.
Unsupervised Learning
● Unsupervised learning algorithms learn patterns from unlabeled data samples.
Reinforcement Learning
Deep Learning
● Deep learning is a class of ML algorithms that uses multiple layers to
progressively extract higher-level features/abstractions from raw inputs.
What about others?
● Active learning
● Self-supervised learning
● Transfer learning
● Generative AI
● …
Large Language Models (LLMs)
● Large Language Models (LLMs) are a subset of deep learning models trained on massive corpus of
text data.
● LLMs perform extremely well on a wide range of natural language tasks:
○ Natural Language Understanding: excel at tasks like sentiment analysis, NER & Q&A
○ Text Generation - generate human-like text for chatbots and other content generation tasks
● LLMs typically consist of billions of parameters and are trained using a Transformer Architecture
Computational Breakthrough in
Big data
power1 Deep Learning
[1] - https://www.offgridweb.com/preparation/infographic-the-growth-of-computer-processing-power/
One example...
Computers have become powerful and accessible...
Data is publicly available…
https://datasetsearch.research.google.com/
https://www.kaggle.com/datasets
Access to ML is being democratized…
Ethics
Explainability
Python is the de-facto language for ML
http://r4stats.com/articles/popularity/
Great suite of matured libraries for ML tasks
Course schedule
Lecture Topics By the end of class
5 1. Model evaluation
2. Calibration
3. Automatic machine learning
Course schedule
Lecture Topics By the end of class
Midterm
12 1. ML in production
2. Course Recap
Questions?
Let’s take a 10 min break!
Exploratory Data Analysis
&
Visualization
Exploratory Data Analysis (EDA) is an approach of analyzing
datasets to summarize their main characteristics, often using
statistical graphics and other data visualization methods.
https://en.wikipedia.org/wiki/Exploratory_data_analysis
Why do we do EDA?
● Explore
● Inform
● Communicate
Data types
● Quantitative/numerical continuous - 1, 3.5, 100, 10^10, 3.14
● Quantitative/numerical discrete - 1, 2, 3, 4
● Qualitative/categorical unordered - cat, dog, whale
● Qualitative/categorical ordered - good, better, best
● Date or time - 09/15/2021, Jan 8th 2020 15:00:00
● Text - The quick brown fox jumps over the lazy dog
Data Visualization
Ugly, Bad & Wrong figures
● Ugly
○ A figure that has aesthetic problems but otherwise is clear and informative
● Bad
○ A figure that has problems related to perception; it may be unclear, confusing, overly
complicated, or deceiving
● Wrong
○ A figure that has problems related to mathematics; it is objectively incorrect
Ugly, Bad & Wrong figures
Aesthetics in data visualization
● Aesthetics refer to a quantifiable set of features that are mapped to the data in
a graphic.
● Aesthetics describe every aspect of a given graphical element.
● Some aesthetics like position, size, color and line width work for both
continuous & discrete data, while others (shape & line type) work for only
discrete data
Scales
● Scales are the mapping between data values and aesthetics values.
Data
Aesthetics
Scales
A typical data visualization chart
● A typical data visualization chart uses three scales.
Ridgeline plot
Visualizing proportions
Visualizing proportions - pie charts, stacked & side-by-side bars
● Pie charts help visually emphasize simple fractions, such as ½, 1/3 , ¼ etc.
Correlation
coefficient
sample means
Visualizing X-Y relationships - correlograms
Correlations between mineral content obtained from 214 glass samples during forensic work
Visualizing uncertainty
Visualizing uncertainty - probability distribution
The blue party is predicted to win over the yellow party by ~1 percentage point with
a margin of error of 1.76 percentage points.
Visualizing uncertainty - population & sample
Visualizing uncertainty - confidence intervals