CSE445 NSU Week - 1
CSE445 NSU Week - 1
- 400 data for PhD students and 300 for the MS candidates
ML algorithm iteratively makes predictions on the training data and is corrected by the
output/supervision of teachers
<X, y>.
(
Supervised Learning: Regression Problem
● Below is the table containing home prices in Monroe twp, NJ. Here price
depends on area (square feet), bedrooms and age of the home (in years).
Given these prices we have to predict prices of new homes based on
area, bedrooms and age.
● Given these home prices find out price of a home that has,
● 3000 sqr ft area, 3 bedrooms, 40 year old
● 2500 sqr ft area, 4 bedrooms, 5 year old
● Regression problem: continuous value output
Supervised Learning - Classification
Supervised learning - Classification
Regression
Unsupervised Learning
- Unlabeled dataset
- System learns without direct human supervision
- Unsupervised learning algorithms take a set of data that contains only inputs, and
find structure/patterns in the data, like grouping or clustering of data points.
- The algorithms therefore learn from test data that has not been labeled, classified or
categorized
- Instead of responding to feedback, unsupervised learning algorithms identify
commonalities in the data and react based on the presence or absence of such
commonalities in each new piece of data.
- Useful for business intelligence
The Cocktail Party Problem
● Encountered when sounds from different sources in the real-world mix in the air
before arriving at the ear
○ One solution could be to label a sample of the dataset and train the labeled portion to create a model. This would however mean
that we are not fully utilizing the larger dataset we have and thus the model that we create may be less robust
○ A potential solution: (1) Label a sample of the large dataset, (2) train a model using this labeled portion, (3) use the model to
predict the unlabeled portion (pseudo-labeling), (4) train using the entire dataset
○ e.g., Google Photos will cluster similar faces, and ask the user if they are the same person
Reinforcement Learning
● The learning system (agent) can:
the word “professor,” Genderify predicted a 98.4 percent probability for males.
Meanwhile, “stupid” returned a 61.7 percent female prediction
ML and Ethics (More Complex case)
- When applied to people, data mining is frequently used to discriminate - who
gets the loan, who gets the special offer, and so on.
- Certain kinds of discrimination - racist, sexual, religious and so on are not only
unethical but also illegal.
- Using sexual and racist information for medical diagnosis is certainly ethical
but using the same information when mining loan payment behavior is not.
ML and Ethics (Difficulty to solve it)
- Reidentification techniques has provided sobering insights into the difficulty
of anonymizing data.
- 85% of the Americans can be identified from just three pieces of information, zip code, birth
date and sex.