Intro To ML
Intro To ML
Raphael Cóbe
[email protected]
Machine Learning
Links and References
Definitions
Science (or art) of computer programming so that they can learn from data;
”Field of study that gives computers the ability to learn without being explicitly
programmed”. Arthur Samuel, 1959
A deterministic algorithm has clear rules to return results according to the provided
input.
If the input can vary widely, this set of rules will be very large, making the execution
time unfeasible.
Supervised Learning
Machine Learning
Supervised Learning
Involves modeling the relationship between data’s characteristic measures and some
associated data label
The determined model can be used to apply labels to new data
Types of supervised algorithms
• Classification: labels are discrete categories
• Example of spam filter: Emails are marked as spam or non-spam. Model classifies new emails
• Regression: labels are continuous quantities
• Example: predicting the price of a car considering a set of predictor variables (mileage, age,
brand)
Classification vs Regression
In a nutshell:
• Classification is the task of predicting a discrete class label.
• Regression is the task of predicting a continuous quantity.
There’s some overlap between classification and regression algorithms; for example:
• A classification algorithm can predict a continuous value, but the continuous value is in the
form of a probability for a class label.
• A regression algorithm can predict a discrete value, but the discrete value in the form of an
integer quantity.
Key Characteristics
For any problem to be investigated as Machine Learning, we have some common
characteristics:
• Samples: rows in the dataset
• Features: columns in the dataset
• Feature Matrix: Combination of rows and features
• Target vector: column to be predicted
The process of organizing data according to the defined model involves the following
activities:
• Exchange categorical or ordinal data for numbers
• Change the scale of the data
• Eliminate missing values or replace them with another value
• Separate predictor variables and target variables
• Split the dataset into training and testing