Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 34
The Machine Learning Landscape
• Common Misconception: Machine Learning = Robots
(helpful or harmful) • Reality: Machine Learning is already here (decades old) • Examples of Existing Machine Learning Applications: – Optical Character Recognition (OCR) – Spam Filters (1990s) • Machine Learning in Everyday Products & Features: – Improved Recommendations (e.g., online shopping) – Voice Search Introduction • What is Machine Learning? – Not simply downloading data – Machine Learning Exploration: – Understanding Core Concepts • Key Regions & Landmarks: – Supervised vs Unsupervised Learning – Online vs Batch Learning – Instance-based vs Model-based Learning What Is Machine Learning? • Science & Art of Programming Learning Computers they learn from data • General Definition: Give computers ability to learn without explicit programming (Arthur Samuel, 1959) • Engineering Definition: Program learns from experience to improve performance on a specific task (Tom Mitchell, 1997) • Machine Learning in Action - Spam Filter Example – Task (T): Flag spam emails – Experience (E): Training data (examples of spam & non-spam emails) – Performance Measure (P): Accuracy (ratio of correctly classified emails) Cont... • Machine Learning vs. Just Downloading Data – Downloading data (e.g., Wikipedia) doesn't make a computer learn or improve at tasks – Machine Learning requires using data to improve performance on a specific task Why Use Machine Learning? The traditional approach - a spam filter using traditional programming technique 1. First you would look at what spam typically looks like. You might notice that some words or phrases (such as “4U,” “credit card,” “free,” and amazing”) tend to come up a lot in the subject. Perhaps you would also notice a few other patterns in the sender’s name, the email’s body, and so on. 2. You would write a detection algorithm for each of the patterns that you noticed, and your program would flag emails as spam if a number of these patterns are detected. 3. You would test your program, and repeat steps 1 and 2 until it is good enough Problem - your program will likely become a long list of complex rules— pretty hard to maintain. Traditional Programming: • Pros: • Cons: – Precise control over – Requires manual coding for program logic and every specific task functionality – Can be inflexible for – Easier to understand and adapting to new data or interpret the code situations – More efficient for well- – Difficulty in handling defined tasks with clear complex or large datasets rules – Time-consuming to modify – Often faster for simpler or update code for changing tasks requirements Machine Learning approach
The program is much shorter, easier to maintain,
and most likely more accurate
A spam filter based on Machine Learning
techniques automatically learns which words and phrases are good predictors of spam by detecting unusually frequent patterns of words in the spam examples
if spammers notice that all their emails
containing “4U” are blocked, they might start writing “For U” instead. A spam filter using traditional programming techniques would need to be updated to flag “For U” emails. If spammers keep working around your spam filter, you will need to keep writing new rules forever Machine Learning Approach: • Pros: • Cons: – Learns from data, improving – Can be less interpretable ("black performance over time box") - understanding how the – Can identify patterns and model arrives at a decision can be make predictions in difficult complex data – Requires expertise in Machine – Adapts to new data and Learning and data preparation situations without explicit – Training data can be time- programming consuming and expensive to collect – Efficient for handling large and label and evolving datasets – Performance can be unpredictable and may require ongoing fine-tuning Automatically adapting to change • spam filter based on Machine Learning techniques automatically notices that “For U” has become unusually frequent in spam flagged by users, and it starts flagging them without your intervention
• Speech recognition - to spell “one”,
Two • No algorithm and complex • So machine learning can be used by providing numerous recordings. Machine Learning can help humans learn Machine Learning can help humans learn • ML algorithms can be inspected to see what they have learned (although for some algorithms this can be tricky). • For instance, once the spam filter has been trained on enough spam, it can easily be inspected to reveal the list of words and combinations of words that it believes are the best predictors of spam. Sometimes this will reveal unsuspected correlations or new trends, and thereby lead to a better understanding of the problem. • Applying ML techniques to dig into large amounts of data can help discover patterns that were not immediately apparent. This is called data mining. summary: • Traditional programming is ideal for well-defined tasks with clear rules and where precise control is needed. • Machine Learning is a powerful tool for complex problems with large datasets, where the ability to learn and adapt is crucial. Machine Learning is great for: • Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform better. • Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution. • Fluctuating environments: a Machine Learning system can adapt to new data. • Getting insights about complex problems and large amounts of data Types of Machine Learning Systems • Classify them in broad categories based on: – Whether or not they are trained with human supervision (supervised, unsupervised, semisupervised, and Reinforcement Learning) – Whether or not they can learn incrementally on the fly (online versus batchlearning) – Whether they work by simply comparing new data points to known data points,or instead detect patterns in the training data and build a predictive model, much like scientists do (instance-based versus model-based learning) Supervised/Unsupervised Learning
• There are four major categories according to the
amount and type of supervision they get during training – supervised learning, – unsupervised learning, – semisupervised learning, and – Reinforcement Learning Supervised learning According to the amount and type of supervision they get during training A typical supervised learning task is classification. The spam filter is a good example of this: it is trained with many example emails along with their class (spam or ham),and it must learn how to classify new emails Supervised learning and Type - Regression Another typical task is to predict a target numeric value, such as the price of a car, given a set of features (mileage, age, brand, etc.) called predictors. This sort of task is called regression. To train the system, you need to give it many examples of cars, including both their predictors and their labels (i.e., their prices). Supervised learning and Types • Classification problems ask the algorithm to predict a discrete value that can identify the input data as a member of a particular class or group. Taking up the animal photos dataset, each photo has been labeled as a dog, a cat, etc., and then the algorithm has to classify the new images into any of these labeled categories. • Regression problems are responsible for continuous data, e.g., for predicting the price of a piece of land in a city, given the area, location, etc.. Here, the input is sent to the machine for predicting the price according to previous instances. And the machine determines a function that would map the pairs. If it is unable to provide accurate results, backward propagation is used to repeat the whole function until it receives satisfactory results. supervised learning algorithms • important supervised learning algorithms – k-Nearest Neighbors – Linear Regression – Logistic Regression – Support Vector Machines (SVMs) – Decision Trees and Random Forests – Neural networks Unsupervised learning
In unsupervised learning, as you might guess, the training data is unlabeled. The system tries to learn without a teacher. Unsupervised learning types of unsupervised learning
there are Four types of unsupervised learning tasks:
– clustering, – Anomaly detection and novelty detection – association rules, and – visualization and dimensionality reduction. Cont..
• Clustering • Visualization Cont..
• Anamaly Detection • Association rules
Reinforcement Learning Comparision Semi - Supervised Learning Batch and Online Learning • Another criterion used to classify Machine Learning whether or not the system can learn incrementally from a stream of incoming data. • Types - • Batch learning • Online learning Instance-Based Versus Model-Based Learning • Instance-based learning • Model-based learning Main Challenges of Machine Learning • “bad algorithm” and “bad data.” • BAD DATA – Insufficient Quantity of Training Data – Nonrepresentative Training Data – Poor-Quality Data – Irrelevant Features – Overfitting the Training Data – Underfitting the Training Data Insufficient Quantity of Training Data The Unreasonable Effectiveness of Data Nonrepresentative Training Data Overfitting the Training Data Cont..