0% found this document useful (0 votes)

55 views25 pages

CS3244 (2120) - Project Discussion 1 - Overview

This document discusses designing a machine learning application for a class project. It covers choosing objectives and performance measures, collecting and preparing data, constructing predictive models, and evaluating models. Key topics include bias-variance tradeoffs, overfitting and underfitting, validation techniques like cross-validation, and ensemble methods to improve accuracy. Students are instructed to determine a benchmark for evaluation and conduct a hypothesis test to demonstrate their model performs significantly better.

Uploaded by

dylantan.yhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views25 pages

CS3244 (2120) - Project Discussion 1 - Overview

Uploaded by

dylantan.yhao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 25

Project Discussion

CS3244: Machine Learning

03 March 2022
CS3244 Project
Assessment Components

1. Application design & problem formulation (8%)

2. Model design & construction (6%)
3. Evaluation (6%)
4. Novelty (4%)
5. Instructions (1%)

3 3
Content Overview

1. The Application
2. The Model
3. The Evaluation

4 4
Designing a
Machine Learning (ML)
Application
The Machine Learning Application

▪ Is it just about a dataset?

6
Designing Applications

▪ SDLC
– Planning → Analysis → Design → Implementation → Maintenance → Planning → ...

▪ Planning and Analysis

– Is there a need for ML in a particular system?
▪ User ML needs / needs analysis
▪ Use cases incorporating ML solutions

7
Main Issues

▪ Assume supervised learning

1. What objectives?
– Model accuracy?
– Performance measures
▪ Quantifying the objectives

2. What data?
– Use existing dataset or collecting data?
– What do I know about the domain?
▪ Features?
▪ Hypothesis representation/space?

8
Objectives Apart from Accuracy

▪ What are other possible performance measures?

9
Gathering Data

▪ What do you intend to do about data?

10
Constructing a Good Predictor

11
Consistency with Training Data
Versus Generalisation

▪ Consistency with training data is just the beginning ...

Real problems → massive instance spaces

Size of instance space:

1,638,333,457,367,040

https://archive.ics.uci.edu/ml/datasets/Mushroom

... generalisation is more important (usually)

12
Consequence of Generalisation Objective

▪ Data is not enough Picking the right Inductive Bias is essential

Example Inductive bias:

D: 1,000,000 instances; 100 Boolean variables • Hypothesis Representation
• Hypothesis Preference
2100 – 106 instances unlabelled

If all target functions are equally likely, any

hypothesis cannot do better than random guessing...
No-Free-Lunch Theorems

Thankfully, real-world problems not drawn

uniformly from the set of all possible function

Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms.

Neural computation, 8(7), 1341-1390
13
Bias-Variance Decomposition

▪ Generalisation error may be divided

– Noise
▪ Error inherent within the data
▪ Typically cannot reduce this
– Bias
▪ Error from assumptions about target function
– Appropriateness of hypothesis representation
– Relevance of features / sufficient features

– Variance
▪ Error from sensitivity to small fluctuations in training data
– More general hypotheses have lower variance

14 14
Overfitting & Underfitting

▪ High bias ⇒ underfitting

– Model not expressive enough or
not appropriately expressive to capture c
– Higher training error

▪ High variance ⇒ overfitting

– Model too expressive and sensitive to smaller
changes in training sample
– Requires more data to converge
– Lower training error, higher testing error

▪ Examples
– Decision trees
▪ Larger/deeper tree ⇒ lower bias; higher variance
– Neural Networks
▪ More hidden units ⇒ lower bias; higher variance

15 15
Simple Ideas to Improve
Generalisation Performance

16
Feature Selection

▪ Determine the “right” attributes to use

– Remove redundant / irrelevant attributes

– Filter approach – use a heuristic

▪ Pearson correlation coefficient
▪ Information gain

– Wrapper approach
▪ ML algorithm used to assess value of attribute sets

– Embedded approach
▪ Feature selection is part of the ML algorithm

17 17
Validation Using Cross-Validation

▪ k-Fold Cross-validation
▪ Divide training set, S, into k-folds, s1, ..., sk
For each fold si
Train model using S \ si
Test model using si
Take mean performance

▪ Wrapper-based approach
– Selecting hyperparameters
– Selecting attributes

19 19
Ensembles

▪ General idea
– Aggregate predictions of multiple hypotheses to generate an overall classification that is more accurate

▪ General motivation
– Assume k independent (i.e., uncorrelated) hypotheses
– Assume generalisation performance > 0.5

sum binomial
terms with 51+
successes

much higher
success as p
tends to 1

20 20
Ensemble Framework

Example:
Random Forest

21 21
Evaluation

22
Model Evaluation

▪ How do you know you have succeeded?

▪ Determine a benchmark
– Competing model
– Threshold

▪ Form hypothesis test to see your model is significantly better than the
benchmark
– m × k-Fold Cross-Validation Performance
– Each value is a mean (central limit theorem applies)
– Apply t-test

23 23
Experimental Setup for Empirical Evaluation

▪ Example Walkthrough

24
Summary

▪ Determine appropriate user ML need

▪ Determine the important performance measures

– For prediction, prioritise generalisation performance

▪ Determine your sources of data

▪ Apply domain knowledge and feature selection

▪ Consider performing validation to choose hyperparameters

▪ Consider ensemble methods

▪ Evaluate model against a benchmark via a valid hypothesis test

25 25
Questions?

246 AI-900 New Sets
100% (1)
246 AI-900 New Sets
20 pages
Quiz 1 Materials
No ratings yet
Quiz 1 Materials
159 pages
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
No ratings yet
Evaluating Model Performance: Evaluation Strategies: Train/Validation/Test
127 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
116 pages
FML - KNN
No ratings yet
FML - KNN
64 pages
AI ML Concepts
No ratings yet
AI ML Concepts
97 pages
ML 02 Dataset-Feature Selection PDF
No ratings yet
ML 02 Dataset-Feature Selection PDF
44 pages
dbms-10 Marks
No ratings yet
dbms-10 Marks
32 pages
Training Evaluation
No ratings yet
Training Evaluation
42 pages
AIML-Unit 5 Notes-Assignment 5
No ratings yet
AIML-Unit 5 Notes-Assignment 5
24 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
32 pages
ML Chap 2
No ratings yet
ML Chap 2
60 pages
A Practical and Technical Introduction To Machine Learning
No ratings yet
A Practical and Technical Introduction To Machine Learning
23 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
ML Unit 1
No ratings yet
ML Unit 1
73 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
ML Notes
No ratings yet
ML Notes
16 pages
Bug Tracking System DOCUMENTATION
75% (16)
Bug Tracking System DOCUMENTATION
57 pages
L2 - Problems in ML & Performance Evaluation
No ratings yet
L2 - Problems in ML & Performance Evaluation
30 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
2-ML Principles
No ratings yet
2-ML Principles
34 pages
Module 4
No ratings yet
Module 4
28 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
APS1070 Lecture (3) Slides
No ratings yet
APS1070 Lecture (3) Slides
70 pages
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
No ratings yet
Machine Learning Using Matlab: Lecture 8 Advice On ML Application
30 pages
Unit 1
No ratings yet
Unit 1
38 pages
AI-900 Exam
No ratings yet
AI-900 Exam
161 pages
Data Science - UNIT-3 - Notes
No ratings yet
Data Science - UNIT-3 - Notes
32 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Machine Learning Exploring The Model
No ratings yet
Machine Learning Exploring The Model
17 pages
Assignment
No ratings yet
Assignment
5 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
ML 21ai63
No ratings yet
ML 21ai63
26 pages
Machine Learning Lecture1 - 26-27 Aug
No ratings yet
Machine Learning Lecture1 - 26-27 Aug
30 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
ML1 5
No ratings yet
ML1 5
4 pages
Machine Leaning 1 Unit
No ratings yet
Machine Leaning 1 Unit
10 pages
Capstone Project
No ratings yet
Capstone Project
6 pages
ML 01
No ratings yet
ML 01
24 pages
CSC413 Lecture Note
No ratings yet
CSC413 Lecture Note
32 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
ML Short Question and Answers
No ratings yet
ML Short Question and Answers
11 pages
Ai Unit 5
No ratings yet
Ai Unit 5
13 pages
ML 5
No ratings yet
ML 5
26 pages
Basic Concepts of Machine Learning For Beginners 1732109263
No ratings yet
Basic Concepts of Machine Learning For Beginners 1732109263
102 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
5.1 Large Scale ML
No ratings yet
5.1 Large Scale ML
10 pages
ML MAKAUT Unit-3
No ratings yet
ML MAKAUT Unit-3
6 pages
Machine Learning in Unit-1
No ratings yet
Machine Learning in Unit-1
10 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
MCS224 Dec 2024 Solved
No ratings yet
MCS224 Dec 2024 Solved
22 pages
MACHINE LEARNING 1-5 (Ai &DS)
100% (1)
MACHINE LEARNING 1-5 (Ai &DS)
60 pages
MACHINE LEARNING Unit-1
No ratings yet
MACHINE LEARNING Unit-1
23 pages
ML Module 1
No ratings yet
ML Module 1
12 pages
Module - 1
No ratings yet
Module - 1
9 pages
Machine Learning For Data Science Unit-4
No ratings yet
Machine Learning For Data Science Unit-4
16 pages
Machine Learning INTRO
No ratings yet
Machine Learning INTRO
12 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Key Ideas in Machine Learning
No ratings yet
Key Ideas in Machine Learning
11 pages
ML (AutoRecovered)
No ratings yet
ML (AutoRecovered)
5 pages
Unit III - I
No ratings yet
Unit III - I
15 pages
Heart Disease Prediction Using Machine Learning-1
No ratings yet
Heart Disease Prediction Using Machine Learning-1
6 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
4 pages
Feature Engineering
No ratings yet
Feature Engineering
23 pages
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
No ratings yet
To Design and Implement Application For Bank Customer Churning Rate Prediction and Analysis Using Machine Learning Algorithm
4 pages
Class PPT - Unit2
No ratings yet
Class PPT - Unit2
139 pages
Grupo 3
No ratings yet
Grupo 3
14 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
CS-DM Module-2
No ratings yet
CS-DM Module-2
29 pages
Feature Selection and Dimensionality Reduction Techniques For Effective Breast Cancer Predictions
No ratings yet
Feature Selection and Dimensionality Reduction Techniques For Effective Breast Cancer Predictions
140 pages
Cigre A2 - 105 - 2014
No ratings yet
Cigre A2 - 105 - 2014
8 pages
Machine Learning Algorithm For Financial Fruad Detection
100% (1)
Machine Learning Algorithm For Financial Fruad Detection
25 pages
CampusX DSMP 2.0 Syllabus
No ratings yet
CampusX DSMP 2.0 Syllabus
36 pages
Data Mining For Bioinformatics Applications 1st Edition He Zengyou Download PDF
No ratings yet
Data Mining For Bioinformatics Applications 1st Edition He Zengyou Download PDF
56 pages
A Feature Selection Technique Based Approach For Predicting Student 2021
No ratings yet
A Feature Selection Technique Based Approach For Predicting Student 2021
10 pages
Detection of Advanced Malware by Machine Learning Techniques
No ratings yet
Detection of Advanced Malware by Machine Learning Techniques
8 pages
Bhabesh - Chapter 5
No ratings yet
Bhabesh - Chapter 5
19 pages
A Survey On Online Review Spam Detection Techniques
No ratings yet
A Survey On Online Review Spam Detection Techniques
5 pages
Egyptian Informatics Journal: O.G. El Barbary, A.S. Salama
No ratings yet
Egyptian Informatics Journal: O.G. El Barbary, A.S. Salama
4 pages
Ieee 2011 Java Data Mining Projects SBGC
No ratings yet
Ieee 2011 Java Data Mining Projects SBGC
11 pages
A Machine Learning Framework For Sport Result Prediction
No ratings yet
A Machine Learning Framework For Sport Result Prediction
7 pages
Heart Disease rp2
No ratings yet
Heart Disease rp2
14 pages
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
No ratings yet
1 - A Survey of Intrusion Detection Models Based On NSL-KDD Data Set (IEEE)
6 pages
A Comparative Study Between Feature Selection Algorithms - Ok
No ratings yet
A Comparative Study Between Feature Selection Algorithms - Ok
10 pages
El Kah-Anoual-Publications-17-08-2022-11-08-19-34
No ratings yet
El Kah-Anoual-Publications-17-08-2022-11-08-19-34
10 pages
A Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps
No ratings yet
A Particle Swarm Optimized Learning Model of Fault Classification in Web-Apps
10 pages
Article 4
No ratings yet
Article 4
7 pages

CS3244 (2120) - Project Discussion 1 - Overview

Uploaded by

CS3244 (2120) - Project Discussion 1 - Overview

Uploaded by

Project Discussion

CS3244: Machine Learning

1. Application design & problem formulation (8%)

▪ Is it just about a dataset?

▪ Planning and Analysis

▪ Assume supervised learning

▪ What are other possible performance measures?

▪ What do you intend to do about data?

▪ Consistency with training data is just the beginning ...

Real problems → massive instance spaces

Size of instance space:

... generalisation is more important (usually)

▪ Data is not enough Picking the right Inductive Bias is essential

Example Inductive bias:

If all target functions are equally likely, any

Thankfully, real-world problems not drawn

Wolpert, D. H. (1996). The lack of a priori distinctions between learning algorithms.

▪ Generalisation error may be divided

▪ High bias ⇒ underfitting

▪ High variance ⇒ overfitting

▪ Determine the “right” attributes to use

– Filter approach – use a heuristic

▪ How do you know you have succeeded?

▪ Determine appropriate user ML need

▪ Determine the important performance measures

▪ Determine your sources of data

▪ Apply domain knowledge and feature selection

▪ Consider performing validation to choose hyperparameters

▪ Consider ensemble methods

▪ Evaluate model against a benchmark via a valid hypothesis test

You might also like