0% found this document useful (0 votes)

21 views17 pages

07 - Model Selection & Building

The document discusses natural language processing and text preprocessing. It covers selecting and building machine learning models using algorithms like random forest. Random forest creates multiple decision trees and aggregates their predictions to improve accuracy. The document provides examples of how random forest works in areas like online shopping, finance, and selecting the best machine learning algorithm for a project.

Uploaded by

Omar Ben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views17 pages

07 - Model Selection & Building

Uploaded by

Omar Ben

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

‫معالجة اللغات الطبيعية‬

Natural Language Processing(NLP)

Ahmad Shhadeh
٥– ‫تهيئة البيانات النصيه‬-‫تطبيق عملي‬
SMS SPAM Filtering
‫إختيار و بناء النموذج –خوارزمية تعلم االله‬
Model Selection & Building

AHMAD SHHADEH
Data Science Project life cycle

 Business Understanding
 Data Collection
 Data Preparation
 Exploratory data analytics(EDA)
 Model Building
 Model Evaluation
 Model Deployment
Ahmad Shhadeh
ML algorithm selection

➢ There are so many Machine learning in the world ,Which machine learning Should we use ? There is no straightforward and
sure-shot way to choose the right MLA. Determining which algorithm to use depends on many factors like:
➢ The problem statement ,The kind of output we are looking .
➢ Type and size of the data,
➢ The available computational time/Resources(Memory, Type of processors ),
➢ Number of features, and observations in the data
➢ …etc
➢ Key skills can help :
➢ Machine Learning Types : Supervised and Unsupervised
➢ Domain knowledge to filter down (CV, NLP, anomaly detection .. etc )
➢ Data science project Pipeline
Types of machine learning
Types of machine learning

https://scikit-learn.org/stable/tutorial/machine_learning_map/
Source : https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/
Introduction to Ensemble Learning

 Ensemble :A technique that create multiple models and then combine them to produce a
better result
 Why learn one classifier when you can learn many
 Example: I have created a short movie about machine learning and need to get a feedback before
making it public
 1st Model: asking two of my friends
 2nd Model: asking 5 colleagues on the machine learning domain
 3rd Model :creating a small survey and get feed back from 20 people
 The responses, would be more generalized and diversified since we have people with different skill set
and different relationships ,This is a better approach to get honest ratings
 With these examples, you can infer that a diverse group of people are likely to make better decisions as
compared to individuals.
Random Forest in real life

❖ You want to purchase a new car!!!!!!

 will you walk up to the first car shop and purchase one based on the advice of the dealer? OR
 You would browser a few web sites/mobile App (haraj ,souq..etc)
 check the posted reviews
 Compare different car models, checking for their features and prices.
 You will also ask some of your friends for their opinion.
In Summary , you wouldn’t directly reach a conclusion, but will instead make a decision considering the
opinions of other people as well.

❖ Who want to be millionaires?

 Asking the audience option
Random Forest in real life

 In the world of finance and investments,

▪ The basic concepts of the investments is to build a bunch of
uncorrelated models,
▪ Each with a positive expected return,
▪ then put them together in a portfolio to earn massive alpha (alpha =
market beating returns).

 Last example for online shopping

 We rely on multiple sources (never trust a solitary Amazon
review), and therefore, not only is a decision tree intuitive,
but so is the idea of combining them in a random forest.

In short : A Multiple number of relatively uncorrelated models (trees) operating as a committee will
outperform any of the individual constituent models.
What is Random Forest?

 Random forest is an ensemble learning method that constructs a collection of

decision trees and then aggregates the predictions of each tree to determine
the final prediction.
 A decision tree is the building block of a random forest and is an intuitive
model.
 Combining the weak models which are produced by individual decision trees
to get a strong model
 In general, the higher the number of trees in the forest gives the high
accuracy results.
Random Forest Characteristics
 Advantage :
 Very versatile and powerful machine learning algorithm.
 Can solve both type of problems i.e. classification and regression, Accepts various types of inputs as well, may it
be ordinal or continuous data.
 Easily handles outliers, missing values, skewed data, the data doesn't even have to be on the same scale.
 Less likely to overfit than some of the other machine learning models.
 Providing the feature importance by identifying the most significant variables so it can use for dimensionality
reduction methods.
 Disadvantage:
 It surely does a good job at classification but not as good as for regression problem
 Random Forest can feel like a black box approach for statistical modelers - you have very little control on what
the model does.
Random Forest for our project(SMS Spam Filtering )

 For our project :SMS Spam filtering

 We can build a random forest with 100 decision trees in it.
 Then each decision trees are built independently, and it will predict either spam or ham.
 Assume 70 of those decision trees predict spam and 30 predict ham.
 Then will apply simple voting method for the trees.
 Max Voting
 Averaging
 Weighted Averaging

 Then the final prediction of the random forest model will be spam if we applied the Max Voting method.
 In our project will use the simplest voting technique which is Max Voting
Max voting method

 The max voting method is generally used for classification

problems.
 In this technique, multiple models are used to make predictions
for each data point. The predictions by each model are
considered as a ‘vote’.
 The predictions which we get from the majority of the models
are used as the final prediction.
 For example, when you asked 5 of your colleagues to rate your
movie (out of 5); we’ll assume three of them rated it as 4 while
two of them gave it a 5. Since the majority gave a rating of 4,
the final rating will be taken as 4.
sklearn.ensemble.RandomForestClassifier
 https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html
Parameter/Attribute/Met Descriptrion
hod

n_estimators (par) The number of trees in the forest., default=100.

n_jobs, (par) default=None ,The number of jobs to run in parallel. fit, predict, decision_path and apply are all
parallelized over the trees. None means 1 unless in a joblib.parallel backend context. -1 means
using all processors

fit(X_train, y_train) (Method) Build a forest of trees from the training set (X, y).

predict(X)test) (Method) Predict class for X.

Feature Importances indicate what predictor variables the random forest considers most important,it can be used for
(Attributes) feature engineering by building additional features from the most important. We can also use
feature importances for feature selection by removing low importance features.
Split Training and Testing Data Sets

 Machine learning is training an algorithm on a set of known examples with a clear goal of generalizing to unseen
examples.
 The train-test split is a technique for evaluating the performance of a machine learning algorithm, It can be used
for classification or regression problems and can be used for any supervised learning algorithm.
 The procedure involves taking a dataset and dividing it into two subsets.
• Train Dataset: Used to fit the machine learning model.
• Test Dataset: Used to evaluate the fit machine learning model: not used to train the model
 The objective is to estimate the performance of the machine learning model on new data: data not used to train the
model.
Thank You

Pa 5 Unit
No ratings yet
Pa 5 Unit
35 pages
Random Forest Algorithm Updated
No ratings yet
Random Forest Algorithm Updated
11 pages
Ilovepdf Merged-3
No ratings yet
Ilovepdf Merged-3
70 pages
Module 5 Machine Learning
No ratings yet
Module 5 Machine Learning
36 pages
Unit 4 (Ensemble Methods)
No ratings yet
Unit 4 (Ensemble Methods)
24 pages
Random Forest PHD Thesis
100% (3)
Random Forest PHD Thesis
4 pages
Bagging and Random Forest Presentation1
100% (3)
Bagging and Random Forest Presentation1
23 pages
On Daibeteg
No ratings yet
On Daibeteg
27 pages
Present
No ratings yet
Present
20 pages
Lecture 19 Different Classification Models
No ratings yet
Lecture 19 Different Classification Models
22 pages
Jntuk Machine Learning 3-2 Unit-3
No ratings yet
Jntuk Machine Learning 3-2 Unit-3
33 pages
Machine Learning Random Forest Algorithm - Javatpoint
No ratings yet
Machine Learning Random Forest Algorithm - Javatpoint
14 pages
Unleashing The Power of Random Forest - A Journey Through Algorithmic Canopies
No ratings yet
Unleashing The Power of Random Forest - A Journey Through Algorithmic Canopies
14 pages
Randon Forest
No ratings yet
Randon Forest
34 pages
Random Forests
No ratings yet
Random Forests
43 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Week 6 - Random Forest
No ratings yet
Week 6 - Random Forest
12 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
32 pages
Random Forest in ML
No ratings yet
Random Forest in ML
13 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Random Forest
No ratings yet
Random Forest
25 pages
11 W11NSE6220 - Fall 2023 - Zeng
No ratings yet
11 W11NSE6220 - Fall 2023 - Zeng
43 pages
Random Forest Algorithm 1
No ratings yet
Random Forest Algorithm 1
14 pages
Da MS
No ratings yet
Da MS
24 pages
Random Forest Classification
No ratings yet
Random Forest Classification
8 pages
015 - Random Forest
No ratings yet
015 - Random Forest
15 pages
Random Forest
No ratings yet
Random Forest
29 pages
2023AIB1008 Lab08
No ratings yet
2023AIB1008 Lab08
8 pages
Lecture 6
No ratings yet
Lecture 6
24 pages
ML Asst.-01
No ratings yet
ML Asst.-01
21 pages
Aditri Chaudhuri - DM
No ratings yet
Aditri Chaudhuri - DM
10 pages
Random Forest
No ratings yet
Random Forest
13 pages
CSL0777 L26
No ratings yet
CSL0777 L26
33 pages
Random Forest (RF) : Decision Trees
No ratings yet
Random Forest (RF) : Decision Trees
3 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Practical No4 - 5 ML
No ratings yet
Practical No4 - 5 ML
11 pages
Random FOrest
No ratings yet
Random FOrest
19 pages
Random Forest Classic Style
No ratings yet
Random Forest Classic Style
9 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Lecture-12 Machine Learning With Python
No ratings yet
Lecture-12 Machine Learning With Python
18 pages
Random Forest
No ratings yet
Random Forest
6 pages
Random Forest Medical Diagnosis 1684665707
No ratings yet
Random Forest Medical Diagnosis 1684665707
10 pages
Random Forest Algorithms - Comprehensive Guide With Examples
No ratings yet
Random Forest Algorithms - Comprehensive Guide With Examples
13 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Random Forest
No ratings yet
Random Forest
2 pages
Random Forest
No ratings yet
Random Forest
18 pages
Random Forest
No ratings yet
Random Forest
25 pages
Random Forest - Basics
No ratings yet
Random Forest - Basics
9 pages
Random Forest Classifier
No ratings yet
Random Forest Classifier
9 pages
Random Forest Algorithm Unit 3
No ratings yet
Random Forest Algorithm Unit 3
2 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
4 pages
05.random Forest
No ratings yet
05.random Forest
3 pages
Confined Space Entry Procedure CWP CHEC JAC HSE PRO 0001
No ratings yet
Confined Space Entry Procedure CWP CHEC JAC HSE PRO 0001
25 pages
ml2 PDF
No ratings yet
ml2 PDF
5 pages
HPE - A00007129en - Us - R13xx-HPE FlexNetwork 5510 HI Layer 2 - LAN Switching Configuration Guide
No ratings yet
HPE - A00007129en - Us - R13xx-HPE FlexNetwork 5510 HI Layer 2 - LAN Switching Configuration Guide
329 pages
Workshop Equipment List (Status)
100% (2)
Workshop Equipment List (Status)
4 pages
2a29477 Clapper Check Valve Ops Manual
No ratings yet
2a29477 Clapper Check Valve Ops Manual
28 pages
New Language Leader Intermediate Unit 12 Key
No ratings yet
New Language Leader Intermediate Unit 12 Key
4 pages
Seismic Zones Factor Zone 4 Normal Occupancies 8: I Occupancy Requirements Table 2.2D
No ratings yet
Seismic Zones Factor Zone 4 Normal Occupancies 8: I Occupancy Requirements Table 2.2D
5 pages
SwOS CRS3xx
No ratings yet
SwOS CRS3xx
17 pages
Type L6N Load Cell: Short Description
No ratings yet
Type L6N Load Cell: Short Description
3 pages
Colorbond Brochure 140220
No ratings yet
Colorbond Brochure 140220
40 pages
Graham Giller Wilmott Talk
No ratings yet
Graham Giller Wilmott Talk
31 pages
Metabo GE 700 Retificadora
No ratings yet
Metabo GE 700 Retificadora
4 pages
The Secrets of YIFY and High Quality and Small File Sizes Are Not So Secret After All Encoding High Quality Low Bitrate Videos in Handbrake For Any Device - Yan D, Ericolon - Random Fudge-Ups
100% (1)
The Secrets of YIFY and High Quality and Small File Sizes Are Not So Secret After All Encoding High Quality Low Bitrate Videos in Handbrake For Any Device - Yan D, Ericolon - Random Fudge-Ups
22 pages
Simple Method For Basic Short Circuit Current Calculations
No ratings yet
Simple Method For Basic Short Circuit Current Calculations
6 pages
7 ICT Powerpoint W1
No ratings yet
7 ICT Powerpoint W1
3 pages
Xi Chap 4
100% (1)
Xi Chap 4
7 pages
Advanced Certification in Full Stack Developer Course IITG
No ratings yet
Advanced Certification in Full Stack Developer Course IITG
13 pages
Sherwin's Resume and Application Letter
No ratings yet
Sherwin's Resume and Application Letter
8 pages
Amica Manual
No ratings yet
Amica Manual
44 pages
Blue and White Modern Digital Marketing Agency Presentation
No ratings yet
Blue and White Modern Digital Marketing Agency Presentation
9 pages
NEW Java Mannual-Lab
No ratings yet
NEW Java Mannual-Lab
43 pages
Web Storyboard: XXX XXX
No ratings yet
Web Storyboard: XXX XXX
12 pages
Multiple Xing
No ratings yet
Multiple Xing
25 pages
RTI GHY April 22
No ratings yet
RTI GHY April 22
42 pages
Inverter - EP-3K-48-AU - User Manual - 091119
No ratings yet
Inverter - EP-3K-48-AU - User Manual - 091119
43 pages
Full 05 Nguyen-Hoang-Lam 231158500218ccf11d2bdA8Ae
No ratings yet
Full 05 Nguyen-Hoang-Lam 231158500218ccf11d2bdA8Ae
16 pages
How To Use The Guide and Quiz: Select The Version. The Questions Are Identical
No ratings yet
How To Use The Guide and Quiz: Select The Version. The Questions Are Identical
11 pages
PP 2500PC 20221010
No ratings yet
PP 2500PC 20221010
2 pages
Open Geodata Repositories & ISRO Geoweb Services For Thematic Applications by Shri. Kamal Pandey
No ratings yet
Open Geodata Repositories & ISRO Geoweb Services For Thematic Applications by Shri. Kamal Pandey
12 pages
Files With Fstream: Short Answer
No ratings yet
Files With Fstream: Short Answer
9 pages
Boschtrainingsolutionsleafleta 4 Cropped
No ratings yet
Boschtrainingsolutionsleafleta 4 Cropped
2 pages
Fundamentals of Machine Learning: a Simplified Approach
From Everand
Fundamentals of Machine Learning: a Simplified Approach
Er. Sudhir Goswami
No ratings yet
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
From Everand
Optimizing AI and Machine Learning Solutions: Your ultimate guide to building high-impact ML/AI solutions (English Edition)
Mirza Rahim Baig
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet

07 - Model Selection & Building

Uploaded by

07 - Model Selection & Building

Uploaded by

‫معالجة اللغات الطبيعية‬

Natural Language Processing(NLP)

❖ You want to purchase a new car!!!!!!

❖ Who want to be millionaires?

 In the world of finance and investments,

 Last example for online shopping

 Random forest is an ensemble learning method that constructs a collection of

 For our project :SMS Spam filtering

 The max voting method is generally used for classification

n_estimators (par) The number of trees in the forest., default=100.

predict(X)test) (Method) Predict class for X.

You might also like