07 - Model Selection & Building
07 - Model Selection & Building
Ahmad Shhadeh
٥– تهيئة البيانات النصيه-تطبيق عملي
SMS SPAM Filtering
إختيار و بناء النموذج –خوارزمية تعلم االله
Model Selection & Building
AHMAD SHHADEH
Data Science Project life cycle
Business Understanding
Data Collection
Data Preparation
Exploratory data analytics(EDA)
Model Building
Model Evaluation
Model Deployment
Ahmad Shhadeh
ML algorithm selection
➢ There are so many Machine learning in the world ,Which machine learning Should we use ? There is no straightforward and
sure-shot way to choose the right MLA. Determining which algorithm to use depends on many factors like:
➢ The problem statement ,The kind of output we are looking .
➢ Type and size of the data,
➢ The available computational time/Resources(Memory, Type of processors ),
➢ Number of features, and observations in the data
➢ …etc
➢ Key skills can help :
➢ Machine Learning Types : Supervised and Unsupervised
➢ Domain knowledge to filter down (CV, NLP, anomaly detection .. etc )
➢ Data science project Pipeline
Types of machine learning
Types of machine learning
https://scikit-learn.org/stable/tutorial/machine_learning_map/
Source : https://blogs.sas.com/content/subconsciousmusings/2020/12/09/machine-learning-algorithm-use/
Introduction to Ensemble Learning
Ensemble :A technique that create multiple models and then combine them to produce a
better result
Why learn one classifier when you can learn many
Example: I have created a short movie about machine learning and need to get a feedback before
making it public
1st Model: asking two of my friends
2nd Model: asking 5 colleagues on the machine learning domain
3rd Model :creating a small survey and get feed back from 20 people
The responses, would be more generalized and diversified since we have people with different skill set
and different relationships ,This is a better approach to get honest ratings
With these examples, you can infer that a diverse group of people are likely to make better decisions as
compared to individuals.
Random Forest in real life
In short : A Multiple number of relatively uncorrelated models (trees) operating as a committee will
outperform any of the individual constituent models.
What is Random Forest?
Then the final prediction of the random forest model will be spam if we applied the Max Voting method.
In our project will use the simplest voting technique which is Max Voting
Max voting method
fit(X_train, y_train) (Method) Build a forest of trees from the training set (X, y).
Machine learning is training an algorithm on a set of known examples with a clear goal of generalizing to unseen
examples.
The train-test split is a technique for evaluating the performance of a machine learning algorithm, It can be used
for classification or regression problems and can be used for any supervised learning algorithm.
The procedure involves taking a dataset and dividing it into two subsets.
• Train Dataset: Used to fit the machine learning model.
• Test Dataset: Used to evaluate the fit machine learning model: not used to train the model
The objective is to estimate the performance of the machine learning model on new data: data not used to train the
model.
Thank You