Machine Learning - Brief
Machine Learning - Brief
ML vs DL vs Ai ?
Artificial Intelligence (Ai):
study/process which enables machines
to mimic human behavior,
through particular algorithm.
• This technology will redefine many industries, such as: Healthcare, Financial services, Energy
and many other fields that directly impact our lives
Data Mining:
• The process of extracting useful information from a huge amount of data
• It’s a tool used by humans to discover new, accurate, and useful patterns in the data.
You first identify business goals and data mining goal, then you start this process:
Steps:
1. Duplicate & irrelevant observations
removing duplicated to avoid inaccuracy and skewness.
2. Structural errors (normalize & standardize)
Handling strange naming conventions, typos
incorrect capitalization, inconsistent data formats.
3. Filter unwanted outliers
handling outliers, such as removing them or imputing them
with more reasonable values.
4. Handle missing data
handle them by imputing missing data, dropping the missing data
or leaving it as-is
5. Verify data quality
checking for consistency between different parts of the
dataset, and validating the accuracy of the data against external sources
Types of Machine Learning:
Basic Terminology:
Labels
A label is the thing we're predicting—the dependent y variable in simple linear regression.
The label could be the future price of wheat, the kind of animal shown in a picture… etc.
Features
A feature is an input variable—the independent x variable in simple linear regression.
Features could either be:
Examples
An example is a particular instance of data, x (feature vector). We break examples into two categories:
Regression
Used to predict continuous values such as sales, salary, weight, or temperature.
For example: A dataset containing features of the house such as size, number of bedrooms, number of
baths, neighborhood, etc. and the price of the house, a Regression algorithm can be trained to learn the
relationship between the features and the price of the house.
• Linear Regression
• Polynomial Regression
• #Decision Tree Regressor*
• #Random Forest Regressor*
• Neural Networks
Classification
Used for predicting discrete outcomes, if the outcome can take two possible values such as
[True or False], [Dog or Cat], [Yes or No], it is known as Binary Classification. When the outcome contains
more than two possible values [dog, cat, lizard…], it is known as Multiclass Classification.
@ Evaluation Metrices:
Confusion Matrix
ولما يجيلك كائن جديد بتشوف نسبة كل صفة من دول عندو ايه
وبعدين تحسب "احتمال" كونه كلب او قطة حسب مجموع نسب الصفات دى.
الشجرة بتاعتنا عامة بيكون ليها نقطة بداية بنسميها Root Node
و بينزل منها فروع Branchesاو Decision Nodes
وفى النهاية عندنا ال Leavesواللى بتشاور على تصنيف معين
● الخطوة االولى المختلفة هنا انك بتقسم ال Training dataلعدد من األجزاء بطريقة اسمها Bootstrap Sampling
وكل جزء بتدخله على شجرة لوحده تتدرب عليه وتديلك نتيجة.
● والخطوة االخيرة زى ما قلنا انك بتجمع النتايج دى كلها وبناءا عليها بتاخد قرار نهائى.
● اول واهم خطوة فى تنفيذ الموديل بعد عرض الداتا على Scatter plot
مثال ,هو اختيار مكان خط االنحدار Regression lineاللى هيوضحلنا
قوة و اتجاه العالقة بين الصفة المتغيرة xوالقيمة المراد استنتاجها y
تنظيم رفوف العرض بحيث المنتجات اللى بتتباع سوىتكون قريبة من بعض فتعلى نسبة المبيعات -
)الخ.. بناءا على قوة االرتباط بينهم ( زى منتجات االطفال والعربيات,تسويق منتجات معينة مع بعضها -
استهداف شرائح من العمالء بعروض وخصومات خاصة بناءا على نمطهم الشرائى -
----------------------------------------------------------------------------------------------------------------------------------------
Variance: the variability of the model's predictions for different training sets.
high-variance results in overfitting, the model is too complex and performs well on the training data
but poorly on new, unseen test data.