Machine Learning
Machine Learning
Machine Learning is the process of helping the machine Learn how to take decision logically
Ex-netflix and amazon built such machine learning model by using tons of data inorder to identify
profitable opportunity and avoid risk
The term machine learning was first coined by Arthur Samuel in the year 1959.
First formal definition of ML was given by Tom M “ a computer program is set to learn from experience E
with respect to some classes of task T and performance measure P if its performance at task in T as
measured by P improves with Experience E.”
Algorithm:A Machine learning algo is a set of rules and statistical technique used to learn patterns from
data and draw the significant information from it.logic behind ML model.
Model: A model is main component of ML. A model is trained by using machine learning algorithm.
An algorithm maps all the decision that a model is supposed to take on the given input inorder to get
the correct output.
Predictor Variable:- it is the feature of the data that can be used to predict the output.
Response variable: It is the output variable that needs to be predicted by using the predictor variable.
Training data:the ML Model is built using training data and training data helps to identify key trends and
patterns to predict the output.
Testing Data: After the model is trained it must be tested to evaluate how accurately it can predict the
outcome.
Or
Supervised Learning- Teach Machine or train machine using the data which is well labelled.
Unsupervised Learning- We will train the machine using unlabeled data and we are training the machine
without any guidance
Reinforcement Learning- Part of Machine Learning where an agent is put in an environment and
behaves in this environment and he learns to behave in this environment by observing the rewards
which it gets from those actions
UnSupervised ML
Unsupervised:-Clustering technique
2 treating Outlier
1 Methods- Mean/median/mode imputation mean can be used when the variable is numeric and
having normal distribution.
Median can be used to fill missing valued when variable is numeric and it is skewed.
2 Random Sample Imputation: In this method random value from existing set of value is taken and
used to fill the missing value. It is easy and variance is same as original dataset.
3 Capturing NAN value with the new feature: This method is used where the data is missing due to
some cause so in that case we can create a new feature in dataframe which will store the null value
for that particular feature. Easy to implement easy to identify where missing value is.
4 End of Distribution: We will the missing value with some extreme value of the feature.
5 Arbitrary value imputation:We can fill the missing value with any value in dataset. It is purely
judgement based decision.
6: Frequent category Imputation: When the variable is categorical best way to fill missing value with
the more popular class mode.
7: Using KNN imputation: K nearest neighbor: if some values are missing at 6 th place baased on
neighbours we are filling missing values by using 5th and 7th place value k nearest neighbor.
8: dropping all the NAN Values: If in dataset for a particular 60% or more is missing then it is advised
to drop that particular feature.
Handling the outliers:Differ significantly from the rest of the data called outliers.
Boxplot
Scatterplot
IQR
ZScore
3 feature Scaling:-When we are having different feature vary over different ranges it is difficult to put
them in a model
The process of bringing features of all features together in the same range is called scaling.
1 Absolute max scaling: According to this method every value is divide by the max value of the feature
convert all the value from -1 to +1 prone to outliers.
x-xmin/xmax-xmin
3 Normalisation= x-xmean/xmax-xmin
4 Standardisation: Converts each value of a feature to a zscore suited when data is normally distributed
Z=x-xmean/std
5 robust scaling: when the dataset is skewed this method will come to the picture
Scaled value=x-xmedian/IQR
Machine learning model may fail to incorporate these variable that’s why it is important to convert them
into simple numeric codes process is called as encoding
1 nominal encoding
2 ordinal encoding
Nominal Encoding:- Present in ordered form increase the features in the dataset.
10 categories
1 hot encoding
It refers to the scenario it will increase the excessive no of feature in dataset leads to decrease model
performance.
In Higher dimensional space it will take more processing time for model to work along with increase in
noise and error and ultimately decrease the accuracy.
Ordinal Encoding:in the ordinal encoding the label encoding is there which will encode each category of
feature in same column.
state stateonly
Maharashtra 3
Delhi 0
karnataka 2
Gujarat 1
TamilNadu 4