Great Learning Notes
Great Learning Notes
Part of statistics that is used to establish the reliability of potential patterns identified is
called Inferential Statistics.
SUPERVISED LEARNING:
A data set with patient’s information such as age, BP level, sugar level, etc., are
called independent attributes (which we already have) and the ‘thing’ that we try to predict
is called target attribute (which we don’t have).
Algorithms that work in 2 stages – training and testing. The data set is split into 2 –
training set and testing set. For model building, only the training set is used.
To see if those models work in real life conditions or they’re just an artifact or a
statistical fluke, Alternate Hypothesis Testing is done. ‘Alternate’ to the Null Hypothesis. Null
Hypothesis aka Law of Internia which basically tells the model is wrong and there is no
relationship between the attributes.
Using inferential statistics we get statistical parameters such as p-values which tells
us the probability of the model created being a true model in the real world. Once the
inferential statistics supports the model, then the next thing to see is the accuracy of the
model in the real world.
Predicting numerical values – Regression Modelling
Predicting other things – Classification Modelling
UNSPERVISED LEARNING:
No training and testing stages. No independent attributes and dependent attributes.
All the values are fed into the algorithms and it tries finding the hidden patterns in
the form of clusters and associations reflecting some kind of commonality or togetherness.
It is the responsibility of the data scientist to analyze the clusters and give meaning to them.