Sub : Advance business Analytics
By : prof Pranjal Mule sir
Date: 5/03/2023
Data mining: Machine learning
Data mining: is extracting data from warehouse for
further use.
Data: raw facts
Information: processed data.
Knowledge: application of information
3 main aspects in data source:
1)Information
2)Knowledge
3)Wisdom
Machin learning is prorammining computers to
optimize a performance criterion using example
data or past experience.
Machine learning types
Supervised learning : classification and regression.
Unsupervised learning: association and clustering.
Reinforcement learning.
Independent variable in data mining are called: features
and attributes.
Dependent variable: are called target and lables.
They also often called X and Y.
Supervised learning based on supervision/
historical data.
When ever your label or output is binary/categorical
it is classification.
Whenever your label/target is numerical it is
regression.
Unsupervised learning :
Cluster: no labels or output and make a
groups/clusters of information
Association: Market Basket analysis
In unsupervised learning there are no models.
Reinforcement learning:
1)Robotics
Performance Evaluation:
Supervised learning
Classification or regression model: in order to create
supervised learning model we need to split data in two
parts namely:
1)Training data set
2)Testing data set
It used to create model where as testing data set is to
used to validate the model for its accuracy
In supervised learning models can be created by
applying algorithms like decision tree, naiv
baiyes,support vector , random forest, logistic
regression , and linear regression to named a few.
Once a model is created by applying any of these
algorithms on training data set we need to measure the
performance. This is achieved by confusion matrix.
True positive : the value which was actually yes and
predicted as yes.
True negative: the value which was actually negative
and predicted as negative.
False positive: the value which was actually negative
but predicted as positive.
False negative: the value which was actually positive
but predicted as negative.
Performance evaluation formulas:
Precision = TP/(TP+FP)
Percentage of correct positive prediction.
Preision is the ratio of correctly predicted
positive observation to the total predicted positive
observation.
RECALL/ Sensitivity = tp/(tp/fn)
1)Percentage of positively labeled instance, also
predicted as positive.
2) recall is the ratio of correctly predicted, positive
observation to the all observations in actual class-
yes.
SPECFICITY = TN/(TN+FP)
1) PERCENTAGE of negatively labeled instance,
also predicted as negative.
ACCCURACY = (TP+TN)/(TP+TN+FP+FN)
1)Percentage of correct predictions
2)Classification accuracy is the total number of
correct prediction divided by the total number
of prediction made for a dataset.
Subject: Advance Business Analytics
By: Prof Panjal Mule sir
Date : 12/03/2023
This is the recap of previous class along with coding.
some websites to practices: Python.org, towards data
science , W3 schools, geeks for geeks, W3 research, etc.
what is the difference between data mining and data
analysis? (will discussed in further class till time
research).
Data mining is the process and there are many
software to use the data mining.
Please the read Definition of data mining.
Explanation of the diagram which is
drawn in book given by sir:
Data base means capturing the data from all the
sources.
Data cleaning include remove the duplication and
redundant data.
Missing value treatment.
Select task relevant data: data mining is the process
and to fid out the relevant data I required algorithm.
We apply data mining process with the help of
algorithm. For ex: if you want to calculate happiness
index of the employees then you need only HR data
that time you extract other data and select relevant
data.
What is Machine Learning:
Machine learning is the process of make a machine
to learn.
Machine learning
1)Supervised learning : works under the supervision
of data.
Supervised learning must have labeled.
Alogorhythm for supervised learning:
1)Decision tree
2)Native bayes
3)Random forest
4)Support factor
5)Logistic regression
6)Linear regression
1)Decision Tree:
Pandas library is to set data structure
Matplotlib is for graph cration
% matloptib inline is to display chart within the
notebook
Sk learn is the libarary which is used for machine
learning
For dividing your data into training and testing
(from sklearn.model_selection import train_test_split)
To evaluate the model
( we use classification report in codes)