0% found this document useful (0 votes)
20 views

BA Notes From Lecture

This document discusses advance business analytics and machine learning. It defines key concepts like data mining, data analysis, machine learning, supervised learning, and performance evaluation. It also provides examples of supervised learning algorithms like decision trees and discusses coding practices in Python for tasks like data splitting, model creation using scikit-learn, and model evaluation. The recap discusses differences between data mining and analysis and explains a diagram on the data mining process.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

BA Notes From Lecture

This document discusses advance business analytics and machine learning. It defines key concepts like data mining, data analysis, machine learning, supervised learning, and performance evaluation. It also provides examples of supervised learning algorithms like decision trees and discusses coding practices in Python for tasks like data splitting, model creation using scikit-learn, and model evaluation. The recap discusses differences between data mining and analysis and explains a diagram on the data mining process.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Sub : Advance business Analytics

By : prof Pranjal Mule sir


Date: 5/03/2023

Data mining: Machine learning

Data mining: is extracting data from warehouse for


further use.
Data: raw facts
Information: processed data.
Knowledge: application of information
3 main aspects in data source:
1)Information
2)Knowledge
3)Wisdom

Machin learning is prorammining computers to


optimize a performance criterion using example
data or past experience.
Machine learning types
 Supervised learning : classification and regression.
 Unsupervised learning: association and clustering.
 Reinforcement learning.

Independent variable in data mining are called: features


and attributes.
Dependent variable: are called target and lables.
They also often called X and Y.

Supervised learning based on supervision/


historical data.
 When ever your label or output is binary/categorical
it is classification.
 Whenever your label/target is numerical it is
regression.
Unsupervised learning :
 Cluster: no labels or output and make a
groups/clusters of information
 Association: Market Basket analysis
 In unsupervised learning there are no models.

Reinforcement learning:
1)Robotics

Performance Evaluation:

Supervised learning
Classification or regression model: in order to create
supervised learning model we need to split data in two
parts namely:
1)Training data set
2)Testing data set
It used to create model where as testing data set is to
used to validate the model for its accuracy
In supervised learning models can be created by
applying algorithms like decision tree, naiv
baiyes,support vector , random forest, logistic
regression , and linear regression to named a few.
Once a model is created by applying any of these
algorithms on training data set we need to measure the
performance. This is achieved by confusion matrix.

True positive : the value which was actually yes and


predicted as yes.
True negative: the value which was actually negative
and predicted as negative.
False positive: the value which was actually negative
but predicted as positive.
False negative: the value which was actually positive
but predicted as negative.

Performance evaluation formulas:


 Precision = TP/(TP+FP)
Percentage of correct positive prediction.
Preision is the ratio of correctly predicted
positive observation to the total predicted positive
observation.
 RECALL/ Sensitivity = tp/(tp/fn)
1)Percentage of positively labeled instance, also
predicted as positive.
2) recall is the ratio of correctly predicted, positive
observation to the all observations in actual class-
yes.

 SPECFICITY = TN/(TN+FP)
1) PERCENTAGE of negatively labeled instance,
also predicted as negative.
 ACCCURACY = (TP+TN)/(TP+TN+FP+FN)
1)Percentage of correct predictions
2)Classification accuracy is the total number of
correct prediction divided by the total number
of prediction made for a dataset.
Subject: Advance Business Analytics
By: Prof Panjal Mule sir
Date : 12/03/2023

This is the recap of previous class along with coding.

some websites to practices: Python.org, towards data


science , W3 schools, geeks for geeks, W3 research, etc.

 what is the difference between data mining and data


analysis? (will discussed in further class till time
research).
 Data mining is the process and there are many
software to use the data mining.
 Please the read Definition of data mining.

Explanation of the diagram which is


drawn in book given by sir:
 Data base means capturing the data from all the
sources.
 Data cleaning include remove the duplication and
redundant data.
Missing value treatment.
 Select task relevant data: data mining is the process
and to fid out the relevant data I required algorithm.
We apply data mining process with the help of
algorithm. For ex: if you want to calculate happiness
index of the employees then you need only HR data
that time you extract other data and select relevant
data.

What is Machine Learning:


 Machine learning is the process of make a machine
to learn.
 Machine learning
1)Supervised learning : works under the supervision
of data.
Supervised learning must have labeled.
Alogorhythm for supervised learning:
1)Decision tree
2)Native bayes
3)Random forest
4)Support factor
5)Logistic regression
6)Linear regression

1)Decision Tree:

 Pandas library is to set data structure


 Matplotlib is for graph cration
 % matloptib inline is to display chart within the
notebook
 Sk learn is the libarary which is used for machine
learning
 For dividing your data into training and testing
(from sklearn.model_selection import train_test_split)
To evaluate the model
( we use classification report in codes)

You might also like