0% found this document useful (0 votes)

26 views216 pages

DA Unit - IV

Uploaded by

rupaliwalunj_2135193

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views216 pages

DA Unit - IV

Uploaded by

rupaliwalunj_2135193

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 216

Introduction UNIT - IV

Features of Python

1 High Level Integrated Interactive Object-Oriented

Scripting Language

2 Simple & Easy to learn

Introduction UNIT - IV

Features of Python

3 Portable

4 Free | Open Source

Introduction UNIT - IV

Python

Object Oriented High Level

What is Python

Programming
Language
Introduction UNIT - IV

Features of Python

5 Perform complex tasks using a few lines of code.

6 Run equally on different platforms such as Windows, Linux, Unix,Macintosh, etc

7 Provides a vast range of libraries for the various ﬁelds such as machine learning,
web developer, and also for the scripting.
Introduction UNIT - IV

Advantages of Python

● Ease of programming

● Minimizes the time to develop and maintain code

● Modular and object-oriented

● Large community of users

● A large standard and user-contributed library

Introduction UNIT - IV

Disadvantages of Python

● Interpreted and therefore slower than compiled languages

● Decentralized with packages

Introduction UNIT - IV

Essential Python Libraries

● A library is a collection of ﬁles (called modules) that contains functions for other
programs.

● A Python library is a reusable chunk of code that you may to include in your
programs.
Introduction UNIT - IV

Essential Python Libraries

01 03
NumPy
02 SciPy 04
Pandas
SciKit-Learn
Introduction UNIT - IV

Essential Python Libraries

● Numpy (Numerical Python) is a perfect tool for

01 scientiﬁc computing and performing basic and
advanced array operations.
NumPy
● The library offers many handy features performing
operations on n-arrays and matrices in Python.

● It helps to process arrays that store values of the

same data type and makes performing math
operations on arrays easier.
Introduction UNIT - IV

Essential Python Libraries

● It is one of the most popular Python libraries in

data science.
02 ● It provides support for data structures and data
analysis tools.
Pandas
● The library is optimized to perform data
science tasks especially fast and efﬁciently.
● Pandas is best suited for structured, labelled data, in
other words, tabular data,that has headings
associated with each column of data.
Introduction UNIT - IV

Essential Python Libraries

Pandas
02 Have core data structures
Pandas

Series DataFrame

used to store data

Introduction UNIT - IV

Essential Python Libraries

Series
02
Pandas ● The series is a one-dimensional array-like structure

● designed to hold a single array (or 'column) of data

and an associated array of data labels called an
index.
Introduction UNIT - IV

Essential Python Libraries

DataFrame

02 ● The DataFrame represents tabular data, a bit like a

spreadsheet.
Pandas ● DataFrames are organised into columns.

● each column can store a single data-type, such as ﬂoating

point numbers, strings, boolean values etc.
● DataFrames can be indexed by either their row or column
names.
Introduction UNIT - IV

Essential Python Libraries

● SciPy contains many different packages and

03 modules to assist in mathematics and scientiﬁc
computing.
SciPy
● It's difﬁcult to state a single use case for SciPy
considering that it contains so many different
useful packages
Introduction UNIT - IV

Essential Python Libraries

Some of the important packages include:

03
Matplotlib

SciPy
● A 2D plotting library that can be used in Python scripts,
the Python and IPython shell, web application servers,
and more.
Introduction UNIT - IV

Essential Python Libraries

Some of the important packages include:

03
IPython

SciPy
● An interactive console that runs your code like the Python
shell, but gives you even more features, like support for data
visualizations.
Introduction UNIT - IV
Essential Python Libraries

04
● Scikit-learn is probably the most useful library for
machine learning in Python.
SciKit-Learn
Introduction UNIT - IV
Essential Python Libraries

04 this library contains a lot of efﬁcient tools

for
SciKit-Learn machine learning & statistical modeling
Including

dimensionality
Classiﬁcation Clustering Regression
reduction
Introduction UNIT - IV

Essential Python Libraries

● Scikit-learn comes loaded with a lot of features

04
1. Supervised learning algorithms:
SciKit-Learn
❏ Think of any supervised learning algorithm you might
have heard about and there is a very high chance that
it is part of scikit-learn.
Introduction UNIT - IV

Essential Python Libraries

● Scikit-learn comes loaded with a lot of features

04
2. Cross-validation:
SciKit-Learn
❏ There are various methods to check the accuracy of
supervised models on unseen data.
Introduction UNIT - IV

Essential Python Libraries

● Scikit-learn comes loaded with a lot of features

04
3. Unsupervised learning algorithms:
SciKit-Learn
❏ there is a large spread of algorithms in the offering -
starting from clustering, factor analysis, principal
component analysis to unsupervised neural networks.
Introduction UNIT - IV

Essential Python Libraries

● Scikit-learn comes loaded with a lot of features

04
3. Various toy datasets:
SciKit-Learn
❏ This came in handy while learning scikit-learn.

❏ For example : IRIS dataset, Boston House prices dataset.

Introduction UNIT - IV

Essential Python Libraries

● Scikit-learn comes loaded with a lot of features

04
3. Feature extraction :
SciKit-Learn
❏ Useful for extracting features from images and text
(e.g. Bag of words).
Data Preprocessing UNIT - IV

● Data preprocessing is a data mining technique that involves transforming raw data
into an understandable format.

● Aim to reduce the data size, ﬁnd the relation between data and normalized them.
Data Preprocessing UNIT - IV

Why Data Preprocessing

● Data which capture from various sources is not pure.

● It contains some noise.
● It is called dirty data or incomplete data.
● In this data, there is lacking attribute values, interest, or containing only aggregate
data. For example : occupation=” “
● Noisy data which contains errors or outliers. For eg. Salary=”-10”.
Data Preprocessing UNIT - IV

Why Data Preprocessing

● Inconsistent data which contains discrepancies in codes or names . for example-

Age=”51” Birthday =”03/09/1998”.
● Incomplete , Noisy , and inconsistent data are common place properties of large
real world databases and data warehouses.
● Incomplete data can occur for a variety of reasons
Data Preprocessing UNIT - IV

Steps during pre-processing

1 Data Cleaning

● Data is cleansed through process such as ﬁlling in missing values,

smoothing the noisy data, or resolving the inconsistencies in the data.
Data Preprocessing UNIT - IV

Steps during pre-processing

2 Data Integration

● Data with different representations are put together and conﬂicts within
the data are resolved
Data Preprocessing UNIT - IV

Steps during pre-processing

3 Data Transformation

● Data is normalized, aggregated and generalized.

Data Preprocessing UNIT - IV

Steps during pre-processing

3 Data Reduction

● Data is normalized, aggregated and generalized.

Data Preprocessing UNIT - IV

Steps during pre-processing

3 Data Discretization

● Involves the reduction of number of values of a continuous attribute by

dividing the range of attributes intervals.
Data Preprocessing UNIT - IV

Removing Duplicates

● Removing Duplicates in the context of data quality is where an organisation

looks to identify and then remove instances where there is more than one record
of a single person.
Data Preprocessing UNIT - IV

Removing Duplicates

● With large scales of data, this will often be done using tools that find and merge
duplicate records in an existing database and prevent new ones from entering it
based on similarities in specific fields.
Data Preprocessing UNIT - IV

Removing Duplicates

● Preparing a dataset before designing a machine learning model is an important

task for the data scientist.
● If there are more duplicates then making machine learning model is useless or
not so accurate. Therefore, you must know to remove the duplicates from the
dataset.
Data Preprocessing UNIT - IV

Removing Duplicates

1 Handling missing data values

● Data cleaning routines attempt to ﬁll in missing values, smooth

out noise while identifying outliers, and correct inconsistencies in
the data.
This is usually done when the class label is
missing. Data Preprocessing UNIT - IV

Removing Duplicates

1 Handling missing data values

● The various methods for handling the problem of missing values

in data tuples are as follows:

Ignoring the tuple

● This is usually done when the class label is missing.

This is usually done when the class label is
missing. Data Preprocessing UNIT - IV

Removing Duplicates

1 Handling missing data values

Manually filling in the missing value

● This approach is time-consuming and may not be a reasonable task for

large data sets with many missing values, especially when the value to be
ﬁlled in is not easily determined.
This is usually done when the class label is
missing. Data Preprocessing UNIT - IV

Removing Duplicates

1 Handling missing data values

Using a global constant to fill in the missing value

● Replace all missing attribute values by the same constant

This is usually done when the class label is
missing. Data Preprocessing UNIT - IV

Removing Duplicates

1 Handling missing data values

● Using a measure of central tendency for the attribute, such as the mean,
the median, the mode
This is usually done when the class label is
missing. Data Preprocessing UNIT - IV

Removing Duplicates

1 Handling missing data values

● Using the attribute mean for numeric values or attribute mode nominal
values, for all samples belonging to the same class as the given tuple.
Data Preprocessing UNIT - IV

Removing Duplicates

2 Transformation of data using function or mapping

● Data transformation is the process of converting data from one format or

structure into another format or structure.

● Data transformation is critical to activities such as data integration and data

management.
Data Preprocessing UNIT - IV

Removing Duplicates

2 Transformation of data using function or mapping

Common reasons to transform data:

❏ Moving data to a new data store

❏ Users want to join unstructured data or streaming data with

structured data so user can analyze the data together
Data Preprocessing UNIT - IV

Removing Duplicates

2 Transformation of data using function or mapping

Common reasons to transform data:

❏ Users want to join unstructured data or streaming data with structured data
so user can analyze the data together
Data Preprocessing UNIT - IV

Removing Duplicates

2 Transformation of data using function or mapping

Common reasons to transform data:

❏ Users want to add information to data to enrich it, such as performing lookups.
Adding geological data, or adding timestamps.
Data Preprocessing UNIT - IV

Removing Duplicates

2 Transformation of data using function or mapping

Common reasons to transform data:

❏ Users want to perform aggregations, such as comparing sales data from

different regions or totalling sales from different regions
Data Preprocessing UNIT - IV

Removing Duplicates

2 Transformation of data using function or mapping

Different ways to transform data:

Scripting

❏ SQL or Python to write the code to extract & transform

the data.
Data Preprocessing UNIT - IV

Removing Duplicates

2 Transformation of data using function or mapping

Different ways to transform data:

On-premise ❏ ETL (Extract, Transform, Load) tools can take much of the pain out of
ETL tools
scripting the transformations by automating the process

❏ These tools are typically hosted on your company’s site, and may
require extensive expertise & infrastructure cost
Data Preprocessing UNIT - IV

Removing Duplicates

2 Transformation of data using function or mapping

Different ways to transform data:

Cloud-based
ETL tools ❏ These ETL tools are hosted in the cloud

❏ Where u can leverage the expertise and infrastructure of the vendor

Data Preprocessing UNIT - IV

Analytics Types

Business analytics is the process of making sense of gathered data

Measuring business performance and producing valuable conclusions

that can help

companies make informed decisions on the future of the business,

through the

use of various statistical methods and techniques.

Data Preprocessing UNIT - IV

Analytics Types

❏ Business Analytics (BA) is the iterative, methodical exploration of an organization's

data, with an emphasis on statistical analysis.

❏ Business analytics is used by companies that are committed to making data-driven

decisions.

❏ Business analytics combines the ﬁelds of management, business and computer

science.
Data Preprocessing UNIT - IV

Analytics Types

❏ The analytical part requires an understanding of data, statistics and computer

science.

❏ Business analytics utilizes big data, statistical analysis and data visualization to
implement organization changes.
Data Preprocessing UNIT - IV
Analytics Types

Challenges with developing and implementing business analytics are

Executive ownership

IT Involvement
Project Management Ofﬁce
(PMO)
Available production data vs.
Cleansed modeling data
End user involvement and
buy-in
change management
Data Preprocessing UNIT - IV
Analytics Types

Data-driven decision-making process uses the following steps:

1. Identify the problem or opportunity for value creation

2. Identify primary as well secondary data sources.

3. Pre-process the data for issues such as missing and incorrect data. Generate derived
variables and transform the data if necessary. Prepare the data for analytics model
building.
Data Preprocessing UNIT - IV
Analytics Types

Data-driven decision-making process uses the following steps:

4. Divide the data sets into subsets training and validation data sets.

5. Build analytical models and identify the best model(s) using model performance
in validation data.

6. Implement solution / Decision / Develop product.

Data Preprocessing UNIT - IV

Analytics Types

Predictive Descriptive

Prescriptive
Analytics Types Data Preprocessing UNIT - IV

Predictive

Predictive analytics tells you what could happen in the future.

❏ Predictive analytics helps your organization predict with conﬁdence what will
happen next so that you can make smarter decisions and improve business
outcomes.

❏ The purpose of the predictive model is ﬁnding the likelihood different samples
will perform in a speciﬁc way.
Analytics Types Data Preprocessing UNIT - IV

Predictive

Predictive analytics tells you what could happen in the future.

❏ The predictive model typically calculates live transactions multiple times to

help evaluate the beneﬁt of a customer transaction.

❏ Predictive models typically utilize a variety of variable data to make the

prediction.

❏ The variability of the component data will have a relationship with what it is likely
to predict.
Analytics Types Data Preprocessing UNIT - IV

Predictive
Project deﬁnition

Monitoring
Data collection

Predictive
Analytics Process Deployment
Analysis

Statistics
Modelling
Analytics Types Data Preprocessing UNIT - IV

Predictive

Project deﬁnition

❏ Identify what shall be the outcome of the project, the deliverables, business objectives
and based on that go towards gathering those data sets that are to be used.
Analytics Types Data Preprocessing UNIT - IV

Predictive

Data collection

❏ This is more of the big basket where all data from various sources are binned
for usage.

❏ This gives a picture about the various customer interactions as a single view
item
Analytics Types Data Preprocessing UNIT - IV

Predictive

Analysis

❏ the data is inspected, cleansed, transformed and modelled to discover if it

really provides useful information and arriving at conclusion ultimately
Analytics Types Data Preprocessing UNIT - IV

Predictive

Statistics

❏ This enables to validate if the ﬁndings, assumptions and hypothesis are ﬁne to go
ahead with and test them using statistical model.
Analytics Types Data Preprocessing UNIT - IV

Predictive

Modelling

❏ Through this accurate predictive models about the future can be provided.

❏ From the options available the best option could be chosen as the required solution
with multi model evaluation.
Analytics Types Data Preprocessing UNIT - IV

Predictive

Deployment

❏ Through the predictive model deployment an option is created to deploy the

analytics results into everyday effective decision.

❏ This way the results, reports and other metrics can be taken based on modelling.
Analytics Types Data Preprocessing UNIT - IV

Predictive

Monitoring

❏ Models are monitored to control and check for performance conformance to

ensure that the desired results are obtained as expected.
Analytics Types Data Preprocessing UNIT - IV

of Predictive Analytics
Example

Social Media Fraud

Analysis Weather Retail Health care
detection
Analytics Types Data Preprocessing UNIT - IV

of Predictive Analytics
Example

❏ Online social media is a fundamental shift of how information is

Social Media being produced, particularly as relates to businesses.
Analysis
Analytics Types Data Preprocessing UNIT - IV

of Predictive Analytics
Example

❏ Weather forecasting has improved by leaps and bounds thanks to

predictive analytics models.
Weather
Analytics Types Data Preprocessing UNIT - IV

of Predictive Analytics
Example
❏ Probably the largest sector to use predictive analytics, retail is
always looking to improve its sales position and for get better
relations with customers.

❏ One of the most ubiquitous examples is Amazon's

Retail
recommendations.
Analytics Types Data Preprocessing UNIT - IV

of Predictive Analytics
Example
❏ Usage of predictive analytics in the healthcare domain can aid to
determine and prevent cases and risks of those developing certain
health related complications like diabetics, asthma and other lifé
threatening ailments.

Health care ❏ Through the administering of predictive analytics in health care

better clinical decisions can be made.
Analytics Types Data Preprocessing UNIT - IV

of Predictive Analytics
Example

❏ Predictive analytics can aid to spot inaccurate credit application, deviant

transactions leading to frauds both online and ofﬂine, identity thefts and
false insurance claims saving ﬁnancial and insurance institutions of lots of
Fraud issues and damages to their operations.
detection
Analytics Types Data Preprocessing UNIT - IV

Descriptive

❏ It is simple method and used in ﬁrst phase of analytics, involves gathering,

organizing tabulating and depicting data then the characteristics of what we are
studying
Analytics Types Data Preprocessing UNIT - IV

Descriptive

❏ The descriptive model shows relationships between the product/service with the
acquired data.

❏ This model can be used to organize a customer by their personal preferences

for example.
Analytics Types Data Preprocessing UNIT - IV

Descriptive

❏ The descriptive model shows relationships between the product/service with the
acquired data.

❏ This model can be used to organize a customer by their personal preferences

for example.
Analytics Types Data Preprocessing UNIT - IV

Descriptive

❏ Descriptive statistics are useful to show things like, total stock in inventory,
average dollars spent per customer and year over year change in sales.

❏ While business intelligence tries to make sense of all the data that's collected
each and every day by organizations of all types, communicating the data in a
way that people can easily grasp often becomes an issue.
Analytics Types Data Preprocessing UNIT - IV

of Descriptive Analytics Production

Example
Financial

Regarding the Operations

company’ s

Historical Insights Sales

Inventory
Reports that provides
Production
Analytics Types Data Preprocessing UNIT - IV
Descriptive

❏ Data visualization evolved because data

displayed graphically allows for an easier
comprehension of the information,
validating the old adage,

❏ "a picture is worth a thousand words."

Analytics Types Data Preprocessing UNIT - IV

Descriptive

❏ In business, proper data visualization provides a different approach to

show potential connections, relationships, etc.

❏ which are not as obvious in data that's non-visual.

❏ A business intelligence dashboard is an information management tool that is

used to track KPIs, metrics and other key data points relevant to a business,
department or speciﬁc process.
Analytics Types Data Preprocessing UNIT - IV

Prescriptive

❏ This model suggests a course of action.

❏ Prescriptive analytics assists users in ﬁnding the optimal solution to a problem or

in making the right choice/decision among several alternatives.

❏ The prescriptive model utilizes an understanding of what has happened, why it

has happened and a variety of "what-might-happen" analysis to help the user
determine the best course of action to take.
Analytics Types Data Preprocessing UNIT - IV

Prescriptive
of Prescriptive Analytics
Example

Trafﬁc Applications Product Optimization Operational Research

Analytics Types Data Preprocessing UNIT - IV

Fig. Relationship between descriptive, predictive & prescriptive analytics

Market Basket Analysis UNIT - IV

It is a technique that allow us to discover the relationships between products.

Market Basket Analysis UNIT - IV

It can be
called

Association Analysis

Frequent itemset mining

Market Basket Analysis Why?

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
Use Cases (Applications) of Association Rule Mining

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
UNIT - IV
Simple Example

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
Simple Example -Transaction Data UNIT - IV

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
UNIT - IV

Simple Example -Transaction Data

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
UNIT - IV
Simple Example -Frequent Item Set

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
UNIT - IV
Simple Example- Association Rule
UNIT - IV

Simple Example- Association Rule

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
UNIT - IV
Simple Example- Association Rule Support

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
Simple Example- Association Rule Conﬁdence UNIT - IV

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
Simple Example- Association Rule Lift UNIT - IV

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
UNIT - IV
Simple Example- Association Rule Lift -
Interpretation
● Lift = 1: implies no relationship between mobile phone and screen guard (i.e., mobile phone
and screen guard occur together only by chance)
● Lift > 1: implies that there is a positive relationship between mobile phone and screen guard (i..,
mobile phone and screen guard occur together more often than random)
● Lift < 1: implies that there is a negative relationship between mobile phone and screen guard
(i.e., mobile phone and screen guard occur together less often than random)

https://blog.rsquaredacademy.com/market-basket-analysis-in-r/
UNIT - IV

•Frequent itemsets from the previous section can form

candidate rules such as X implies Y .

X→Y
UNIT - IV

Appropriateness
of Candidate
Rule

Support Confidence Lift

UNIT - IV
Association Rule/ Apriori Example

TID List_Of_Item IDs Minimum Support = 0.5 or 50%

T100 I1, I2, I5 Means 9/2 = 4.5 = 4

T101 I2, I4
ITEM Set FREQUENCY
T102 I2, I5
{ I1} 6
T103 I1, I2, I4
{I2} 8
T104 I1, I2, I3
{I 3} 5
T105 I2, I3
{I4} 3
T106 I1, I2, I3, I4
{I5} 3
T107 I1, I2, I3

T108 I1, I3 , I5
UNIT - IV

Example
Minimum Support =
0.5 or 50%
TID List_Of_Item IDs Means 9/2 = 4.5 = 4
T100 I1, I2, I5
ITEM Set FREQUENC After ITEM Set FREQUENCY
T101 I2, I4 Y
Prunin
{ I1} 6
{ I1} 6 g
T102 I2, I5
{I2} 8
{I2} 8
T103 I1, I2, I4
{I 3} 5
{I 3} 5
T104 I1, I2, I3
{I4} 3
T105 I2, I3
{I5} 3
T106 I1, I2, I3, I4

T107 I1, I2, I3

T108 I1, I3 , I5
UNIT - IV
Example

Minimum Support =
0.5 or 50%
Means 9/2 = 4.5 = 4
After
ITEM Set FREQUENCY Candidate Prunin
ITEM Set FREQUENCY ITEM Set FREQUENCY
Generatio g
{ I1} 6
n { I1, I2} 5 { I1, I2} 5
{I2} 8
{I1, I3} 4 {I1, I3} 4
{I 3} 5
{I 2, I3} 4 {I 2, I3} 4
UNIT - IV

Example
Minimum Support =
0.5 or 50%
Means 9/2 = 4.5 = 4

Candidate After Pruning

ITEM Set FREQUENCY Generatio No values, so ITEM Set FREQUENCY
n ITEM Set FREQUENCY go to previous
{ I1, I2} 5 { I1, I2} 5
stage
{I1, I3} 4 { I1, I2,I3} 3
{I1, I3} 4
{I 2, I3} 4 {I 2, I3} 4

We have 3 rules
1. I1 => I2
2. I1 => I3
3. I2 => I3
Example- Support

Rule Frequency Formula Putting Support Value

of X + Y Values
in
Formula
I1 => I2 5 5/9 0.55
Freq( X+Y)
I1 => I3 4 ______________________ 4/9 0.44

No of Transaction
I2 => I3 4 4/9 0.44
Example- Conﬁdence

Rule Freq( X) Freq Formula for Putting Confidenc

(X+ Confidence Values in e
Y) Formula (x=>y)
I1 => I2 6 5 5/6 0.83
Freq( X+Y)
I1 => I3 6 4 ______________________ 4/6 0.66

Freq (X)
I2 => I3 8 4 4/8 0.50
Example- Lift
Rule Support Support Support Formula Putting Values Support
of of of in Formula Value
(X+ Y) X Y
I1 => I2 0.55 6/9 = 8/9 =0.88 0.55 0.94
0.66 Support( X+Y) ----------------
______________________
(0.66 * 0.88)
I1 => I3 0.44 6/9 = 5/9 =0.55 0.44 1.21
0.66 Support (X) * Support (Y) ----------------
(0.66 * 0.55)
I2 => I3 0.44 8/9 =0.88 5/9 =0.55 0.44 0.90
----------------
(0.88 * 0.55)
Example

Rule Support Confidence Lift

I1 => I2 0.55 0.83 0.94

I1 => I3 0.44 0.66 1.21

I2 => I3 0.44 0.50 0.90

Example
Example

Rule Support Confidence Lift

I1 => I2 0.55 0.83 0.94

I1 => I3 0.44 0.66 1.21

I2 => I3 0.44 0.50 0.90

Example

Rule Support Confidence Lift

I1 => I2 0.55 0.83 0.94

I1 => I3 0.44 0.66 1.21

Example
Example

Rule Support Confidence Lift

I1 => I2 0.55 0.83 0.94

I1 => I3 0.44 0.66 1.21

Applications of Association Rules
The term market basket analysis refers to a speciﬁc
implementation of association rules

•For better merchandising – products to include/exclude from inventory each month

•Placement of products
•Cross-selling
•Promotional programs—multiple product purchase incentives managed through a
loyalty card program
Market Basket Analysis UNIT - IV

It creates If-Then scenario rules

if
Then Item B is likely to
Item A is purchased
be purchased

Rule written as If {A} Then {B}

Market Basket Analysis UNIT - IV

If Part Then Part

if
Then Item B is likely to
Called Item A is purchased Called
be purchased

Antecedent Consequent

It is Condition It is result
Market Basket Analysis UNIT - IV

Algorithm

Association Rule

Apriori
Market Basket Analysis UNIT - IV

Support

Algorithm

measures

Association Rule
Conﬁdence Lift
Market Basket Analysis UNIT - IV

Support

Algorithm
● Support is the number of transactions that include items
in the (A) & {B} parts of the rule as a percentage of the
total number of transactions.

● It is a measure of how frequently the collection of items

Association Rule
occur together as a percentage of all transaction.
Support = A+B
Total
Market Basket Analysis UNIT - IV

Confidence
Algorithm
● Confidence of the rule is the ratio of the number of transactions
that include all items in (B) as well as the number of
transactions that include all items in (A) to the number of
transactions that include all items in (A).
Association Rule
Confident = A+B
A
Association Rules UNIT - IV

❏ Association analysis is useful for discovering interesting relationships hidden in large

data sets.

❏ The uncovered relationships can be represented in the form of association rules or sets
of frequent items.
Association Rules UNIT - IV

❏ Association rule mining is a procedure which is meant to ﬁnd frequent patterns,

correlations, associations, or causal structures from data sets found in various kinds of
databases such as relational databases, transactional databases, and other forms of
data repositories.

❏ Association rules are if/then statements that help uncover relationships between
seemingly unrelated data in a transactional database, relational database or other
information repository.
Association Rules UNIT - IV

❏ An example of an association rule would be

"If a customer buys a 1 packet brade, he is 80 % likely to also purchase milk."

ID Items
1 {Bread, Milk}
Market basket transaction
2 {Bread, Milk, Cola, Sugar}

3 {Bread, Milk, Tea, Sugar}

… …

{ Bread, Milk } Example of frequent itemset

{ Bread } → { Milk } Example of association rule

Association Rules UNIT - IV

❏ Association rule mining can be viewed as a two-step process :

1. Find all frequent itemsets :

By deﬁnition, each of these item sets will occur at least as frequently as a predetermined
minimum support count, min sup.

2. Generate strong association rules from the frequent item sets :

By deﬁnition, these rules must satisfy minimum support and minimum conﬁdence.
Market Basket Analysis UNIT - IV
Apriori Algorithm
Example of Apriori Algorithm

Algorithm

Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm

Solution
Algorithm
● Find the frequent itemsets and generate association rules on the
given dataset
● Assume that minimum support threshold (s = 33.33%) and
minimum conﬁdence threshold (c = 60%)
Apriori

Do not consider the Transaction which has frequency < 2

Market Basket Analysis UNIT - IV
Apriori Algorithm

Solution Step : - 1 Generating 1-itemset Frequent Pattern

Algorithm

Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm

Solution Step : - 2 Generating 2-itemset Frequent Pattern

Algorithm

Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm
Example of Apriori Algorithm → Table P. 4.4.3 transaction with 8 items

Solution Step : - 3 Generating 3-itemset Frequent Pattern

Algorithm

Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm

Solution Step : - 4 Frequent Itemset

Algorithm

Frequent Itemset (I) = {Hot Dogs, Coke, Chips}

Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm
Example of Apriori Algorithm → Table P. 4.4.3 transaction with 8 items

Solution Step : - 5 Generating Association Rules from Frequent Itemsets

Algorithm
● For each frequent itemset “l”, generate all nonempty subsets of l.
● For every nonempty subset s of l, output the rule “s 🡪 (l-s)”

If support count(l)/support count(s) >= minconf

Where minconf is minimum conﬁdence threshold.

Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm

Solution Step : - 5 Generating Association Rules from Frequent Itemsets

Algorithm
● [Hot Dogs^Coke]=>[Chips]

● Confidence

= sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs^Coke)

Apriori = 2/2*100=100%

Selected
Market Basket Analysis UNIT - IV
Apriori Algorithm

Solution Step : - 5 Generating Association Rules from Frequent Itemsets

Algorithm
● [Hot Dogs^Chips]=>[Coke]

● confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs^Chips)

= 2/2*100=100%
Apriori

Selected
Market Basket Analysis UNIT - IV
Apriori Algorithm

Solution Step : - 5 Generating Association Rules from Frequent Itemsets

Algorithm
[Coke^Chips]=>[Hot Dogs]

confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke^Chips

Apriori = 2/3*100=66.67%

//Selected
Market Basket Analysis UNIT - IV
Apriori Algorithm

Solution Step : - 5 Generating Association Rules from Frequent Itemsets

Algorithm
[Coke^Chips]=>[Hot Dogs]

confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke^Chips

Apriori = 2/3*100=66.67%

//Selected
Market Basket Analysis UNIT - IV
Apriori Algorithm
Example of Apriori Algorithm → Table P. 4.4.3 transaction with 8 items

Solution

Algorithm
● [Hot Dogs]=>[Coke^Chips]

○ confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs)

= 2/4*100=50%

○ Rejected
Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm
Example of Apriori Algorithm → Table P. 4.4.3 transaction with 8 items

Solution

Algorithm
● [Coke]=>[Hot Dogs^Chips]

○ confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke)

= 2/3*100=66.67%

//Selected
Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm
Example of Apriori Algorithm → Table P. 4.4.3 transaction with 8 items

Solution

Algorithm
There are four strong results (minimum confidence greater than

60%)

● [Hot Dogs^Coke]=>[Chips]

● [Hot Dogs^Chips]=>[Coke]
Apriori
● [Coke^Chips]=>[Hot Dogs]

● [Coke]=>[Hot Dogs^Chips]
Market Basket Analysis UNIT - IV
Apriori Algorithm

Algorithm
Click Here for More Examples

Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm

Drawback

Algorithm
The two primary drawbacks of the Apriori Algorithm are:
1. At each step, candidate sets have to be built.
2. To build the candidate sets, the algorithm has to repeatedly
scan the database
Apriori
Market Basket Analysis UNIT - IV
Apriori Algorithm

Drawback

Frequent Pattern (FP) Growth

Algorithm ● an improvement of apriori algorithm.

● used for ﬁnding frequent itemset in a transaction
database without candidate generation.
● represents frequent items in frequent pattern trees or
FP Growth
FP-tree.
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example

Algorithm

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example

Algorithm

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example

Algorithm
● minimum support be 3
● These elements are stored in descending order of their
respective frequencies.
● After insertion of the relevant items, the set L looks like this:-

FP Growth L = {K : 5, E : 4, M : 3, O : 4, Y : 3}
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example
Ordered-Item set
Algorithm

FP Growth

Item sorting : Items in a transaction are sorted in descending order of

support counts.
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example Tree Data Structure: Inserting the set {K, E, M, O, Y}

Algorithm

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example Tree Data Structure: Inserting the set {K, E,O, Y}

Algorithm

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example Tree Data Structure: Inserting the set {K, E,M}

Algorithm

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example Tree Data Structure: Inserting the set {K, M,Y}

Algorithm

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example Tree Data Structure: Inserting the set {K, E, O}

Algorithm

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example Conditional Pattern Base

Algorithm

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example Conditional Frequent Pattern Base

Algorithm
It is done by taking the set of elements that is common in all the paths in the
Conditional Pattern Base of that item and calculating its support count by summing the
support counts of all the paths in the Conditional Pattern Base.

FP Growth
Market Basket Analysis UNIT - IV

Frequent pattern growth

Example Frequent Pattern rules

Algorithm

FP Growth
Regression UNIT - IV

● Regression is a data mining function that predicts a number.

Algorithm

● Proﬁt, sale, mortgage rates, house values, square footage,

temperature or distance could all be predicted using regression
techniques.

● For example, a regression model could be used to predict the values of

Regression a data warehouse based on web-marketing, number of data entries,
size and other factors
Regression UNIT - IV

Algorithm ● A regression task begins with a data set in which the target values are
known.

● Regression analysis is a good choice when all of the predictor variables

are continuously valued as well.

Regression
● For an input x, if the output is continuous, this is called a regression
problem.
Regression UNIT - IV

● For example, based on historical information of demand for toothpaste

in your supermarket, you are asked to predict the demand for the next
Algorithm month.

● Regression is concerned with the prediction of continuous quantities.

● Linear regression is the oldest and most widely used predictive model
in ﬁeld of machine learning.
Regression
● The goal is to minimize the sum of the squared errors to ﬁt a straight
line to a set of data points.
Regression UNIT - IV
Regression Line

Least squares :

Algorithm ● The least squares regression line is the line that makes
the sum of squared residuals as small as possible.
● Linear means "straight line".

Regression Line :

● It is the line which gives the best estimate of one

Regression variable from the value of any other given variable.

● The regression line gives the average relationship

between the two variables in mathematical form.
Regression UNIT - IV
Regression Line Linear Regression

● For two variables X and Y, there are always two lines of regression.

Regression line of X on Y:
Algorithm
Gives the best estimate for the value of X for any speciﬁc given values of Y:

X=a+bY

where,
a = X - intercept
b = Slope of the line
Regression X = Dependent variable
Y = Independent variable
Regression UNIT - IV
Regression Line Linear Regression

● For two variables X and Y, there are always two lines of regression.

Algorithm

Regression
Regression UNIT - IV
Regression Line Linear Regression

● For two variables X and Y, there are always two lines of regression.

Regression line of Y on X:
Algorithm
Gives the best estimate for the value of Y for any speciﬁc given values of X:

Y=a+bX

where,
a = Y - intercept
b = Slope of the line
Regression X = Dependent variable
Y = Independent variable
Regression UNIT - IV
Regression Line

Linear Regression Example :

Algorithm
❏ The simplest form of regression to visualize is linear regression
with a single predictor.

❏ A linear regression technique can be used if relationship

between X and Y can approximated with a straight line.
Regression
Regression UNIT - IV
Regression Line

Linear Regression Example :

Algorithm Consider following data

(i) Find values of b0 and b1 w.r.t. linear regression model which best
ﬁts given data.

(ii) Interpret and explain equation of regression line.

Regression
(iii) If new person rates " Bahubali-Part-I" as 3 then predict the rating of
same person for "Bahubali-Part-II"
Regression UNIT - IV
Regression Line

Linear Regression Example :

Algorithm Person Xi = rating for movie "Bahubali- Yi = rating for movie

Part-I" by ith person "Bahubali-Part-II" by ith person

1st 4 3

2nd 2 4

3rd 3 2

Regression 4th 5 5

5th 1 3

6th 3 1
Regression UNIT - IV
Regression Line
Average of X Values
Linear Regression Example :
Average of Y Values
Algorithm

Regression
Regression UNIT - IV
Regression Line

values of β0 and β1 w.r.t. linear regression model

Algorithm

Regression
Regression UNIT - IV
Regression Line

Interpretation 1
Algorithm
For increase in value of x by 0.3 unit there is increases in value of y in
one units.

Interpretation 2

Even if x = 0 value of independent variable, it is expected that value of

Regression y is 2.1.
Regression UNIT - IV
Regression Line

● If new person rates " Bahubali-Part-I" as 3 then predict

Algorithm
the rating of same person for "Bahubali-Part-II"
○ For x=3 the y value will be
○ Y (Predicted) = 2.1 + 0.3 (3) = 2.1+ 0.9
● If new person rates " Bahubali-Part-I" as 3 then predict the
Regression rating of same person for "Bahubali-Part-II" is 3.9
Regression UNIT - IV
Logistic Regression

❏ Logistic regression is a form of regression analysis in which the

Algorithm
outcome variable is binary or dichotomous.

❏ A statistical method used to model dichotomous or binary outcomes

using predictor variables.

Regression
❏ Logistic component : Instead of modeling the outcome, Y, directly,
the method models the log odds (Y) using the logistic function.
Regression UNIT - IV
Logistic Regression

❏ Methods used to quantify association between an outcome and

Algorithm predictor variables. It could be used to build predictive models as a
function of predictors

❏ In simple logistic regression, logistic regression with 1 predictor

variable.

Regression Logistic
In [P/(1-P)] = a0 + a1X1 + a2X2 + -------------- + akXk
Regression
Regression UNIT - IV
Logistic Regression

Algorithm

Regression Logistic
In [P/(1-P)] = a0 + a1X1 + a2X2 + -------------- + akXk
Regression
UNIT - IV
Classification UNIT - IV

❏ It Predicts categorical labels (classes), prediction models continuous-valued functions.

❏ Classiﬁcation is considered to be supervised learning.

❏ Preprocessing of the data in preparation for classiﬁcation and prediction can involve data
cleaning to reduce noise or handle missing values,

❏ relevance analysis to remove irrelevant or redundant data transformation such as

generalizing the data to higher level concepts or normalizing data
Classification UNIT - IV

New example

Training example Machine Learning Rules for Predicted

labeled Algorithm Classiﬁcation Classiﬁcation
Classification UNIT - IV

Naïve Bayes algorithm is a supervised learning algorithm,

Naive Bayes
which is based on Bayes theorem and used for solving
classiﬁcation problems.

It is a part of classiﬁcation algorithm which also provides

Decision Tree
solutions to the regression problems using the classiﬁcation
rule
Naive Bayes Joint Probability Example UNIT - IV

P(Red and King)

Color
Type Red Black Total
King 2 2 4
Non-King 24 24 48
Total 26 26 52
Marginal Probability Example
UNIT - IV

P(King)

Color
Type Red Black Total
King 2 2 4
Non-King 24 24 48
Total 26 26 52
Conditional Probability Example UNIT - IV

From the face card the probability of selecting one card of the type Heart
and Jack is 1/12. Total number of face cards is 12, which have only one heart
of Jack.
Naïve Bayes Classiﬁcation UNIT - IV

Based on Bayes Rule

Naïve Bayes Classiﬁcation UNIT - IV
Naïve Bayes Classiﬁcation UNIT - IV

Finally, we classify X as RED since its class membership achieves the largest
posterior probability.
Naïve Bayes Solved Example UNIT - IV
Naïve Bayes Solved Example UNIT - IV

Conditional Probability
Naïve Bayes Solved Example UNIT - IV
Conditional Probability
Naïve Bayes Solved Example UNIT - IV
Example
In this example we have 4 inputs (predictors). The ﬁnal posterior probabilities can be standardized
between 0 and 1.
Naïve Bayes Solved Example UNIT - IV

P (N0 | Today) > P (Yes | Today)

So, prediction that golf would be played is ‘No’.

https://www.analyticsvidhya.com/blog/2021/09/naive-bayes-algorithm-a-complete-g
uide-for-data-science-enthusiasts/
Decision Tree UNIT - IV

• to create a training model that can use to predict the class or value of the
target variable by learning simple decision rules inferred from prior
data(training data).
• start from the root of the tree
• compare the values of the root attribute with the record’s attribute.
• On the basis of comparison, follow the branch corresponding to that value and
jump to the next node.
Decision Trees UNIT - IV
Decision Trees UNIT - IV

Each node is associated with a feature (one of the elements of a

feature vector that represent an object);
Each node test the value of its associated feature;
There is one branch for each value of the feature
Leaves specify the categories (classes)
Can categorize instances into multiple disjoint categories –
multi-class
Decision Trees - Algorithms UNIT - IV

ID3 → (extension of D3)

C4.5 → (successor of ID3)
CART → (Classiﬁcation And Regression Tree)
CHAID → (Chi-square automatic interaction detection Performs
multi-level splits when computing classiﬁcation trees)
MARS → (multivariate adaptive regression splines)s
Decision Trees - ID3 UNIT - IV

● The ID3 algorithm builds decision trees using a top-down greedy

search approach through the space of possible branches with no

backtracking.

● A greedy algorithm, as the name suggests, always makes the choice

that seems to be the best at that moment.

Decision Trees - ID3 UNIT - IV

1. It begins with the original set S as the root node.

2. On each iteration of the algorithm, it iterates through the very unused attribute of the
set S and calculates Entropy(H) and Information gain(IG) of this attribute.
3. It then selects the attribute which has the smallest Entropy or Largest Information
gain.
4. The set S is then split by the selected attribute to produce a subset of the data.
5. The algorithm continues to recur on each subset, considering only attributes never
selected before.
Decision Trees - Information Gain UNIT - IV

The amount of information improved in the nodes before splitting them for making further decisions.
Decision Trees - Information Gain UNIT - IV

Less Impurities More Impurities

Information Gain = 1 – Entropy

Decision Trees - Entropy UNIT - IV
● The entropy of any random variable or random process is the average level of
uncertainty involved in the possible outcome of the variable or process.
● To understand it more let’s take an example of a coin ﬂip
● two probabilities either it will be a tail, or it will be a head and if the probability of
tail after ﬂip is p then the probability of a head is 1-p.
● and the maximum uncertainty is for p = ½ when there is no reason to expect one
outcome over another.
● Here we can say that the entropy here is 1
Decision Trees - Entropy UNIT - IV
● Mathematically the formula for entropy is:

Where

X = random variable or process

Xi = possible outcomes

p(Xi) = probability of possible outcomes.

Decision Trees - Entropy UNIT - IV

% of enrolled for training = 50%

% of not % of enrolled for training = 50%

Let’s first calculate the entropy for the above-given situation.

Entropy = -(0.5) * log2(0.5) -(0.5) * log2(0.5) = 1

Decision Trees - Entropy UNIT - IV

% of enrolled for training = 0%

% of not % of enrolled for training = 100%

Let’s first calculate the entropy for the above-given situation.

Entropy = -(0) * log2(0) -(1) * log2(1) = 0

Decision Trees - Entropy UNIT - IV

● if a node is containing only one class in it or formally says the node of the
tree is pure the entropy for data in such node will be zero and according to
the information gain formula the information gained for such node will we
higher and purity is higher
● if the entropy is higher the information gain will be less and the node can
be considered as the less pure.
Decision Trees - Information Gain UNIT - IV

Gain (S, A) = expected reduction in entropy due to sorting on A

Values (A) is the set of all possible values for attribute A,

Sv is the subset of S which attribute A has value v,
|S| and | Sv | represent the number of samples in set S and set Sv respectively

Gain(S,A) is the expected reduction in entropy caused by knowing the value of attribute A.
Decision Trees UNIT - IV

❑ Play Tennis Example

❑ Feature values:
❑ Outlook = (sunny, overcast, rain)
❑ Temperature =(hot, mild, cool)
❑ Humidity = (high, normal)
❑ Wind =(strong, weak)
Decision Trees UNIT - IV

201
Decision Trees UNIT - IV

202
Decision Trees UNIT - IV
❑ Play Tennis Example
❑ Feature Vector = (Outlook, Temperature, Humidity, Wind)

Outlook
Sunny Overcast Rain
Humidity Wind
Yes
High Normal Strong Weak
No Yes No Yes
Decision Trees UNIT - IV

Node Node
associated associated
with a feature with a feature
Outlook
Sunny Overcast Rain
Humidity Yes Wind
High Normal Strong Weak
No Yes No Yes

Node
associated
with a feature
Decision Trees UNIT - IV

❑ Outlook = (sunny, overcast, rain)

One branch
One branch for each value
for each value Outlook
Sunny Overcast Rain
Humidity One branch Yes Wind
for each
High Normal value Strong Weak
No Yes No Yes
Decision Trees UNIT - IV

❑ Class = (Yes, No)

Outlook
Sunny Overcast Rain
Humidity Yes Wind
High Normal Strong Weak
No Yes No Yes

Leaf nodes
specify classes Leaf nodes
specify classes
Example UNIT - IV

Play Tennis Example

UNIT - IV
Example

Humidity

High Normal
3+,4- 6+,1-
E=.985 E=.592

Gain(S, Humidity) = .94 - 7/14 * 0.985 - 7/14 *.592 = 0.151

UNIT - IV
Example

Wind

Weak Strong
6+2- 3+,3-
E=.811 E=1.0

Gain(S, Wind) = .94 - 8/14 * 0.811 - 6/14 * 1.0 = 0.048

Example UNIT - IV

Outlook

Sunny Overcast Rain

1,2,8,9,11 3,7,12,13 4,5,6,10,14
2+,3- 4+,0- 3+,2-
0.970 0.0 0.970
Gain(S, Outlook) = 0.246
UNIT - IV
Example

Pick Outlook as the root

Outlook

Gain(S, Humidity) = 0.151

Sunny Overcast Rain Gain(S, Wind) = 0.048
Gain(S, Temperature) = 0.029
Gain(S, Outlook) = 0.246
UNIT - IV
Example

Outlook
PickSunny
OutlookOvercast
as the root
Rain
Yes
1,2,8,9,11 4,5,6,10,14
3,7,12,13 3+,2-
2+,3- 4+,0-
? ?

Continue until: Every attribute is included in path, or, all examples in the leaf
have same label
Example

Outlook

Sunny Overcast Rain

Yes
1,2,8,9,11 3,7,12,13
2+,3- 4+,0-
?

Gain (Ssunny, Humidity) = .97-(3/5) * 0-(2/5) * 0 = .97

Gain (Ssunny, Temp) = .97- 0-(2/5) *1 = .57
Gain (Ssunny, Wind) = .97-(2/5) *1 - (3/5) *.92 = .02
UNIT - IV
Example

Outlook

Sunny Overcast Rain

Yes
Humidity

High Normal
Gain (Ssunny, Humidity) = .97-(3/5) * 0-(2/5) * 0 = .97
No Yes Gain (Ssunny, Temp) = .97- 0-(2/5) *1 = .57
Gain (Ssunny, Wind) = .97-(2/5) *1 - (3/5) *.92 = .02

214
UNIT - IV
Example

Outlook

Sunny Overcast Rain

Yes
Humidity ?
4,5,6,10,14
High Normal 3+,2-

No Yes
Gain (Srain, Humidity) =
Gain (Srain, Temp) =
Gain (Srain, Wind) =
215
UNIT - IV
Example

Outlook

Sunny Overcast Rain

Yes
Humidity Wind

High Normal Strong Weak

No Yes No Yes

https://www.saedsayad.com/decision_tree.htm UNIT - IV

Record of Processing Activities RoPA - Template
No ratings yet
Record of Processing Activities RoPA - Template
13 pages
DA Unit - IV
No ratings yet
DA Unit - IV
229 pages
D P Lab Manual
No ratings yet
D P Lab Manual
54 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
Data Preprocessing
No ratings yet
Data Preprocessing
22 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
3 pages
DAL EXT 1 and 2
No ratings yet
DAL EXT 1 and 2
125 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
Datascience
No ratings yet
Datascience
26 pages
1st Class-Introduction and Python Package
No ratings yet
1st Class-Introduction and Python Package
93 pages
Set-B - CT2 - AnswerKey
No ratings yet
Set-B - CT2 - AnswerKey
10 pages
Prerequisite Session 4
No ratings yet
Prerequisite Session 4
12 pages
DSP U2
No ratings yet
DSP U2
172 pages
Foundations of Data Science
No ratings yet
Foundations of Data Science
3 pages
Ass1 DSBDA Writeup
No ratings yet
Ass1 DSBDA Writeup
8 pages
2A - Python+Data Analysis For Pyhton2 v2
No ratings yet
2A - Python+Data Analysis For Pyhton2 v2
38 pages
Data Processing With Python and R
No ratings yet
Data Processing With Python and R
6 pages
UNIT - Introduction - DataScience - New
No ratings yet
UNIT - Introduction - DataScience - New
55 pages
Advanced Python Lab
No ratings yet
Advanced Python Lab
17 pages
22am901 Data Science Using Python Unit 2
No ratings yet
22am901 Data Science Using Python Unit 2
116 pages
Data Science I: Charles C.N. Wang
No ratings yet
Data Science I: Charles C.N. Wang
68 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Data Preprocessing and Data Analysis Using Python
No ratings yet
Data Preprocessing and Data Analysis Using Python
32 pages
3 Data Preprocessing
No ratings yet
3 Data Preprocessing
33 pages
2 Data Preprocessing
No ratings yet
2 Data Preprocessing
57 pages
Set-D CT2 Answerkey
No ratings yet
Set-D CT2 Answerkey
11 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
CSC407 - Chapter 2-3
No ratings yet
CSC407 - Chapter 2-3
46 pages
DSV Manual Final
No ratings yet
DSV Manual Final
47 pages
DSP U1
No ratings yet
DSP U1
89 pages
Syllabus AIML
No ratings yet
Syllabus AIML
14 pages
Unit 4 PPT Part2 - Pandas
No ratings yet
Unit 4 PPT Part2 - Pandas
40 pages
DS Unit 2
No ratings yet
DS Unit 2
42 pages
Data Sciences Unit-I
No ratings yet
Data Sciences Unit-I
83 pages
5.DataScience THEORY LESSON PLAN
No ratings yet
5.DataScience THEORY LESSON PLAN
6 pages
21CSS303T Data Science Syllabus
No ratings yet
21CSS303T Data Science Syllabus
2 pages
16-Data Preprocessing
No ratings yet
16-Data Preprocessing
27 pages
Data Preprocessing-AIML Algorithm1
No ratings yet
Data Preprocessing-AIML Algorithm1
47 pages
Data Mining and Data Warehousing - Data Preprocessing - L03
No ratings yet
Data Mining and Data Warehousing - Data Preprocessing - L03
10 pages
Python For Data Science
No ratings yet
Python For Data Science
8 pages
S08 Slides
No ratings yet
S08 Slides
14 pages
Ip Kvs
No ratings yet
Ip Kvs
92 pages
Xii Ip Study Material
No ratings yet
Xii Ip Study Material
92 pages
UNIT 2 DT
No ratings yet
UNIT 2 DT
8 pages
ML Pgms - 24mar2025
No ratings yet
ML Pgms - 24mar2025
23 pages
Python For Data Science
No ratings yet
Python For Data Science
22 pages
Python Packages To Learn Data Science E-Book
No ratings yet
Python Packages To Learn Data Science E-Book
76 pages
DS-DS Lab-1
No ratings yet
DS-DS Lab-1
4 pages
MLT Lab Manual
No ratings yet
MLT Lab Manual
41 pages
CSE445 NSU Week - 3
No ratings yet
CSE445 NSU Week - 3
48 pages
Lect 04 Preprocessing Structured
No ratings yet
Lect 04 Preprocessing Structured
39 pages
Data Mining Assignment
No ratings yet
Data Mining Assignment
8 pages
Set-C AnsKey CT2
No ratings yet
Set-C AnsKey CT2
10 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
8 LO5 Lect 1
No ratings yet
8 LO5 Lect 1
16 pages
Datascience
No ratings yet
Datascience
8 pages
DWM Module 2
No ratings yet
DWM Module 2
9 pages
DSBDA Lab Manual
No ratings yet
DSBDA Lab Manual
155 pages
Data Science Bootcamp (Day-01) (1) - Compressed
No ratings yet
Data Science Bootcamp (Day-01) (1) - Compressed
161 pages
Scientific Computing with Python: Mastering Numpy and Scipy
From Everand
Scientific Computing with Python: Mastering Numpy and Scipy
John Smith
No ratings yet
Numpy Simply In Depth
From Everand
Numpy Simply In Depth
Ajit Singh
5/5 (1)
DBP2024Insights V2
No ratings yet
DBP2024Insights V2
36 pages
TNM-Malawi-Wireless - CSFB Failures in LTE Network
No ratings yet
TNM-Malawi-Wireless - CSFB Failures in LTE Network
25 pages
Time Management and Pupils Performance A
No ratings yet
Time Management and Pupils Performance A
10 pages
SQL Codes 20
No ratings yet
SQL Codes 20
12 pages
Replication of Views With SAP LT Replication Server: Projection View
No ratings yet
Replication of Views With SAP LT Replication Server: Projection View
10 pages
The Function of Secondary Storage
No ratings yet
The Function of Secondary Storage
7 pages
VJA 3330 3130 Lesson 5 Communication March 2015
No ratings yet
VJA 3330 3130 Lesson 5 Communication March 2015
38 pages
Literature Review On Legal Aid
100% (2)
Literature Review On Legal Aid
8 pages
Introduction To Data Analysis and Decision Making
No ratings yet
Introduction To Data Analysis and Decision Making
11 pages
553 Biology Check List Term 2
No ratings yet
553 Biology Check List Term 2
5 pages
Antons Et Al 2021 Computational Literature Reviews Method Algorithms and Roadmap
No ratings yet
Antons Et Al 2021 Computational Literature Reviews Method Algorithms and Roadmap
32 pages
Data Visualization in Decision Making Tools
No ratings yet
Data Visualization in Decision Making Tools
3 pages
Artikel Icesre 2024
No ratings yet
Artikel Icesre 2024
16 pages
Basic Linux Commands: Mkdir - Make Directories
No ratings yet
Basic Linux Commands: Mkdir - Make Directories
5 pages
Railway Route Optimization
No ratings yet
Railway Route Optimization
2 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
Tittle of Proposal: An Assessment of Factors Contributing To Medicines and Pharmaceuticals Expiries at Zambia Medicines and Medical Supplies Agency
No ratings yet
Tittle of Proposal: An Assessment of Factors Contributing To Medicines and Pharmaceuticals Expiries at Zambia Medicines and Medical Supplies Agency
23 pages
02 - Basic Data Warehousing & Architectures
No ratings yet
02 - Basic Data Warehousing & Architectures
51 pages
Map Making in Arcgis 9.3: Getting Started
No ratings yet
Map Making in Arcgis 9.3: Getting Started
15 pages
Introduction To Batch Processing
No ratings yet
Introduction To Batch Processing
23 pages
Desalegn Amlaku
No ratings yet
Desalegn Amlaku
105 pages
Missing Value Paper
No ratings yet
Missing Value Paper
10 pages
QUIZ Merged Merged
No ratings yet
QUIZ Merged Merged
89 pages
Participant Observation
100% (1)
Participant Observation
38 pages
Eye Tracking 2017
No ratings yet
Eye Tracking 2017
26 pages
Retail Store Operation
100% (1)
Retail Store Operation
43 pages
Call PLSQL From BIP Through A Procedure Call: Solution Details
No ratings yet
Call PLSQL From BIP Through A Procedure Call: Solution Details
6 pages
Quezon City Polytechnic University: Grading System
No ratings yet
Quezon City Polytechnic University: Grading System
12 pages
GIS Lab Manual
100% (2)
GIS Lab Manual
10 pages