Untitled

Download as pdf or txt
Download as pdf or txt
You are on page 1of 259

1.

Introduction

2. Introduction to Business Analytics

3. Evolution of Business Analytics

4. Business Analytics Applications

5. Importance of Business Analytics

6. A Categorization of Analytical Methods and Models

1. Descriptive Analytics

2. Predictive Analytics

3. Prescriptive Analytics

4. Inferential Analytics

5. Decision Analytics

7. Big Data Analytics


Analysis

Information

Data

Opinions
Introduction to Business Analytics
• Definition and meaning of Business Analytics

• Evolution of Business Analytics

• Steps involved in Business Analytics

• Scope and Importance of Business analytics

• Limitations of Business analytics

• Business Analytics vs Big Data analytics

• Business Analytics vs Data Science


Definition
Analytics is the use of:

Data,

Information technology,

Statistical analysis,

Quantitative methods, and

Mathematical or computer-based models

to help managers gain improved insight about their business operations and

make better, fact-based decisions.


Definition -

“Business analytics refers to the skills, technologies, and

practices for continuous iterative exploration and investigation of past

business performance to gain insight and drive business planning.”


Definition
• Business analytics (BA) is the iterative, methodical exploration of an organization's data, with an

emphasis on statistical analysis.

• Business analytics is used by companies that are committed to making data-driven decisions.

• Data-driven companies treat their data as a corporate asset and actively look for ways to turn it

into a competitive advantage.

• Successful business analytics depends on data quality, skilled analysts who understand the

technologies and the business, and

• An organizational commitment to using data to gain insights that inform business decisions.
Definition of Business Analytics
• Business analytics, is the data management solution.

• It is business intelligence subset, refers to the use of methodologies such as

• Data mining

• Predictive analytics, and

• Statistical analysis

• In order to analyze and transform data into useful information,

• Identify and anticipate trends and outcomes, and

• Ultimately make smarter, data-driven business decisions.


Evolution of Business Analytics
BA in the 1800s - The need to stay ahead
BA in the late 1800s -The Advent of Scientific Management
BA in the early 1900s -The Transformation of the Manufacturing Industry
BA in the 1950s -The first hard drive disk by IBM
BA in the late 1900s -The Emergence of Business Intelligence
BA in the new Millennium Availability of different analytical solutions
BA in 2005 Accessibility of Data for the Common People
BA from 2005 to 2020 -The Bread and Butter for Companies globally
Steps involved in Business Analytics
Data Aggregation

Data Mining

Association and Sequence Identification

Text Mining

Exploration

Forecasting

Predictive Analytics

Data Visualization
1. Data Aggregation: is known as combining of data
Prior to analysis
Data must first be gathered
Organized, and
Filtered,
Either through volunteered data or transactional records
2. Data Mining: Data mining for business analytics
Sorts through large datasets using
Databases
Statistics, and
Machine learning to
Identify trends and establish relationships
3. Association and Sequence Identification: the identification of predictable

actions that are performed in association with other actions or sequentially

4. Text Mining: explores and organizes large, unstructured text datasets for the

purpose of qualitative and quantitative analysis

5. Forecasting: Analyzes historical data from a specific period in order to make

informed estimates that are predictive in determining future events or behaviors


6.Predictive Analytics: Predictive business analytics uses a variety of statistical techniques to

create predictive models, which extract information from datasets, identify patterns, and

provide a predictive score for an array of organizational outcomes.

7.Optimization:Once trends have been identified and predictions have been made, businesses

can engage simulation techniques to test out best-case scenarios

8.Data Visualization: Provides visual representations such as charts and graphs for easy and

quick data analysis

Sample Preview
Scope of Business Analytics
Scope of Business Analytics
Descriptive Analytics
- uses data to understand past and present
Predictive analytics
- analyzes past performance
Prescriptive analytics
- uses optimization techniques
1-15
Scope of Business Analytics
Example 1.1 Retail Markdown Decisions

 Most department stores clear seasonal inventory by reducing prices.


 The question is:
When to reduce the price and by how much?

 Descriptive analytics: examine historical data for similar products (prices, units sold,
advertising, …)

 Predictive analytics: predict sales based on price

 Prescriptive analytics: find the best sets of pricing and advertising to maximize sales
revenue
Scope of Business Analytics
 Descriptive Analytics

- uses data to understand past and present

 Predictive Analytics

- analyzes past performance

 Prescriptive Analytics

- uses optimization techniques


Scope of Business Analytics
Example 1.1 Retail Markdown Decisions

 Most department stores clear seasonal inventory by reducing prices.


 The question is :
When to reduce the price and by how much?

 Descriptive analytics: examine historical data for similar products (prices, units
sold, advertising, …)

 Predictive analytics: predict sales based on price

 Prescriptive analytics: find the best sets of pricing and advertising to maximize
sales revenue 1-18
Scope of Business Analytics
Analytics in Practice: O Y O R o o m s
•OYO owns numerous hotels and casinos
•Uses analytics to:
- Forecast demand for rooms
- Segment customers by gaming activities
•Uses prescriptive models to:
- Set room rates
- Allocate rooms
- Offer perks and rewards to customers
Reasons Why Business Analytics is Important
Enhance Make Reduce
Improve
Customer Informed Employee
Efficiency
Experience Decisions Turnover

Cut Make The


Improved
Identify Frauds Manufacturing Most Of Your
Advertising
Costs Investment

Accelerate Conduct A
Better Product Tackle
Through Competitor
Management Problems
Uncertainty Analysis
Disadvantages of Business Analytics Limitations
• Lack of alignment, availability and trust
• In most organizations, the analysts are organized according to the business domains.
• Unfortunately, the analysis is shared with the top executives and thus the results are not easily
communicated to the business users for whom they provide the greatest value.

• Lack of Commitment
• Since the solutions that are prefabricated from the analysts are not particularly difficult to
implement; they can be very costly, and the ROI is not immediate.
• By nature, these analytics models are prepared to improve accuracy over time but
• It is a complex model that requires dedication to implement the solution.
• Because the business users do not see the promised results immediately, they lose interest which
results in loss of trust as a result of which the models fail.
• Low quality of underlying transactional data
• Implementation of the solutions provided by the business analysts fail because either
the data is not available
• The data sources are too complex or they are poorly constructed.
• Need for domain
• Business analytics requires a dedicated and coherent approach and a good level of
maturity.
Planning
What is the plan?
Analytics

What happened Descriptive


in the past? Analytics

Types of What was the


Diagnostic
Business reason for the
Analytics
Analytics occurrence?

What is going to
Predictive
happen in the
Analytics
future?

What should we Prescriptive


do about it? Analytics
What is Descriptive Analytics?

• Descriptive analytics is a field of statistics that focuses on gathering and

summarizing raw data to be easily interpreted.

1. Descriptive Analytics concentrate on

2. Historical data,

3. Providing the context

4. Understanding information and

5. Numbers.
• It is a Preliminary step in the Business management process

• It Creates a foundation for further analysis and understanding.

• Descriptive analytics seeks answers about “What Happened”

• Descriptive analytics is usually the First Step, and

• This will result in visualizations like


• Pie charts,

• line graphs,

• Bar charts, and

• Other simpler graphical displays.


• DA Summarize data from individual respondents, etc.

• They help to make sense of large numbers of individual responses, to communicate

the essence of those responses to others.

• They focus on typical or average scores, the dispersion of scores over the

available responses, and the shape of the response curve

• The are used for observations, case studies, and surveys.


How Can we Use Descriptive Analytics?
• Even without knowing it, many organizations use descriptive analytics
extensively in their everyday operations.
• For most businesses, descriptive analytics form the core of their everyday
reporting.
• This includes simpler reports such as inventory, workflow, warehousing, and
sales.
• Which can be aggregated easily and provide a clear picture of a company’s
operations.
• Used in operations revolves around annual revenue reports.
Understanding the Different Types of Descriptive Statistics
• Frequency Distribution
• Used for both quantitative and qualitative data, ( CountA, Countblanks, Countif,Countifs)
• The frequency distribution is normally presented in a table or a graph.
• Summary of grouped data that’s been categorized based on mutually exclusive classes and the
number of occurrences in each respective class

• Central Tendency
Mean

Central
Tendency

Mode Median
Dispersion :
Two data set can have similar means but may have differences in
dispersion.
For example: Data set A & B
Understanding the Different Types of Descriptive Statistics

• Variability
• A measure of variability is a summary statistic reflecting the degree of dispersion in a
sample.

• The measures of variability determine how far apart the data points appear to fall from
the center.

• Dispersion, spread, and variability all refer to and denote the range and width of the
distribution of values in a data set.

• The range, standard deviation, and variance are used respectively, to depict different
components and aspects of the spread.
Mean

Measure of
Median
Central Tendency

Mode

Range
Continuous
Variable Measures of
Variance
Dispersion
Standard
Deviation

Skewness
Des Statistics Distribution
Forensics
Kurtosis

Charts
Discrete
Frequency
(Categorical Graphs
Analysis
Variables)
Crosstabs
Descriptive Analytics
• If we wanted to characterize the students in this class we would find that they are:

• Young

• From Vijayawada

• Fit

• Female

• How young?

• How fit is this class?

• What is the distribution of males and females?


Reference cases for Descriptive Analytics
• Variability
• https://corporatefinanceinstitute.com/resources/knowledge/other/variability/
Diagnostic Analytics?

• Diagnostic analytics is a form of advanced analytics that examines data or

content to answer the question, “Why did it Happen?”

• It is characterized by techniques such as Drill-Down, Data Discovery, Data

Mining and Correlations.


Application of Diagnostic Analytics
It’s doing a deep-dive into your data to search for valuable insights.

Diagnostic analytics takes it a step further to uncover the reasoning behind certain results.

Diagnostic analytics is usually performed using such techniques as data discovery, drill-down,

data mining, and correlations.

In the discovery process, analysts identify the data sources that will help them interpret the

results.

Drilling down involves focusing on a certain facet of the data or particular widget.
Importance of Diagnostic Analytics?
• Translating your complex data into visualizations and insights that everyone can take advantage of it.

• Diagnostic analytics helps you get value out of your data by asking the right questions and making
deep dives for the answers.

• The decisions with the most chances of success.

• It not only helps company heads make more accurate decisions but can even curate a more data-driven
culture.

• A data-driven culture leads to a refinement of the collection, curation, analysis and diagnosis of data,

• Which creates greater awareness of how the company operates.


A Complete Picture for Business Leaders :

With judicious use of Diagnostic Analytics

Company heads can make decisions that lead to year-on-year growth.

While cutting costs at the same time.

Comprehensive picture of the situation to make a well-informed decision.

Integrates Internal and External Sources


 Diagnostic analytics draws data from both internal and external sources to illustrate a series
of connections and correlations between two variables.
For Example, A retail store can discover sales based on location, weather, traffic, parking and
other variables. Something companies found challenging to accomplish without a data analytics
tool.
What is Predictive Analytics?
• Predictive analytics uses statistical analysis and machine learning to predict
the probability of a certain event occurring in the future for a set of historical
data points.
• Predictive analytics takes historical data and feeds it into a machine
learning model that considers key trends and patterns.
• The model is then applied to current data to predict what will happen next.
Prescriptive Analytics
Prescriptive analytics focuses on finding the best course of action in a scenario, given the

available data. It’s related to both descriptive analytics and predictive analytics, but emphasizes

actionable insights Instead of Data Monitoring.

 Predictive analytics focuses on forecasting possible outcomes

 Prescriptive analytics aims to find the best solution given a variety of choices.

 The field also empowers companies to make decisions based on optimizing the result of

future events or risks, and provides a model to study them.


Importance and functions Prescriptive analytics
• This includes combining existing conditions and considering the consequences of each
decision to determine how the future would be impacted.

• It can measure the repercussions (Consequences) of a decision based on different


possible future scenarios.

• The field borrows heavily from mathematics and computer science, using a variety of
statistical methods.

• The process creates and re-creates possible decision patterns that could affect an
organization in different ways.

• Prescriptive analytics is the final step of business analytics.


Prescriptive analysis has benefits such as:
• Optimization of processes, campaigns, and strategies.
• Minimizes maintenance needs and interconnects them for better conditions.
• Reduce costs without affecting performance.
• It increases the likelihood that companies will approach and plan for internal growth
properly.
• Qualitative research method — know the characteristics that distinguish it.
• Production optimization.
• Efficient supply chain management.
• Improved customer service and experience.
Diagnostic Descriptive Predictive Prescriptive

Uses Historical Data Uses Historical Data Uses Historical Data Uses Historical Data

Identify Data Reconfigures data into Fills Gaps in Available Estimates outcomes
anomalies easy to read Formats Data based on variables

Highlights data Describes the state of your Offers suggestions


Creates Data Model
trends business operations about outcomes

Investigates Forecast Potential Future Uses algorithms,


learn from the past
underlying issues Outcomes AI Machine Learning

Answers "Why" Answers "What Might Answers "if the"


Answer "What" Questions
Questions Happen" Questions
BIG DATA
• Farnam Jahanian (NSF) National Science Foundation

“Big Data is characterized not only by the enormous volume of

data but also by the diversity and heterogeneity of the data and

the velocity of its generation.”

• Nuala O’Connor Kelly (GE)

“it’s the volume and velocity and variety of data… to achieve new

results for …” 49
Nick Combs (EMC) , Egan, Marino Corporation DELL EMC

“It’s needle in a haystack or connecting the dots.”

Arvind Krishna (IBM) added the FOURTH V:

Veracity: data in doubt

Describe 'contradictory data,' or noisy data

Farnam Jahanian kicked off a May 1, 2012 briefing, calling data

“a transformative new currency for science, engineering, education, and

commerce.”
Scale of Data
MB GB TB Peta
Sensors

BIG DATA
RFID
Mobile Web
Web Logs
Sentiments

WEB
Customer History
User Generated
Dynamic Pricing Content
Segmentation
Affiliate Networks Social Interaction

CRM
Targeting
Customer & Feeds
Position Behaviour
Purchase GPS Co Ordinates
Pricing Methods Preferences
ERP

Payment Business Data


Contacts Decision Mapping Feeds
Sales
Feedbacks SMS & MMS
Expenditure
Communication Product & Service
Profits System Logs
Losses Delegation
Assets Duties
Emp
Dept
Products
Portfolio
UNIT II
Business Intelligence & Data
Visualization
Overview of Business Intelligence

Data Visualization

Effective Design Techniques

Principles of Effective Data Dashboards

Popular BI Tools

ETL (Extract-Transform-Load)
Definition and meaning of Business Intelligence
Business intelligence (BI)

leverages software and services to

Transform data

Into actionable insights that

Inform an organization’s

Its strategic and tactical business decisions.


What Is Business Intelligence?
• Howard Dresner, a Gartner analyst, coined the BI term in the early 1990s

• Today there is much discussion of analytics

• There are many BI definitions, but the following is useful

Business intelligence (BI) is a broad category of applications,

technologies, and processes for gathering, storing, accessing, and analyzing

data to help business users make better decisions.


B I has the following Pre-Requisites
• Data is available in many forms, shapes and formats.
• Broadly, data can be either structured or unstructured.
• Data that's properly organized, with well-defined constraints and relationships
among its different parts, can be considered as structured.
What Is Business Intelligence
Business Intelligence Procedure

Extract
Transform
Load
Advantages of Business Intelligence

It allows for
easy analytics.
It streamlines
business
It gives a bird's processes:
eye view:
Fix
Accountability
To improve
visibility
Boost
productivity
Advantages of Business Intelligence

Here are some of the advantages of using Business Intelligence System:

1. Boost productivity

With a BI program, It is possible for businesses to create reports with a single click thus saves

lots of time and resources. It also allows employees to be more productive on their tasks.

2. To improve visibility

BI also helps to improve the visibility of these processes and make it possible to identify any

areas which need attention.

3. Fix Accountability

BI system assigns accountability in the organization as there must be someone who should own
• 4. It gives a bird's eye view:
• BI system also helps organizations as decision makers get an overall bird's eye
view through typical BI features like dashboards and scorecards.
• 5. It streamlines business processes:
• BI takes out all complexity associated with business processes.
• It also automates analytics by offering predictive analysis, computer modeling,
benchmarking and other methodologies.
• 6. It allows for easy analytics.
• BI software has democratized its usage, allowing even nontechnical or non-
analysts users to collect and process data quickly. This also allows putting the
power of analytics from the hand's many people
Dis Advantages of BI
•Cost
•Complexity:
•Limited use
•Time Consuming
•Implementation
BI System Disadvantages
Cost:

• Business intelligence can prove costly for small as well as for medium-sized enterprises. The use of such type of system may be
expensive for routine business transactions.

Complexity:

• Another drawback of BI is its complexity in implementation of datawarehouse. It can be so complex that it can make business techniques
rigid to deal with.

Limited use

• Like all improved technologies, BI was first established keeping in consideration the buying competence of rich firms. Therefore, BI
system is yet not affordable for many small and medium size companies.

Time Consuming Implementation

• It takes almost one and half year for data warehousing system to be completely implemented. Therefore, it is a time-consuming process.
Data Visualization
Data Visualization

Data visualization is a graphic representation that expresses the

significance of data.

It reveals insights and patterns that are not immediately visible in the raw

data.

It is an art through which information, numbers, and measurements can be

made more understandable.


• Here are a few additional statistics highlighting the importance of data

visualization over text when presenting information:

• 90% of the information transmitted to the Brain is visual

• Humans process images 60,000 times faster than text

• 70% of our sensory receptors are in our eyes

• 65% of people are visual learners


Effective Design Techniques
Informat
ive
Predictive
Efficient
(Optional)

Is More Appealin
digestible g

Is more
shareable Importance of Intuitive

Data
Visualization
Makes
for better
Fast
decision
making

Saves
Flexible
time

Interacti
Insight
ve
Why data visualization is such a powerful tool:

• Intuitive: Presenting a graph as a node-link structure instantly makes sense, even to people
who have never worked with graphs before.

• Fast: It is fast because our brains are great at identifying patterns, but only when data is
presented in a tangible format. Armed with visualization, we can spot trends and outliers
very effectively.

• Flexible: The world is densely connected, so as long as there is an interesting relationship


in your data somewhere, you will find value in graph visualization.

• Insightful: Exploring graph data interactively allows users to gain more in-depth
knowledge, understand the context and ask more questions, compared to static
visualization or raw data.
Why data visualization is such a powerful tool:

The visualization should be able to convey the


Informative desired information from the data to the
reader.
Efficient The visualization should not be ambiguous.
The visualization should be captivating and
Appealing
visually pleasing.
The visualizations can contain variables and
Interactive and
filters with which the users may interact to
Predictive (Optional)
predict results of different scenarios.
Principles of Effective Data Dashboards
Principle of Good Visualization

Figure and
Proximity Similarity Enclosure Symmetry Closure Continuity Connection
ground
Principle Description

Proximity White space can be used to group elements together and separate others

Similarity Objects that look similar are instinctively grouped together in our minds

Enclosure Helps distinguish between groups

Symmetry Objects should not be out of balance, or missing, or wrong.

Closure We tend to complete shapes and paths even if part of them is missing

Continuity We tend to continue shapes beyond their ending points (similar to closure)

Connection Helps group elements together

Figure and ground We typically notice only one of several main visual aspects of a graph;
Stacked
Column Doughnut
Column Line Chart Pie Chart
Chart Chart
Chart

Box &
X Y Scatter
Bar Chart Area Chart Histogram Whisker
Chart
Chart

Tree Map
Funnel Chart Bubble Chart Surface Chart Radar Chart
Chart

Control
Gantt Chart
Charts
Popular BI Tools
WHAT ARE BI TOOLS?

BI Tools are types of software used to gather, process, analyze,

and visualize large volumes of past, current, and future data in

order to generate actionable business insights, create interactive

reports, and simplify the decision-making processes.


They bring together all relevant data
Their true self-service analytics approaches unlock data
access
Users can take advantage of predictions

They eliminate manual tasks

They reduce business costs

They’re constantly at your service, 24/7/365


• THE BENEFITS OF BUSINESS INTELLIGENCE TOOLS
• 1. They bring together all relevant data:
• Whether you work in a small company or large enterprise, you probably
collect data from various portals, ERPs, CRMs, flat files, databases,
APIs, and much more.
• You need to obtain a high level of data intelligence to be able to manage all
these sources and develop a better understanding of the collected
information.
• That’s why utilizing modern data connectors will help you in centralizing the
disparate sources and provide you with a single point of view on all your
business processes.
• That way, identifying issues, trends, and taking action are closely connected and
based solely on data.
2. Their true self-service analytics approaches unlock data access:
When each person in the company is equipped with modern business
intelligence software that will enable him/her to explore the data on their own,
the need to request reports from the IT department is significantly reduced.
This self-service BI approach gives organizations a competitive advantage
because each employee will be equipped with the right amount of data analytics
skills that will, ultimately, save the company’s time and resources while
unburdening the IT department, hence, enabling them to focus on other critical
tasks.
• 3. Users can take advantage of predictions:

• Predictive analytics doesn’t need to be a specialty of data scientists or analysts.


With the integration of forecast engines business users can generate insights
for future scenarios that will help them in adjusting current strategies to deliver
the best possible results.

• On the other hand, if a business condition changes, intelligent data


alerts safeguard the anomalies that can occur while you manage huge
amounts of data, and discover new trends and patterns that will enable you to
react immediately.
• 4. They eliminate manual tasks:
• While traditional means of business management encourage the use of
spreadsheets and static presentations, modern software eliminates endless
amounts of rows and columns and facilitates the automation of processes.
• The tool updates your KPI (Key Process Indicators) dashboard itself with real-
time data.
• You can automate the reporting process with specified time intervals and purely
look at the results.
• Simply Drag-and-drop Your Values and see how you can easily create a
powerful interactive dashboard that enables you to directly interact with
your screen.
5. They reduce business costs:

 From sales planning and customer behavior analysis to real-time

process monitoring and offer optimization.

 BI platforms enable faster planning, analysis, and reporting

processes.

 If you can work fast and accurate, you can achieve far better

business results and make profitable adjustments.


6. They’re constantly at your service, 24/7/365:

• Various organizations require various needs and the Software-as-a-Service


model.

• According to your needs, the software can scale or de-scale, thus, adjusting
to the specific needs of a company.

• Since the data is stored on a cloud, you have non-stop access to the software.

• You can fully explore various self-service analytics features no matter if


you’re a manager, data scientist, analyst or consultant.
Various BI Tools
System Application & Data
Process
Statistical Analysis System
ETL

( E x t r a c t - Tr a n s f o r m - L o a d )
WHAT IS ETL?

•EXTRACT …
•TRANSFORM …
•LOAD
WHY ETL?

• Companies need a way to analyze their data for critical business decisions.

• Transactional Database can’t answer complex business questions.

• A data warehouse provide a common data repository.

• ETL provide a method of moving the data from various source into a data
warehouse
ETL CONCEPT

• A Company data may be scattered in different locations and in different

formats.

• ETL Allows you to: Migrate the data into a data warehouse.

• Convert the various formats and types to adhere to one consistent system.

• ETL is a predefined process for access and manipulate source data and loading

it into a target database.


ETL REQUIREMENTS
Any ETL Architecture must meet the following requirements:
Business Requirement
Compliance Requirement
Data Profiling
Data Security
Data Integration
Right Data at Right Time
Archiving & Uneage
Final End User Delivery Interface
Available Skills
Legacy License
Alignment with overall Enterprise Architecture
EXTRACT
• Gathering the data

• Raw data that was written directly into the disk

• Data written to flat files or relational tables from structured source


systems

• Data can be read multiple times, if needed.

• Cleansing the data

• Eliminate duplicates or fragmented data (Uneven, Split)

• Exclude unwanted / unneeded information


TRANSFORM

• Preparing the data to be housed in the data warehouse.

• Converting the extracted data

• Using rules and lookup tables

• Combining data

• Verification / Validity checks

• Standardization
LOAD

• Storing the transformed data in the data warehouse.

• It could be Batch/Real-time processing

• Can follow few schema etc,


UNIT III
• Unit III: Data Mining
Data Sampling
Data Preparation
Treatment of Missing Data
Identification of Outliers and Inaccurate Data
Variable Representation
Definition and meaning of Data Mining
• Data mining (knowledge discovery from data)

• Extraction of interesting (Non-Significant, Implicit, Previously unknown and Potentially useful) patterns or

knowledge from huge amount of data

• Alternative name

• Knowledge discovery in databases (KDD)

• Watch out: Is everything “data mining”?

• Query processing

• Expert systems or statistical programs


Why Does Data Mining is required
• Data analysis and Decision Support
• Market analysis and management
• Target marketing, customer relationship management (CRM)
• Market basket analysis, Market Segmentation
• Risk analysis and management
• Forecasting, customer retention, quality control, competitive analysis
• Fraud detection and detection of unusual patterns (outliers)
• Other Applications
• Text mining (news group, email, documents) and Web mining
• Stream data mining
• Bioinformatics and bio-data analysis
130
Where Does Data Come from Business
• Where does the data come from?
• Credit card transactions, discount coupons, customer complaint calls
• Target marketing
• Find clusters of “model” customers who share the same characteristics: interest, income level,
spending habits, etc.
• Determine customer purchasing patterns over time
• Cross-market analysis
• Associations/co-relations between product sales, & prediction based on such association
• Customer profiling
• What types of customers buy what products
• Customer requirement analysis
• Identifying the best products for different customers
• Predict what factors will attract new customers
131
• Approaches: Clustering & model construction for frauds, outlier analysis
• Applications: Health care, retail, credit card service, telecomm.
• Medical insurance
• Professional patients, and ring of doctors
• Unnecessary or correlated screening tests
• Telecommunications:
• Phone call model: destination of the call, duration, time of day or week. Analyze patterns
that deviate from an expected norm
• Retail industry
• Analysts estimate that 38% of retail shrink is due to dishonest employees
• Internet Web Surf-Aid
• IBM Surf-Aid applies data mining algorithms to Web access logs for market-related pages to
discover customer preference and behavior pages, analyzing effectiveness of Web marketing,
improving Web site organization, etc.
132
Why use Data mining? ,,, Some most important reasons for using Data mining are

Establish relevance and relationships amongst data. Use this information to generate profitable
insights
Business can make informed decisions quickly

Helps to find out unusual shopping patterns in grocery stores.

Optimize website business by providing customize offers to each visitor.

Helps to measure customer's response rates in business marketing.

Creating and maintaining new customer groups for marketing purposes.

Predict customer defections, like which customers are more likely to switch to another supplier
in the nearest future.
Differentiate between profitable and unprofitable customers.

Identify all kind of suspicious behavior, as part of a fraud detection process.


Data Sampling
Sampling is a technique of selecting individual members or

a subset of the population to make statistical inferences from

them and estimate characteristics of the whole population


Population

In statistics, a population is the pool of individuals from which a

statistical sample is drawn for a study. Thus, any selection of individuals

grouped together by a common feature can be said to be a population.


SAMPLING…….

Target
Population
Study
Population

Sample

136
Simple random sample

Systematic random sample

Stratified random sample


Probability (Random)
Samples
Multistage sample

Multiphase sample

Types of Samples
Cluster sample

Convenience sample

Non-Probability
Purposive sample
Samples

Quota
Probability Sampling

Probability sampling is defined as a sampling technique in which the

researcher chooses samples from a larger population using a method based on

the theory of probability


Simple random sample would be the names of 25 employees being chosen out

of a hat from a company of 250 employees


Systematic Random Sampling
If a local NGO is seeking to form a systematic sample of 500 volunteers from
a population of 5000, they can select every 10th person in the population
Stratified Random Sampling
The Stratified Sampling is a sampling technique wherein the population is sub-divided into

homogeneous groups, called as 'strata', from which the samples are selected on a random basis.
Multi Stage Sampling
Multiphase Sample ( Time Based)
Cluster Sampling
Non-Probability Sampling

In non-probability sampling, the sample is selected based

on non-random criteria, and not every member of the population

has a chance of being included


Convenience Sampling
Data Preparation
Treatment of Missing Data

1. Missing data, or missing values, occur when no data value is stored for the variable in an

observation.

2. Missing data are a common occurrence and can have a significant effect on the conclusions that can

be drawn from the data.


D.B Rubin (1976) classified missing data problems into three categories

• Missing Completely At Random (MCAR):


• When missing values are randomly distributed across all observations, then we consider the data to be missing completely at
random.
• A quick check for this is to compare two parts of data – one with missing observations and the other without missing
observations.

• Missing At Random (MAR): The key difference between MCAR and MAR is that under MAR the data is not
missing randomly across all observations but is missing randomly only within sub-samples of data.
• For example, if high school GPA data is missing randomly across all schools in a district, that data will be considered MCAR.
However, if data is randomly missing for students in specific schools of the district, then the data is MAR.

• Not Missing At Random (NMAR): When the missing data has a structure to it, we cannot treat it as missing
at random.
• In the above example, if the data was missing for all students from specific schools, then the data cannot be treated as MAR.
How to Handle Missing Values
Q : What kind of a house you stay
Q2 :What type of house you stay
Q3 What is your income
S No Income House (KIND) House (KIND)
1 13 Individual 1
2 11 Group House 3
3 15 Apartment 1
4 13 Group House 2
5 11 Apartment 3
6 11 Apartment 1
7 10 Individual 3
8 14 Group House 2
9 12 Apartment 2
10 10 Individual 3
11 14 Group House 3
12 15 Individual 2
13 12 Apartment 1
14 11 Individual 3
15 13 Apartment 1
S No Income House (KIND) House (KIND)
1 13 Individual 1
2 Group House
3 15 Apartment 1
4 13 2
5 Apartment 3
6 11 Apartment 1
7 10 Individual 3
8 14 Group House
9 2
10 10 Individual 3
11 14 Group House 3
12 15 Individual
13 12 1
14 11 Individual 3
15 13 Apartment 1
Identification of Outliers and Inaccurate Data

An outlier is an observation that lies an abnormal distance from other


values in a random sample from a population.
Examination of the data for unusual observations that are far removed
from the mass of data. These points are often referred to as outliers.
What Are Outliers?
• Outlier: A data object that deviates significantly from the normal objects as if it were
generated by a different mechanism
• Ex.: Unusual credit card purchase
• Outliers are different from the noise data
• Noise is random error or variance in a measured variable
• Noise should be removed before outlier detection

158
Types of Outliers (I)
• Three kinds: global, contextual and collective outliers
Global Outlier
• Global outlier (or point anomaly)
• Object is Og if it significantly deviates from the rest of the data
set
• Contextual outlier (or conditional outlier)
• Object is Oc if it deviates significantly based on a selected context
• Ex. 80o F in NYC: outlier? (depending on summer or winter?)
• Attributes of data objects should be divided into two groups
• Contextual attributes: defines the context, e.g., time &
location
• Behavioral attributes: characteristics of the object, used in
outlier evaluation, e.g., temperature, pressure, humidity
• Issue: How to define or formulate meaningful context?
159
Types of Outliers (II)
• Collective Outliers
• A subset of data objects collectively deviate significantly from
the whole data set, even if the individual data objects may not
be outliers
Collective Outlier
• Denial-of-service packages to each other

 Detection of collective outliers


 Consider not only behavior of individual objects, but also that of

groups of objects
 Need to have the background knowledge on the relationship

among data objects, such as a distance or similarity measure on


objects.

160
Unit IV
• Unit IV: Analytics in Business functions
• 4.1 Financial Analytics
• 4.2 Human Resource (HR) Analytics
• 4.3 Marketing Analytics
• 4.4 Health Care Analytics
• 4.5 Supply Chain Analytics
Financial Analytics
• Financial analytics is the creation of adhoc analysis to answer specific
business questions and forecast possible future financial scenarios.

• The goal of financial analytics is to shape the strategy for business


through reliable, factual insight rather than intuition.

• By offering detailed views of companies' financial data, financial


analytics provides the tools for firms to gain deep knowledge of key
trends and take action to improve their performance.
Importance of Financial Analytics
• Today’s businesses require timely information for decision-making purposes

• Every company needs prudent financial planning and forecasting

• The diverse needs of the traditional financial department, and advancements in


technology, all point to the need for financial analytics.

• Financial analytics can help shape up the business’ future goals.


• It can help you improve the decision-making strategies for your

business.

• Financial analytics can help you focus on measuring and managing

your business’ tangible assets such as cash and equipment.

• It provides an in-depth insight into the organization’s financial

status and improves the cash flow, profitability, and business value.
Types of Financial Analysis
Horizontal analysis refers to the side-by-side comparison of an

organization's financial performance for consecutive reporting

periods.

The aim is to determine major shifts in the data. Later, this

information could be applied to a more detailed analysis of

financial results.
Vertical analysis pertains to the proportional analysis of a financial
statement.

Each line item on a financial statement is listed as a percentage

of another item –

For example, every line item on an income statement is provided

as a percentage of gross sales, while every line item on a balance

sheet is given as a percentage of total assets


• Short-term Analysis provides a detailed review of working capital, involving the
calculation of turnover rates for accounts receivable, inventory and accounts payable.
• Any differences from the long-term average turnover rate should be studied further because
working capital is a significant user of cash.

• Multi-Company Comparison entails tallying and comparing major financial ratios


of two organizations, usually in the same industry sector.
• The aim is to determine the companies' relative financial strengths and weaknesses.

• Industry Comparison contrasts the results of a specific business and the average
results of an entire industry.
• The purpose is to determine any unusual results in comparison to the industry average.
Key types of Financial Analytics • Examining financial and other relevant information,
financial analytics offers various views of companies'
past, present and future performance.

Predictive sales Product


Client Profitability Cash-flow Value-Driven Shareholder value
analytics profitability
Analytics analytics Analytics analytics,
analytics

These employ
s real-time This which is
indicators, used to tally
It include the This entails including the This assesses the value of a
It helps
use of assessing working a business' company by
differentiate
correlation each product capital ratio value drivers, examining the
between
analysis or individually, and cash or the key returns it
clients who
past trends to rather than conversion "levers" the provides to
make money
forecast establishing cycle, and organization shareholders,
for a company
corporate profitability may include needs to pull is used
and those who
sales. overall at a tools such as to achieve its concurrently
don't.
company. regression goals. with profit
analysis to and revenue
predict cash analytics.
flow.
Various Financial Models in Realtime
Three Statement Model

Discounted Cash Flow (DCF) Model

Merger Model (M&A)

Initial Public Offering (IPO) Model

Leveraged Buyout (LBO) Model

Sum of the Parts Model

Consolidation Model

Budget Model

Forecasting Model

Option Pricing Model


Three Statement Model:
 Three Statement Model: The 3-Statement Model is the most basic setup for financial

modeling

 Income statement

 Balance sheet, and

 Cash flow are all dynamically linked with formulas in Excel.

The objective is to set it up so all the accounts are connected and a set of assumptions can

drive changes in the entire model.

It’s important to know how to link the 3 financial statements, which requires a solid

foundation of accounting, finance, and Excel skills.


Discounted Cash Flow (DCF) Model The DCF
 Discounted Cash Flow (DCF) Model The DCF model builds on the 3-Statement model to

value a company based on the Net Present Value (NPV) of the business’ future cash flow.

 The DCF model takes the cash flows from the 3-statement model

 It makes some adjustments where necessary, and

 This uses the XNPV function in Excel to discount them back to today at the company’s

Weighted Average Cost of Capital (WACC).

 These types of financial models are used in equity research and other areas of the capital

markets.
Merger Model (M&A)
 Merger Model (M&A)
The M&A model is a more advanced model used to evaluate the pro
forma accretion/dilution of a merger or acquisition.
It’s common to use a single tab model for each company, where the
consolidation of Company A + Company B = Merged Co.
The level of complexity can vary widely. This model is most
commonly used in investment banking and/or corporate
development.
Initial Public Offering (IPO) Model
Initial Public Offering (IPO) Model

 Investment bankers and corporate development professionals also build IPO models in Excel to value their
business in advance of going public.

 These models involve looking at comparable company analysis in conjunction with an assumption about how
much investors would be willing to pay for the company in question.

 The valuation in an IPO model includes “an IPO discount” to ensure the stock trades well in the secondary
market.
Leveraged Buyout (LBO) Model
 Leveraged Buyout (LBO) Model
 A leveraged buyout transaction typically requires modeling complicated debt schedules and
is an advanced form of financial modeling.
 An LBO is often one of the most detailed and challenging of all types of financial models,
as the many layers of financing create circular references and require cash flow waterfalls.
 These types of models are not very common outside of private equity or investment
banking.
Consolidation Model
 Consolidation Model
 This type of model includes multiple business units added into one single model.
 Typically, each business unit has its own tab, with a consolidation tab that simply sums up the other business
units.
 This is like a Sum of the Parts exercise where Division A and Division B are added together and a new,
consolidated worksheet is created
Budget Model ( Financial Planning Model
 Budget Model

 This is used to model finance for professionals in financial planning & analysis

(FP&A) to get the budget together for the coming year(s).

 Budget models are typically designed to be based on monthly or quarterly figures

and focus heavily on the income statement.


Forecasting Model
This type is also used in financial planning and analysis (FP&A) to build a
forecast that compares to the budget model.
Sometimes the budget and forecast models are one combined workbook and
sometimes they are totally separate.
• Option Pricing Model
• The two main types of option pricing models are
1. Binomial tree and
2. Black-Scholes.

• These models are based purely on mathematical formulas rather than


subjective criteria and, therefore, are a straightforward calculator built into
Excel.
Binomial tree
Role of Financial Analytic manager
Gather Data

Generate Organize
Reports Information

Make Role of Financial


Analyze Result
Presentations Analyst

Build Models Make Forecast

Develop
Recommendati
ons
HR Analytics
What is HR Analytics?

• Analytics is based on data.

• HR analytics is the science of gathering, organizing and analyzing the data related to HR functions.

• like Recruitment, Talent Management, Employee Engagement, Performance and Retention to ensure

better decision making in all these areas.

• By using various types of HR software and technology, HR departments are creating a large amount of

data every day. However, the objective of HR analytics is to actually make sense of this data and

turn it into a valuable insight.


Importance of HR Analytics
Improved Hiring Decision
Reduced Talent Scarcity
Process Improvement
Good Training
Better Insights
Attrition
Improved Employee Experience
More Productive Workforce
Transform role of HR as Strategic Partner
Identify Best Performing Talent
Improves HR Performance
Predict demand & Skills
The Need of HR Analytics in Organization

1. Increased need for data and analytics tool in HR to make better HR decisions

2. Better Quality of Hire is one of the HR data analytics benefits

3. A vital benefit of HR metrics and analytics is Employee Retention

4. Transformation of HR as a strategic partner is one of the benefits of


Workforce analytics

5. Business analytics in HR can help predict the hiring needs of an organization


Increased need for data and analytics tool in HR to make better HR decisions

• An important role of HR analytics is to provide access to critical data and


insights about the workforce

• Which can be then analysed for making better decisions.

• Not only does it improve the HR performance but also provides a better
understanding of what motivates employees to work productively, and how do
the organizational culture affects employee performance.
• Better Quality of Hire is one of the HR data analytics benefits
• Running machine learning algorithms on jobseeker’s data allows companies to identify
the best matching talent for a vacant position, thus improving the quality of hire.

• A vital benefit of HR metrics and analytics is Employee Retention


• Similarly, using employee data, recruiters can recognize a pattern of high performing
employees and accordingly modify their employee hiring and retention strategy.

• HR analytics helps identify the departments suffering from the maximum attrition and
the reasons causing it.

• It can also help HR in identifying the activities which have the maximum impact on
employee engagement and thus allow organizations to invest in such activities.
• Transformation of HR as a strategic partner is one of the benefits of Workforce
analytics
• The application of HR analytics can provide a unique advantage point to HR department to
validate its importance and its role as a strategic partner in a business’ performance.
• HR professionals can provide business leaders with verifiable data to back their talent hiring,
retention and engagement policies.

• Business analytics in HR can help predict the hiring needs of an organization


• HR analytics can help predict the changes that may be in the organization’s future. Using HR
analytics, one can predict the skills and positions which are needed to improve business
performance.
• HR analytics can play a role in ensuring better HR performance and improving business
performance an a whole.
HR Models
•The Standard Causal Model of HRM
•The 8-box model by Paul Boselie
•The HR value chain
•The HR Value Chain Advanced
•The Harvard Framework for HR
The Standard Causal Model of HRM
The best-known HR model is the Standard Causal Model of HRM.
The model is derived from many similar models published throughout the 90’s and early 2000’s.
The model shows a causal chain that starts with the business strategy and ends, through the HR
processes, with (improved) financial performance.
The 8-box model by Paul Boselie :
A different HR model that’s often used to model what we do in HR, is the 8-box model by Paul
Boselie.
The 8-box model shows different external and internal factors that influence the effectiveness of
what we do in HR.
The HR value chain :
The HR value chain is one of the best-known models in HR. It is based on the work of Paauwe
and Richardson (1997) and creates a nuance on the models above in regards to how HR
operates.
According to the HR value chain, everything we do (and measure) in HR can be divided into
two categories: HRM activities and HRM outcomes.
The HR Value Chain Advanced
The Harvard Framework for HRM
Marketing Analytics

• Marketing analytics is the process of identifying meaningful patterns in

data to inform marketing decisions.

• In simpler terms, marketing analytics helps you make data-informed

decisions to optimize your marketing spend on activities with the

highest impact.
The Need of Marketing Analytics
• Which audience segment(s) should we target?

• Which marketing channels should we use (to reach each segment)?

• What messaging, creatives, and copy should we use (to convince each segment to
buy from us)?

• How much money are we currently spending on acquiring new customers? And
how much money can we afford to spend on acquiring new customers?

• What’s our return on marketing investment across different channels?


Importance of Marketing Analytics

1. Understanding the customer and market trends is really important in today’s time.

2. Allows understanding of big picture trends that too by focusing on every single

detail.

3. With the aid of providing you with a clear picture of the efforts and the returns.

4. It allows you to easily depict that which programs worked and also depicts the

reasons why it failed or even succeeded.


1. The market study is also an important part of the business. Marketing

analytics allow monitoring of trends over time.

2. Marketing analytics allows understanding the return on investment by

providing a clear picture of the working and the reports of each programme.

3. By easily helping to study the market trends, marketing analytics facilitates to

proficiently forecast future results.


Supply Chain Analytics
According to Capgemini Analytics, “Supply Chain Analytics brings data-

driven intelligence to your business, reducing the overall cost to serve and

improving service levels.

Supply chains typically Generate Massive Amounts Of Data. Supply

chain analytics helps to make sense of all this data uncovering

patterns and generating insights.


This model is fixed in nature

Iron / Vendor
Mining Sales
Ore s
Evolution of Supply Chain Analytics

Iron-ore and Ford Customers Fixed


1900
Mining

1950 Vendors Toyota Customers Flexible

1960 Customers Dell Vendors Company Complete


Supply Chain Management Process
Data is used extensively in Supply Chain for following
Analytics
Planning, Product Launches to Replenishment planning
in
Supply Scheduling of resources and assets

Chain Landed Costing , Transportation Analysis


Demand Planning
Fulfillment Process Analysis
Vendor Analysis
Purchase Order Analysis
SKU (stock keeping unit ) Rationalization
Supply Chain Network Design
Facility Design, Simulation and Layout Planning
Unit V
Machine Learning is…

Machine learning, a branch of artificial intelligence, concerns

the construction and study of systems that can learn from

data.
Machine Learning is…
Machine learning is a programming of computers to optimize the performance of
criterion using example data or past experience.
-- Ethem Alpaydin

The goal of machine learning is to develop methods that can automatically detect
patterns in data, and then to use the uncovered patterns to predict future data or other
outcomes of interest.
-- Kevin P. Murphy

The field of pattern recognition is concerned with the automatic discovery of


regularities in data through the use of computer algorithms and with the use of these
regularities to take actions.
-- Christopher M. Bishop
Machine Learning is…
Machine learning is about predicting the future based on
the past.
-- Hal Daume III
Machine Learning is…
Machine learning is about predicting the future based on the past.
-- Hal Daume III

past future

Training model/ Testing model/


Data predictor Data predictor
Machine Learning
1. Data mining: machine learning applied to “databases”, i.e. collections
of data

2. Inference and/or estimation in statistics

3. Pattern recognition in engineering

4. Signal processing in electrical engineering

5. Induction

6. Optimization
Goals of the course: Learn about…
Different machine learning problems

Common techniques/tools used

• theoretical understanding
• practical implementation

Proper experimentation and evaluation

Dealing with large (huge) data sets

• Parallelization frameworks
• Programming tools
Data
examples

Data
Data
examples

Data
Data
examples

Data
Data
examples

Data
Supervised learning
examples

label
label1

label3
labeled examples

label4

label5

Supervised learning: given labeled examples


Supervised learning

label
label1
model/
label3 predictor

label4

label5

Supervised learning: given labeled examples


Supervised learning

model/ predicted label


predictor

Supervised learning: learn to predict new example


Supervised learning: classification
label
apple

apple
Classification: a finite set of
labels
banana

banana

Supervised learning: given labeled examples


Classification Example

Differentiate between

low-risk and high-risk customers

from their income and savings


Classification Applications
Face recognition

Character recognition

Spam detection

Medical diagnosis: From symptoms to illnesses

Biometrics: Recognition/authentication using physical and/or behavioral


characteristics: Face, iris, signature, etc
Supervised learning: regression
label

-4.5

10.1 Regression: label is real-valued

3.2

4.3

Supervised learning: given labeled examples


Regression Example

Price of a used car

y = wx+w0
x : car attributes
(e.g. mileage)
y : price

232
Regression Applications
Economics/Finance: predict the value of a stock

Epidemiology

Car/plane navigation: angle of the steering wheel, acceleration, …

Temporal trends: weather over time


Supervised learning: ranking

label
1

Ranking: label is a ranking


4

Supervised learning: given labeled examples


Ranking example

Given a query and


a set of web pages,
rank them according
to relevance
Ranking Applications
User preference, e.g. Netflix “My List” -- movie queue
ranking

iTunes

flight search (search in general)

reranking N-best output lists


Unsupervised learning

Unupervised learning: given data, i.e. examples, but no labels


Unsupervised learning applications
learn clusters/groups without any label

customer segmentation (i.e. grouping)

image compression

bioinformatics: learn motifs


Reinforcement learning
left, right, straight, left, left, left, straight GOOD

left, straight, straight, left, right, straight, straight BAD

left, right, straight, left, left, left, straight 18.5

left, straight, straight, left, right, straight, straight -3

Given a sequence of examples/states and a reward after


completing that sequence, learn to predict the action to take
in for an individual example/state
Reinforcement learning example
Backgammon

… WIN!

… LOSE!

Given sequences of moves and whether or not the


player won at the end, learn to make good moves
Reinforcement learning example

http://www.youtube.com/watch?v=VCdxqn0fcnE
Other learning variations
What data is available:
• Supervised, unsupervised, reinforcement learning
• semi-supervised, active learning, …

How are we getting the data:


• online vs. offline learning

Type of model:
• generative vs. discriminative
• parametric vs. non-parametric
Introduction to Machine Learning
• Machine Learning is a broad field, but it is classified into
• Three classes of
• Supervised
• unsupervised and
• Reinforcement learning.
Support Vector
Machines
Discriminating Analysis

Classification Navie Bayes

Nearest Neighbor

Neural Networks
Supervised Learning (GLM) Generalized
Linear Model
SVR ( Support Vector
Regression )
GPR ( Gausian Process
Regression Regression )
ML
Ensemble Methods
K-Mean
Decision Trees
Hierarchical
Neural Networks
Un Supervised Gaussian Mixture

Hidden Markov Model

Neural Networks
UNSUPERVISED LEARNING

Unsupervised Learning is a
machine learning technique in
which the users do not need to
supervise the model. Instead, it
allows the model to work on its
own to discover patterns and
information that was previously
undetected. It mainly deals with
the unlabelled data.
SUPERVISED vs UNSUPERVISED

Supervised Learning - The presence of outcome variable is available to


guide the learning process.

● There must be a training dataset in which the target variable is already


known
● Algorithms are trained against known labels

Unsupervised Learning - The target labels are not known.

● We have to cluster the data to reveal meaningful partitions and hierarchy


● Algorithms are trained against known labels
WHY UNSUPERVISED LEARNING ?

● Unsupervised machine learning finds all kind of unknown patterns in data.


● Unsupervised methods help you to find features which can be useful for
categorization.
● It is taken place in real time, so all the input data to be analyzed and
labelled in the presence of learners.
● It is easier to get unlabelled data from a computer than labelled data,
which needs manual intervention.
● Customer Segmentation

● Document Segmentation

APPLICATIONS ● Image Segmentations

● Anomaly Detection

● Pattern Recognition.
TYPES OF UNSUPERVISED LEARNING

Hierarchical Clustering K-Means Clustering Principal Component


Analysis(PCA)

Association Rules Singular Value


Decomposition (SVD)
CLUSTERING
The method of identifying similar groups of
data in a data set is called clustering.

It deals with finding structures in a collection


of unlabelled dataset.

We look at the data and then try to club


similar observations and form different
groups.
TYPES OF CLUSTERING
● Hierarchical Clustering : - It builds the hierarchy of clusters. Initially the individuals data
points are assigned to their own clusters and then we group the close clusters into one
clusters and keep doing it until we have one big cluster.
● K-Means clustering :- In K-means clustering first we select the desired number of
clusters(K) then we divide our clusters into K-groups.
● Dimensionality reduction techniques (PCA) :- Sometimes our variables in dataset
becomes very large and it causes problems like high training time, less accurate models and
high computational complexities. We use dimensionality reduction techniques to reduce the
dimensionality of the dataset by transforming large set of variables into smaller ones that
still contains most of the information.
PROPERTIES OF CLUSTERS
● All the data points in clusters should
be similar to each other.

● All the data points from different


clusters should be as different as
possible.
ASSOCIATION RULES
• Association rules analysis is a technique to uncover how items are associated to each other

• Association Rules work on the basis of if/then statements. These statements help to reveal

associations between independent data in a database, relational database or other

information repositories. These rules are used to identify the relationships between the

objects which are usually used together.

• Association rules allow us to establish association amongst data objects. It help us to

discover interesting relationships between variables. For example, people who has bought

bread are most likely to buy butter as well.


Disadvantages of Unsupervised Learning

● You cannot get precise information regarding data sorting, and the output as data used

in unsupervised learning is labelled and not known

● Less accuracy of the results is because the input data is not known and not labelled by

people in advance. This means that the machine requires to do this itself.

● The user needs to spend time interpreting and label the classes which follow that

classification.

You might also like