0% found this document useful (0 votes)
52 views26 pages

Dr. Gaurav Dixit: Department of Management Studies

The document provides an introduction to business analytics, including definitions and classifications of descriptive, predictive, and prescriptive analytics. It discusses data mining and where it is used, covering common business questions and goals. Key terms like algorithms, models, variables, and data types are also introduced.

Uploaded by

Shubham Dixit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views26 pages

Dr. Gaurav Dixit: Department of Management Studies

The document provides an introduction to business analytics, including definitions and classifications of descriptive, predictive, and prescriptive analytics. It discusses data mining and where it is used, covering common business questions and goals. Key terms like algorithms, models, variables, and data types are also introduced.

Uploaded by

Shubham Dixit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

INTRODUCTION

LECTURE 01
DR. GAURAV DIXIT
DEPARTMENT OF MANAGEMENT STUDIES

1
INTRODUCTION

• What is Business Analytics?


– “Business analytics is comprised of solutions used to build analysis
models and simulations to create scenarios, understand realities and
predict future states.” - Gartner IT Glossary
– Includes data mining, predictive analytics, applied analytics and
statistics, and is delivered as an application suitable for a business user
• Analytics can be classified as:
– Descriptive analytics
– Predictive analytics
– Prescriptive analytics

2
INTRODUCTION

• Descriptive analytics involve gathering, organizing, tabulating,


and depicting data and then describing the characteristics of
what you are studying.
– Also called reporting in managerial lingo
– First phase of analytics
– Though useful, it doesn’t inform you about why the results happen or
what can happen in future.

3
INTRODUCTION

• Predictive analytics use the past to predict the future.


– Identify associations among different variables and predict the
likelihood of a phenomenon reoccurring on the basis of those
relationships
• Correlation vs. Causation
• Prescriptive analytics suggest a course of action.
– Recommends decisions entailing mathematical and computational
models
– Final phase of analytics

4
INTRODUCTION

• Methods from statistics, forecasting, data mining,


experimental design are used in Business Analytics
• What is Data Mining?
– “Extracting useful information from large datasets” - Hand et al. 2001
– “The process of discovering meaningful correlations, patterns and
trends by sifting through large amounts of data stored in repositories.
Data mining employs pattern recognition technologies, as well as
statistical and mathematical techniques.” - Gartner IT Glossary

5
INTRODUCTION

• Where is Data Mining Used?


– Variety of domains:
• Predicting the response of a drug or a medical treatment on the patient
suffering from a serious disease
• Predicting whether an intercepted communication is about a potential
terror attack
• Predicting whether a packet of network data can pose a cybersecurity
threat

6
INTRODUCTION

• Where is Data Mining Used?


– Common business questions:
• Which customers are most likely to respond to the marketing or
promotional offer?
• Which customers are most likely to default on loan?
• Which customers are most likely to subscribe to a magazine?

7
INTRODUCTION

• Data Mining Genesis


– An interdisciplinary subfield of computer science
– Originates from the fields of machine learning and statistics
– Data mining as “statistics at scale and speed” – Pregibon 1999
– Extension: “statistics at scale, speed, and simplicity” – Shmueli et al.
2010

8
INTRODUCTION

Classical Statistical Setting Data Mining Paradigm


• Data scarcity and Computational • Large datasets and fast computing
difficulty powers
• Same sample is used to compute • Fitting a model with one sample
an estimate and to check its and evaluating the performance
reliability using another sample
• Logic of inference: confidence • Machine learning techniques, such
intervals and hypothesis tests as trees and neural networks are
(Inference is determining less structured and more
computationally intensive in
whether a pattern or result might
comparison to statistical techniques
have happened by chance)
9
INTRODUCTION

• Rapid Growth of Data


– Millions of transactions on a daily basis
• Organized retailers such as Shoppers Stop, Big Bazaar, and Pantaloons
• E-commerce retailers such as Flipkart, Amazon, and Snapdeal
– Growing economy and Internet growth
– Decreasing cost and increasing availability of automatic data capture
mechanisms, e.g., Bar codes, POS devices, click-stream data, GPS data
– Operational databases to data warehouse and data marts
– Constant declining cost of data storage and improving processing
capabilities

10
INTRODUCTION

• Core of this course focuses on


– Predictive Analytics consisting of tasks of
• Prediction,
• Classification,
• Association rules

• In Data mining, typically several different methods are applied


for a particular goal and the most useful is selected

11
INTRODUCTION

• Usefulness of a method
– Goal of the analysis
– Underlying assumptions of the method
– Size of the dataset
– Types of pattern in the dataset
• Dataset Example: Sedan Car owner
– Goal: Income level and Household Area is used to classify whether a
household owns a sedan car

12
INTRODUCTION

• Dataset format
– Tabular or matrix format: variables in columns and observations in
rows
– Each row represents a household (unit of analysis) in SedanCar dataset
• R and RStudio
– R is a programming language and software environment for statistical
computing and graphics.
– It is widely used by statisticians and data miners
– RStudio is the most commonly used integrated development
environment (IDE) for R.

13
INTRODUCTION

• Key Terms
– Algorithm
• A specific sequence of actions or set of rules to be followed to perform a task.
• Algorithms are used to implement data mining techniques such as trees, neural
networks etc.

– Model
• By model, we mean data mining model here
• A data mining model is an application of a data mining technique on dataset

14
INTRODUCTION

• Key Terms
– Variable
• Operationalized way of representing a characteristic of an object, event, or
phenomenon
• A variable can take different values in different situations.

– Input variable, Independent variable, Feature, Field, Attribute, or


Predictor
• Input variable is an input to the model

15
INTRODUCTION

• Key Terms
– Output variable, Outcome variable , Dependent variable, Target
variable, or Response
• Output variable is an output of the model

– Record, observation, case, row


• Observation is the unit of analysis on which the variable measurements are taken
such as a customer, a household, an organization, an industry etc.

16
INTRODUCTION

• In Data Mining and related domains, generally two types of


variables are used:
– Categorical
• Nominal
• Ordinal
– Continuous
• Interval
• Ratio

17
INTRODUCTION

• Understanding the type of variables in a dataset is important


– To identify an appropriate statistical or data mining technique
– Proper interpretation of the data analysis results
• Data of these variable types are either quantitative or
qualitative in nature
– Quantitative data measure numeric values and are expressed in
number
– Qualitative data measure types and are expressed by a label, or a
numeric code

18
INTRODUCTION

• Structure of these variable types increases from nominal to


ratio in a hierarchical fashion

• Nominal
– Values indicate distinct types, e.g., gender, nationality, religion, PIN
code, employee ID
– Only two operations = and ≠ are supported

19
INTRODUCTION

• Ordinal
– Values indicate a natural order or sequence, e.g., academic grades,
Likert scale, quality of a food item
– Four additional operations <, ≤, >, ≥ are supported
• Interval
– Difference between two values is also meaningful
– Values may be in reference to a somewhat arbitrary zero point
– Celsius temperature, Fahrenheit temperature, location variables:
Distance from landmarks, geographical coordinates (latitude &
longitude), calendar dates

20
INTRODUCTION

• Interval
– Two additional operations +, - are supported

• Ratio
– Ratio of two values is also meaningful. Values are in reference to an
absolute zero point
– Kelvin temperature, age, length, weight, height, income
– Two additional operations ×, ÷ are supported

21
INTRODUCTION

• Conversion from one variable type to other


– High structure variable type can be converted into low structure
variable type
– For example, a ratio variable ‘age’ can be converted into an ordinal
variable ‘age group’

22
INTRODUCTION

• Course Roadmap
– Module I: General Overview of Data Mining and its Components
– Module II: Data Preparation and Exploration
– Module III: Performance Metrics and Assessment
– Module IV: Supervised Learning Methods
– Module V: Unsupervised Learning Methods
– Module VI: Time Series Forecasting
– Module VII: Conclusion

23
INTRODUCTION

• Supplementary Lectures

– Introduction to R

– Basic Statistical Methods

24
Key References

• HBR Video (Business Analytics Defined by Thomas H.


Davenport)
• Gartner IT Glossary
• Data Mining for Business Intelligence: Concepts, Techniques,
and Applications in Microsoft Office Excel with XLMiner by
Shmueli, G., Patel, N. R., & Bruce, P. C. (2010)

25
Thanks…

26

You might also like