Dr. Gaurav Dixit: Department of Management Studies
Dr. Gaurav Dixit: Department of Management Studies
LECTURE 01
DR. GAURAV DIXIT
DEPARTMENT OF MANAGEMENT STUDIES
1
INTRODUCTION
2
INTRODUCTION
3
INTRODUCTION
4
INTRODUCTION
5
INTRODUCTION
6
INTRODUCTION
7
INTRODUCTION
8
INTRODUCTION
10
INTRODUCTION
11
INTRODUCTION
• Usefulness of a method
– Goal of the analysis
– Underlying assumptions of the method
– Size of the dataset
– Types of pattern in the dataset
• Dataset Example: Sedan Car owner
– Goal: Income level and Household Area is used to classify whether a
household owns a sedan car
12
INTRODUCTION
• Dataset format
– Tabular or matrix format: variables in columns and observations in
rows
– Each row represents a household (unit of analysis) in SedanCar dataset
• R and RStudio
– R is a programming language and software environment for statistical
computing and graphics.
– It is widely used by statisticians and data miners
– RStudio is the most commonly used integrated development
environment (IDE) for R.
13
INTRODUCTION
• Key Terms
– Algorithm
• A specific sequence of actions or set of rules to be followed to perform a task.
• Algorithms are used to implement data mining techniques such as trees, neural
networks etc.
– Model
• By model, we mean data mining model here
• A data mining model is an application of a data mining technique on dataset
14
INTRODUCTION
• Key Terms
– Variable
• Operationalized way of representing a characteristic of an object, event, or
phenomenon
• A variable can take different values in different situations.
15
INTRODUCTION
• Key Terms
– Output variable, Outcome variable , Dependent variable, Target
variable, or Response
• Output variable is an output of the model
16
INTRODUCTION
17
INTRODUCTION
18
INTRODUCTION
• Nominal
– Values indicate distinct types, e.g., gender, nationality, religion, PIN
code, employee ID
– Only two operations = and ≠ are supported
19
INTRODUCTION
• Ordinal
– Values indicate a natural order or sequence, e.g., academic grades,
Likert scale, quality of a food item
– Four additional operations <, ≤, >, ≥ are supported
• Interval
– Difference between two values is also meaningful
– Values may be in reference to a somewhat arbitrary zero point
– Celsius temperature, Fahrenheit temperature, location variables:
Distance from landmarks, geographical coordinates (latitude &
longitude), calendar dates
20
INTRODUCTION
• Interval
– Two additional operations +, - are supported
• Ratio
– Ratio of two values is also meaningful. Values are in reference to an
absolute zero point
– Kelvin temperature, age, length, weight, height, income
– Two additional operations ×, ÷ are supported
21
INTRODUCTION
22
INTRODUCTION
• Course Roadmap
– Module I: General Overview of Data Mining and its Components
– Module II: Data Preparation and Exploration
– Module III: Performance Metrics and Assessment
– Module IV: Supervised Learning Methods
– Module V: Unsupervised Learning Methods
– Module VI: Time Series Forecasting
– Module VII: Conclusion
23
INTRODUCTION
• Supplementary Lectures
– Introduction to R
24
Key References
25
Thanks…
26