Lecture 2 Introduction
Lecture 2 Introduction
MACHINE LEARNING
Pooja Vashisth
What Is Machine Learning?
Data availability
Timing
Longevity
Different Types of Data
Numeric: True numeric values that allow arithmetic
operations (e.g., price, age)
Interval: Values that allow ordering and subtraction, but do
not allow other arithmetic operations (e.g., date, time)
Ordinal: Values that allow ordering but do not permit
arithmetic (e.g., size measured as small, medium, or large)
Categorical: A finite set of values that cannot be ordered
and allow no arithmetic (e.g., country, product type)
Binary: A set of just two values (e.g., gender)
Textual: Free-form, usually short, text data (e.g., name,
address)
We often reduce this categorization to just two data
types: continuous (encompassing the numeric and
interval types), and categorical (encompassing the
categorical, ordinal, binary, and textual types).
Different Types of Features
The features in an ABT can be of two types:
raw features or
derived features
Aggregate
Flags
Ratios
Mappings
For propensity modeling, there are two key
periods: the observation period, over which
descriptive features are calculated, and the
outcome period, over which the target feature is
calculated.
Legal Issues
Anti-discrimination legislation
Personal data