INF30036 DataTypes Lecture2-1
INF30036 DataTypes Lecture2-1
• Types of data
• Working with data in Rstudio
• Intro to data exploration
2
Part 1
Types of data
Types of Data for Analytics
4
Types of Data for Analytics
Categorical
(qualitative)
Ordinal
Categories within implied order
Data
Numerical
(quantitative)
5
Qualitative
6
Quantitative
Quantitative data
7
Exercise 1
Qualitative or Quantitative?
• Colors of automobiles in a dealer’s showroom.
• Number of seats in movie theaters.
• Classification of patients based on nursing care
needed(complete, partial, or self care)
• Lengths of newborn cats of a certain species.
• Number of complaint letters received by an airline
per month.
8
Quantitative data
9
Discrete data
Discrete
10
Continuous data
Continuous
11
Exercise 2
Discrete or continuous?
13
Nominal data
Nominal data
Ordinal data
15
Exercise 3
Nominal or Ordinal
23
R Data types
• Character
> Strings
> Ex: “Survived” or “3.14”
• Numeric
> Integer/float/double
> Ex: 3.14/3.14L/3+14i
• Factor
> Factor is a class for categorical variable
> Factors have different levels of categories
> Ex: Survived has two levels – “Survived” and “Not Survived”
> Factors can have numeric levels too – Ex: Survived – “0” for Not
Survived and “1” for Survived
• Logical
> True/False
24
Data Structures
• Vector
• List
• Factor
• Matrix
• Data frame
26
R is Vectorized
25
Vector
27
Vectors
28
Exercise - 4
29
List
30
Factor
31
Matrix
32
Matrix
33
Data frames
34
Missing values
35
Coercion
36
Coercion – explicit coercion
37
Reading/Writing Data
There are other file formats that you can read into R but in
this course we will primarily use .csv
• For .xls files
> install.packages(“xlsx”)
> library(xlsx)
> {Variable name} read.xlsx(“filename.xls”) 38
Data wrangling with Dplyr
40
R for Business analytics
• Advantages
> Designed for Statistical Analysis
– Many built-in functions
> Large number of libraries
> Mature open source project
• Disadvantages
> Overhead (Does not scale well to very
large data)
• Use R as a “sandbox” to play with a
sample 22
Part 3
Data exploration (Intro)
Data exploration?
42
Data reduction
Sample Dataset
csv or excel file
43
The fundamental data problem
Program
Program
Incomplete data
data
data
Program data
Program data Program
Program Database data
Database data
data
Program data data
data Program Interface
Program Program
Program Program
Inaccurate data
Temporary Temporary
Database Database
Interface
Inconsistent data
Program
Program
data
data
Program data
Program Program data
Database Program
data
Database data
data
data data
Unobtainable data
data
Program Program
Program Program
44
Data exploration
45
Common exploration tools
• Drawing plots
• Using visualization tools (e.g., Tableau, Cognos)
• Programming in Rstudio / Python
• Rattle package in Rstudio
46
Thank You for your attention