Lect01 2
Lect01 2
BTBI30081
統計應用方法
2025/2/19
1
Data analysis process
1. Data import (tidying data)
2. Data transformation (data manipulation)
3. Data visualization
4. Modeling
2
Bottlenecks in data analysis
One of the most time-consuming aspects of the
data analysis process is “data wrangling” or “data
munging”
Import, clean and transform messy data into a format
that is useful for data visualization and modeling
Refer to the first two steps in the data analysis process
3
Package: tidyverse
The tidyverse is a collection of R packages designed
for data science
The core tidyverse includes the packages
5
Data import
The first step in data analysis is importing the data
into the R environment
6
The are several function in the base package
available for reading data
read.table – sep=“” (white space)
read.csv – sep=“,” (comma)
read.delim – sep=“\t” (Tab)
These functions are identical except for the “field
separator character” are different.
If it does not contain an absolute path, the file name is
relative to the current working directory, getwd().
7
Example
We took a poll of our students to obtain (self-
reported) height and gender. Our task is to
describe this list of heights.
8
Different ways to import data into R
9
Data types
dat <- read.csv(filename)
We make assignments in R: “<-”
We put the content of what comes out of read.csv into
an object “dat”
The data type of dat is “data.frame”– one the most
widely used data types in R
10
Tidy data type: tibble
tibble (or tbl_df) is a modern reimagining of the
data.frame, keeping what time has proven to be
effective, and throwing out what is not
In tidyverse, all functions adopt and produce tibble–
one of the unifying features of the tidyverse
Creating tibble: RMD_example 01-2.2
11
Data import with readr
We can use the functions in readr package in
tidyverse to import data, which will create tibble
data type
read_csv
read_tsc
read_delim
RMD_example 01-2.2
12
Data manipulation with base
functions
Extracting columns, Quick review of vectors,
Coercion
RMD_example 01-2.3
13
Data manipulation with dplyr
Important dplyr functions to remember
15
16
17
18
More data transformation with dplyr
https://rstudio.github.io/cheatsheets/html/data-
transformation.html
19