0% found this document useful (0 votes)
7 views19 pages

Lect01 2

The document outlines the data analysis process, emphasizing the importance of data wrangling, which includes data import and transformation. It introduces the tidyverse, a collection of R packages designed for data science, and highlights key functions for data manipulation. Additionally, it provides examples of importing data into R and using dplyr for data manipulation tasks.

Uploaded by

bear.c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views19 pages

Lect01 2

The document outlines the data analysis process, emphasizing the importance of data wrangling, which includes data import and transformation. It introduces the tidyverse, a collection of R packages designed for data science, and highlights key functions for data manipulation. Additionally, it provides examples of importing data into R and using dplyr for data manipulation tasks.

Uploaded by

bear.c
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Lecture 1-2: Data wrangling

BTBI30081
統計應用方法

2025/2/19

1
Data analysis process
1. Data import (tidying data)
2. Data transformation (data manipulation)
3. Data visualization
4. Modeling

Each of these steps need their own tools and software


to complete

2
Bottlenecks in data analysis
 One of the most time-consuming aspects of the
data analysis process is “data wrangling” or “data
munging”
 Import, clean and transform messy data into a format
that is useful for data visualization and modeling
 Refer to the first two steps in the data analysis process

3
Package: tidyverse
 The tidyverse is a collection of R packages designed
for data science
 The core tidyverse includes the packages

tibble simple data frames


readr read rectangular text data
dplyr a grammar of data manipulation
tidyr easily tidy data
ggplot2 grammar of graphics
purrr functional programming tools
4
 tibble, readr, tidyr, dplyr in tidyverse are for data
wrangling

5
Data import
 The first step in data analysis is importing the data
into the R environment

6
 The are several function in the base package
available for reading data
 read.table – sep=“” (white space)
 read.csv – sep=“,” (comma)
 read.delim – sep=“\t” (Tab)
 These functions are identical except for the “field
separator character” are different.
 If it does not contain an absolute path, the file name is
relative to the current working directory, getwd().

7
Example
 We took a poll of our students to obtain (self-
reported) height and gender. Our task is to
describe this list of heights.

8
Different ways to import data into R

 Option 1: Download file with your browser to your


working directory
 Option 2: Read from within R
 Option 3: Download from within R
 RMD_example 01-2.1

9
Data types
 dat <- read.csv(filename)
 We make assignments in R: “<-”
 We put the content of what comes out of read.csv into
an object “dat”
 The data type of dat is “data.frame”– one the most
widely used data types in R

10
Tidy data type: tibble
 tibble (or tbl_df) is a modern reimagining of the
data.frame, keeping what time has proven to be
effective, and throwing out what is not
 In tidyverse, all functions adopt and produce tibble–
one of the unifying features of the tidyverse
 Creating tibble: RMD_example 01-2.2

11
Data import with readr
 We can use the functions in readr package in
tidyverse to import data, which will create tibble
data type
 read_csv
 read_tsc
 read_delim
 RMD_example 01-2.2

12
Data manipulation with base
functions
 Extracting columns, Quick review of vectors,
Coercion
 RMD_example 01-2.3

13
Data manipulation with dplyr
 Important dplyr functions to remember

select() select columns


mutate() create new columns
filter() filter rows
arrange() arrange or re-order rows
group_by() grouping operations
summarise() summarise values
 RMD_example 01-2.4
14
Joining
two
data
frames
in dplyr

15
16
17
18
More data transformation with dplyr

 https://rstudio.github.io/cheatsheets/html/data-
transformation.html

19

You might also like