0% found this document useful (0 votes)
5 views4 pages

Introduction To R2

The document is an introduction to using R for data science, covering package installation, loading libraries, and reading data from CSV and Excel files. It includes exercises for practical application and assignments to explore datasets and perform data manipulation. The learning objectives emphasize understanding package management and data handling in R.

Uploaded by

jeremybravo4229
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

Introduction To R2

The document is an introduction to using R for data science, covering package installation, loading libraries, and reading data from CSV and Excel files. It includes exercises for practical application and assignments to explore datasets and perform data manipulation. The learning objectives emphasize understanding package management and data handling in R.

Uploaded by

jeremybravo4229
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

SDS3102: Introduction to Data Science

Introduction to R

Kalekye

2025-05-3

Contents
Package Installation from CRAN 1

Loading Libraries 1

Finding functions specific to a package 2


Other way to install and load packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Reading data to R 2
CSV files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Excel files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
Viewing data in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Assignment 3
Learning Objectives
• Explain different ways to install external R packages
• Demonstrate how to load a library and how to find functions specific to a package
• Read data in Excel and CSV format to R

Package Installation from CRAN


CRAN - a repository/storage where the latest downloads of R (and legacy versions) are found in addition to
source code for thousands of different user contributed R packages
• Packages for R can be installed from the CRAN package repository using the install.packages function.
• Note that quotations are required here
install.packages("package name")

install.packages("readxl")

• some packages are a part of other packages, eg. ggplot2 is a part of tidyverse, designed like this to make
common data science operations more user-friendly
Exercise: - Install package tidyverse

Loading Libraries
• Once you have the package installed, you can load the library into your R session for use.

1
• Any of the functions that are specific to that package will be available for you to use by simply calling
the function as you would for any of the base functions.
• Note that quotations are not required here.
library(package name)

library (readxl)

• We only need to install a package once on our computer.


• However, to use the package, we need to load the library every time we start a new R/RStudio
environment.
You can think of this as installing a bulb versus turning on the light
Exercise: Load package tidyverse

Finding functions specific to a package


• This is your first time using ggplot2, how do you know where to start and what functions are available
to you?
• One way to do this, is by using the Package tab in RStudio.
• If you click on the tab, you will see listed all packages that you have installed.
Scroll down on packages tab to ggplot2 in your list

• For those libraries that you have loaded, you will see a blue checkmark in the box next to it.
• You can also use goggle to search for the package you require by searching the activity you want to do
what package can I use to plot in R?

Other way to install and load packages


• Install packages and load them automatically
pacman::p_load()

Example:
pacman::p_load(
ggplot2,
readxl
)

• Make sure you have loaded the package pacman using install.packages()

Reading data to R
CSV files
• Base packages in R are capable of reading CSV files into R
• No need to download any package for this
• Import a CSV file named “data_file_name.csv” into R
d1 <- read.csv("data_file_name.csv")
d1 <- read_excel("file path containing data")

2
• The variable d1 will be stored as a data frame data type.

Excel files
• Here, R needs packages to read an Excel file
• There are quite a few packages to read data Excel files into R — openxlsx, readxl (more robust),
xlsx etc.
• start by loading the package needed
• Let us use readxl
library(readxl)

• Import an Excel file named “data_file_name.xlsx” into R


d2 <- read_excel("data_file_name.xlsx")
d2 <- read_excel("file path containing data", sheet = "specific sheet if many")

Exercise: Create a sample Excel/CSV file or use any existing files on your computer and try loading it into
R

Viewing data in R
• Type the object name on your script or console, if the dataset is small
d2

• Let’s also look at the first few lines of data by performing the head()
head(d2)

• – Let’s also look at the last few lines of data by performing the tail()
tail(d2)

• Look at the Environment pane in R-studio, click on the d2, R-studio will open a new tab in the
script-editor pane and reveal the object.
• Use str() view the structure of the data
str(d2)

• Use View() to see the data


View(d2)

Assignment
1. Load the Required Packages Use either library() or pacman::p_load() to load the following packages:
• dplyr (for data manipulation)
2. Download data from the links provided and load it in R
https://data.cdc.gov/api/views/qvzb-qs6p/rows.csv?accessType=DOWNLOAD

https://data.humdata.org/dataset/7b9c2851-
dc37-4a88-9dcb-62e55eb91baf/resource/df6bfc55-3b25-4309-a1b4-74afba434956/download
/kenya-health-facilities-2017_08_02.xlsx

3. Explore the Data


• Display the first 6 rows of each dataset

3
• Display the last 6 rows of each dataset
• Check the structure of both datasets
• Summarize the numerical variables
4. Merge the two datasets by region using hint left_join(), then create a new dataset hint: use write_csv
5. Count the number of health facilities per region
6. Detect missing data in both datasets
Happy Week!
HAPPY WEEKEND

You might also like