0% found this document useful (0 votes)
5 views

Tutorial 1 - Answers.

Uploaded by

bhattibaba118
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Tutorial 1 - Answers.

Uploaded by

bhattibaba118
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

ICT583

3 March 2023

ICT583 Data Science Applications


Tutorial 1

1. Introduction to RStudio
https://education.rstudio.com/learn/beginner/
chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://cran.r-project.org/doc/
contrib/Paradis-rdebuts_en.pdf

RStudio Layout

(Source: https://datacarpentry.org/genomics-r-intro/00-introduction/index.html)

1
ICT583
3 March 2023

Task 1.1
Generate your first R Script file, plot a histogram for a built-in dataset, save and execute code.

# show the names of built-in datasets


data()

# Loading
data(mtcars)
# Print the first 6 rows
head(mtcars, 6)

# show description of the dataset


?mtcars

# you can also input character for the first argument of the function data()
data("mtcars")
# Print the first few rows, six rows by default
head(mtcars)
# Number of rows (observations)
nrow(mtcars)
# Number of columns (variables)
ncol(mtcars)

str(mtcars)
#> try another dataset
data("iris")
head(iris)

2
ICT583
3 March 2023

#> try using iris data to generate a histogram


# import data
data(iris)

# store septal length as object i


i = iris$Sepal.Length

# input i for the argument of the function hist(), and store the result as object h
h <- hist(i)
# show values of the plot
h
# you can also input the septal length for the argument of hist()
hist(iris$Sepal.Length)

# you can specify other arguments of hist()


hist(i, main="my iris", xlab="iris septal length",
xlim=c(3,9), ylim=c(0,35), col="blue", freq=T
)
# if you are unsure about an R function, you can check https://www.rdocumentation.org/ or
https://rdrr.io/
https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/hist
# useful sites
https://www.tutorialspoint.com/r/r_histograms.htm
http://www.sthda.com/english/wiki/ggplot2-histogram-plot-quick-start-guide-r-software-and-
data-visualization
https://www.datacamp.com/tutorial/make-histogram-basic-r

3
ICT583
3 March 2023

Task 1.2:
R Markdown
Alternatively, you can try creating R Markdown file.
https://rmarkdown.rstudio.com/articles_intro.html

It basically does the same, but also generates a report that embed code with text, outputs, etc.,
in HTML or other file type for reporting.

#>
---
title: "tut1.2"
output: html_document
date: "2023-02-27"
---

```{r setup, include=FALSE}


knitr::opts_chunk$set(echo = TRUE)
```

## R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML,
PDF, and MS Word documents. For more details on using R Markdown see
<http://rmarkdown.rstudio.com>.

When you click the **Knit** button a document will be generated that includes both content
as well as the output of any embedded R code chunks within the document. You can embed an
R code chunk like this:

4
ICT583
3 March 2023

```{r carss}

summary(cars)

str(cars)

```

## Including Plots

You can also embed plots, for example:

```{r pressure, echo=T}

plot(pressure)

str(pressure)

```

Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the
R code that generated the plot.

2. What data science applications can R achieve - Explore R shiny gallery!


5
ICT583
3 March 2023

Try the following apps


# Radiant
https://shiny.rstudio.com/gallery/radiant.html
https://github.com/radiant-rstats/radiant
# the diamonds data can be found in the tidyverse package
# install tidyverse
install.packages("tidyverse")
# load tidyverse
library(tidyverse)
# read basic info about diamonds
str(diamonds)

# health spending and life expectancy


https://shiny.rstudio.com/gallery/google-charts.html
https://github.com/rstudio/shiny-examples/tree/main/182-google-charts
https://databank.worldbank.org/

Discussion:
Can you describe their application task?
- What are the aims of the project?
What data information were presented?
- What are the variables, R functions and results?
Can you summarize any new insights after observing the generated results?
- Did the original authors achieve their goals? How accurate was it?

3. Where to find the publicly available datasets for analysis – explore Kaggle and UCI!

6
ICT583
3 March 2023

Visit the following websites which have the most popular data repository:
https://archive.ics.uci.edu/ml/datasets.php
https://www.kaggle.com/

# Choose one data set that you are most interested in.
# after downloading it onto your drive, you can read it by using an R function e.g., read.csv()
# the first argument of read.csv() should be the directory of your file

my_data = read.csv("D:/Users/SK/Downloads/abalone.data", header = F)


# my_data is the object name,
# "D:/Users/SK/Downloads/abalone.data" is the directory,
# F is FALSE, specifying the header argument which is the second argument of read.csv()

# read more about read.csv() and relevant functions


https://www.rdocumentation.org/packages/utils/versions/3.6.2/topics/read.table

# Investigate their first or last few rows of the data frame.


head(my_data)
tail(my_data)

# Understand the dataset variables and their data types.


str(my_data)

You might also like