0% found this document useful (0 votes)

130 views9 pages

Tutorial 2 - Histogram

1. The document discusses making histograms in R by analyzing lung capacity data. It shows how to import data, generate a histogram with default settings, and customize histograms by changing axes, bins, titles, labels, and adding density curves. 2. Key steps include importing data, generating a histogram with the hist() function, and customizing aspects like changing from frequencies to probability density, setting axis limits, varying the number of bins, and adding titles/labels. 3. Advanced customization allows rotating y-axis labels, and overlaying density curves estimated from the data using the lines() and density() functions.

Uploaded by

Anwar Zainuddin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

130 views9 pages

Tutorial 2 - Histogram

Uploaded by

Anwar Zainuddin

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Big Data

Big-Data-Analytics-with-R-and-Hadoop

Trainer: Ts. Dr. Ahmad Anwar Zainuddin

Performing data modelling in R

Data modelling is a machine learning technique to identify the hidden pattern from the historical
dataset, and this pattern will help in future value prediction over the same data. This technique highly
focusses on past user actions and learns their taste. Most of these data modelling techniques have
been adopted by many popular organizations to understand the behaviour of their customers based
on their past transactions. These techniques will analyse data and predict for the customers what they
are looking for. Amazon, Google, Facebook, eBay, LinkedIn, Twitter, and many other organizations are
using data mining for changing the definition applications.

Tutorial 2: Making Histogram in R : www.youtube.com/watch?v=Hj1pgap4UOY

Objective: • A histogram is a visual representation of the distribution of a dataset. As such, the shape
of a histogram is its most evident and informative characteristic: it allows you to easily see where a
relatively large amount of the data is situated and where there is very little data to be found. In other
words, you can see where the middle is in your data distribution, how close the data lie around this
middle and where possible outlier are to be found. Because of all this, histograms are a great way to
get to know your data!

Method:
1. Prior to start the coding, install some packages as below by: Go to Package > Install as
shown in Figure 1.

Figure 1 : Install Packages

2. Go to the Packages section and type the following packages and install them a shown in the
Figure 2 .
i. ggplot2
ii. plyr
iii. Shiny
iv. Rpubs
v. devtools.
Big Data

Figure 2: ggplot2 Installing Package

3. Import the dataset from this link:

http://www.mediafire.com/file/nayf5x3fz208wm8/BigData-with-R-LungCapData.zip/file

Go to File > Import Dataset > From Excel as shown in Figure 3. The dataset of
“LungCapData” is imported and displayed on the screen as shown in Figure 4.

Figure 3: Import Dataset

Big Data

Figure 4: The LungCapData is imported

4. Type the following coding below :

> LungCapData <- read.table(file.choose(), header=T, sep="\t")

Warning message:

In read.table(file.choose(), header = T, sep = "\t") :incomplete final

line found by readTableHeader on 'D:\Big Data Practices\SLIDES\Big
Data with SQL\R\Dataset\LungCapData\LungCapData.xls'

If this warning appears, kindly import the dataset again as shown in 3.

>library(readxl)

> LungCapData <- read_excel("D:/Big Data Practices/SLIDES/Big Data with

SQL/R/Dataset/LungCapData/LungCapData.xls")

# To view, attach and state the dataset

> View(LungCapData)
> attach(LungCapData)
> names(LungCapData)
[1] "LungCap" "Age" "Height" "Smoke" "Gender" "Caesarean"
Big Data

> head(LungCapData)

# A tibble: 6 x 6
LungCap Age Height Smoke Gender Caesarean
<dbl> <dbl> <dbl> <chr> <chr> <chr>
1 6.48 6 62.1 no male no
2 10.1 18 74.7 yes female no
3 9.55 16 69.7 no female yes
4 11.1 14 71 no male no
5 4.8 5 56.9 no male no
6 6.22 11 58.7 no female no
>
# Type help in the brackets of the command you would like help for
> help(hist)
# Or simply through a question mark (?) in front of the command.
> ?hist

# To produce histogram of Lung Capacity

> hist(LungCap)
#You will notice the default in R, is to report “frequencies”, a default “title” and a “bin
width” that is determined by R.
Big Data

# Now we move to change the default of the values:

#1. Change the Y-axis to represent a “probability density “rather than “frequencies”. To do
so we can use “freq” argument and set this equal to “false”.
> hist(LungCap, freq=FALSE)
#or simply write to capital “F”.
> hist(LungCap, freq=F)

#Conversely, we may use the “prob” argument and set this equal “TRUE”, again providing a
capital ‘T’.
> hist(LungCap, prob=TRUE)
> hist(LungCap, prob=T)

#We may change the x and y limits using the “xlim” and “ylim” argument. Here, set the y
limits to run from 0 up to 0.2.

> hist(LungCap, prob=T, ylim=c(0, 0.2))

Big Data

#Now move to change the bin width. To do so we may use the “breaks” argument within the
histogram command.
> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=7)
# 7 breaks point will result in 8 bins being produced.

> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=14)

# 14 breaks point will result in 15 bins being produced.

> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=24)

# 24 breaks point will result in 25 bins being produced.
Big Data

# We also can state and specify all numbers individually.

> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=c(0,2,4,6,8,10,12,14,16))

# We also can state and specify all numbers using sequence commands
> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=seq(from=0, to=16, by=2))

# Now we move to change the title using the “main” argument as well as label the x-axis and
y-axis using “x-lab” or “ylab”.
> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=seq(from=0, to=16, by=2), main="Boxplot of
Lung Capacity", xlab="Lung Capacity")
Big Data

# Next, we move to rotate the values on the y-axis by setting the “las” argument equal to 1.
> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=seq(from=0, to=16, by=2), main="Boxplot of
Lung Capacity", xlab="Lung Capacity", las=1)

#Finally, we discuss to add “density curve” over this plot. It can be done using the “lines”
command.
>lines(density(LungCap))
Big Data

#We can also change the colour of line using col= “red”.
> lines(density(LungCap), col= “red”, lwd=3)

O Level Space Physics Notes
100% (5)
O Level Space Physics Notes
40 pages
Applied Statistics For Bioinformatics Using R
100% (2)
Applied Statistics For Bioinformatics Using R
279 pages
LS2 ALS Understanding-How-Your-Sense-Organs-Works-causes-and-symptoms
100% (1)
LS2 ALS Understanding-How-Your-Sense-Organs-Works-causes-and-symptoms
10 pages
R For Health Data Science 1st Edition Complete Volume Download
No ratings yet
R For Health Data Science 1st Edition Complete Volume Download
15 pages
Sajjad DS
100% (2)
Sajjad DS
97 pages
Healthcare Analytics
No ratings yet
Healthcare Analytics
72 pages
Data Visualization Notes-2
No ratings yet
Data Visualization Notes-2
223 pages
F.M.L. Thompson - The Cambridge Social History of Britain, 1750-1950, Vol. 01. Regions and Communities
No ratings yet
F.M.L. Thompson - The Cambridge Social History of Britain, 1750-1950, Vol. 01. Regions and Communities
592 pages
Bioinfo 10
No ratings yet
Bioinfo 10
88 pages
05 Data Transformation Exploration Visualization
No ratings yet
05 Data Transformation Exploration Visualization
38 pages
Multiway
No ratings yet
Multiway
64 pages
2024 NEW Myg Catalogue
No ratings yet
2024 NEW Myg Catalogue
8 pages
R Cheat Sheet
No ratings yet
R Cheat Sheet
9 pages
Stroke Prediction Dataset
No ratings yet
Stroke Prediction Dataset
48 pages
Report PSA Assessement
No ratings yet
Report PSA Assessement
21 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Đề-Cương-Cuối-Kì-Ppnc 2
No ratings yet
Đề-Cương-Cuối-Kì-Ppnc 2
15 pages
SLDG Book - Full
No ratings yet
SLDG Book - Full
2,149 pages
Introduction To Psych Package
No ratings yet
Introduction To Psych Package
65 pages
4.18 Data Wrangling Slides Part1
No ratings yet
4.18 Data Wrangling Slides Part1
54 pages
R Record-1
No ratings yet
R Record-1
57 pages
Parta PDF
No ratings yet
Parta PDF
153 pages
Sleep Health Analysis
No ratings yet
Sleep Health Analysis
20 pages
Apuntes de Clase - DataCamp - R
No ratings yet
Apuntes de Clase - DataCamp - R
42 pages
STAT 214-T241-Lab 2
No ratings yet
STAT 214-T241-Lab 2
23 pages
Sales Forecasting
No ratings yet
Sales Forecasting
25 pages
Manual F315-F321-F330-F340
No ratings yet
Manual F315-F321-F330-F340
19 pages
Lab 3B Confound It All
No ratings yet
Lab 3B Confound It All
7 pages
Lab 2
No ratings yet
Lab 2
22 pages
Graphics PDF
No ratings yet
Graphics PDF
38 pages
The Social Work Student's Research Handbook - 2nd Edition Instant DOCX Download
100% (15)
The Social Work Student's Research Handbook - 2nd Edition Instant DOCX Download
16 pages
Unit3 R
No ratings yet
Unit3 R
30 pages
R Practicals
No ratings yet
R Practicals
32 pages
Explanationdocx
No ratings yet
Explanationdocx
9 pages
An Introduction To The Psych Package: Part I: Data Entry and Data Description
No ratings yet
An Introduction To The Psych Package: Part I: Data Entry and Data Description
63 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
Estadistica Medica Con R
No ratings yet
Estadistica Medica Con R
73 pages
Mini-Thesis Template
No ratings yet
Mini-Thesis Template
14 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
48 pages
All Values in The First Column
No ratings yet
All Values in The First Column
7 pages
Exercises
No ratings yet
Exercises
20 pages
Unit3 R
No ratings yet
Unit3 R
19 pages
Assignment# 06
No ratings yet
Assignment# 06
16 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
7th Report
No ratings yet
7th Report
14 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
Lab - 10
No ratings yet
Lab - 10
13 pages
DV Unit 2 Update
No ratings yet
DV Unit 2 Update
13 pages
Rintro
No ratings yet
Rintro
42 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
Pima Tutorial
No ratings yet
Pima Tutorial
8 pages
ProbList5 24 SLN
No ratings yet
ProbList5 24 SLN
9 pages
ECE 4141-Experiment 3 - CMOS NAND Transistors Sizing Simulation Using PSPICE
100% (2)
ECE 4141-Experiment 3 - CMOS NAND Transistors Sizing Simulation Using PSPICE
6 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
ALY6015 Final Project Report
No ratings yet
ALY6015 Final Project Report
19 pages
Chapter 4 Data Visualization
No ratings yet
Chapter 4 Data Visualization
21 pages
ECE 4141 - Introduction To Microwind Analysis of CMOS 0.35 Micron Technology MOSFET
No ratings yet
ECE 4141 - Introduction To Microwind Analysis of CMOS 0.35 Micron Technology MOSFET
17 pages
B) Stata Interface (With Data and Commands, Windows) : End: The Introduction of Data Has Finished
No ratings yet
B) Stata Interface (With Data and Commands, Windows) : End: The Introduction of Data Has Finished
14 pages
STAT501 Online - HW2R - Spring2024
No ratings yet
STAT501 Online - HW2R - Spring2024
7 pages
Cambridge International AS & A Level: Physics 9702/23
No ratings yet
Cambridge International AS & A Level: Physics 9702/23
12 pages
42 Histograms
No ratings yet
42 Histograms
5 pages
Q3 - Stat2100 Dupol Melkiancaesar
No ratings yet
Q3 - Stat2100 Dupol Melkiancaesar
12 pages
R Studio Notes
No ratings yet
R Studio Notes
10 pages
Mẫu Câu Writing Task 2 Hay
No ratings yet
Mẫu Câu Writing Task 2 Hay
15 pages
LiFePO4 Battery Material For The Production of Lit
No ratings yet
LiFePO4 Battery Material For The Production of Lit
13 pages
q3 Stat2100 Bautista-Lhuriely
No ratings yet
q3 Stat2100 Bautista-Lhuriely
11 pages
2 R - Zajecia - 4 - Eng
No ratings yet
2 R - Zajecia - 4 - Eng
7 pages
Week 4 Laboratory Activity
No ratings yet
Week 4 Laboratory Activity
6 pages
18.question Bank - SA I - ND22
No ratings yet
18.question Bank - SA I - ND22
5 pages
Tutorial 5 - Calculating Mean, Standard Deviation, Frequencies
No ratings yet
Tutorial 5 - Calculating Mean, Standard Deviation, Frequencies
6 pages
100% Original Combo
No ratings yet
100% Original Combo
4 pages
HBRI Brochure
0% (1)
HBRI Brochure
8 pages
Classroom and Lab Area - Job Roles Wise
No ratings yet
Classroom and Lab Area - Job Roles Wise
115 pages
Solving Linear Fractional Programming Problems With Interval Coefficients in The Objective Function. A New Approach
No ratings yet
Solving Linear Fractional Programming Problems With Interval Coefficients in The Objective Function. A New Approach
11 pages
BC672 772RB-2 6pg
No ratings yet
BC672 772RB-2 6pg
6 pages
(RFD9910) Common Base Amplifier Linearization Using Augmentation
100% (2)
(RFD9910) Common Base Amplifier Linearization Using Augmentation
3 pages
Magnetically Coupled Circuits
No ratings yet
Magnetically Coupled Circuits
21 pages
Lab Experiment 1 - Friction Pipe
No ratings yet
Lab Experiment 1 - Friction Pipe
7 pages
Caries Detection
No ratings yet
Caries Detection
7 pages
HI5004 Group Assignment Guideline T1.2021
No ratings yet
HI5004 Group Assignment Guideline T1.2021
15 pages
Đề thi học kì 2 2022 - 2023
No ratings yet
Đề thi học kì 2 2022 - 2023
3 pages
PERSONAL-LIFELONG-LEARNING-PLAN Marilyn D. Tagao
No ratings yet
PERSONAL-LIFELONG-LEARNING-PLAN Marilyn D. Tagao
7 pages
Germination Value A New Formula: Pinus Radiata
No ratings yet
Germination Value A New Formula: Pinus Radiata
5 pages
Dengue Fever in Penang
No ratings yet
Dengue Fever in Penang
2 pages
VCO Non-Adjusting PLL FM MPX Stereo Demodulator With FM Accessories
No ratings yet
VCO Non-Adjusting PLL FM MPX Stereo Demodulator With FM Accessories
16 pages
A-Type Buyers Guide With Technology Comparison For Oxygen Plants 2021
No ratings yet
A-Type Buyers Guide With Technology Comparison For Oxygen Plants 2021
19 pages
McIntyre - Quantum Mechanics - 83
No ratings yet
McIntyre - Quantum Mechanics - 83
3 pages
Facilities Management Conference Indonesia
No ratings yet
Facilities Management Conference Indonesia
6 pages
Cleanrooms and HVAC Systems Design Fundamentals
100% (6)
Cleanrooms and HVAC Systems Design Fundamentals
39 pages
ChuteDesignFormulas Paper43
No ratings yet
ChuteDesignFormulas Paper43
11 pages
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet

Tutorial 2 - Histogram

Uploaded by

Tutorial 2 - Histogram

Uploaded by

Big Data

Trainer: Ts. Dr. Ahmad Anwar Zainuddin

Performing data modelling in R

Tutorial 2: Making Histogram in R : www.youtube.com/watch?v=Hj1pgap4UOY

Figure 1 : Install Packages

Figure 2: ggplot2 Installing Package

3. Import the dataset from this link:

Figure 3: Import Dataset

Figure 4: The LungCapData is imported

4. Type the following coding below :

> LungCapData <- read.table(file.choose(), header=T, sep="\t")

In read.table(file.choose(), header = T, sep = "\t") :incomplete final

If this warning appears, kindly import the dataset again as shown in 3.

> LungCapData <- read_excel("D:/Big Data Practices/SLIDES/Big Data with

# To view, attach and state the dataset

# To produce histogram of Lung Capacity

# Now we move to change the default of the values:

> hist(LungCap, prob=T, ylim=c(0, 0.2))

> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=14)

> hist(LungCap, prob=T, ylim=c(0, 0.2), breaks=24)

# We also can state and specify all numbers individually.

You might also like