0% found this document useful (0 votes)

23 views5 pages

Exploratory Data Analysis

This document discusses exploratory data analysis of a cancer dataset using various R functions to analyze correlations. Several datasets are extracted from the original cancer data and analyzed using functions for pairwise correlation, scatter plots, normality testing, and simulating raw data from given correlation coefficients. Correlation analysis is performed between variables in each dataset using Pearson and Spearman correlation coefficients. Normality of each variable is also assessed using QQ plots and Wilk-Shapiro tests.

Uploaded by

Cyd Duque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views5 pages

Exploratory Data Analysis

Uploaded by

Cyd Duque

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

# Mindanao State University

# General Santos City

# Exploratory Data Analysis

# Prepared by: Prof. Carlito O. Daarol
# Math Department
# March 16, 2023

# -------------------------------------------------------------------
# Activate the file containing all functions
# You should modify the file location because it refers to my laptop
# -------------------------------------------------------------------

(drive <- "D:/")

folder_functions <- "Research/thesisfunctions"
filename <- "fn_More_Correlations.R"
source(paste0(drive, folder_functions,"/",filename))

# --------------------------------------------------------
# Using function 1: Read the dataset using function call
# --------------------------------------------------------

# Set pointer to location of my data ( do not use setwd command for data
retrieval)

folder_data <- "C:/Users/Admin/Desktop/Class Lectures/BLecture 0 Graphics in R"

filename <- "Cancer.csv"
data <- readcsv(folder_data,filename)

# Check contents
dim(data)
colnames(data)
head(data)

# ------------------------------------------
# Using R package to display large dataset
# Output visible only if output format is html
# --------------------------- --------------
library(DT)
datatable(data)

# --------------------------------------------------------
# Using function 3: Compute correlation using two columns
# from the dataset
# --------------------------------------------------------

X <- data$breastcancer
Y <- data$co2emissions
corXY <- correlation(X,Y)
corXY
# Result: NA
# This means computation of correlation is not possible because
# of the presence of Missing values

# Possible solution is to omit the NA values

# this is not good because at the end
# X and Y may not have the same length

# Using function 4: put X and Y into 1 dataframe

dataXY <- as.data.frame(cbind(X,Y))

dim(dataXY)

# select only rows with no missing value

dataXY <- na.omit(dataXY)
dim(dataXY)

corXY <- Correcorre(dataXY)

corXY

# Using function 5 and 2: Construct two sets of variables from the data

str(data) # we need to lookup first on the type of variables we have

# select three columns from the data

Set1 <- data[,c("breastcancer", "alcconsumption","internetuserate")]
anyNA(Set1)

# select another three columns from the data

Set2 <- data[,c("co2emissions", "employrate","lifeexpectancy")]
anyNA(Set2)

# If we delete NA values separately then Set1 and Set2 we

# could possibly have unequal rows
# solution is to combine them as dataframe

tmpdat <- cbind(Set1,Set2)

tmpdat <- na.omit(tmpdat)
dim(tmpdat)

# pull out again Set1 and Set2

Set1 <- tmpdat[,1:3]
Set2 <- tmpdat[,4:6]

#process pairwise correlations by feeding the two sets to the 5th function
Pearsonr <- pairwiseCor(Set1,Set2,"pearson")
Spearmanr <- pairwiseCor(Set1,Set2,"spearman")

Pearsonr
Spearmanr

# Table is not good enough for distribution

# Call the function #2 NiceTable to enhance appearance
NiceTable(Pearsonr,"Pearson Correlation Analysis")
NiceTable(Spearmanr,"Spearman Correlation Analysis")

# Using function 6: Compute correlation using only 1 set of data

Pearson1 <- singlesetCor(tmpdat,"pearson")

Spearman1 <- singlesetCor(tmpdat,"spearman")

#display unformatted table

Pearson1
Spearman1

# Display a better table

NiceTable(Pearson1, "Pearson Correlation Analysis")
NiceTable(Spearman1, "Pearson Correlation Analysis")

# Using function 7: Correlation Coefficents in table format

CorrsjPlot(Set1,"pearson","Pearson Correlation Coefficients")

CorrsjPlot(Set2,"pearson","Pearson Correlation Coefficients")

# Using function 8: Scatter Plot

Set1name <- colnames(Set1)

CorrePlotXY(Set1,Set1name[1],Set1name[2],"blue", "XAxis", "YAxis","pearson")

Set2name <- colnames(Set2)

CorrePlotXY(Set2,Set2name[1],Set2name[2],"blue", "XAxis", "YAxis","pearson")

# Use double for loop to generate all plots for Set 1

for (i in 1:(ncol(Set1)-1)) {
for (j in (i+1):ncol(Set1)){
print(CorrePlotXY(Set1,Set1name[i],Set1name[j],"blue", Set1name[i],
Set1name[j],"pearson"))

}
}

# Use double for loop to generate all plots for Set 2

for (i in 1:(ncol(Set2)-1)) {
for (j in (i+1):ncol(Set2)){
print(CorrePlotXY(Set2,Set2name[i],Set2name[j],"blue", Set2name[i],
Set2name[j],"pearson"))
}
}

#Remark: Plots under double for loop will not appear without the pront command

# Using function 9: How to verify if the data

# satisfies the normal distribution using Wilk-Shapiro test
# Using function 10: How to verify if the data
# satisfies the normal distribution using graphs

NiceTable(Set1,"Dataset in wide original format")

# convert data to long format first

data_long <- melt(Set1)
NiceTable(data_long,"Dataset in long format")
QQNormality_Plot(data_long)

# Points must fall inside the confidence band

# for it to be called as normally distributed.
# If not satisfied call the distribution as Non-normal (skewed)

# Using function 10: How to verify if the data

# satisfies the normal distribution using graphs

NiceTable(Set1,"Dataset in wide original format")

# convert data to long format first

data_long <- melt(Set1)
NiceTable(data_long,"Dataset in long format")
QQNormality_Plot(data_long)

# Points must fall inside the confidence band

# for it to be called as normally distributed.
# If not satisfied call the distribution as Non-normal (skewed)

# Using function 11: For a given set of correlation coefficients, Generate the
# corresponding raw data X and Y.

PlotHistDensity(Set1)

# Using function 12: For a given set of correlation coefficients, Generate the
# corresponding raw data X and Y.

sampleCor <- c(0.214, 0.4, 0.617, 0.742, 0.851, 0.915)

Simulate_XY_From_Correlations(sampleCor)

sampleCor <- c(0.214, 0.3, 0.617, 0.76, 0.851, 0.915)

Simulate_XY_From_Correlations(sampleCor)

# View generated data

gendata <- read.csv("DatXY.csv")
NiceTable(gendata,"Generated Datasets")

r-cheatsheet-ABC
No ratings yet
r-cheatsheet-ABC
3 pages
maths lab
No ratings yet
maths lab
17 pages
R programming end term
No ratings yet
R programming end term
4 pages
R-Unit 5
No ratings yet
R-Unit 5
76 pages
Group 2 Final Project
No ratings yet
Group 2 Final Project
15 pages
9488 Et Longitudinal 2 Eda
No ratings yet
9488 Et Longitudinal 2 Eda
30 pages
Practical No 12 sml
No ratings yet
Practical No 12 sml
6 pages
8 - Cia 3 Key
No ratings yet
8 - Cia 3 Key
3 pages
Intro To R Software
No ratings yet
Intro To R Software
7 pages
Experiment No 8
No ratings yet
Experiment No 8
11 pages
Data Cleansing
No ratings yet
Data Cleansing
18 pages
Adhithyan
No ratings yet
Adhithyan
22 pages
ds
No ratings yet
ds
2 pages
Solutions for QB3
No ratings yet
Solutions for QB3
14 pages
Nthu Bacs-Hw
No ratings yet
Nthu Bacs-Hw
4 pages
R Quiz
No ratings yet
R Quiz
291 pages
Correlation Analysis in python
100% (1)
Correlation Analysis in python
6 pages
Lab Exercise 1
No ratings yet
Lab Exercise 1
16 pages
CourseKata r Cheatsheet ABC (1)
No ratings yet
CourseKata r Cheatsheet ABC (1)
5 pages
R code
No ratings yet
R code
9 pages
06 - Problems With The Error
No ratings yet
06 - Problems With The Error
2 pages
Commands for Data Analysis using R
No ratings yet
Commands for Data Analysis using R
11 pages
r-cheatsheet-ABCD (1)
No ratings yet
r-cheatsheet-ABCD (1)
3 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
r-cheatsheet-ABCD
No ratings yet
r-cheatsheet-ABCD
3 pages
Spatial Statistics in R
No ratings yet
Spatial Statistics in R
29 pages
R Pgms 30
No ratings yet
R Pgms 30
6 pages
Ex 10 - Decision Tree With Rpart and Fancy Plot and Cardio Data
No ratings yet
Ex 10 - Decision Tree With Rpart and Fancy Plot and Cardio Data
4 pages
Practical 1 EDA
No ratings yet
Practical 1 EDA
14 pages
SML Practical 1to11
No ratings yet
SML Practical 1to11
23 pages
r-cheatsheet-ABC (1)
No ratings yet
r-cheatsheet-ABC (1)
3 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Graph 1: Data: Observations Reasonably Spread Across Years. Distribution Across 15 Occupations
No ratings yet
Graph 1: Data: Observations Reasonably Spread Across Years. Distribution Across 15 Occupations
12 pages
List of Functions
No ratings yet
List of Functions
7 pages
Assignment 1
No ratings yet
Assignment 1
12 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
AMDA Practical - A048
No ratings yet
AMDA Practical - A048
35 pages
Name: Badigi Shivakumar Reg - No: 20MIS0173 Lab - Slot: L9+L10 Date: 02-09-2021
No ratings yet
Name: Badigi Shivakumar Reg - No: 20MIS0173 Lab - Slot: L9+L10 Date: 02-09-2021
10 pages
Module - 4 (R Training) - Basic Stats & Modeling
No ratings yet
Module - 4 (R Training) - Basic Stats & Modeling
15 pages
R Practicals
No ratings yet
R Practicals
32 pages
Instant ebooks textbook DNA Encoded Chemical Libraries Methods and Protocols 1st Edition David Israel download all chapters
100% (2)
Instant ebooks textbook DNA Encoded Chemical Libraries Methods and Protocols 1st Edition David Israel download all chapters
50 pages
R
No ratings yet
R
6 pages
BAN5
No ratings yet
BAN5
2 pages
UL2
No ratings yet
UL2
2 pages
Unit 4 Statistics Notes Scatter Plot 2023-24
No ratings yet
Unit 4 Statistics Notes Scatter Plot 2023-24
15 pages
List of Programs in R 2 Sem
No ratings yet
List of Programs in R 2 Sem
48 pages
R Functions
No ratings yet
R Functions
8 pages
R Syntax Examples 1
No ratings yet
R Syntax Examples 1
6 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
Session Set Working Directory Choose Directlry
No ratings yet
Session Set Working Directory Choose Directlry
17 pages
r file code
No ratings yet
r file code
16 pages
Alg 2.2 2.6 Originals
No ratings yet
Alg 2.2 2.6 Originals
20 pages
CRM Cheat Sheet
No ratings yet
CRM Cheat Sheet
7 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
8 pages
A Short List of The Most Useful R Commands
No ratings yet
A Short List of The Most Useful R Commands
11 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
R Examples
No ratings yet
R Examples
56 pages
A Short List of Some Useful R Commands: Input and Display
No ratings yet
A Short List of Some Useful R Commands: Input and Display
2 pages
09-08-2016 - Turtal Lab
29% (14)
09-08-2016 - Turtal Lab
4 pages
Quine (1992), Pursuit of Truth
No ratings yet
Quine (1992), Pursuit of Truth
124 pages
Mathematics Chapterwise Mcqs - 12th STD
No ratings yet
Mathematics Chapterwise Mcqs - 12th STD
164 pages
RFP CPC - 2024 16th Moyane Street
No ratings yet
RFP CPC - 2024 16th Moyane Street
41 pages
James, A. (2008). Play in Childhood, An Anthropological Perspective. Child Psychology and Psychiatry Review, 3, 104-109.
No ratings yet
James, A. (2008). Play in Childhood, An Anthropological Perspective. Child Psychology and Psychiatry Review, 3, 104-109.
6 pages
B V Ramana Higher Engineering Mathematics MC Graw Hill Education 2018
100% (4)
B V Ramana Higher Engineering Mathematics MC Graw Hill Education 2018
1,366 pages
Problems of Education To Indigenous Peoples
No ratings yet
Problems of Education To Indigenous Peoples
11 pages
Satellite Communications 2nd Ed by Timothy Pratt, Charles W (1) - Bostian Sample From CH 2
40% (5)
Satellite Communications 2nd Ed by Timothy Pratt, Charles W (1) - Bostian Sample From CH 2
5 pages
Prob 1
No ratings yet
Prob 1
3 pages
CH 9
No ratings yet
CH 9
16 pages
Feet First - Object Exploration in Young Infants
No ratings yet
Feet First - Object Exploration in Young Infants
6 pages
Lecture Strength - Part 1 - Simple Stress-1
No ratings yet
Lecture Strength - Part 1 - Simple Stress-1
33 pages
14 Help - Structural Steel Import - Autodesk
No ratings yet
14 Help - Structural Steel Import - Autodesk
8 pages
F07 Midterm Solutions
No ratings yet
F07 Midterm Solutions
9 pages
Math Lit P1 June 2023 Memorandum
No ratings yet
Math Lit P1 June 2023 Memorandum
10 pages
The Spectralism of Gérard Grisey
No ratings yet
The Spectralism of Gérard Grisey
26 pages
Comp and Ben Presentation
No ratings yet
Comp and Ben Presentation
11 pages
Lesson Selfies 20221031
No ratings yet
Lesson Selfies 20221031
5 pages
Tes Evaluasi - Intentions and Plans
No ratings yet
Tes Evaluasi - Intentions and Plans
4 pages
Avik Barman Logistics and Supply Chain Management Report
No ratings yet
Avik Barman Logistics and Supply Chain Management Report
26 pages
Bio Paper 3 f4 Akhir Tahun
No ratings yet
Bio Paper 3 f4 Akhir Tahun
31 pages
13 The 5e Instructional Model NASA
No ratings yet
13 The 5e Instructional Model NASA
3 pages
Osborne & Patterson (2011) Scientific Argument and Explanation
No ratings yet
Osborne & Patterson (2011) Scientific Argument and Explanation
12 pages
SDVC Women Barrier Analysis FGD Tool
No ratings yet
SDVC Women Barrier Analysis FGD Tool
4 pages
AltaBlue A15,30,50,100
No ratings yet
AltaBlue A15,30,50,100
255 pages
Malaysia Soil Series
No ratings yet
Malaysia Soil Series
2 pages
Kathleen Coggins: Administrative Assistant
No ratings yet
Kathleen Coggins: Administrative Assistant
2 pages
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Elliot Shetka: 3550 EAST 46 STREET MINNEAPOLIS, MN - 651-808-7688
No ratings yet
Elliot Shetka: 3550 EAST 46 STREET MINNEAPOLIS, MN - 651-808-7688
2 pages
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
DLP Alim-1
No ratings yet
DLP Alim-1
12 pages