0% found this document useful (0 votes)

1 views5 pages

Lab 2

Uploaded by

taytay2560

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views5 pages

Lab 2

Uploaded by

taytay2560

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lab 2: Centrality, Spread, and Graphing

Contents
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Graphing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Importing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Objectives
• Learn how to calculate measures of centrality
• Learn how to calculate spread
• Learn how to graph data
• Learn how to import and process CSV’s

Centrality
The primary measures of centrality we will use are mean and median. In order to calculate mean, you use
the function mean(data)
x <- sample(1:100, 10)
y <- sample(1:100, 10)

print(mean(x))

## [1] 44.4
print(mean(y))

## [1] 46
Another measure of centrality is the median. In order to calculate the median, you use the function
median(data)
print(median(x))

## [1] 56.5
print(median(y))

## [1] 45

Spread
The primary measure of spread we will use is standard deviation. In order to calculate standard deviation,
you use the function sd(data)

1
print(sd(x))

## [1] 25.94096
print(sd(y))

## [1] 28.40188
Alternatively you may want to find the IQR of the data. In order to calculate the IQR. One way is by finding
the difference between the 75th percentile and the 25th percentile.
#find the 25th percentile
q25 <- quantile(x, 0.25)
#find the 75th percentile
q75 <- quantile(x, 0.75)
#find the IQR
iqr <- q75 - q25
print(iqr)

## 75%
## 42.25
You can also find IQR using the function IQR(data)
print(IQR(x))

## [1] 42.25
print(IQR(y))

## [1] 41.25

Graphing
In order to plot a scatter plot, you can use the function plot(x, y)
plot(x, y)
80
60
y

40
20

10 20 30 40 50 60 70

2
You can make a histogram using the function hist(data)
hist(x)

Histogram of x
4
3
Frequency

2
1
0

0 10 20 30 40 50 60 70

x
You can make a boxplot using the function boxplot(data). You can also make a boxplot of multiple data
sets by passing in a list of data sets.
# add labels to the boxplot
boxplot(x, y, names=c("X", "Y"), main="Boxplot of X and Y")

Boxplot of X and Y
80
60
40
20

X Y

3
Importing Data
In order to import data from a CSV file, you can use the read.csv() function.
data <- read.csv("organizations-100.csv", header=TRUE, sep=",")

# print column names

print(colnames(data))

## [1] "X" "Organization.Id" "Name"

## [4] "Website" "Country" "Description"
## [7] "Founded" "Industry" "Number.of.employees"
# set column founded and number.of.employees to numeric

# data$number.of.employees <- as.numeric(data$number.of.employees)

# print column names

print(colnames(data))

## [1] "X" "Organization.Id" "Name"

## [4] "Website" "Country" "Description"
## [7] "Founded" "Industry" "Number.of.employees"
Sometimes you may want to map a column to 0 or 1 depending on the value. For instance, you may want to
count how many companies were founded after 2000
print(sum(data$Founded > 2000))

## [1] 38
You can also filter the data based on a condition. For instance, you may want to filter the data to only
include companies with more than 100 employees
filtered_data <- data[data$Number.of.employees > 100,]

print(head(filtered_data))

## X Organization.Id Name Website

## 1 1 FAB0d41d5b5d22c Ferrell LLC https://price.net/
## 2 2 6A7EdDEA9FaDC52 Mckinney, Riley and Day http://www.hall-buchanan.info/
## 3 3 0bFED1ADAE4bcC1 Hester Ltd http://sullivan-reed.com/
## 4 4 2bFC1Be8a4ce42f Holder-Sellers https://becker.com/
## 5 5 9eE8A6a4Eb96C24 Mayer Group http://www.brewer.com/
## 6 6 cC757116fe1C085 Henry-Thompson http://morse.net/
## Country Description Founded
## 1 Papua New Guinea Horizontal empowering knowledgebase 1990
## 2 Finland User-centric system-worthy leverage 2015
## 3 China Switchable scalable moratorium 1971
## 4 Turkmenistan De-engineered systemic artificial intelligence 2004
## 5 Mauritius Synchronized needs-based challenge 1991
## 6 Bahamas Face-to-face well-modulated customer loyalty 1992
## Industry Number.of.employees
## 1 Plastics 3498
## 2 Glass / Ceramics / Concrete 4952
## 3 Public Safety 5287

4
## 4 Automotive 921
## 5 Transportation 7870
## 6 Primary / Secondary Education 4914
# print the number of companies with more than 100 employees
print(nrow(filtered_data))

## [1] 100

Exercises
Part 1
1. Generate a sample of 1000 numbers from 1 to 100 and calculate the mean, median, standard deviation,
and IQR of the sample
2. Create a histogram of the sample
3. Create a boxplot of the sample

Part 2
1. Calculate the mean, median, standard deviation, and IQR of the number of employees in the data set
organizations-100.csv
2. Create a scatter plot of the number of employees and the year founded
3. Create a histogram of the number of employees
4. Create a boxplot of the number of employees
5. Filter the data to only include companies with more than 100 employees and calculate the mean, median,
standard deviation, and IQR of the number of employees in the filtered data set

References
Source of data: https://github.com/datablist/sample-csv-files?tab=readme-ov-file

Modern Statistics With R
100% (3)
Modern Statistics With R
580 pages
EDAV
No ratings yet
EDAV
218 pages
Advance R Prog.-1
No ratings yet
Advance R Prog.-1
24 pages
Data - Analysis Using Matlab
No ratings yet
Data - Analysis Using Matlab
156 pages
Stats
100% (1)
Stats
1,561 pages
Statistical Methods For Data Science
100% (2)
Statistical Methods For Data Science
406 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
Reliability and Validity of The Evaluation Tool of Children's Handwriting-Cursive (ETCH-C) Using The General Scoring Criteria
No ratings yet
Reliability and Validity of The Evaluation Tool of Children's Handwriting-Cursive (ETCH-C) Using The General Scoring Criteria
10 pages
Business Analytics Unit 4
No ratings yet
Business Analytics Unit 4
24 pages
Advanced R Data Analysis Training PDF
No ratings yet
Advanced R Data Analysis Training PDF
72 pages
Unit 2 1
No ratings yet
Unit 2 1
54 pages
Structural Equation Modeling With Lisrel Application in Tourism PDF
No ratings yet
Structural Equation Modeling With Lisrel Application in Tourism PDF
18 pages
Dav Exps - Merged - Merged
No ratings yet
Dav Exps - Merged - Merged
99 pages
DSA1101 2019 Week1 Part2
No ratings yet
DSA1101 2019 Week1 Part2
38 pages
Misuse of Slovin's Formula
No ratings yet
Misuse of Slovin's Formula
8 pages
Introduction To R
No ratings yet
Introduction To R
103 pages
Basics of Data Analysis and Graphics in
No ratings yet
Basics of Data Analysis and Graphics in
103 pages
R-Lab p-4,2,1
No ratings yet
R-Lab p-4,2,1
12 pages
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
No ratings yet
Business Analytics Unit - IV Notes - 60637706 - 2025 - 05!15!02 - 16
28 pages
COST - JournalPracticals (1-7)
No ratings yet
COST - JournalPracticals (1-7)
22 pages
Statistics and Data Science With R Part - 4
No ratings yet
Statistics and Data Science With R Part - 4
23 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Unit 3
No ratings yet
Unit 3
36 pages
Descriptive Statistics in R
No ratings yet
Descriptive Statistics in R
46 pages
Data Minig and Techniquezz
No ratings yet
Data Minig and Techniquezz
48 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
BA Notes
No ratings yet
BA Notes
5 pages
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
No ratings yet
X - 15 x-1 2. Print ('Hello Word!') ## (1) "Hello Word!" 3. X - 4 y - 5 Z - X+y Print (Z) 4. X - 4 y - 5 Cat ('The Sum of X and y Is', X+y)
15 pages
R-Programming Lab Mannual
No ratings yet
R-Programming Lab Mannual
33 pages
Da Laqs Saqs
No ratings yet
Da Laqs Saqs
23 pages
TEB2043 Introduction To Data Science: Descriptive Analytics & Visualization DR Shuhaida Mohamed Shuhidan JAN 2025
No ratings yet
TEB2043 Introduction To Data Science: Descriptive Analytics & Visualization DR Shuhaida Mohamed Shuhidan JAN 2025
29 pages
Wgu C784 - Applied Healthcare Statistics Pre-Assessment Exam
100% (1)
Wgu C784 - Applied Healthcare Statistics Pre-Assessment Exam
29 pages
Phan Project2 Report
No ratings yet
Phan Project2 Report
10 pages
Unit3 R
No ratings yet
Unit3 R
19 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
Lec 4
No ratings yet
Lec 4
18 pages
Business Analytics - L2
No ratings yet
Business Analytics - L2
41 pages
Computer Interactive Statistics
No ratings yet
Computer Interactive Statistics
103 pages
R Programming Cheat Sheet
No ratings yet
R Programming Cheat Sheet
7 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
Lab Manual - DSR
No ratings yet
Lab Manual - DSR
32 pages
Introduction To R For Business Analytics
No ratings yet
Introduction To R For Business Analytics
7 pages
Unit3 R
No ratings yet
Unit3 R
30 pages
Mode For Grouped Data
No ratings yet
Mode For Grouped Data
37 pages
Applied Statistical Inference Likelihood and Bayes One-Click Download
100% (10)
Applied Statistical Inference Likelihood and Bayes One-Click Download
17 pages
Module 5-6
No ratings yet
Module 5-6
12 pages
Unit 3 DS
No ratings yet
Unit 3 DS
30 pages
R Programming
No ratings yet
R Programming
11 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
DVPD Final Lab Word PDF
No ratings yet
DVPD Final Lab Word PDF
93 pages
BDA 09 Shridhti Tiwari
No ratings yet
BDA 09 Shridhti Tiwari
12 pages
DAUP Exam Notes - 2in1
No ratings yet
DAUP Exam Notes - 2in1
35 pages
Program-1
No ratings yet
Program-1
15 pages
Skewness Kurtosis
No ratings yet
Skewness Kurtosis
26 pages
Boulder Handout 2019
No ratings yet
Boulder Handout 2019
187 pages
R Workshop Material 18-19, Oct-2023
No ratings yet
R Workshop Material 18-19, Oct-2023
67 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
Ismaykim1 PDF
No ratings yet
Ismaykim1 PDF
522 pages
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
No ratings yet
Muthayammal College of Arts and Science Rasipuram: Assignment No - 1
10 pages
R File Code
No ratings yet
R File Code
16 pages
Exploratory Data Analysis (EDA) in Python
No ratings yet
Exploratory Data Analysis (EDA) in Python
6 pages
Week-6 DS Practical
No ratings yet
Week-6 DS Practical
12 pages
Unit 1,2
No ratings yet
Unit 1,2
17 pages
Week - 6-7
No ratings yet
Week - 6-7
9 pages
CORE Stat and Prob Q4 Mod17 W6 Hypothesis Testing On Population Proportion
No ratings yet
CORE Stat and Prob Q4 Mod17 W6 Hypothesis Testing On Population Proportion
30 pages
Importing The Files
No ratings yet
Importing The Files
14 pages
Praktikum Modul 3
No ratings yet
Praktikum Modul 3
5 pages
Harolds Stats PDFs Cheat Sheet 2016
No ratings yet
Harolds Stats PDFs Cheat Sheet 2016
13 pages
Notes On ARIMA: ND RD
No ratings yet
Notes On ARIMA: ND RD
4 pages
DLL Stat 5th Week For COT
100% (1)
DLL Stat 5th Week For COT
5 pages
Sampling Methods Exam Qs
No ratings yet
Sampling Methods Exam Qs
11 pages
Sta 2023
No ratings yet
Sta 2023
5 pages
Chapter 25 - Probability
No ratings yet
Chapter 25 - Probability
26 pages
BI Syllabus
No ratings yet
BI Syllabus
3 pages
A Statistical Analysis of GDP and Final Consumption Using Simple Linear Regression. The Case of Romania 1990-2010
No ratings yet
A Statistical Analysis of GDP and Final Consumption Using Simple Linear Regression. The Case of Romania 1990-2010
7 pages
HELM (2005) : Section 41.2: Tests Concerning A Single Sample
No ratings yet
HELM (2005) : Section 41.2: Tests Concerning A Single Sample
14 pages
Daftar Pustaka
No ratings yet
Daftar Pustaka
31 pages
ch9 2
No ratings yet
ch9 2
9 pages
CH II Estimation
No ratings yet
CH II Estimation
33 pages
Sampling Methods
No ratings yet
Sampling Methods
14 pages
Reviewer in Poped
No ratings yet
Reviewer in Poped
7 pages
Methodology of Performance Scoring in The d2 Sustained-Attention Test: Cumulative-Reliability Functions and Practical Guidelines
No ratings yet
Methodology of Performance Scoring in The d2 Sustained-Attention Test: Cumulative-Reliability Functions and Practical Guidelines
53 pages
Missing Data Management
No ratings yet
Missing Data Management
19 pages
Business Tools For Decision Making
No ratings yet
Business Tools For Decision Making
1 page
Dougherty5e C14G01 2016 05 27
No ratings yet
Dougherty5e C14G01 2016 05 27
34 pages
RM Final
No ratings yet
RM Final
13 pages
BCS 040
No ratings yet
BCS 040
7 pages
Ids Unit 3
No ratings yet
Ids Unit 3
4 pages
C1 STS
No ratings yet
C1 STS
3 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet