0% found this document useful (0 votes)
1 views5 pages

Lab 2

Uploaded by

taytay2560
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views5 pages

Lab 2

Uploaded by

taytay2560
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Lab 2: Centrality, Spread, and Graphing

Contents
Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Centrality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Spread . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Graphing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Importing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

Objectives
• Learn how to calculate measures of centrality
• Learn how to calculate spread
• Learn how to graph data
• Learn how to import and process CSV’s

Centrality
The primary measures of centrality we will use are mean and median. In order to calculate mean, you use
the function mean(data)
x <- sample(1:100, 10)
y <- sample(1:100, 10)

print(mean(x))

## [1] 44.4
print(mean(y))

## [1] 46
Another measure of centrality is the median. In order to calculate the median, you use the function
median(data)
print(median(x))

## [1] 56.5
print(median(y))

## [1] 45

Spread
The primary measure of spread we will use is standard deviation. In order to calculate standard deviation,
you use the function sd(data)

1
print(sd(x))

## [1] 25.94096
print(sd(y))

## [1] 28.40188
Alternatively you may want to find the IQR of the data. In order to calculate the IQR. One way is by finding
the difference between the 75th percentile and the 25th percentile.
#find the 25th percentile
q25 <- quantile(x, 0.25)
#find the 75th percentile
q75 <- quantile(x, 0.75)
#find the IQR
iqr <- q75 - q25
print(iqr)

## 75%
## 42.25
You can also find IQR using the function IQR(data)
print(IQR(x))

## [1] 42.25
print(IQR(y))

## [1] 41.25

Graphing
In order to plot a scatter plot, you can use the function plot(x, y)
plot(x, y)
80
60
y

40
20

10 20 30 40 50 60 70

2
You can make a histogram using the function hist(data)
hist(x)

Histogram of x
4
3
Frequency

2
1
0

0 10 20 30 40 50 60 70

x
You can make a boxplot using the function boxplot(data). You can also make a boxplot of multiple data
sets by passing in a list of data sets.
# add labels to the boxplot
boxplot(x, y, names=c("X", "Y"), main="Boxplot of X and Y")

Boxplot of X and Y
80
60
40
20

X Y

3
Importing Data
In order to import data from a CSV file, you can use the read.csv() function.
data <- read.csv("organizations-100.csv", header=TRUE, sep=",")

# print column names


print(colnames(data))

## [1] "X" "Organization.Id" "Name"


## [4] "Website" "Country" "Description"
## [7] "Founded" "Industry" "Number.of.employees"
# set column founded and number.of.employees to numeric

# data$number.of.employees <- as.numeric(data$number.of.employees)

# print column names


print(colnames(data))

## [1] "X" "Organization.Id" "Name"


## [4] "Website" "Country" "Description"
## [7] "Founded" "Industry" "Number.of.employees"
Sometimes you may want to map a column to 0 or 1 depending on the value. For instance, you may want to
count how many companies were founded after 2000
print(sum(data$Founded > 2000))

## [1] 38
You can also filter the data based on a condition. For instance, you may want to filter the data to only
include companies with more than 100 employees
filtered_data <- data[data$Number.of.employees > 100,]

print(head(filtered_data))

## X Organization.Id Name Website


## 1 1 FAB0d41d5b5d22c Ferrell LLC https://price.net/
## 2 2 6A7EdDEA9FaDC52 Mckinney, Riley and Day http://www.hall-buchanan.info/
## 3 3 0bFED1ADAE4bcC1 Hester Ltd http://sullivan-reed.com/
## 4 4 2bFC1Be8a4ce42f Holder-Sellers https://becker.com/
## 5 5 9eE8A6a4Eb96C24 Mayer Group http://www.brewer.com/
## 6 6 cC757116fe1C085 Henry-Thompson http://morse.net/
## Country Description Founded
## 1 Papua New Guinea Horizontal empowering knowledgebase 1990
## 2 Finland User-centric system-worthy leverage 2015
## 3 China Switchable scalable moratorium 1971
## 4 Turkmenistan De-engineered systemic artificial intelligence 2004
## 5 Mauritius Synchronized needs-based challenge 1991
## 6 Bahamas Face-to-face well-modulated customer loyalty 1992
## Industry Number.of.employees
## 1 Plastics 3498
## 2 Glass / Ceramics / Concrete 4952
## 3 Public Safety 5287

4
## 4 Automotive 921
## 5 Transportation 7870
## 6 Primary / Secondary Education 4914
# print the number of companies with more than 100 employees
print(nrow(filtered_data))

## [1] 100

Exercises
Part 1
1. Generate a sample of 1000 numbers from 1 to 100 and calculate the mean, median, standard deviation,
and IQR of the sample
2. Create a histogram of the sample
3. Create a boxplot of the sample

Part 2
1. Calculate the mean, median, standard deviation, and IQR of the number of employees in the data set
organizations-100.csv
2. Create a scatter plot of the number of employees and the year founded
3. Create a histogram of the number of employees
4. Create a boxplot of the number of employees
5. Filter the data to only include companies with more than 100 employees and calculate the mean, median,
standard deviation, and IQR of the number of employees in the filtered data set

References
Source of data: https://github.com/datablist/sample-csv-files?tab=readme-ov-file

You might also like