0% found this document useful (0 votes)

6 views

Lab2

The document contains a lab report by Vianna Chavez, detailing various statistical exercises involving data analysis using R. It includes exercises on lead and copper levels in Flint, life expectancy related to income, and customer data analysis, among others. Each exercise presents data visualizations, statistical calculations, and interpretations of results.

Uploaded by

vc431365

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

Lab2

Uploaded by

vc431365

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

2/5/25, 11:34 PM Lab2

Lab2
Vianna Chavez
2025-02-04

Exercise 1
1a)
flint <- read.csv("~/Desktop/Stats10/flint.csv")

1b)
mean(flint$Pb >= 15)

## [1] 0.04436229

1c)
mean(flint$Cu[flint$Region == "North"])

## [1] 44.6424

1d)
mean(flint$Cu[flint$Pb >= 15])

## [1] 305.8333

1e)
mean(flint$Pb)

## [1] 3.383272

mean(flint$Cu)

file:///Users/vianna/Desktop/Stats10/Lab2.html 1/16
2/5/25, 11:34 PM Lab2

## [1] 54.58102

1f)
boxplot(flint$Pb, main = "Flint lead (Pb) levels")

1g)
# No, I don't believe that this visual representation of the mean is the best. I think a
histogram would do a better representation of the mean.

Exercise 2
2a)
life <-read.table("https://ucla.box.com/shared/static/rqk4lc030pabv30wknx2ft9jy848ub9n.t
xt", header = TRUE)
plot(Life~Income, data = life)

file:///Users/vianna/Desktop/Stats10/Lab2.html 2/16
2/5/25, 11:34 PM Lab2

# Income seems to support the idea that as income increases, there is a higher chance of
longer life expectancy.

2b)
boxplot(life$Income, main = "Income")

file:///Users/vianna/Desktop/Stats10/Lab2.html 3/16
2/5/25, 11:34 PM Lab2

library(mosaic)

## Registered S3 method overwritten by 'mosaic':

## method from
## fortify.SpatialPolygonsDataFrame ggplot2

##
## The 'mosaic' package masks several functions from core packages in order to add
## additional features. The original behavior of these functions should not be affected
by this.

##
## Attaching package: 'mosaic'

## The following objects are masked from 'package:dplyr':

##
## count, do, tally

## The following object is masked from 'package:Matrix':

##
## mean

file:///Users/vianna/Desktop/Stats10/Lab2.html 4/16
2/5/25, 11:34 PM Lab2

## The following object is masked from 'package:ggplot2':

##
## stat

## The following objects are masked from 'package:stats':

##
## binom.test, cor, cor.test, cov, fivenum, IQR, median, prop.test,
## quantile, sd, t.test, var

## The following objects are masked from 'package:base':

##
## max, mean, min, prod, range, sample, sum

histogram(life$Income, main = "Income")

# There are notable outliers in the box plot image.

2c)

file:///Users/vianna/Desktop/Stats10/Lab2.html 5/16
2/5/25, 11:34 PM Lab2

low_income <- life[life$Income < 1000, ]

high_income <- life[life$Income >= 1000, ]

2d)
plot(Life~Income, data = low_income)

library(mosaic)
cor(Life~Income, data = low_income) # correlation coefficient

## [1] 0.752886

Exercise 3
3a)

file:///Users/vianna/Desktop/Stats10/Lab2.html 6/16
2/5/25, 11:34 PM Lab2

maas <- read.table("https://ucla.box.com/shared/static/tv3cxooyp6y8fh6gb0qj2cxihj8klg1h.

txt", header = TRUE)

summary(maas$lead)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 37.0 72.5 123.0 153.4 207.0 654.0

summary(maas$zinc)

## Min. 1st Qu. Median Mean 3rd Qu. Max.

## 113.0 198.0 326.0 469.7 674.5 1839.0

3b)
histogram(maas$lead)

library(mosaic)
histogram(log(maas$lead))

file:///Users/vianna/Desktop/Stats10/Lab2.html 7/16
2/5/25, 11:34 PM Lab2

3c)
plot(log(lead)~log(zinc), data = maas)

file:///Users/vianna/Desktop/Stats10/Lab2.html 8/16
2/5/25, 11:34 PM Lab2

# There seems to be a pretty strong, positive and linear relationship between the two va
riables.

3d)
mycolors <- c("lightblue", "pink", "purple")
mylevels <- cut(maas$lead, c(0, 150, 400, 10000))
mysize <- 8 #the point size, can be changed to other values
plot(maas$x, maas$y, col=mycolors[as.numeric(mylevels)], pch= mysize)

file:///Users/vianna/Desktop/Stats10/Lab2.html 9/16
2/5/25, 11:34 PM Lab2

Exercise 4
4a)
LA <- read.table("https://ucla.box.com/shared/static/d189x2gn5xfmcic0dmnhj2cw94jwvqpa.tx
t", header=TRUE)

plot(LA$Longitude, LA$Latitude, xlab = "Longitude", ylab = "Latitude", main = "Mapping L

A City Centers")
library(maps)
map("county", "california", add = TRUE)

file:///Users/vianna/Desktop/Stats10/Lab2.html 10/16
2/5/25, 11:34 PM Lab2

4b)
# Relationship: There seems to be a positive correlation between income and schools. As
the income increases, so does the amount of schools. The information seems to be linear,
but it is affected by some outliers.
# 2. second plot
plot(Schools~Income, data = LA[LA$Schools != 0, ], main = "School v. Income")

file:///Users/vianna/Desktop/Stats10/Lab2.html 11/16
2/5/25, 11:34 PM Lab2

Exercise 5
5a)
customer_data <- read.csv("https://ucla.box.com/shared/static/y2y8rcie7mjw2h5t92x9dfcp13
3tc90h.csv")

# Yes there missing data sets. There are 22 total missing- 10 missing from the Age varia
ble, 5 from income, purchase amount is missing 7 variables.

sum(is.na(customer_data))

## [1] 22

colSums(is.na(customer_data))

## cust_id age gender income education

## 0 10 0 5 0
## marital_status purchase_amt
## 0 7

file:///Users/vianna/Desktop/Stats10/Lab2.html 12/16
2/5/25, 11:34 PM Lab2

5b)
# Income and purchase_amt should be changed to numeric values since there may be instanc
es of precision in the amounts, such as a purchase amount being 164.50 etc.

customer_data$income <- as.numeric(customer_data$income)

customer_data$purchase_amt <- as.numeric(customer_data$purchase_amt)

class(customer_data$cust_id)

## [1] "character"

class(customer_data$age)

## [1] "integer"

class(customer_data$gender)

## [1] "character"

class(customer_data$income)

## [1] "numeric"

class(customer_data$education)

## [1] "character"

class(customer_data$marital_status)

## [1] "character"

class(customer_data$purchase_amt)

## [1] "numeric"

5c)

file:///Users/vianna/Desktop/Stats10/Lab2.html 13/16
2/5/25, 11:34 PM Lab2

# I produced box plots to help me visualize the data for Customer income, purchase amoun
t, and even age. I did not note any outliers with this visualization.

boxplot(customer_data$income, main = "Customer Income")

boxplot(customer_data$purchase_amt, main = "Purchase Amount")

file:///Users/vianna/Desktop/Stats10/Lab2.html 14/16
2/5/25, 11:34 PM Lab2

boxplot(customer_data$age, main = "Age")

file:///Users/vianna/Desktop/Stats10/Lab2.html 15/16
2/5/25, 11:34 PM Lab2

file:///Users/vianna/Desktop/Stats10/Lab2.html 16/16
Part II

You may choose to type or write your answers electronically or scan your handwritten
solutions. Please ensure that you show all steps and explanations to receive full credit,
unless otherwise instructed.

Exercise 1
A study was done random sample of 900 college students. The researcher wants to find out if
gender would affect people’s body image. The two-way table below summarizes the two
variables.

Body Image
Two-way table About
Overweight Underweight Total
right
Gender

Female 310 130 30 470

Male 290 68 72 430

Total 600 206 102 900

a. In general, are students happy with their body weight? (Hint: Students that are happy with
their body weight responded "about right.")
yes, in general around 66% of students that responded to feel somewhat happy with their
weight.
b. If the researcher wants to compare the differences in body image between females and males.
What graph would best visualize the data for this purpose? Explain. (No need to draw the
actually plot)
if the researcher wants to compare these categorical values with each other, the best chart to
use would be a grouped bar chart. With this visual representation, we would get a better
understanding of the differences between each category but also both groups of male and
female.
c. Are female students more likely to feel they are about right than male students? Explain with
numerical evidence.
No, male students are a bit more likely to
feel about right with their weight. Females
are around 65.9%, whereas males are 67.4%.
d. For students who do not feel ‘about right’ with their body image, are there any differences
between the two gender groups? (Hint: are they more likely to feel there are overweight or
underweight? Do female students and male students feel the same way?)
In the terms of female participants, feeling overweight is much more common than feeling
underweight. Females feeling overweight was around 81.2%. The difference is much more
drastic than the males, as the males are almost evenly split. However, males are more likely
to feel that they are underweight.

Exercise 2
For each of the scatterplots shown, provide a written description that includes the direction, form,
and strength of the relationship, along with any outliers that do not fit the general trend. In
addition, explain what these characteristics mean in the context of the data.

a. Data on 50 states taken from the U.S. Census shows how the median family income is
related to the population (25 years or older) with a college degree or higher.

h. Direction: positive
i. Form: linear
j. strength of relationship: moderate
k. outliers: Yes (29, 61?)
l. Explain characteristics
i This graph depicts that people with a bachelors tend to have higher median
family incomes. This makes sense because having a degree or educational
background does lead to a higher paying job opportunity.

b. Consider the relationship between the average amount of fuel used (in liters) to drive a fixed
distance in a car (100 km), and the speed at which the car is driven (in km per hour). c.
a. Direction: not negative or positive
b. Form: nonlinear (U-shape)
c. strength of relationship: strong
d. outliers: 1 potential outlier
e. Explain characteristics
a. This scatter plot represents the relationship between a car’s speed and the fuel
consumption of that car. At low speeds, the engine leads to high fuel
consumption. At moderate speeds efficiency is maximized. At high speeds,
fuel consumption increases again.
Exercise 3
A researcher collected data on the median starting salaries and the median mid-career salaries for
graduates at a selection of colleges. (Source: The Wall Street Journal, Salary increase by salary
type, https://www.wsj.com/public/resources/documents/info-
Salaries_for_Colleges_by_Typesort.html). The data points and the fitted least squares regression
line are displayed in the graph below.

a. What is the explanatory variable and response variable?

a Explanatory variable: start median salary
b Response variable: mic-career median salary

b. And why do you think the median salary is used instead of the mean?
a The median salary is used because it is the least susceptible to outliers, whereas the
mean is sensitive to outliers in data.

c. Can the median mid-career salary be estimated given a median starting salary of 60 (in
thousands of dollars)? Please explain why or why not, and show your calculation and
explanation if possible.
Yes because $60,000 is observed within the scatter plot and the data, and this estimate seems
somewhat reliable.
d. Can the median mid-career salary be estimated given a median starting salary of 100 (in
thousands of dollars)? Please explain why or why not, and show your calculation and
explanation if possible.
This estimate may not be reliable because 100,000 is outside of the observed data and the
prediction may not stand true.

Exercise 4
Assume that the relationship between the calories in a five-ounce serving and the % alcohol
content for a sample of wines is linear. Use the % alcohol as the explanatory variable, and fit a
least squares regression line.

a. Calculate slope and intercept of the regression line.

Slope: 18.97, Intercept: -67.67
b. Report the equation of the regression line and interpret it in the context of the problem.
Ŷ = -67.67 + 18.98x
This little suggests that for each increase in alcohol, the calorie increase by around 18.98
calories. And the intercept shows that when the alcohol content is 0, there is -67.67 calories.

c. Find and interpret the value of the coefficient of determination.

r ^ 2 = 0.9025
This coefficient of determination suggests that 90.25% of the variation in calories is there
attributed to alcohol.
d. Suppose a new point was added to your data: a wine that is 20% alcohol that contains 0
calories. How will that affect the value of r and the slope of the regression line? (No
calculation needed)
This new point would affect the correlation coefficient because it will be an outlier to the trend
that is consistent with the data.
Data table (Source:healthalicious.com)
Calories % alcohol
122 10.6
119 10.1
121 10.1
123 8.8
129 11.1
236 15.2

Table of summary statistics

Calories % alcohol
Mean 141.67 11.03
Std. Dev. 46.34 2.32
r 0.95

Exercise 5
A doctor who believes strongly that antidepressants work better than "talk therapy" tests
depressed patients by treating half of them with antidepressants and the other half with talk
therapy. The doctor recruited 100 patients for the study. After six months’ treatment, the patients
will be evaluated on a scale of 1 to 5, with 5 indicating the greatest improvement. The doctor is
designing the study plan.

a. The doctor wants to put the most severe patients in the antidepressants group because he is
concerned about those patients’ conditions. Will this affect his ability to compare the
effectiveness of the antidepressants and the “talk therapy”? Explain.
Yes this will overall invalidate the comparisons that the researcher makes. There will be
hints of bias and the doctor must instead randomly assign patients to the two groups to ensure
that there is a fair comparison between the two.
b. The doctor asks you whether it is acceptable for him to know which treatment each patient
receives. Explain why this practice may affect his ability to compare the two groups.
No, it would not be acceptable for the experimenter to know which treatment group the
patients are assigned in. This will ultimately lead to experimenter bias, and a study is best
conducted when it is double-blind.
c. What improvements to the plan would you recommend?
Overall, I would highly suggest the use of random assignment to each group. Allowing
random assignment to eliminate the possibility of confounding variables such as the severity
of the patients depression. I will also ensure that there is a double-blind approach, in which
the experimenter nor the patient knows what they are doing. This would further eliminate
any bias that may come from the experimenter or the participant period. I would possibly also
suggest having a larger group to conduct the study on.

Wiring Harness Questions
100% (4)
Wiring Harness Questions
3 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
155 pages
Artificial Intelligent Systems For Vehicle Classification A Survey
No ratings yet
Artificial Intelligent Systems For Vehicle Classification A Survey
20 pages
Math1041 Study Notes For UNSW
No ratings yet
Math1041 Study Notes For UNSW
16 pages
Merchant Center Intro PDF
0% (1)
Merchant Center Intro PDF
2 pages
WEEK 3 Activity - Assignment 1
No ratings yet
WEEK 3 Activity - Assignment 1
5 pages
R For Data Exploration
No ratings yet
R For Data Exploration
52 pages
Unit 1 Assignment SKELETON R spr18
No ratings yet
Unit 1 Assignment SKELETON R spr18
23 pages
RStudio
No ratings yet
RStudio
4 pages
STATS 10 Assignment 1
No ratings yet
STATS 10 Assignment 1
7 pages
Experiment 2
No ratings yet
Experiment 2
7 pages
List of Important AP Statistics Concepts To Know
No ratings yet
List of Important AP Statistics Concepts To Know
9 pages
Basics of Data Analysis and Graphics In
No ratings yet
Basics of Data Analysis and Graphics In
103 pages
Experiment Lab-II
No ratings yet
Experiment Lab-II
9 pages
Econ 2b03 Assignment 1
No ratings yet
Econ 2b03 Assignment 1
8 pages
R code
No ratings yet
R code
9 pages
IntroR 2
No ratings yet
IntroR 2
18 pages
R Practicals
No ratings yet
R Practicals
32 pages
Experiment Lab-II
No ratings yet
Experiment Lab-II
9 pages
Workshop Activity: X Seq y Length
No ratings yet
Workshop Activity: X Seq y Length
3 pages
Hmw 09
No ratings yet
Hmw 09
1 page
2016 04 27 Cmpe 140 Computing Econ 09 Graphics Continued
No ratings yet
2016 04 27 Cmpe 140 Computing Econ 09 Graphics Continued
28 pages
BDA MSC It
No ratings yet
BDA MSC It
35 pages
Modelling With R
No ratings yet
Modelling With R
3 pages
Commands for Data Analysis using R
No ratings yet
Commands for Data Analysis using R
11 pages
03 UnderstandData
No ratings yet
03 UnderstandData
29 pages
Tugas8 Probabilitas - AdelyaNatasya - 09011281924055
No ratings yet
Tugas8 Probabilitas - AdelyaNatasya - 09011281924055
8 pages
Chapter2-ESTA3042 2020S2
No ratings yet
Chapter2-ESTA3042 2020S2
80 pages
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
No ratings yet
FIT3152 Data Analytics. Tutorial 01: Introduction To R. Review of Basic Statistics
4 pages
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
No ratings yet
BM-1, Applied Statistics, Lesson 2: Comparing Two Groups (And One Group)
39 pages
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
No ratings yet
Unit - 2: Data Manipulation With R & Data Visualization in Watson Studio
58 pages
BIOS 521 HW3 Solutions
No ratings yet
BIOS 521 HW3 Solutions
4 pages
Computer Lab 1 MM
No ratings yet
Computer Lab 1 MM
26 pages
Midterm_Project_Group_6
No ratings yet
Midterm_Project_Group_6
41 pages
F24_Lab-01 (1)
No ratings yet
F24_Lab-01 (1)
4 pages
HWK2_324_SS
No ratings yet
HWK2_324_SS
7 pages
Lab Report For APSC 254
No ratings yet
Lab Report For APSC 254
6 pages
Lab - 10
No ratings yet
Lab - 10
13 pages
HWK1_324_SS
No ratings yet
HWK1_324_SS
7 pages
Introduction To Modern Statistics 2nd Edition 2nd Edition pdf download
No ratings yet
Introduction To Modern Statistics 2nd Edition 2nd Edition pdf download
78 pages
STAT-2450 Assignment 1: Name:, Student ID: B00
No ratings yet
STAT-2450 Assignment 1: Name:, Student ID: B00
9 pages
Homework 10: R Markdown
No ratings yet
Homework 10: R Markdown
39 pages
QM2 Tutorial 3
No ratings yet
QM2 Tutorial 3
26 pages
Mock Exam - Appendix
No ratings yet
Mock Exam - Appendix
15 pages
Visual Statistics Use R
No ratings yet
Visual Statistics Use R
451 pages
Assignment 1 s4b
No ratings yet
Assignment 1 s4b
9 pages
Lab Manual _DSR
No ratings yet
Lab Manual _DSR
32 pages
Parta PDF
No ratings yet
Parta PDF
153 pages
R_record-1
No ratings yet
R_record-1
57 pages
R Notes For Data Analysis and Statistical Inference
No ratings yet
R Notes For Data Analysis and Statistical Inference
10 pages
Chapter 02 Exploratory Data Analysis
No ratings yet
Chapter 02 Exploratory Data Analysis
38 pages
Final review Packet
No ratings yet
Final review Packet
21 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
156 pages
Analysis Using Statistical: Introduction & Data Exploration
No ratings yet
Analysis Using Statistical: Introduction & Data Exploration
23 pages
s05 Solution
No ratings yet
s05 Solution
15 pages
GEA1000 Lecture Notes
No ratings yet
GEA1000 Lecture Notes
157 pages
IPS7e_LecturePPT_ch02
No ratings yet
IPS7e_LecturePPT_ch02
105 pages
Descriptive Statistics in R
No ratings yet
Descriptive Statistics in R
46 pages
Machine Learning Project
67% (3)
Machine Learning Project
30 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
22 pages
SEE5211 Chapter3-P2017
No ratings yet
SEE5211 Chapter3-P2017
58 pages
Imstat
No ratings yet
Imstat
549 pages
Basic DBA Query v.1: Oracle Database
From Everand
Basic DBA Query v.1: Oracle Database
Oraclesql-plsql
5/5 (1)
1a. Autoevaluación - Exercises
No ratings yet
1a. Autoevaluación - Exercises
2 pages
Consolidation of Financial Statements Review Qns
No ratings yet
Consolidation of Financial Statements Review Qns
18 pages
Cool Needle Felting For Kids. A F
100% (5)
Cool Needle Felting For Kids. A F
34 pages
American Standard
No ratings yet
American Standard
62 pages
Brazing - Fundamentals - 1
No ratings yet
Brazing - Fundamentals - 1
7 pages
DP10 Visa
No ratings yet
DP10 Visa
5 pages
NCERT Class 12 English The Rattrap
No ratings yet
NCERT Class 12 English The Rattrap
14 pages
Is It Worth Dropping One More Year For GATE - Quora
No ratings yet
Is It Worth Dropping One More Year For GATE - Quora
6 pages
ENVIRONMENTAL
No ratings yet
ENVIRONMENTAL
17 pages
BECE 2025 — FREQUENTLY TESTED TOPICS ACROSS SUBJECTS
100% (2)
BECE 2025 — FREQUENTLY TESTED TOPICS ACROSS SUBJECTS
3 pages
App of CFC Logic in Intel Devices
No ratings yet
App of CFC Logic in Intel Devices
14 pages
CF1 Group 6 Assignment 1
No ratings yet
CF1 Group 6 Assignment 1
5 pages
Bai Tap Tieng Anh 11 Global Success Unit 3 Ban Hs
No ratings yet
Bai Tap Tieng Anh 11 Global Success Unit 3 Ban Hs
39 pages
Santa Monica Studios
No ratings yet
Santa Monica Studios
7 pages
Advanced 7 American Headways 5 Final Exam Unit 4
No ratings yet
Advanced 7 American Headways 5 Final Exam Unit 4
5 pages
FOGRA41 MW3 Subset
No ratings yet
FOGRA41 MW3 Subset
2 pages
Subject: CB Faculty: Prof. Kanwal Kapil Roll No: 19P104
No ratings yet
Subject: CB Faculty: Prof. Kanwal Kapil Roll No: 19P104
2 pages
Titelblatt Dissertation Uni Wien
100% (2)
Titelblatt Dissertation Uni Wien
5 pages
87th Academy Awards
No ratings yet
87th Academy Awards
11 pages
BSAIS 3B MODULE 2 Unit 1 To 4 LUMAPAY ROSALIE Q. Cost Accounting and Control
No ratings yet
BSAIS 3B MODULE 2 Unit 1 To 4 LUMAPAY ROSALIE Q. Cost Accounting and Control
26 pages
GP_Engine_Instruction_Manual
No ratings yet
GP_Engine_Instruction_Manual
13 pages
Shristi Means That Is Evolved or Produced
No ratings yet
Shristi Means That Is Evolved or Produced
7 pages
313A Manual
No ratings yet
313A Manual
70 pages
Pengaruh Kemampuan Numerasi Dalam Menyelesaikan Masalah Matematika Terhadap Prestasi Belajar Mahasiswa Pendidikan Matematika
No ratings yet
Pengaruh Kemampuan Numerasi Dalam Menyelesaikan Masalah Matematika Terhadap Prestasi Belajar Mahasiswa Pendidikan Matematika
10 pages
1987 Organic Chemistry Third Edition (Fessenden Ralph Fessenden j )
No ratings yet
1987 Organic Chemistry Third Edition (Fessenden Ralph Fessenden j )
1 page
Smartgen 4020
No ratings yet
Smartgen 4020
30 pages
The Instability Severity Index Score
No ratings yet
The Instability Severity Index Score
8 pages