0% found this document useful (0 votes)
315 views

Bivariate Data Project

This experiment examines the relationship between thumb length and the time taken to text a phrase on a phone. Thirty subjects had their thumb length measured and then timed while texting the phrase. Figures 1 and 2 show that thumb length data is approximately symmetrical and normal. Figure 3 shows the time data is right-skewed with a median of 120 seconds. Figure 4 shows the time data is bimodal or multi-modal. The scatter plot reveals a weak positive linear relationship between length and time, but the residuals do not fit a linear model well. Re-expressing the data may improve the linear fit.

Uploaded by

Trang Vu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
315 views

Bivariate Data Project

This experiment examines the relationship between thumb length and the time taken to text a phrase on a phone. Thirty subjects had their thumb length measured and then timed while texting the phrase. Figures 1 and 2 show that thumb length data is approximately symmetrical and normal. Figure 3 shows the time data is right-skewed with a median of 120 seconds. Figure 4 shows the time data is bimodal or multi-modal. The scatter plot reveals a weak positive linear relationship between length and time, but the residuals do not fit a linear model well. Re-expressing the data may improve the linear fit.

Uploaded by

Trang Vu
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 5

Bivariate Data Project: Time VS Length

Figure In this experiment, the length of the


1 thumb and the speed in which it takes to text the
quote “Why does it rain cats and dogs, how do
we get stumped, and why do we want to cut to
the chase?” is applied to see whether or not there
is an association between the two factors. To
6.8 7.0 7.2 7.4
determine this, a ruler was used to calculate the
length in centimeters from the base of the thumb
to the tip on the most dominate texting figure.
After recording the length of the thumb, the
individual being tested was given a phone, which
was used throughout the entire testing of
Histogram

Figure
samples, and was asked to read and then text the
2 phrase provided. The same timer was used and
the data was recorded and expressed. The
process was repeated per person, making total of
6.5 7.0 7.5
30 subjects tested.
length

Box Plot
In figure 1, the boxplot shows relatively
symmetrical intervals between the Q1 to the
median and the median to the Q3. The median
for the given data regarding length is 6.3cm,
which is also the approximate mean. The IQR
Figure which is calculated by subtracting Q1 from Q3,
3 which ended up being 0.8, indicating that the
0 120 140 160 180 200 220
data is approximately normal. The spread of the
ime data, assuming no outliers were present, starts
on 5.2cm and ends on 7.2cm, making 5.2cm the
minimum and 7.2cm the maximum. According to
figure 2, the histogram, which was used to
describe the shape, the data showed a unimodel
and approximately symmetrical shape. The data
expressed from the length of the thumb shows
Length VS Time Histogram that majority of the samples tested had an
7
Figure average length of the thumb, which the amount
6
4 of people gradually decreased when the average
5 length decreased or increased.
4

3 According to this boxplot, as shown in


2 Figure 3, the median appears to be around 120s
1 (120 seconds) with an IQR of 80s, the IQR, as
stated before was calculated. Unlike Figure 1,
0 50 100 150 200 250
Figure 3 did not seem to be symmetrical,
time
intervals between Q1 and median appeared to be
smaller than the interval between the median
and Q3, which indicates that the data is skewed.
When viewing Figure 4, which is the histogram of
Figure 3, the data appeared to be skewed to the
right with a gap at 200s, which is considered to
be an unusual feature. The spread starts at 66.7s
and ends around 208s. Overall, the graph
appears to be bimodal, or perhaps more since the
data fluctuates constantly, so an actual shape
may be difficult to determine.
Length VS Time Scatter Plot Figure
220
200
180
160 The scatterplot, as displayed on the left,
140
120
shows a weak positive linear association
100
80
between the length of the thumb and the time
60
40
it takes to text. There also appears to be a
20 pattern in the residual therefore this would not
0
5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 make an appropriate linear model. The model
length
80
is expressed by the equation: Time= 35 +
40
0
14.7(length), which means, for every one
-40
-80
increase in length the predicted increase of
5.2 5.4 5.6 5.8 6.0 6.2 6.4 6.6 6.8 7.0 7.2 7.4 time is 14.7s. The correlation coefficient for
length
time = 14.7length + 35; r 2 = 0.039 this particular set of data is 0.197, suggesting
little or no correlation between the time it
takes to text to the length of the thumb. Also,
about 3.9% in variation of time can be explained by the linear model for length and time.
The standard deviation of the residual graph is 39.0113. Overall, a linear model is not
appropriate for this data unless re-expressed.
Re-Expression: Alligator Data
Scatter Plot
700
600

Figure
Collection 1 500 Figure
length w eight lnw e ight <new >
400
61 58 28 3.3322 300
7
2 61 44 3.78419 200
3 63 33 3.49651 100
4 68 39 3.66356 0

5 69 36 3.58352 0 20 40 60 80 100 120 140 160


length
6 72 38 3.63759
7 72 61 4.11087 100

8 74 54 3.98898 0
-100
9 74 51 3.93183
0 20 40 60 80 100 120 140 160
10 76 42 3.73767 length
11 78 57 4.04305 w eight = 5.90length - 393; r 2 = 0.84

12 82 80 4.38203
13 85 84 4.43082
14 86 83 4.41884
15 86 80 4.38203
16 86 90 4.49981
17 88 70 4.2485 Scatter Plot

18 89 6.5 84 4.43082
19 90 6.0 106 4.66344
20 90
5.5
102 4.62497
Figure 8
5.0
21 94 110 4.70048
4.5
22 94 130 4.86753
4.0
23 114 3.5 197 5.2832
24 128 3.0 366 5.90263
25 147 640
0 6.46147
20 40 60 80 100 120 140 160
length
0.2
0.0
-0.2

0 20 40 60 80 100 120 140 160


length
lnw eight = 0.0354length + 1.34; r2 = 0.96

Source:

This source of data was used after discovering the original set of data was incapable
of being re-expressed properly.

The scatterplot, as expressed in Figure 7, shows a somewhat linear positive strong


association with a correlation coefficient of 0.917, indicating a strong correlation between
the weight of the alligators and the length. About 84% in variation of weight can be
explained by the linear model for length and weight. Despite the strong correlations, the
data cannot be appropriately expressed by a linear model due to the results of the residual
plot, which shows a somewhat curved pattern. According to the pattern, the data was re-
expressed using ln(weight), or natural log of the y-values, as seen in Figure 8.
After the data was re-expressed, the correlation coefficient changed to 0.978,
indicating an even stronger correlation and the residual plot became more chaotic. Based
on the re-expressed data, a linear model would be most ideal. The new equation is:
ln(weight)= 0.0354(length) + 1.34, which translates to: for every one increase in length,
the predicted weight would generally increase by 0.0354.
If the alligator was 180 inches, the predicted weight would be 2235.01 pounds. The
natural log of weight would be 7.712, which relates to Figure 8.

You might also like