0% found this document useful (0 votes)

50 views21 pages

#1660908-Data Management and Statistical Computing

This document summarizes a student's DMC assignment 2. It includes questions analyzing diabetes data using R and Stata. Key analyses include creating new variables, data visualizations like scatter plots and box plots, and exploring distributions of variables related to meningitis data. Statistical tests examine relationships between variables and differences between groups.

Uploaded by

RONALD MUSUNGU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views21 pages

#1660908-Data Management and Statistical Computing

Uploaded by

RONALD MUSUNGU

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 21

DMC Assignment 2 1

DMC ASSIGNMENT 2

By Student Name

Course

Tutor

University

City and State

Date
DMC Assignment 2 2

DMC Assignment 2

Question 1 (R)

Question 1

> library(readr)
Warning message:
package ‘readr’ was built under R version 3.6.3
> Diabetes <- read_csv("Diabetes.csv")
Parsed with column specification:
cols(
id = col_double(),
location = col_character(),
staff = col_character(),
age = col_double(),
gender = col_character(),
height = col_double(),
weight = col_double(),
waist = col_double(),
hip = col_double(),
bp1s = col_double(),
bp1d = col_double(),
bp2s = col_double(),
bp2d = col_double(),
glyhb = col_double()
)
>
> attach(Diabetes)
> location=factor(location)
> gender=factor(gender)
> staff=factor(staff)

Question 2

a) Creating variable diabetes

> diabetes<-ifelse(glyhb>7, "diabetic", "nondiabetic")

b) Creating bar chart

> Diabetes[age<30,"age_group"]<-"Under 30"

> Diabetes[age>=30&age<40, "age_group"]<-"30 to <40"
> Diabetes[age>=40&age<50, "age_group"]<-"40 to <50"
> Diabetes[age>=50&age<60, "age_group"]<-"50 to <60"
> Diabetes[age>=60, "age_group"]<-"60 and over"

> library(ggplot2)
> tbl<-with(Diabetes, table(age_group, diabetes))
> ggplot(as.data.frame(tbl), aes(factor(age_group), Freq, fill=diabetes))+
scale_x_discrete("age_group", labels=c("1"="Under 30", "2"="30 to <40",
"3"="40 to <50", "4"="50 to <60", "5"="60 and above"))+
geom_col(position='dodge')+
ggtitle("Diabetes Status by Age Group")+
ylab('Number of People')
DMC Assignment 2 3

Question 3

> boxplot(weight, main="Boxplot for Weight")

> boxplot(height, main="Boxplot for Height")
> boxplot(waist, main="Boxplot for Waist Circumference")
> boxplot(hip, main="Boxplot for Hip Circumference")

Boxplot for Weight

Boxplot for Height

DMC Assignment 2 4

Boxplot for Waist

Boxplot for Hip

DMC Assignment 2 5

Checking for patterns of data entry errors

> which(is.na(weight))
[1] 162
> which(is.na(height))
[1] 64 87 196 232 318
> which(is.na(waist))
[1] 337 394
> which(is.na(hip))
[1] 337 394

Setting NA values to 0

> Diabetes[is.na(Diabetes)]=0

Question 4

> waist_hip_ratio<-waist/hip
> scatterplot(glyhb~waist_hip_ratio|gender, data = Diabetes,
main="Scatterplot")
DMC Assignment 2 6

The appropriate graph to determine the relationship between waist-hip ratio and diabetes is a

scatter plot. This is because both data are continuous and can be compared by plotting their data

points (Sarikaya and Gleicher, 2017, 402). Also, the two variables can be used to estimate each

other (Schmidt and Finan, 2018, 160).

Question 5

i. Create variables for systolic and diastolic blood pressure

ii. Create a new variable map

DMC Assignment 2 7

iii. Scatterplot of MAP by age, where males and females are on the same graph

iv. The mean and standard deviation of MAP

v.
DMC Assignment 2 8

Part 2: Stata

Question 1

a) Recoding variables
DMC Assignment 2 9

b) Convert other string variables to numeric variables

DMC Assignment 2 10
DMC Assignment 2 11
DMC Assignment 2 12

Question 2

a) New set of variables for Blood Glucose, CSF Glucose, and CSF Protein

gen bldgluc_2=(bldgluc_1*1/18)

gen csfgluc_2=(csfgluc_1*1/18)

gen csfpr_2=(csfpr_1*1/18)

b) Reason for capturing data in two different units

Capturing data in two different units of measurements is imperative to enable people

from different locations to easily make conversions of the values (Chehregosha et al., 2019,

856). For instance, the U.S. uses the mmol/L scale while Germany uses the mg/dL scale.

Thus, using different measures allows easier communication between countries since the

scales can be converted by either multiplying or dividing the values by a factor of 18

(Zhu, Volkening, and Laffel, 2020, 24).

c) The overall distribution of the variables

Histogram of the Distribution for Blood Glucose

DMC Assignment 2 13

Distribution of Blood Glucose (mmol/L)

150
100
Frequency
50
0

0 5 10 15 20
Blood Glucose (mmol/L)

The distribution for Blood Glucose is positively skewed. This is because the distribution

curve appears to lean towards the left with a long tail extending to the right (Boels et al.,

2019, 12). The distribution is unimodal.

Histogram of the Distribution for CSF Glucose

DMC Assignment 2 14

Distribution of CSF Glucose (mmol/L)

2 50
2 00
F req ue ncy
1 00 1 50
50
0

0 5 10 15
CSF Glucose (mmol/L)

The distribution for CSF Glucose is right-skewed. This is because most of the data fall to

the right of the distribution’s peak (Ravignani, 2017, 562).

Histogram of the Distribution for Blood Glucose

Distribution of CSF Protein (mmol/L)

250 200
F re q u e n c y
100 150
50
0

0 20 40 60 80
CSF Protein (mmol/L)

The distribution of CSF Protein is positively skewed. The distribution is unimodal since it

has one peak (Ash et al., 2017, 5).

DMC Assignment 2 15

d) Log-transformation of CSF glucose and CSF protein variables and their

distribution

Distribution of Log-transformed CSF Glucose (mmol/L)

100
80
Frequency
40 20
0 60

-6 -4 -2 0 2
log(CSF Glucose)

Log transformation is used to transform highly skewed data to ensure the distribution

conforms to normality (Curran-Everett, 2018, 344). From the previous histogram

distribution of CSF Glucose, the distribution was not normal. After performing a log

transformation on the data, the normal bell curve illustrates that the data for log(CSF

Glucose) approximately follows normality as shown in the figure above (Asar, Ilk and

Dag, 2017, 93).

DMC Assignment 2 16

Distribution of Log-transformed CSF Protein (mmol/L)

40 30
Frequency
20 10
0

-4 -2 0 2 4
log(CSF Protein)

The distribution of CSF Protein was not normal. A log transformation on the data

increases its normality as illustrated in the histogram above (Templeton and Burney,

2017, 156). The normality of distribution is observed from the symmetrical normal

distribution curve (Mai and Mirarab, 2021, 1156).

Question 3

A cross-tabulation is a way of organizing data in tabular format to display statistical differences

between two variables (Dasheva, Andonov and Doncheva, 2020, 13). Also, box plots illustrate

the distribution of data by indicating possible outliers (Ho et al., 565). The histograms and cross-

tabulations below show the differences in interquartile range and median of CSF Glucose by

grampos and abm.

DMC Assignment 2 17

CSF Glucose by Gram Positive

250
200 150
csfgluc_1
100 50
0

Gram Negative Gram Positive

From the cross-tabulation above, CSF glucose is higher in the Gram Positive group compared to

the Gram negative group (Gogtay and Thatte, 2017, 79). One outlier exists in the Gram Positive

group.

An illustration of the differences in interquartile range and median in CSF glucose by abm is

shown below.
DMC Assignment 2 18

CSF Glucose by Gram Positive

250
200 150
csfgluc_1
100 50
0

Viral Meningitis Bacterial Meningitis missing

As depicted by the crosstabulation and boxplot above, there a higher level of CSF Glucose in the

Viral Meningitis group compared to the Bacterial Meningitis group.

Question 4
DMC Assignment 2 19

1
.8 .6
cumul_prev
.4 .2
0

0 5 10 15
month

There is an increase in the number of infections as demonstrated by the probability distribution

plot above. The plot shows an increase in Viral Meningitis, which tends to increase the cases of

CSF Glucose in the population studied.

DMC Assignment 2 20

References

Asar, Ö., Ilk, O. and Dag, O., 2017. Estimating Box-Cox power transformation parameter via

goodness-of-fit tests. Communications in Statistics-Simulation and Computation, 46(1),

pp.91-105.

Ash, S.Y., Harmouche, R., Vallejo, D.L.L., Villalba, J.A., Ostridge, K., Gunville, R., Come,

C.E., Onieva, J.O., Ross, J.C., Hunninghake, G.M. and El-Chemaly, S.Y., 2017.

Densitometric and local histogram based analysis of computed tomography images in

patients with idiopathic pulmonary fibrosis. Respiratory Research, 18(1), pp.1-11.

Boels, L., Bakker, A., Van Dooren, W. and Drijvers, P. 2019. ‘Conceptual difficulties when

interpreting histograms: A review.’ Educational Research Review, 28(3), pp.1-15.

Chehregosha, H., Khamseh, M.E., Malek, M., Hosseinpanah, F. and Ismail-Beigi, F. 2019. ‘A

view beyond HbA1c: role of continuous glucose monitoring.’ Diabetes Therapy, 10(3),

pp.853-863.

Curran-Everett, D., 2018. Explorations in statistics: the log transformation. Advances in

physiology education, 42(2), pp.343-347.

Dasheva, D., Andonov, H. and Doncheva, L., 2020. Master’s Program High Performance Sport

E-Learning during COVID-19 Pandemic. Педагогика, 92(S7), pp.9-16.

Gogtay, N.J. and Thatte, U.M., 2017. Principles of correlation analysis. Journal of the

Association of Physicians of India, 65(3), pp.78-81.

Ho, J., Tumkaya, T., Aryal, S., Choi, H. and Claridge-Chang, A., 2019. Moving beyond P

values: data analysis with estimation graphics. Nature methods, 16(7), pp.565-566.

Mai, U. and Mirarab, S., 2021. Log Transformation Improves Dating of Phylogenies. Molecular

biology and evolution, 38(3), pp.1151-1167.

DMC Assignment 2 21

Ravignani, A., 2017. Visualizing and interpreting rhythmic patterns using phase space plots.

Music Perception: An Interdisciplinary Journal, 34(5), pp.557-568.

Sarikaya, A. and Gleicher, M., 2017. Scatterplots: Tasks, data, and designs. IEEE transactions

on visualization and computer graphics, 24(1), pp.402-412.

Schmidt, A.F. and Finan, C., 2018. Linear regression and the normality assumption. Journal of

clinical epidemiology, 98, pp.146-151.

Templeton, G.F. and Burney, L.L., 2017. Using a two-step transformation to address non-

normality from a business value of information technology perspective. Journal of

Information Systems, 31(2), pp.149-164.

Zhu, J., Volkening, L.K. and Laffel, L.M. 2020. ‘Distinct patterns of daily glucose variability by

pubertal status in youth with type 1 diabetes.’ Diabetes Care, 43(1), pp.22-28.

Wgu C784 - Applied Healthcare Statistics Pre-Assessment Exam Questions and Correct Answers 2024 Guaranteed A+
No ratings yet
Wgu C784 - Applied Healthcare Statistics Pre-Assessment Exam Questions and Correct Answers 2024 Guaranteed A+
16 pages
MCQs Statistic Master Revision
No ratings yet
MCQs Statistic Master Revision
10 pages
09 KTK - 14 Statistics
No ratings yet
09 KTK - 14 Statistics
36 pages
Phage Ecology: Harald Brüssow and Elizabeth Kutter
No ratings yet
Phage Ecology: Harald Brüssow and Elizabeth Kutter
36 pages
Assignment
No ratings yet
Assignment
4 pages
Topic 1 - W1-3 Introduction To Biostatistics
No ratings yet
Topic 1 - W1-3 Introduction To Biostatistics
52 pages
Geography Revision Booklet
No ratings yet
Geography Revision Booklet
170 pages
Epidemiology MID
100% (1)
Epidemiology MID
5 pages
Biostatistics QUestions and Answers
No ratings yet
Biostatistics QUestions and Answers
7 pages
Biostatistics MCQs محلول
No ratings yet
Biostatistics MCQs محلول
16 pages
Lecture 2 - Descriptive Statistics
No ratings yet
Lecture 2 - Descriptive Statistics
53 pages
1 Introduction To Biostatistics
100% (3)
1 Introduction To Biostatistics
52 pages
Building Technology 1 - Building Materials: Midterm Project
No ratings yet
Building Technology 1 - Building Materials: Midterm Project
68 pages
Tutankhamuns Missing Ribs KMT 18.1 PDF
100% (3)
Tutankhamuns Missing Ribs KMT 18.1 PDF
7 pages
Biostatistics - Epidemiology Slides 2023 (B&B) (Medicalstudyzone - Com)
No ratings yet
Biostatistics - Epidemiology Slides 2023 (B&B) (Medicalstudyzone - Com)
256 pages
Community Medicine
No ratings yet
Community Medicine
145 pages
Statistics 180930091746
No ratings yet
Statistics 180930091746
117 pages
A PCB Dataset For Defects Detection and Classification: Weibo Huang, Peng Wei
No ratings yet
A PCB Dataset For Defects Detection and Classification: Weibo Huang, Peng Wei
9 pages
1e Aldehyde & Ketone
100% (1)
1e Aldehyde & Ketone
48 pages
BRM Answer Key Q Bank by Alam.
No ratings yet
BRM Answer Key Q Bank by Alam.
90 pages
Fast Facts: Treatment-Free Remission in Chronic Myeloid Leukemia: From Concept to Practice and Beyond
From Everand
Fast Facts: Treatment-Free Remission in Chronic Myeloid Leukemia: From Concept to Practice and Beyond
Sandeep Potluri
No ratings yet
Biostatics c1-2
No ratings yet
Biostatics c1-2
81 pages
d6 Babylon 5 RPG
100% (1)
d6 Babylon 5 RPG
25 pages
Information Retrieval 3.edited
0% (1)
Information Retrieval 3.edited
4 pages
Archive - courseprintPDF-Descriptive - Statistics-160816 - 031324
No ratings yet
Archive - courseprintPDF-Descriptive - Statistics-160816 - 031324
47 pages
01 - Introduction
No ratings yet
01 - Introduction
27 pages
FUJITSU SoftwareServerView Suite Remote Management
No ratings yet
FUJITSU SoftwareServerView Suite Remote Management
426 pages
BIOstats
No ratings yet
BIOstats
39 pages
Biostatistics Workbook
No ratings yet
Biostatistics Workbook
32 pages
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
No ratings yet
SPSS Advance Statistics Session 1 RCD DR Muhammad Khan Asif
55 pages
001 Unit 3 Disasters and Triage Management Final
No ratings yet
001 Unit 3 Disasters and Triage Management Final
45 pages
Graph and Stats Intro 25
No ratings yet
Graph and Stats Intro 25
26 pages
Biostatistics Mcqs With Key
No ratings yet
Biostatistics Mcqs With Key
14 pages
Contact Details:: Dr. Joy C. Chavez
No ratings yet
Contact Details:: Dr. Joy C. Chavez
101 pages
Practical Manual 402
No ratings yet
Practical Manual 402
30 pages
Choosing Appropriate Descriptive Statistics, Graphs and Statistical Tests
No ratings yet
Choosing Appropriate Descriptive Statistics, Graphs and Statistical Tests
47 pages
بنك اشترشادي لمادة الاحصاء الحيوي-اسوان
No ratings yet
بنك اشترشادي لمادة الاحصاء الحيوي-اسوان
31 pages
2122 Phy WS - Book3B 4.3 - 2 Ans
No ratings yet
2122 Phy WS - Book3B 4.3 - 2 Ans
2 pages
Application of The Exact Muffin-Tin Orbitals Theory
No ratings yet
Application of The Exact Muffin-Tin Orbitals Theory
30 pages
Introduction To Bio-Statistics
No ratings yet
Introduction To Bio-Statistics
33 pages
1 Introduction To Biostatistics
No ratings yet
1 Introduction To Biostatistics
30 pages
1 Synopsis
No ratings yet
1 Synopsis
25 pages
Stat
No ratings yet
Stat
17 pages
Biostatistics Practicals With Key Answer 2
No ratings yet
Biostatistics Practicals With Key Answer 2
27 pages
Programming For Data Analytics
No ratings yet
Programming For Data Analytics
27 pages
Effects of Green Seaweeds (Ulva SP.) As Feed Supplements in Red Tilapia Diets
No ratings yet
Effects of Green Seaweeds (Ulva SP.) As Feed Supplements in Red Tilapia Diets
16 pages
NUR 301 Biostattistics DL CAT
No ratings yet
NUR 301 Biostattistics DL CAT
8 pages
Questions .Tot
No ratings yet
Questions .Tot
24 pages
ProbList5 24 SLN
No ratings yet
ProbList5 24 SLN
9 pages
IR.20 150 Manual GB120314
No ratings yet
IR.20 150 Manual GB120314
12 pages
Medical Statistics at a Glance Workbook
From Everand
Medical Statistics at a Glance Workbook
Aviva Petrie
No ratings yet
Geometric Design of A Highway Using Autocad Civil 3D: Presenter Name
No ratings yet
Geometric Design of A Highway Using Autocad Civil 3D: Presenter Name
11 pages
Pattern of Its Evolution: Differentiate This Condition From Parosteal
No ratings yet
Pattern of Its Evolution: Differentiate This Condition From Parosteal
14 pages
Business Statistics Assignment
No ratings yet
Business Statistics Assignment
12 pages
Thesis LD
No ratings yet
Thesis LD
4 pages
Practical Statements
No ratings yet
Practical Statements
9 pages
1 Introduction To Biostatistics
No ratings yet
1 Introduction To Biostatistics
52 pages
Khilgaon Flyover
No ratings yet
Khilgaon Flyover
18 pages
Statistical Analysis - Amit Gauri
No ratings yet
Statistical Analysis - Amit Gauri
7 pages
Statistical Analysis Nikita
No ratings yet
Statistical Analysis Nikita
7 pages
2.reference Ranges and Normal Values
No ratings yet
2.reference Ranges and Normal Values
10 pages
CU - P2 - Statistical Analysis - V.Niranjani Devi.
No ratings yet
CU - P2 - Statistical Analysis - V.Niranjani Devi.
7 pages
Medical Statistics-Assignment No (1) - To Be Submitted 12-07-2025
No ratings yet
Medical Statistics-Assignment No (1) - To Be Submitted 12-07-2025
13 pages
Ahmd To Gandhidham PDF
No ratings yet
Ahmd To Gandhidham PDF
2 pages
Most Common Us Mle Diseases
No ratings yet
Most Common Us Mle Diseases
10 pages
Brakes System Activities
No ratings yet
Brakes System Activities
13 pages
Homework M2 Solution
No ratings yet
Homework M2 Solution
9 pages
Glad Tidings of The Kingdom of God Issue 1564
No ratings yet
Glad Tidings of The Kingdom of God Issue 1564
20 pages
Question PSM Day-2 - Test & Discussion - 240309 - 065409
No ratings yet
Question PSM Day-2 - Test & Discussion - 240309 - 065409
12 pages
Supplementary Material ADA
No ratings yet
Supplementary Material ADA
5 pages
Indigenous Environmental Education For Cultural Survival: Leanne Simpson, Trent University, Canada
No ratings yet
Indigenous Environmental Education For Cultural Survival: Leanne Simpson, Trent University, Canada
13 pages
Biostatistics & Epidemiology
No ratings yet
Biostatistics & Epidemiology
19 pages
Elective
No ratings yet
Elective
6 pages
The Area of Quadrilateral ABCD 40: y Students Who Were Not From College C
No ratings yet
The Area of Quadrilateral ABCD 40: y Students Who Were Not From College C
6 pages
Reflective Learning Journal
No ratings yet
Reflective Learning Journal
7 pages
#3157312-Healthcare Products
No ratings yet
#3157312-Healthcare Products
5 pages
LAB 2 Biostat and Epi
No ratings yet
LAB 2 Biostat and Epi
7 pages
DWDM
No ratings yet
DWDM
18 pages
Statistics Workshop With Excel
No ratings yet
Statistics Workshop With Excel
4 pages
Study Question - (Theory+Practical) - Spring'22
No ratings yet
Study Question - (Theory+Practical) - Spring'22
4 pages
Module 2
No ratings yet
Module 2
4 pages
Why The Cross 2
No ratings yet
Why The Cross 2
17 pages
Stats Questions
No ratings yet
Stats Questions
5 pages
Dance 101
No ratings yet
Dance 101
17 pages
Mid Sem Quiz2
No ratings yet
Mid Sem Quiz2
4 pages
Assignment # 01
No ratings yet
Assignment # 01
4 pages
Running Head: 360-DEGREE EVALUATIONS 1
No ratings yet
Running Head: 360-DEGREE EVALUATIONS 1
4 pages
Change Management - Edited
No ratings yet
Change Management - Edited
4 pages
Histogram: Product Weight For A Sample of 40. Target Weight 50.0 Grams
No ratings yet
Histogram: Product Weight For A Sample of 40. Target Weight 50.0 Grams
2 pages
Homework M2 - 9-2-22
No ratings yet
Homework M2 - 9-2-22
5 pages
Davao Doctors College, Inc. Gen. Malvar ST., Davao City
No ratings yet
Davao Doctors College, Inc. Gen. Malvar ST., Davao City
2 pages
JCP KM
No ratings yet
JCP KM
2 pages
Senaraihotellangkawi
No ratings yet
Senaraihotellangkawi
2 pages
Biostatistics Mcqs With Key
97% (29)
Biostatistics Mcqs With Key
13 pages
Mastering O'Level Islamiyat
98% (47)
Mastering O'Level Islamiyat
343 pages