0% found this document useful (0 votes)
12 views

Exercises

The document contains 6 challenges involving creating and manipulating vectors, matrices, arrays, and data frames in R. Challenge I involves creating vectors for height and weight data and answering questions about averages, variances, standard deviations, and extracting elements. Challenge II involves creating a matrix from a table and answering questions using matrix operations. Challenge III represents a table as an array and answers questions using array operations. Challenge IV creates a list containing risk ratio results from previous challenges. Challenge V creates a data frame from vectors and extracts a row. Challenge VI identifies errors in sample R code.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

Exercises

The document contains 6 challenges involving creating and manipulating vectors, matrices, arrays, and data frames in R. Challenge I involves creating vectors for height and weight data and answering questions about averages, variances, standard deviations, and extracting elements. Challenge II involves creating a matrix from a table and answering questions using matrix operations. Challenge III represents a table as an array and answers questions using array operations. Challenge IV creates a list containing risk ratio results from previous challenges. Challenge V creates a data frame from vectors and extracts a row. Challenge VI identifies errors in sample R code.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Challenges:

Day - 1 (Session - 1)

Challenge - I

Create vectors named height and weight using the following data:
height : 160.3, 134.2, 159, 149, 145, and 147.1
weight : 83.8, 37.2, 71.7, 72.8, 50.5, and 42.9.

i) Based on the above vectors, answer the following questions:

a) The average height is __________

b) The variance of height is __________

c) The SD of height is __________

d) The average weight is __________

e) The variance of weight is __________

f) The SD of weight is __________

ii) Extract the 4th element in weight and height vector

a) 4th element in weight __________

b) 4th element in height __________

iii) Based on the above vector, calculate BMI

a) Calculate BMI __________

b) Extract the 4th element in BMI vector __________

1
Challenge - II

Create a matrix using the following table, and answer the following questions using matrix
operations

a) The total number of smokers __________.

(Hint rowSums(___))

b) The total number of non smokers __________.

(Hint rowSums(___))

A
c) Incidence of CHD among smokers ( A+B )__________.

C
d) Incidence of CHD among non smokers ( C+D )__________.

A/(A+B)
e) Risk ratio of CHD ( C/(C+D) ) __________.

2
Challenge - III

Represent the following table using array, and answer the following using array operations

a) In rural, incidence of CHD among smokers __________.

b) In rural, incidence of CHD among non smokers __________.

c) In rural, what is the risk ratio of CHD __________.

d) In urban, incidence of CHD among smokers __________.

e) In urban, incidence of CHD among non smokers __________.

f) In urban, risk ratio of CHD __________.

3
Challenge - IV

4) Create a list that contains results of overall risk ratio (Challenge II), rural risk ratio
(Challenge IIIc) and urban risk ratio (challenge IIIf)

Challenge - V

5) Write a R program to create a data frame for the following data:

unique_id’s vector C10001, C10002, and C10003;


treatment vector A, B and C;
age vector 29,30 and 28.

Then extract 3rd entire row.

Challenge - VI

Find the error in the following R codes

a) temp <- c(99.4, 102.3; 100.3)

b) Consider mat is a 2X2 matrix. Now, to extract 2nd row 1st column, will this command
mat(2;1) works?

c) hba1c% <- c(16.4, 11.0, 10.3, 12.4)

d) vector <- c(13, 7A, 11, 30)

e) R command to view the last 6 rows of dataframe df is str(df).

4
Day - 1 (Session - 2)

In this hypothetical study, data from 25 individuals have been collected to explore the
relationship between demographic factors, systolic blood pressure, hypertension, and the
effectiveness of two types of drugs, A and B.

Lets work through these questions to undergo the data cleaning process.

1) Import the exercise data from the directory (File name is Exercise_data-Day1.csv)

i) How many variables are there in the datasheet? __________ (Hint ____ %>%
dim())

ii) The datasheet has how many observations? __________

2) Give the variables new names as the following (Hint ___ %>% rename())

i) “Height.in.cms” as height

ii) “Weight.in.kgs” as weight

iii) “Type.of.drug.given” as drugType

3) Give the variables labels as the following (Hint ___ %>% set_variable_labels())

i) height as Height (in Cms)

ii) weight as Weight (in Kgs)

iii) drugType as Type of Drug

5
4) Recode the values of the following variables (Hint ___ %>% recode())

i) Hypertension, Yes=1 and No=0

ii) Gender, Male=1 and Female=2

5) Assign value labels to the following (Hint ___ %>% set_value_labels())

i) In Hypertension 0 as “NO” and 1 as “YES”

ii) In Gender 1 as “Male” and 2 as “Female”

6) How many people participated in the study from urban? __________ (Hint ___
%>% filter( ))

7) How many individuals took drug A? __________ (Hint ___ %>% filter( ))

8) How many individuals took drug B? __________ (Hint ___ %>% filter( )

6
9) Find the duplicates. How many pairs that are the same did you find?__________

(Hint ___ %>% filter(duplicated(-----))

10) Find the missing data for the variable Systolic Blood pressure (mmHg). (Hint
filter(is.na(-----)))

How many missing values were discovered?__________

11) Identify the outliers in Systolic Blood Pressure (mmHg). (Hint use the range
80-160)

How many outliers were found? __________

12) Prepare summary table by drug type for diastolic blood pressure with count, mean
and median, and SD (Hint ___ %>% group_by(___) %>% summarise(___))

i) What is the mean diastolic blood pressure for A __________

ii) What is the median diastolic blood pressure for B __________

7
Day - 2 (Session - 1)

Let us create some data visualizations to understand how drug is effective in treatment
of blood pressure, and see if there are any baseline differences, and differences in outcomes
- hypertension, systolic and diastolic BP.

Import exercise data (Exercise_data-Day2.csv) from the directory.

1. Use the ggplot2 package to plot the bar graph for hypertension response (Univariate
bar graph). Which response has the most frequency? __________

2. Could you add drug type in the bar chart for hypertension? (Bivariate grouped bar
chart). How many people who indicated they had hypertension also took drug A?
__________

(Hint ggplot(aes(x=____,y=____), fill=____))

3. Could you now add the dwelling type to the previous bar graph. In bar graph, to
include the location use facet_wrap() function. What type of distribution does the
graph looks like in large city? __________

(Hint facet_wrap(~____))

4. Draw a density chart for systolic blood pressure (Univariate chart). What type of
distribution does the graph looks like? __________

a) Right skewed-distribution

b) Left skewed-distribution

c) Normal distribution

d) Uniform distribution

8
5. Create a box plot to represent systolic blood pressure by drug type (Bivariate box
plot). What is the median blood pressure for both drug type? __________

6. Using facet_wrap(), add the type of dwelling to the previous graph. Which sort of
dwelling has the highest blood pressure when using drug B? __________

7. Use a scatter chart to plot the graph for systolic and diastolic pressure (Bivariate
graph).

What is the relationship between systolic and diastolic blood pressure? __________

a) No association

b) Positive association

c) Negative association

9
Day - 2 (Session - 2)

Create summary tables for the following conditions. Then, fill in the blanks.

Import exercise data (Exercise_data-Day2.csv) from the directory.

1) Prepare summary statistics for the variables, sex, dwelling, drugType.

Variable n(%)
Gender

- Male __________

- Female __________

Location

- Small city __________

- Large city __________

- Town __________

Drug type

- Type A __________

- Type B __________

10
2) Prepare summary statistics for the following variables by type of drug, sex, dwelling,
hyper. Include statistical tests.

Variable Type A Type B p-value


Gender __________

- Male __________ __________

- Female __________ __________

Location __________

- Small city __________ __________

- Large city __________ __________

- Town __________ __________

Hypertension __________

- Yes __________ __________

- No __________ __________

3) Prepare the summary statistics for the numerical vectors systolic and diastolic blood
pressure by drug type. Include statistical tests.

Variable Type A Type B p-value

Systolic BP __________ __________ __________

Diastolic BP __________ __________ __________

11

You might also like