0% found this document useful (0 votes)
20 views4 pages

Assignment - 1: Data Analytics and R

The document discusses analyzing a health record dataset of 30 people using R. It includes importing the Excel dataset, descriptive analyses like calculating ranges, means, modes and other statistics on variables like miles per day, calories burned, and more. It also analyzes distributions by age, gender, state, and vaccination status.

Uploaded by

Lakshita Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views4 pages

Assignment - 1: Data Analytics and R

The document discusses analyzing a health record dataset of 30 people using R. It includes importing the Excel dataset, descriptive analyses like calculating ranges, means, modes and other statistics on variables like miles per day, calories burned, and more. It also analyzes distributions by age, gender, state, and vaccination status.

Uploaded by

Lakshita Saini
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

DATA ANALYTICS AND R

ASSIGNMENT -1
SUBMITTED BY : LAKSHITA SAINI

TABLE
Dataset of Health record of 30 people distributed.

IMPORTING DATA

DATA TYPE - EXCEL SHEET

#import Excel file into R

#Creation of table and importing the data

data <- read_excel('C:\\Users\\Admin\\Desktop\\NIFT\\Year - 3\\Sem 6\\Data


Analytics & R\\HealthRecord.xlsx')
DESCRIPTIVE ANALYSIS OF PROBLEM STATEMENTS

#1 - What is the range (maximum and minimum),mean and median of the


miles covered per day?

To study what is the maximum and minimum miles covered by the people who
participated in the study and to also find out the average miles covered by
them and the central value, representing the situation of the majority of the
group.

summary_mpd = summary(data$miles_per_day)

print(summary_mpd)

r = range(data$miles_per_day)

print(r)

#2 - What is the mode of calories burned?

The mode is calculated to understand around which value ‘calories burned’ are
the most frequent.

install.packages("modeest")

library(modeest)

mode = mfv(data$calories_burned)

print(mode)

#three - What is the variance of workout duration?

Variance is calculated to understand the variability of the data of workout


duration from the mean value of workout duration.

var(data$workout_duration)
#four - What is the standard deviation of health score?

The value of standard deviation will help us to understand the data values of
the health score dispersed around the mean, telling us about the health of the
group.

std = sd(data$health_score)

print(std)

#five - How many people participated under each age?

The count function will help us study the distribution of our group across
various age groups.

library(dplyr)

data %>% count(age)

#six - How much percentage of people are vaccinated and how many are not?

Prop.table function helps us to study a very important factor about the


vaccination status as it contributes majorly to the health status of an
individual.

v1 <- prop.table(table(data$vaccination_status))

print(v1*100)
#seven - What is the distribution of candidates who participated among states?

The count function will help us study the distribution of our group across
various states.

data %>% count(state)

#eight - Who has a better health score, males or females?

The mean score calculation on the basis of the gender which helps us to
understand which age group has a better health status.

install.packages("xlsx")

library("xlsx")

excel_path <- "C:\\Users\\Admin\\Desktop\\NIFT\\Year - 3\\Sem 6\\Data


Analytics & R\\HealthRecord.xlsx"

gender <- read.xlsx(excel_path, sheetName = "Example", colIndex = 4)

score <- read.xlsx(excel_path, sheetName = "Example", colIndex = 8)

df <- data.frame(gender, score)

df %>%

group_by(gender) %>%

summarise_at(vars(health_score), list(mean_score = mean))

You might also like