0% found this document useful (0 votes)
43 views16 pages

Business Analytics-1: STR (Crew - Data)

The document describes analyzing a dataset containing employee data. It lists the categorical and numeric variables, describes the numeric variable salary using descriptive statistics like mean, median, standard deviation and variance. It also counts the number of groups in the categorical variable "Job code" and enumerates functions used to analyze both the categorical variable "Job code" and numeric variable "salary". Functions like count, group_by, summarise, mean and table are used.

Uploaded by

Nikhil Malhotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views16 pages

Business Analytics-1: STR (Crew - Data)

The document describes analyzing a dataset containing employee data. It lists the categorical and numeric variables, describes the numeric variable salary using descriptive statistics like mean, median, standard deviation and variance. It also counts the number of groups in the categorical variable "Job code" and enumerates functions used to analyze both the categorical variable "Job code" and numeric variable "salary". Functions like count, group_by, summarise, mean and table are used.

Uploaded by

Nikhil Malhotra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 16

Business Analytics- 1

1) List the categorical and numeric variables of the data set


ANS.
A. Categorical variables:
1. Hire date
2. Lastname
3. firstname
4. Location
5. Phone
6. EmpId
7. Job.code
B. Numeric variable
1. Salary
Output:
str(Crew.data)
'data.frame': 69 obs. of 8 variables:
$ Hire.date: Factor w/ 69 levels "1-Jul-87","1-Mar-90",..: 35 50 3 16 27 36 62
60 24 17 ...
$ Lastname : Factor w/ 69 levels "BEAUMONT","BERGAMASCO",..: 21 35 69 19
41 18 42 64 67 9 ...
$ Firstname: Factor w/ 69 levels "ANITA M.","ANNETTE M.",..: 30 29 24 58 54
26 68 39 59 37 ...
$ Location : Factor w/ 3 levels "CARY","FRANKFURT",..: 1 2 3 1 3 2 3 2 2 3 ...
$ Phne : int 1168 2164 1565 1157 2360 1595 2366 1197 1553 1369 ...
$ EmpId : Factor w/ 69 levels "E00034","E00084",..: 53 36 49 46 31 4 25 29
41 18 ...
$ Job.code : Factor w/ 6 levels "FLTAT1","FLTAT2",..: 1 1 1 1 1 1 1 1 1 1 ...
$ Salary : int 21000 22000 22000 23000 24000 25000 25000 26000 27000
28000 ...
2) Describe the numeric variable using descriptive technique
Ans:

a) Summary
Output:
summary(Crew.data$Salary)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21000 33000 42000 52145 73000 112000

b) Mean
Output:
>mean(Crew.data$Salary)
[1] 52144.93

c) Median
Output:
>median(Crew.data$Salary)
[1] 42000

d) Standard Deviation
Output:
>sd(Crew.data$Salary)
[1] 25521.78

e) Variance
Output:
>var(Crew.data$Salary)
[1] 651361040

3) How many groups are containing in the variable “Job code”

Ans: There are 6 categories

O/p 1: Using dplyr function

>Crew.data%>%count(Job.code)
# A tibble: 6 x 2
Job.code n
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

O/p 2: Using group_by function:

# A tibble: 6 x 2
Job.code count
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

4) Enumerate all functions explained in the video for “Job code”

Ans:
a) Count function
Output:
>Crew.data%>%count(Job.code)
A tibble: 6 x 2
Job.code `mean(Salary)`
<fct><dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875
b) Group by function:

O/p

>Crew.data%>%group_by(Job.code)%>%summarise(count=n())
# A tibble: 6 x 2
Job.code count
<fct><int>
1 FLTAT1 14
2 FLTAT2 18
3 FLTAT3 12
4 PILOT1 8
5 PILOT2 9
6 PILOT3 8

c) Table function: 
 
Output
>table(Crew.data$Job.code)

FLTAT1 FLTAT2 FLTAT3 PILOT1 PILOT2 PILOT3


14 18 12 8 9 8

5) Enumerate all functions explained in the video for “salary”


Ans:
a) Mean Salary:
Output:

mean(Crew.data$Salary)
[1] 52144.93
b) Standard Deviation in Salary:

Output:

sd(Crew.data$Salary)
[1] 25521.78

c) Variance

Output:

>var(Crew.data$Salary)
[1] 651361040

d) Summary

Output:

summary(Crew.data$Salary)
Min. 1st Qu. Median Mean 3rd Qu. Max.
21000 33000 42000 52145 73000 112000

e) Median Salary:

Output:

median(Crew.data$Salary)
[1] 42000

f) Jobcode Category-wise Salary

Output:

Crew.data%>%group_by(Job.code)%>%summarise(mean(Salary))
# A tibble: 6 x 2
Job.code `mean(Salary)`
<fct><dbl>
1 FLTAT1 25643.
2 FLTAT2 35111.
3 FLTAT3 44250
4 PILOT1 69500
5 PILOT2 80111.
6 PILOT3 99875

Question 2:

1) Enumerate all functions explained in the video for all categorical and
numerical variables of the data set.
Ans:

Although it shows all as numeric variables, here 5 are categorical variables.


Categorical variables: cyl,vs, am, gear and carb.- (5)
Numeric Variables: mpg,disp, hp, drat, wt and qsec – (6)

Output:
str(mtcars)
'data.frame': 32 obs. of 11 variables:
$ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
$ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
$ disp: num 160 160 108 258 360 ...
$ hp : num 110 110 93 110 175 105 245 62 95 123 ...
$ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
$ wt : num 2.62 2.88 2.32 3.21 3.44 ...
$ qsec: num 16.5 17 18.6 19.4 17 ...
$ vs : num 0 0 1 1 0 1 0 1 1 1 ...
$ am : num 1 1 1 0 0 0 0 0 0 0 ...
$ gear: num 4 4 4 3 3 3 3 4 4 4 ...
$ carb: num 4 4 1 1 2 1 4 2 2 4 ...

Numeric Variables:
1) mpg:
Mean:
>mean(mtcars$mpg)
[1] 20.09062

Median:
>median(mtcars$mpg)
[1] 19.2

Standard deviation:
>sd(mtcars$mpg)
[1] 6.026948
Variance:
>var(mtcars$mpg)
[1] 36.3241

Summary:
>summary(mtcars$mpg)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90

2) disp

Mean:
>mean(mtcars$disp)
[1] 230.7219

Median:
>median(mtcars$disp)
[1] 196.3

Standard deviation:
>sd(mtcars$disp)
[1] 123.9387

Variance:
>var(mtcars$disp)
[1] 15360.
Summary:
>summary(mtcars$disp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
71.1 120.8 196.3 230.7 326.0 472.0

3) hp:

Mean:
>mean(mtcars$hp)
[1] 146.6875

Median:
>median(mtcars$hp)
[1] 123

Standard deviation:
>sd(mtcars$hp)
[1] 68.56287

Variance:
>var(mtcars$hp)
[1] 4700.867

Summary:
>summary(mtcars$hp)
Min. 1st Qu. Median Mean 3rd Qu. Max.
52.0 96.5 123.0 146.7 180.0 335.0

4) drat

Mean:
>mean(mtcars$drat)
[1] 3.596563

Median:
>median(mtcars$drat)
[1] 3.695

Standard deviation:
>sd(mtcars$drat)
[1] 0.5346787

Variance:
>var(mtcars$drat)
[1] 0.285881

Summary:
>summary(mtcars$drat)
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.760 3.080 3.695 3.597 3.920 4.930

5) wt

Mean:
>mean(mtcars$wt)
[1] 3.21725

Median:
>median(mtcars$wt)
[1] 3.32

Standard deviation:
>sd(mtcars$wt)
[1] 0.9784574

Variance:
>var(mtcars$wt)
[1] 0.957379

Summary:
>summary(mtcars$wt)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.513 2.581 3.325 3.217 3.610 5.424
6) qsec
Mean:
>mean(mtcars$qsec)
[1] 17.84875

Median:
>median(mtcars$qsec)
[1] 17.71

Standard deviation:
>sd(mtcars$qsec)
[1] 1.786943

Variance:
>var(mtcars$qsec)
[1] 3.193166

Summary:
>summary(mtcars$qsec)
Min. 1st Qu. Median Mean 3rd Qu. Max.
14.50 16.89 17.71 17.85 18.90 22.90

Categorical Variables:
1) cyl:
a) using dplyr package
>mtcars%>%count(cyl)
# A tibble: 3 x 2
cyl n
<dbl><int>
1 4 11
2 6 7
3 8 14

b) using group_by function


>mtcars%>%group_by(cyl)%>%summarise(count=n())
# A tibble: 3 x 2
cyl count
<dbl><int>
1 4 11
2 6 7
3 8 14

2) vs:
a) using dplyr package:
>mtcars%>%count(vs)
# A tibble: 2 x 2
vs n
<dbl><int>
1 0 18
2 1 14

b) using group_by function

>mtcars%>%group_by(vs)%>%summarise(count=n())
# A tibble: 2 x 2
vs count
<dbl><int>
1 0 18
2 1 14

3) am:
a) using dplyr package:
>mtcars%>%count(am)
# A tibble: 2 x 2
am n
<dbl><int>
1 0 19
2 1 13

b) using group_by function


>mtcars%>%group_by(am)%>%summarise(count=n())
# A tibble: 2 x 2
am count
<dbl><int>
1 0 19
2 1 13

4) gear:
a) using dplyr package:
>mtcars%>%count(gear)
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5

b) using group_by function


>mtcars%>%group_by(gear)%>%summarise(count=n())
# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
5) carb:
a) using dplyr package:
>mtcars%>%count(carb)
# A tibble: 6 x 2
carb n
<dbl><int>
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1

b) using group_by function


>mtcars%>%group_by(carb)%>%summarise(count=n())
# A tibble: 6 x 2
carb count
<dbl><int>
1 1 7
2 2 10
3 3 3
4 4 10
5 6 1
6 8 1

2. Prepare a data frame for at least two categorical variables and find the
mean salary of those groups.
Ans:
Numeric Variables:

I. Finding the mean mpg of the cars with different gears.


a) using count function:
>mtcars%>%count(gear)
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5

b) using group by function:


>mtcars%>%group_by(gear)%>%summarise(count=n())
# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
Mean mpg of different geared car:
>mtcars%>%group_by(gear)%>%summarise(mean(mpg))
# A tibble: 3 x 2
gear `mean(mpg)`
<dbl><dbl>
1 3 16.1
2 4 24.5
3 5 21.4

II. Finding average horsepower generated by different geared cars

a) using count function:


>mtcars%>%count(gear)
# A tibble: 3 x 2
gear n
<dbl><int>
1 3 15
2 4 12
3 5 5

c) using group by function:


>mtcars%>%group_by(gear)%>%summarise(count=n())
# A tibble: 3 x 2
gear count
<dbl><int>
1 3 15
2 4 12
3 5 5
Mean hp of different geared car:

>mtcars%>%group_by(gear)%>%summarise(mean(hp))
# A tibble: 3 x 2
gear `mean(hp)`
<dbl><dbl>
1 3 176.
2 4 89.5
3 5 196.

Categorical Variables:
1) For Cyl:
Steps:
table(mtcars$cyl)
mtcarst=table(mtcars$cyl)
class(mtcarst)
mtcarsf=as.data.frame(mtcarst)
mtcarsf

Output:
>mtcarsf
Var1 Freq
1 4 11
2 6 7
3 8 14

2) For am:
Steps:
table(mtcars$am)
mtcarst1=table(mtcars$am)
class(mtcarst1)
mtcarsf1=as.data.frame(mtcarst1)
mtcarsf1

Output:
>mtcarsf1
Var1 Freq
1 0 19
2 1 13

You might also like