Business Analytics Assignment: Group No: - 10 Group Members
Business Analytics Assignment: Group No: - 10 Group Members
GROUP MEMBERS
AADITYA MANDLOI (20194402)
DEBAJYOTI PADHI (20192210)
DIMPLE MOHANTA (20192213)
BUSINESS DIPANJAN MONDAL (20194417)
HEMANT KUMAR SINGH (20194421)
NEERAJ R KUMAR (20192326)
ASSIGNMENT
Submitted to PROF. SAMEER JAIN
Summary of the learning Outcomes:
Business analytics is “a process of transforming data into actions through analysis and insights in
the context of organizational decision making and problem solving.” Business analytics is
supported by various tools such as Microsoft Excel and various Excel add-ins, commercial
statistical software packages such as R-Programming, SAS or Minitab, and more-complex business
intelligence suites that integrate data with analytical software.
In this assignment, we are given data related to food nutrition and we are going to use R
programing to do analysis and gather some insights as required. R is a statistical programming
language that has open source libraries for statistics, machine learning, and data science. R has
best-in-class tools for visualization, reporting, and interactivity, which are as important to business
as they are to science.
By doing this assignment, we came to know about different functions of R which can help us in
getting insights, they are as follows: -
So in this assignment we first set The working directory which is just a file path on your
computer that sets the default location of any files you read into R, or save out of R which
is done by setwd().
Usually large data is given in excel and the readxl package makes it easy to get data out of
Excel and into R.
The functions head() and tails(), as their name suggest will display the first and the last part
of the data, respectively.
The function summary() can be used to display several statistic summaries (the mean,
median, 25th and 75th quartiles, min and max in one single line call) of the entire data
frame.
Vector is a basic data structure in R. It contains element of the same type. They are
generally created using the c() function. If we want to put any particular columns particular
data in a vector, we can use the column no. and head or tail to call it.
The function as.matrix attempts to turn its argument into a matrix. It converts
a data.table into a matrix. The str() function in R Language is used for compactly displaying
the internal structure of a R object. It can display even the internal structure of large lists
which are nested. So here if we give [1: n,1: n] in matrix format it would form a matrix of
n*n matrix out of the dataset given.
class() function in R Language is used to return the class of data used.
We can directly perform arithmetic expressions to different column having same class and
store it in a new variable.
The dim function of the R programming language returns the dimension (e.g. the number
of columns and rows) of a matrix, array or data frame.
subset() function return subsets of vectors, matrices or data frames which meet
conditions. The format is subset (x, subset) where x is object to be subsetted and subset
indicates logical expression indicating elements or rows to keep.
Bar plots with vertical and horizontal bars can be created in R using
the barplot() function. This function can take a lot of argument to control the way our data
is plotted.
The generic function hist computes a histogram of the given data values.
# A tibble: 10 x 20
NDB_No Shrt_Desc `Water_(g)` Energ_Kcal `Protein_(g)` `Lipid_Tot_(g)` `Ash_(g)` `Carbohydrt_(g)`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01001 BUTTER,W~ 15.9 717 0.85 81.1 2.11 0.06
2 01002 BUTTER,W~ 16.7 718 0.49 78.3 1.62 2.87
3 01003 BUTTER O~ 0.24 876 0.28 99.5 0 0
4 01004 CHEESE,B~ 42.4 353 21.4 28.7 5.11 2.34
5 01005 CHEESE,B~ 41.1 371 23.2 29.7 3.18 2.79
6 01006 CHEESE,B~ 48.4 334 20.8 27.7 2.7 0.45
7 01007 CHEESE,C~ 51.8 300 19.8 24.3 3.68 0.46
8 01008 CHEESE,C~ 39.3 376 25.2 29.2 3.28 3.06
9 01009 CHEESE,C~ 37.0 404 22.9 33.3 3.71 3.09
10 01010 CHEESE,C~ 37.6 387 23.4 30.6 3.6 4.78
# ... with 12 more variables: `Fiber_TD_(g)` <dbl>, `Sugar_Tot_(g)` <dbl>, `Calcium_(mg)` <dbl>,
# `Iron_(mg)` <dbl>, `Magnesium_(mg)` <dbl>, `Phosphorus_(mg)` <dbl>, `Potassium_(mg)` <dbl>,
# `Sodium_(mg)` <dbl>, `Zinc_(mg)` <dbl>, `Copper_mg)` <dbl>, `Manganese_(mg)` <dbl>,
# `Selenium_(µg)` <dbl>
# A tibble: 20 x 20
NDB_No Shrt_Desc `Water_(g)` Energ_Kcal `Protein_(g)` `Lipid_Tot_(g)` `Ash_(g)` `Carbohydrt_(g)`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 03934 BABYFOOD~ 82.6 68 1.26 0.28 0.61 15.2
2 03935 INF FORM~ 87.3 67 1.8 3.63 0.48 6.77
3 03936 INF FORM~ 88.1 63 1.4 3.5 0.61 6.39
4 03937 INF FORM~ 76 130 2.9 7 0.61 13.9
5 03938 INF FORM~ 2 524 12 28 2 56
6 03939 INF FORM~ 88 63 1.8 3.5 0.61 6.09
7 03940 INF FORM~ 76 126 3.6 7 1.22 12.2
8 03941 INF FORM~ 2 508 13.6 27.2 5 52.2
9 03942 INF FORM~ 87 68 1.71 3.49 0.33 7.53
10 03943 INF FORM~ 3 509 12.7 26.0 2.46 56.0
11 03944 INF FORM~ 87 69 1.92 3.77 0.38 6.94
12 03945 INF FORM~ 2.25 520 14.4 28.3 3.26 51.8
13 03946 INF FORM~ 87 68 1.48 3.74 0.75 7.4
14 03947 INF FORM~ 75.9 128 2.73 6.95 0.8 13.6
15 03948 INF FORM~ 2.2 520 11.1 28.1 3 55.7
16 03949 INF FORM~ 87.8 66 1.36 3.7 0.38 6.77
17 03950 INF FORM~ 2.25 522 10.9 28.9 3.26 54.7
18 03951 INF FORM~ 76.2 127 2.64 6.89 0.79 13.5
19 03952 INF FORM~ 75.8 128 3.13 6.98 0.88 13.2
20 03953 INF FORM~ 87.5 66 1.61 3.59 0.66 6.7
# ... with 12 more variables: `Fiber_TD_(g)` <dbl>, `Sugar_Tot_(g)` <dbl>, `Calcium_(mg)` <dbl>,
# `Iron_(mg)` <dbl>, `Magnesium_(mg)` <dbl>, `Phosphorus_(mg)` <dbl>, `Potassium_(mg)` <dbl>,
# `Sodium_(mg)` <dbl>, `Zinc_(mg)` <dbl>, `Copper_mg)` <dbl>, `Manganese_(mg)` <dbl>,
# `Selenium_(µg)` <dbl>
Output
NDB_No Shrt_Desc Water_(g) Energ_Kcal Protein_(g)
Length:600 Length:600 Min. : 0.20 Min. : 0.0 Min. : 0.000
Class :character Class :character 1st Qu.:39.24 1st Qu.: 63.0 1st Qu.: 1.708
Mode :character Mode :character Median :76.16 Median :106.5 Median : 3.915
Mean :60.35 Mean :187.1 Mean : 8.686
3rd Qu.:85.96 3rd Qu.:315.8 3rd Qu.:12.500
Max. :99.90 Max. :876.0 Max. :84.080
#now creating a vector "test" using the top 10 values of Protein Variable
> test<-c(head(myData[,5],10))
> test
$`Protein_(g)`
[1] 0.85 0.49 0.28 21.40 23.24 20.75 19.80 25.18 22.87 23.37
#Create a new variable "EPW" by dividing Energ_Kcal with the Water value
> EPW<-Energ_Kcal/`Water_(g)`
> EPW
[1] 45.1795841 42.9425837 3650.0000000 8.3235086 9.0245682 6.8979760 5.7915058
[8] 9.5723014 10.9130200 10.2788845 10.3141361 1.2282241 1.2179809 0.8887792
[15] 0.9970458 0.8729389 6.6514633 8.5899904 4.7808765 10.2584388 34.6726190
[22] 8.5865895 12.4435071 6.7534077 9.0953426 5.9988002 6.5729640 4.7229453
[29] 6.2672615 8.8101508 4.0088734 18.5430464 13.4430727 7.7447745 8.5714286
[36] 2.4267782 1.8545894 12.5202200 9.3702387 10.4437948 7.9328045 9.2400909
[43] 9.5957011 7.8941149 7.6762523 7.5000000 7.3963820 6.0860441 1.5140325
[50] 2.5634143 4.5984252 5.8813354 4.1904451 1.6845520 2.7097304 1.0661497
[57] 2.3799973 0.7186039 0.7181124 1.9245456 1.7600621 1.7600621 177.5167785
[64] 392.5170068 2.9510192 4.3730330 6.3333997 2.9234013 0.6917668 0.6921593
[71] 0.7298438 0.5604753 0.5739365 0.6384677 0.4670819 0.4787886 0.5409060
[78] 0.3742845 0.4093826 0.4588183 0.4438034 0.6916100 200.8097166 114.5569620
[85] 90.4040404 72.2448980 130.3030303 11.8188513 1.8098325 0.9823678 1.0085055
[92] 0.9249118 0.7301849 0.9338993 0.7928301 0.8000000 1.1632090 1.3382900
[99] 1.6481994 1.5043653 0.2569043 96.5811966 0.2899485 110.6583072 0.6939704
[106] 0.7405666 0.6570456 1.0759494 1.3147410 1.3694952 1.4170040 1.8778726
[113] 0.5938107 6.1556108 5.2445074 5.9925825 2.8213617 2.0771911 2.0228556
[120] 1.8852999 1.9502618 212.9496403 328.8770053 24.0082079 44.0281030 217.2077922
[127] 2.6118876 2.6267216 2.1250841 2.3586207 115.0259067 39.9665552 16.6000000
[134] 0.3742845 0.6384677 1.8098325 114.5569620 90.4040404 15.5808342 7.9964851
[141] 4.3456790 5.4059367 5.2364865 2.9464553 9.8003153 8.4440228 9.5578840
[148] 2.7416799 10.2103643 1.9390582 0.5444029 65.8620690 0.5604753 0.4670819
[155] 2.5492958 1.7413572 0.9181141 5.8506224 0.4918220 43.0232558 1.4609712
[162] 1.5649015 5.3592233 4.1747573 2.5517241 6.6157205 4.3274854 1.9947507
[169] 1.1711712 0.6860465 2.2426249 0.9492516 1.4170040 5.2371542 5.0139832
[176] 317.1052632 5.4150198 5.8506224 199.6402878 0.6921593 200.8097166 1.8233387
[183] 4.6332046 1.3147410 1.3694952 1.2599469 1.4170040 1.0759494 0.4918220
[190] 1.5649015 27.4000000 33.0000000 10.9717868 0.5517241 7.7069006 5.8148580
[197] 6.3655031 0.7052668 1.0886076 1.4149660 3.7123746 7.1490281 4.8466258
[204] 4.9281314 9.4576543 3.5341060 4.1554124 1.9688645 2.3929676 1.1035867
[211] 8.4274953 6.8116264 9.3663216 7.5000000 2.0344980 0.6933020 65.1459854
[218] 2.8469751 6.9767442 2.0458503 10.0901917 2.7543860 1.9968553 9.6315789
[225] 11.3416321 6.7842000 0.8655370 1.3838120 1.0828025 1.0682945 1.0047727
[232] 1.3409712 1.0362694 0.9718415 0.8736237 0.4552015 0.7309597 1.4451648
[239] 0.8136993 1.1931119 1.3800286 0.9873418 1.2061960 2.0417156 5.4951034
[246] 4.7645845 2.6711996 0.8197655 0.6200482 6.0433295 31.0874704 35.3249476
[253] 22.5120773 57.5367647 33.7386018 37.5603865 64.9006623 32.9166667 26.2325581
[260] 23.3459357 27.7608916 38.2191781 33.6343115 46.5260546 36.9318182 39.6103896
[267] 34.6575342 39.1600454 36.5384615 51.3178295 33.7022133 58.1395349 35.4712042
[274] 96.3946869 84.2696629 63.2653061 26.6868077 25.0889680 49.5755518 20.1444623
[281] 39.5031056 25.9194396 88.2352941 32.9752954 40.4255319 35.5531686 26.0504202
[288] 39.5728643 30.2222222 38.1136951 35.4300385 24.2801556 0.2498371 0.5002909
[295] 0.7166746 0.0000000 0.2238567 1.5512210 5.4773678 3.6766987 0.6543585
[302] 0.1899135 0.2742993 0.5641749 1.9330087 0.8900191 0.5143191 25.2212389
[309] 0.2011220 1.1510791 32.9824561 0.0000000 56.4912281 31.1627907 52.8423773
[316] 26.9200931 0.9825328 0.9825328 0.9833677 1.5816327 1.2049689 1.2049689
[323] 1.0732791 1.4070352 1.6774194 1.9210526 2.7525622 1.3819721 1.3819721
[330] 2.6934097 17.2865375 2.6474820 0.7634498 107.4585635 0.0000000 0.6138522
[337] 0.5852651 0.5399325 0.5417607 0.9356015 0.7065091 0.6805075 1.2941176
[344] 0.7254721 1.0036720 1.0012210 0.8078888 0.9191176 0.5868815 0.9065223
[351] 0.9065223 1.1387900 0.8046647 0.6745170 0.6896552 0.5869074 0.5756208
[358] 0.7706679 0.6303725 0.5611672 0.9363745 0.6789413 0.7224771 0.7424594
[365] 0.6872852 0.4872647 0.5862458 0.5894355 0.6463527 0.5404188 0.5988701
[372] 0.7859238 0.7052023 0.2939575 0.2594595 0.3131749 0.7192575 0.3773585
[379] 0.2816901 0.3516484 0.3021800 0.2590114 0.6721698 0.7134364 0.5990783
[386] 0.5727377 0.5828571 0.4627540 0.4134078 0.7220217 0.6818182 0.7985258
[393] 0.5712980 0.7565012 0.4129464 0.7673569 0.6666667 0.7770472 0.7770472
[400] 0.4751131 0.5011390 0.8875000 0.9343434 0.8592777 0.8739076 1.3458950
[407] 0.8750000 0.6658879 0.5017104 0.5408516 0.5955857 0.5957944 0.7938417
[414] 0.4134078 0.4377104 0.6921241 0.6904762 0.7619048 0.8384710 0.7955936
[421] 0.4632768 0.5011390 0.7636364 0.5835240 0.8684864 1.0506329 0.7340554
[428] 0.7487923 0.5340909 0.5855339 0.4831461 0.5209513 0.5612829 0.8856089
[435] 0.5084746 0.4836895 0.5365297 0.5239180 0.5753740 0.5498282 0.8547009
[442] 0.5346985 55.2941176 236.4705882 59.5522388 86.8888889 1.0250000 1.0552764
[449] 86.4035088 84.8936170 0.9124088 0.9290954 67.4137931 76.0233918 0.9876543
[456] 0.5743243 0.5862458 0.6651376 0.9196740 74.4444444 76.3157895 238.5000000
[463] 70.5660377 85.9574468 73.3898305 75.7142857 99.2500000 61.2500000 94.6666667
[470] 0.9433962 0.9835657 0.9774436 0.8343558 0.8518519 1.0025063 0.7946210
[477] 0.8251232 0.7237636 0.7228916 1.0438144 0.7074341 0.7664234 0.7211538
[484] 1.0638298 1.0843525 0.5221339 0.6011561 0.4581006 0.5209513 0.6039198
[491] 0.4622322 0.3752759 0.8220859 0.4026846 0.3555556 0.4008909 0.6492027
[498] 0.7268464 0.6758305 0.6658879 0.6952491 0.7611241 0.7132996 0.3351351
[505] 63.3333333 0.8841099 0.9570552 1.0220221 1.4899329 1.2119682 1.0512162
[512] 1.5570470 1.0512162 1.5436242 1.0370913 1.5456989 1.5498652 1.5053763
[519] 1.0512162 0.7542857 1.6754617 203.6000000 0.7159091 157.5757576 0.7159091
[526] 154.8484848 154.8484848 1.7387842 0.7553216 200.8000000 0.7243916 1.6776750
[533] 1.7392459 161.0000000 1.0620323 259.5000000 0.7657143 0.7130730 1.7357891
[540] 0.7243916 170.0000000 236.8181818 1.7490394 0.7470289 232.8888889 145.2941176
[547] 0.9168920 0.8293424 0.7539411 1.6884316 206.8000000 1.6644650 0.7553216
[554] 0.7558406 205.6000000 0.7404032 1.6675420 145.7142857 232.0000000 1.7357891
[561] 0.7404032 1.6675420 0.7243916 232.0000000 0.7314286 1.2302722 160.0000000
[568] 0.7754630 201.2000000 0.7130730 236.8181818 1.2302722 0.7505774 1.6846361
[575] 188.4000000 0.7428571 1.6976127 200.8000000 160.0000000 0.7753732 0.8230453
[582] 0.7672927 0.7150965 1.7105263 262.0000000 0.7159091 1.6578947 254.0000000
[589] 0.7816092 169.6666667 0.7931034 231.1111111 0.7816092 1.6868740 236.3636364
[596] 0.7517941 232.0000000 1.6675420 1.6884316 0.7539411
# A tibble: 564 x 20
NDB_No Shrt_Desc `Water_(g)` Energ_Kcal `Protein_(g)` `Lipid_Tot_(g)` `Ash_(g)` `Carbohydrt_(g)`
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01004 CHEESE,B~ 42.4 353 21.4 28.7 5.11 2.34
2 01005 CHEESE,B~ 41.1 371 23.2 29.7 3.18 2.79
3 01006 CHEESE,B~ 48.4 334 20.8 27.7 2.7 0.45
4 01007 CHEESE,C~ 51.8 300 19.8 24.3 3.68 0.46
5 01008 CHEESE,C~ 39.3 376 25.2 29.2 3.28 3.06
6 01009 CHEESE,C~ 37.0 404 22.9 33.3 3.71 3.09
7 01010 CHEESE,C~ 37.6 387 23.4 30.6 3.6 4.78
8 01011 CHEESE,C~ 38.2 394 23.8 32.1 3.36 2.57
9 01012 CHEESE,C~ 79.8 98 11.1 4.3 1.41 3.38
10 01013 CHEESE,C~ 79.6 97 10.7 3.85 1.2 4.61
# ... with 554 more rows, and 12 more variables: `Fiber_TD_(g)` <dbl>, `Sugar_Tot_(g)` <dbl>,
# `Calcium_(mg)` <dbl>, `Iron_(mg)` <dbl>, `Magnesium_(mg)` <dbl>, `Phosphorus_(mg)` <dbl>,
# `Potassium_(mg)` <dbl>, `Sodium_(mg)` <dbl>, `Zinc_(mg)` <dbl>, `Copper_mg)` <dbl>,
# `Manganese_(mg)` <dbl>, `Selenium_(µg)` <dbl>
#Bar Plot between Energ_Kal and Water using the new subset created
> attach(submyData)
> barplot(table(Energ_Kcal),ylab = "Frequency",main="Bar Plot of Energ_Kcal")
> barplot(table(`Water_(g)`),ylab = "Frequency",main ="Bar Plot of Water")