0% found this document useful (0 votes)
73 views7 pages

Mean Vector and Correlation Matrix in R - Jupyter Notebook

The document discusses calculating the mean vector and correlation matrix in R using diabetes patient data. It begins by importing libraries and reading in the data. Descriptive statistics and scatterplot matrices are used to analyze the data. The mean of each variable is calculated to get the average values. Group means are calculated by diabetes outcome. The variance-covariance matrix and correlation matrix are also computed to understand the relationships between variables.

Uploaded by

AnuvidyaKarthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views7 pages

Mean Vector and Correlation Matrix in R - Jupyter Notebook

The document discusses calculating the mean vector and correlation matrix in R using diabetes patient data. It begins by importing libraries and reading in the data. Descriptive statistics and scatterplot matrices are used to analyze the data. The mean of each variable is calculated to get the average values. Group means are calculated by diabetes outcome. The variance-covariance matrix and correlation matrix are also computed to understand the relationships between variables.

Uploaded by

AnuvidyaKarthi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

3/10/22, 4:39 AM Mean vector and Correlation Matrix in R - Jupyter Notebook

Importing modules and reading data


In [37]:

library(car)
library(corrplot)
data <- read.csv(url("https://datalifex.in/dataml/diabetes.csv"))
data

0 118 84 47 230 45.8 0.551 31

7 107 74 0 0 29.6 0.254 31

1 103 30 38 83 43.3 0.183 33

1 115 70 30 96 34.6 0.529 32

3 126 88 41 235 39.3 0.704 27

8 99 84 0 0 35.4 0.388 50

7 196 90 0 0 39.8 0.451 41

9 119 80 35 0 29.0 0.263 29

11 143 94 33 146 36.6 0.254 51

10 125 70 26 115 31.1 0.205 41

7 147 76 0 0 39.4 0.257 43

1 97 66 15 140 23.2 0.487 22

Summary

localhost:8888/notebooks/Mean vector and Correlation Matrix in R.ipynb# 1/7


3/10/22, 4:39 AM Mean vector and Correlation Matrix in R - Jupyter Notebook

In [38]:

summary(data)

Pregnancies Glucose BloodPressure SkinThickness

Min. : 0.000 Min. : 0.0 Min. : 0.00 Min. : 0.00

1st Qu.: 1.000 1st Qu.: 99.0 1st Qu.: 62.00 1st Qu.: 0.00

Median : 3.000 Median :117.0 Median : 72.00 Median :23.00

Mean : 3.845 Mean :120.9 Mean : 69.11 Mean :20.54

3rd Qu.: 6.000 3rd Qu.:140.2 3rd Qu.: 80.00 3rd Qu.:32.00

Max. :17.000 Max. :199.0 Max. :122.00 Max. :99.00

Insulin BMI DiabetesPedigreeFunction Age

Min. : 0.0 Min. : 0.00 Min. :0.0780 Min. :21.00

1st Qu.: 0.0 1st Qu.:27.30 1st Qu.:0.2437 1st Qu.:24.00

Median : 30.5 Median :32.00 Median :0.3725 Median :29.00

Mean : 79.8 Mean :31.99 Mean :0.4719 Mean :33.24

3rd Qu.:127.2 3rd Qu.:36.60 3rd Qu.:0.6262 3rd Qu.:41.00

Max. :846.0 Max. :67.10 Max. :2.4200 Max. :81.00

Outcome

Min. :0.000

1st Qu.:0.000

Median :0.000

Mean :0.349

3rd Qu.:1.000

Max. :1.000

Matrix Scatterplot
Scatterplot matrices are good for determining rough linear correlations of metadata that contain continuous
variables.
Scatterplot matrices are not so good for looking at discrete variables.

localhost:8888/notebooks/Mean vector and Correlation Matrix in R.ipynb# 2/7


3/10/22, 4:39 AM Mean vector and Correlation Matrix in R - Jupyter Notebook

In [39]:

scatterplotMatrix(data[1:9])

calculating mean of all the variables

Mean vector
In [40]:

apply(data[,1:9], 2, mean)

Pregnancies: 3.84505208333333 Glucose: 120.89453125 BloodPressure: 69.10546875


SkinThickness: 20.5364583333333 Insulin: 79.7994791666667 BMI: 31.992578125
DiabetesPedigreeFunction: 0.471876302083333 Age: 33.2408854166667 Outcome:
0.348958333333333

In [43]:

colMeans(data[,1:9])

Pregnancies: 3.84505208333333 Glucose: 120.89453125 BloodPressure: 69.10546875


SkinThickness: 20.5364583333333 Insulin: 79.7994791666667 BMI: 31.992578125
DiabetesPedigreeFunction: 0.471876302083333 Age: 33.2408854166667 Outcome:
0.348958333333333

Inference: from the mean we could infer that the average age of the data collected is arround 33

localhost:8888/notebooks/Mean vector and Correlation Matrix in R.ipynb# 3/7


3/10/22, 4:39 AM Mean vector and Correlation Matrix in R - Jupyter Notebook

Group means

by function
In [44]:

by(data=data[,1:9], INDICES = data$Outcome ,FUN= colMeans)

data$Outcome: 0

Pregnancies Glucose BloodPressure

3.298000 109.980000 68.184000

SkinThickness Insulin BMI

19.664000 68.792000 30.304200

DiabetesPedigreeFunction Age Outcome

0.429734 31.190000 0.000000

------------------------------------------------------------

data$Outcome: 1

Pregnancies Glucose BloodPressure

4.865672 141.257463 70.824627

SkinThickness Insulin BMI

22.164179 100.335821 35.142537

DiabetesPedigreeFunction Age Outcome

0.550500 37.067164 1.000000

Here, outcome 0 is the case of non-diabetics and 1 refers to person having diabetics and respective mean level
of all the parameters We could see that the Mean level of Glucose,Insulin and Age for the person with diabetics
is comparatively higher than with non-diabetics.

using aggregate function


In [45]:

aggregate(.~ Outcome,data,mean)

A data.frame: 2 × 9

Outcome Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesP

<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>

0 3.298000 109.9800 68.18400 19.66400 68.7920 30.30420

1 4.865672 141.2575 70.82463 22.16418 100.3358 35.14254

variance and covariance matrix

localhost:8888/notebooks/Mean vector and Correlation Matrix in R.ipynb# 4/7


3/10/22, 4:39 AM Mean vector and Correlation Matrix in R - Jupyter Notebook

In [46]:

var(data[,1:9]) #the diagonal elements of the matrix are the individual variances

A matrix: 9 × 9 of type dbl

Pregnancies Glucose BloodPressure SkinThickness Insu

Pregnancies 11.35405632 13.947131 9.2145382 -4.3900410 -28.5552

Glucose 13.94713066 1022.248314 94.4309556 29.2391827 1220.9357

BloodPressure 9.21453818 94.430956 374.6472712 64.0293962 198.3784

SkinThickness -4.39004101 29.239183 64.0293962 254.4732453 802.9799

Insulin -28.55523074 1220.935799 198.3784122 802.9799408 13281.1800

BMI 0.46977418 55.726987 43.0046951 49.3738694 179.7751

DiabetesPedigreeFunction -0.03742597 1.454875 0.2646376 0.9721355 7.0666

Age 21.57061977 99.082805 54.5234528 -21.3810232 -57.1432

Outcome 0.35661805 7.115079 0.6006967 0.5687473 7.1756

In [47]:

#off diagonal elements gives the covariance

The above matrix indicates the direction of the linear relationship between variables

Correlation Matrix
In [48]:

cor(data[,1:9])

A matrix: 9 × 9 of type dbl

Pregnancies Glucose BloodPressure SkinThickness Insulin

Pregnancies 1.00000000 0.12945867 0.14128198 -0.08167177 -0.07353461

Glucose 0.12945867 1.00000000 0.15258959 0.05732789 0.33135711

BloodPressure 0.14128198 0.15258959 1.00000000 0.20737054 0.08893338

SkinThickness -0.08167177 0.05732789 0.20737054 1.00000000 0.43678257

Insulin -0.07353461 0.33135711 0.08893338 0.43678257 1.00000000

BMI 0.01768309 0.22107107 0.28180529 0.39257320 0.19785906

DiabetesPedigreeFunction -0.03352267 0.13733730 0.04126495 0.18392757 0.18507093

Age 0.54434123 0.26351432 0.23952795 -0.11397026 -0.04216295

Outcome 0.22189815 0.46658140 0.06506836 0.07475223 0.13054795

localhost:8888/notebooks/Mean vector and Correlation Matrix in R.ipynb# 5/7


3/10/22, 4:39 AM Mean vector and Correlation Matrix in R - Jupyter Notebook

Visualization of correlation matrix


In [49]:

corrplot(cor(data[,1:9]),method="ellipse")

INFERENCE: from the plot we could notice that there is a strong correlation between pregnancies and age and
moderate correlation between glucose level and people having diabetes, skinthickness and insulin
level,skinthickness and BMI level

Building covariance matrix from correlation matrix


Formula:
S=D1/2.R.D1/2

localhost:8888/notebooks/Mean vector and Correlation Matrix in R.ipynb# 6/7


3/10/22, 4:39 AM Mean vector and Correlation Matrix in R - Jupyter Notebook

In [50]:

D1 <- diag(diag(var(data[,1:9]))^(1/2))
S <- D1%*%cor(data[,1:9])%*%D1
print(S)

[,1] [,2] [,3] [,4] [,5]


[,6]

[1,] 11.35405632 13.947131 9.2145382 -4.3900410 -28.555231 0.4697


742

[2,] 13.94713066 1022.248314 94.4309556 29.2391827 1220.935799 55.7269


867

[3,] 9.21453818 94.430956 374.6472712 64.0293962 198.378412 43.0046


951

[4,] -4.39004101 29.239183 64.0293962 254.4732453 802.979941 49.3738


694

[5,] -28.55523074 1220.935799 198.3784122 802.9799408 13281.180078 179.7751


721

[6,] 0.46977418 55.726987 43.0046951 49.3738694 179.775172 62.1599


840

[7,] -0.03742597 1.454875 0.2646376 0.9721355 7.066681 0.3674


047

[8,] 21.57061977 99.082805 54.5234528 -21.3810232 -57.143290 3.3603


299

[9,] 0.35661805 7.115079 0.6006967 0.5687473 7.175671 1.1006


376

[,7] [,8] [,9]

[1,] -0.03742597 21.5706198 0.35661805

[2,] 1.45487481 99.0828054 7.11507904

[3,] 0.26463757 54.5234528 0.60069671

[4,] 0.97213555 -21.3810232 0.56874728

[5,] 7.06668051 -57.1432903 7.17567090

[6,] 0.36740469 3.3603299 1.10063763

[7,] 0.10977864 0.1307717 0.02747217

[8,] 0.13077169 138.3030459 1.33695268

[9,] 0.02747217 1.3369527 0.22748262

localhost:8888/notebooks/Mean vector and Correlation Matrix in R.ipynb# 7/7

You might also like