0% found this document useful (0 votes)
52 views

Clustering Example

This document describes a case study using data from Samsung Galaxy S3 smartphones to analyze and cluster human activity patterns based on accelerometer data. The study explores various techniques for clustering and visualizing the smartphone accelerometer data recorded during different activities like walking, sitting, and standing. These techniques include hierarchical clustering, k-means clustering, and principal component analysis. The goal is to determine how well different clustering methods can distinguish between the types of activities performed by individuals based solely on the accelerometer data.

Uploaded by

niti860
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

Clustering Example

This document describes a case study using data from Samsung Galaxy S3 smartphones to analyze and cluster human activity patterns based on accelerometer data. The study explores various techniques for clustering and visualizing the smartphone accelerometer data recorded during different activities like walking, sitting, and standing. These techniques include hierarchical clustering, k-means clustering, and principal component analysis. The goal is to determine how well different clustering methods can distinguish between the types of activities performed by individuals based solely on the accelerometer data.

Uploaded by

niti860
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

EDA Case Study - Understanding Human EDA Case Study - Understanding Human

Activity with Smart Phones Activity with Smart Phones


Roger D. Peng, Associate Professor of Biostatistics
Johns Hopkins Bloomberg School of Public Health
Samsung Galaxy S3 Samsung Galaxy S3
http://www.samsung.com/global/galaxys3/
2/18
Samsung Data Samsung Data
http://archive.ics.uci.edu/ml/datasets/Human+Activity+Recognition+Using+Smartphones
3/18
Slightly processed data Slightly processed data
Samsung data le
load("data/samsungData.rda")
names(samsungData)[1:12]
## [1] "tBodyAcc-mean()-X" "tBodyAcc-mean()-Y" "tBodyAcc-mean()-Z"
## [4] "tBodyAcc-std()-X" "tBodyAcc-std()-Y" "tBodyAcc-std()-Z"
## [7] "tBodyAcc-mad()-X" "tBodyAcc-mad()-Y" "tBodyAcc-mad()-Z"
## [10] "tBodyAcc-max()-X" "tBodyAcc-max()-Y" "tBodyAcc-max()-Z"
table(samsungData$activity)
##
## laying sitting standing walk walkdown walkup
## 1407 1286 1374 1226 986 1073
4/18
Plotting average acceleration for first subject Plotting average acceleration for first subject
par(mfrow = c(1, 2), mar = c(5, 4, 1, 1))
samsungData <- transform(samsungData, activity = factor(activity))
sub1 <- subset(samsungData, subject == 1)
plot(sub1[, 1], col = sub1$activity, ylab = names(sub1)[1])
plot(sub1[, 2], col = sub1$activity, ylab = names(sub1)[2])
legend("bottomright", legend = unique(sub1$activity), col = unique(sub1$activity),
pch = 1)
5/18
Clustering based just on average acceleration Clustering based just on average acceleration
source("myplclust.R")
distanceMatrix <- dist(sub1[, 1:3])
hclustering <- hclust(distanceMatrix)
myplclust(hclustering, lab.col = unclass(sub1$activity))
6/18
Plotting max acceleration for the first subject Plotting max acceleration for the first subject
par(mfrow = c(1, 2))
plot(sub1[, 10], pch = 19, col = sub1$activity, ylab = names(sub1)[10])
plot(sub1[, 11], pch = 19, col = sub1$activity, ylab = names(sub1)[11])
7/18
Clustering based on maximum acceleration Clustering based on maximum acceleration
source("myplclust.R")
distanceMatrix <- dist(sub1[, 10:12])
hclustering <- hclust(distanceMatrix)
myplclust(hclustering, lab.col = unclass(sub1$activity))
8/18
Singular Value Decomposition Singular Value Decomposition
svd1 = svd(scale(sub1[, -c(562, 563)]))
par(mfrow = c(1, 2))
plot(svd1$u[, 1], col = sub1$activity, pch = 19)
plot(svd1$u[, 2], col = sub1$activity, pch = 19)
9/18
Find maximum contributor Find maximum contributor
plot(svd1$v[, 2], pch = 19)
10/18
New clustering with maximum contributer New clustering with maximum contributer
maxContrib <- which.max(svd1$v[, 2])
distanceMatrix <- dist(sub1[, c(10:12, maxContrib)])
hclustering <- hclust(distanceMatrix)
myplclust(hclustering, lab.col = unclass(sub1$activity))
11/18
New clustering with maximum contributer New clustering with maximum contributer
names(samsungData)[maxContrib]
## [1] "fBodyAcc.meanFreq...Z"
12/18
K-means clustering (nstart=1, first try) K-means clustering (nstart=1, first try)
kClust <- kmeans(sub1[, -c(562, 563)], centers = 6)
table(kClust$cluster, sub1$activity)
##
## laying sitting standing walk walkdown walkup
## 1 0 0 0 50 1 0
## 2 0 0 0 0 48 0
## 3 27 37 51 0 0 0
## 4 3 0 0 0 0 53
## 5 0 0 0 45 0 0
## 6 20 10 2 0 0 0
13/18
K-means clustering (nstart=1, second try) K-means clustering (nstart=1, second try)
kClust <- kmeans(sub1[, -c(562, 563)], centers = 6, nstart = 1)
table(kClust$cluster, sub1$activity)
##
## laying sitting standing walk walkdown walkup
## 1 0 0 0 0 49 0
## 2 18 10 2 0 0 0
## 3 0 0 0 95 0 0
## 4 29 0 0 0 0 0
## 5 0 37 51 0 0 0
## 6 3 0 0 0 0 53
14/18
K-means clustering (nstart=100, first try) K-means clustering (nstart=100, first try)
kClust <- kmeans(sub1[, -c(562, 563)], centers = 6, nstart = 100)
table(kClust$cluster, sub1$activity)
##
## laying sitting standing walk walkdown walkup
## 1 18 10 2 0 0 0
## 2 29 0 0 0 0 0
## 3 0 0 0 95 0 0
## 4 0 0 0 0 49 0
## 5 3 0 0 0 0 53
## 6 0 37 51 0 0 0
15/18
K-means clustering (nstart=100, second try) K-means clustering (nstart=100, second try)
kClust <- kmeans(sub1[, -c(562, 563)], centers = 6, nstart = 100)
table(kClust$cluster, sub1$activity)
##
## laying sitting standing walk walkdown walkup
## 1 29 0 0 0 0 0
## 2 3 0 0 0 0 53
## 3 0 0 0 0 49 0
## 4 0 0 0 95 0 0
## 5 0 37 51 0 0 0
## 6 18 10 2 0 0 0
16/18
Cluster 1 Variable Centers (Laying) Cluster 1 Variable Centers (Laying)
plot(kClust$center[1, 1:10], pch = 19, ylab = "Cluster Center", xlab = "")
17/18
Cluster 2 Variable Centers (Walking) Cluster 2 Variable Centers (Walking)
plot(kClust$center[4, 1:10], pch = 19, ylab = "Cluster Center", xlab = "")
18/18

You might also like