TIC Final Report
TIC Final Report
TIC Final Report
GymNeus
Prepared in
partial fulfilment of
TIC PROJECT
Prepared By
Yashvardhan Srivastava 2010B5A3540P
Aayush Jain 2010B1A3371P
Shriniwas Sharma 2010ABPS460P
Danish Pruthi 2011A7PS037P
Submitted to
Dr. Anu Gupta
BIRLA INSTITUTE OF TECHNOLOGY AND SCIENCE, PILANI
ACKNOWLEDGEMENT
Every work requires support and contribution from different sources and people for its
successful completion and achieve desired outcome. I would like to thank Dr. Anu Gupta
for providing me this opportunity to work on this project as a TIC Project for Knightvale
Consultancy GmbH.
I would specially like to thank Mr. Puneet Teja, my mentor for this project who has guided
me on every turn throughout this project. His every suggestion was highly valued and
helped me optimize the application.
Moreover I would also like to thank the institution for providing round the clock access to
internet facilities which helped me to get relevant information. In addition to that I would
like to pay heartily regards to those people who helped us directly and indirectly.
2
ABSTRACT
GYMNEUS
GymNeus is a service that allows user to track and analyze their workouts on their
Smartphone. It includes the workouts in the gym. The data is sent can be sent to the user's
smartphone if its around. It can be saved in the device and can be recovered later or it can
be directly sent to the servers and the user can get a complete analysis of his workout that
can be tracked or even maintained as a training log for a trainer.
The accelerometer and gyroscope sensors send the data points and the machine learning
algorithms identify the workout being done and store it accordingly.
3
TABLE OF CONTENTS
Acknowledgement i
Abstract ii
1. Introduction 4
2. Workflow 6
3. Results 9
4. Literature Survey 11
5. Decision Trees: An introduction 13
6. Rating decision trees 14
7. Ranking features 14
8. Automated Methodologies of creating decision trees 15
9. Combination/Ensemble Techniques 17
10. APPENDIX 18
6.1 ARFF file documentation 18
6.2 Java Code 21
4
INTRODUCTION
GYMNEUS
GymNeus is used to track workouts on strength training machines. Users can keep a
detailed record of their workouts on all strength training machines as well as general
workout sessions. In addition, the workouts can be shared with friends, peers on social
networks as well as with personal trainers.
GymNeus connects over Bluetooth with your mobile device or with your computer using a
micro USB cable.
Gymneus has accelerometers and gyroscope based sensors which take accurate data and
transfer it to the smartphone which then process the data in such a manner that the type of
workout is quickly identified.
This processing to identify/ classify the workout type can also be done on the electronics
and there could also be a provision for directly sending the data in json format over the 3G/
GPRS connection on the servers.
ANDROID
Android is a software stack for mobile devices that includes an operating system,
middleware and key applications. Android is a software platform and operating system for
mobile devices based on the Linux operating system and developed by Google and the Open
Handset Alliance. It allows developers to write managed code in a Java-like language that
utilizes Google-developed Java libraries, but does not support programs developed in
native code. The unveiling of the Android platform on 5 November 2007 was announced
with the founding of the Open Handset Alliance, a consortium of 34 hardware, software and
telecom companies devoted to advancing open standards for mobile devices. When
released in 2008, most of the Android platform will be made available under the Apache
free-software and open-source license. Since then the market of smart phones increased
5
exponentially and different versions of Android platform were launched with the Android
4.3 Kit-Kat being the latest. These versions provide enhanced support for various features
and different types of sensors, which provide a very rich experience to user and an
opportunity to developer to experiment and create amazing android apps.
SOFTWARES/ RESOURCES USED
Eclipse IDE
Eclipse is an integrated development environment (IDE). It contains a base workspace and
an extensible plug-in system for customizing the environment. Written mostly in Java,
Eclipse can be used to develop applications. By means of various plug-ins, Eclipse may also
be used to develop applications in other programming languages
Java 7
Java is a computer programming language that is concurrent, class-based, object-oriented,
and specifically designed to have as few implementation dependencies as possible.
The seventh iteration of this language is used for programming and generating various files
which are later used in the project.
WEKA
Weka (Waikato Environment for Knowledge Analysis) is a popular suite of machine
learning software written in Java, developed at the University of Waikato, New Zealand.
Weka is free software available under the GNU General Public License.
The Weka (pronounced Weh-Kuh) workbench[1] contains a collection of visualization tools
and algorithms for data analysis and predictive modelling together with graphical user
interfaces for easy access to this functionality.
Weka supports several standard data mining tasks, more specifically, data pre
processing, clustering, classification, regression, visualization, and feature selection. All of
Weka's techniques are predicated on the assumption that the data is available as a single
flat file or relation, where each data point is described by a fixed number of attributes
(normally, numeric or nominal attributes, but some other attribute types are also
supported). ARFF files were developed by the Machine Learning Project at the Department
of Computer Science of The University of Waikato for use with the Weka machine learning
software.
6
ARFF (Attribute-Relation File Format) file is an ASCII text file that describes a list of
instances sharing a set of attributes.
Further details about the ARFF file which is generated for use in Weka has been given in the
Appendix.
WORKFLOW
As discussed earlier the Basic requirement, or rather the initial requirement is to identify /
classify which type of workout is being done looking at only the basic accelerometer and
the gyroscope readings.
The objective being a simple classification problem required a machine learning algorithm.
It also required sufficient amount of data so that the features calculated are enough for the
machine to both learn and train itself to identify which type of workout is being performed.
To judge the correctness of the algorithm Weka has been used.
The basic procedure was a two step process:
1. Obtain enough data points for 3 predefined workout types (pulls, curls, stretches)
2. Write java code to generate the arff file which could be used by weka.
The Data points were collected using the accelerometers and gyroscope sensors of a
smartphone.
The data points for a particular repetition were stored separately in a file.
The file was in .csv format.
2014-02-12 10:17:13
+0000 0.034574
-
0.68078
-
0.81232 0.013885
-
0.66577
-
0.78806
2014-02-12 10:17:13
+0000 0.011716
-
0.68789
-
0.79446 -0.00897
-
0.67288
-
0.77019
2014-02-12 10:17:13
+0000 0.011716
-
0.68789
-
0.79446 -0.00897
-
0.67288
-
0.77019
2014-02-12 10:17:13
+0000 -0.02315
-
0.69756
-
0.76527 -0.04384
-
0.68256 -0.741
2014-02-12 10:17:13
+0000 -0.02022 -0.6869
-
0.77772 -0.04091
-
0.67189
-
0.75345
2014-02-12 10:17:13
+0000 -0.02022 -0.6869
-
0.77772 -0.04091
-
0.67189
-
0.75345
2014-02-12 10:17:13
+0000 0.006238
-
0.72957
-
0.86001 -0.01445
-
0.71457
-
0.83574
2014-02-12 10:17:13
+0000 -0.04747
-
0.73574
-
0.83021 -0.06816
-
0.72073
-
0.80594
7
2014-02-12 10:17:13
+0000 -0.0322
-
0.85549
-
1.13512 -0.05289
-
0.84048
-
1.11086
This file was cleaned to reveal only the data required to generate the arff file.
121,-57,-16,-17,204,108
11,-63,-50,-17,218,109
87,-65,-51,-15,224,111
57,-66,-45,-19,222,110
980,-1086,-1629,34,219,102
259,-2213,-2801,90,296,29
The values generated correspond to accelerometer readings of the 3 axis followed by the
gyroscope readings of the corresponding axis.
A total of 117 (44 pulls, 46 stretches, 27 curls) such files corresponding to different
repetition was created.
This was followed by calculation of various features that might be useful in determining the
workout type. The features which were decided upon :
mean, variance, minmax difference, zero crossing rate, correlation of the corresponding
pair of axis, linear regression, root mean square.
These features were calculated using a java program and arranged according to the format
described for arff type files.
The format has been described in the appendix.
The final arff file that was generated can be visualized as:
@relation actvity
@attribute minmax_acc_x
...
@attribute minmax_vel_gyr_avg
@attribute rms_acc_x
...
@attribute rms_vel_gyr_avg
@attribute var_acc_x
...
@attribute var_vel_gyr_avg
@attribute mean_acc_x
...
@attribute mean_vel_gyr_avg
@attribute zcr_acc_x
...
@attribute zcr_vel_gyr_avg
@attribute corr_acc_xy
...
@attribute corr_vel_gyr_xz
@attribute type {curls, pulls, stretches}
@data
8
2737.0,4701.0,7371.0,1521.6666666666667,.....600.0,448.0,456.0,188.66666666666666,stretches
4946.0,8544.0,11733.0,5626.333333333333,......1588.0,4739.0,2950.0,2182.333333333333,stretches
670.8845504257793,1151.2510933762453,......1480.7058587038819,358.81221334347646,stretches
136.6925016231688,212.70552414077073,......146.34397835237363,101.68867521345072,stretches
1616.3602321264898,4134.453303642454,......4825.417751863562,2473.2497964329355,curls
878.2117284573236,2834.0528929432494,......1882.6672462227625,1295.8325714895939,curls
468474.6266666669,1379896.8733333335,......2279139.583333333,133777.1177777778,curls
17224.573333333334,6406.71,6723.309999999999,......2102.3770370370366,1881325.7433333334,curls
9396463.31,1.8557338793333333E7,4249710.597037037,......279233.5733336,1919486.4166666667,curls
931439.79,406143.2314814815,18.72,-26.04,67.2,......-17.8933333324,-46.36,197.72,122.32,pulls
91.22666666666667,898.08,-2841.32,-2338.72,.......-1427.3199997,-709.36,2487.8,1627.96,pulls
1135.4666666666667,6.0,6.0,6.0,10.0,4.0,2.0,2.0,0.0,......2.0,2.0,1.0,2.0,2.0,0.0,0.0,0.0,pulls
-0.46675630191156164,0.9569454103983737,.....-0.40540496037786083,0.496238755629301,pulls
-0.7020484458471432,-0.5874667698799683,......-0.8490728313603951,0.9753172476431332,pulls
-0.8636550536040989,-0.7832024225046871,......0.9696033412535484,-0.9029983281113617,pulls
This file that has been generated was used in weka to generate results.
The following results were obtained.
There was a series of tests performed and the results are as follows:
Phase 1:
Correctly Classified Instances 76 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0.0272
Root mean squared error 0.0763
Relative absolute error 6.1754 %
Root relative squared error 16.2985 %
Total Number of Instances 76
This was when we used 80% as the training set and rest 20% as testing!
Phase 2:
Correctly Classified Instances 111 95.6897 %
Incorrectly Classified Instances 5 4.3103 %
Kappa statistic 0.9337
Mean absolute error 0.0331
Root mean squared error 0.1671
Relative absolute error 7.6344 %
Root relative squared error 35.8956 %
Total Number of Instances 116
Phase 3 :
Correctly Classified Instances 34 97.1429 %
Incorrectly Classified Instances 1 2.8571 %
Kappa statistic 0.9544
Mean absolute error 0.019
Root mean squared error 0.138
9
Relative absolute error 4.3841 %
Root relative squared error 29.5843 %
Total Number of Instances 35
=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure Class
0.857 0 1 0.857 0.923 curl
1 0.053 0.941 1 0.97 pull
1 0 1 1 1 stretch
=== Confusion Matrix ===
a b c <-- classified as
6 1 0 | a = curl
0 16 0 | b = pull
0 0 12 | c = stretch
Phase 4 :
Correctly Classified Instances 117 100 %
Incorrectly Classified Instances 0 0 %
Kappa statistic 1
Mean absolute error 0
Root mean squared error 0
Relative absolute error 0 %
Root relative squared error 0 %
Total Number of Instances 117
Generating Decision Trees : According to these decision trees generated on weka the java
program for binary decision tree is written. The code for the same is given in the appendix.
10
.
11
LITERATURE SURVEY
The Classification Problem
Classication, which is the task of assigning objects to one of several predened
categories, is a pervasive problem that encompasses many diverse applications. Examples
include detecting spam email messages based upon the message header and content,
categorizing cells as malignant or benign based upon the results of MRI scans, and
classifying galaxies based upon their shapes.
The input data for a Classication task is a collection of records. Each record, also known as
an instance or example, is characterized by a tuple (x, y), where x is the attribute set and y
is a special attribute, designated as the class label (also known as category or target
attribute).
Human Activity Recognition through feature extraction
12
This is followed by feature extraction. The feature extraction step is possibly the most
important part of the activity recognition problem since classification can be handled by
any existing machine learning algorithm if the features are robust. In general frequency
domain features have been found to perform best [6]. However oftentimes extracting these
require too much computation to be feasible in realtime systems [7]. The feature extraction
scheme that we devised is computationally efficient but less tolerant of person to person
variations. We combined modified versions of techniques previously used in this domain
with quantitative description methods used in electroencephalography (EEG) signal
analysis. Our intended use case is activity recognition on cell phones. Important
characteristics of that scenario are minimal processing capability, only one 3D
accelerometer, device is carried in a mostly static orientation in the user's pocket or
purse, and that the system can be trained and used by the same person, namely the owner
of the phone. Performance on the standard dataset and the prototype cell phone application
proves that our method is applicable for the targeted use case.
As a whole this work makes the following contributions:
A novel linear-time feature extraction scheme that uses various disparate methods to
identify human activities is presented.
Accuracy of the proposed method is shown using various classification methods on a
standard accelerometer-based dataset and realtime data on a cell phone.
Prototype application demonstrates that activities can be detected on modern cellphones
in realtime without help from any external sensing or computing device.
13
The time- and frequency-domain features we consider are listed in table I. There are two
reasons why we consider all these features: Firstly, feature extraction costs computational
as well as communication resources. There is a relationship between the cost, the
robustness and the expressive power of the features. Therefore, we closely examine the
nature of these relationships. For example, all the time domain-features avoid the
complexity of preprocessing - i.e., they do not require the laborious task of framing,
windowing, filtering, Fourier transformation, liftering, and so on. Subsequently, they not
only consume little processing power, but the algorithms can be directly deployed in
resource constrained nodes. However, they are not robust to measurement and calibration
errors. The second reason is our desire to support rapid prototyping by providing
application developers the knowledge and experience concerning the type of features they
14
can consider if they choose to employ accelerometer sensors. The features we analyze are
listed in table I.
Decision Trees: An introduction
Decision Trees are important data-structures, where each node has a binary-question on
which a decision is taken, and the entire data splits as per the result to the question posed
at each node.
Decision Trees are essential to Classification problems in particular. A decision tree is a
flowchart-like structure in which internal node represents test on an attribute, each branch
represents outcome of test and each leaf node represents class label (decision taken after
computing all attributes). A path from root to leaf represents classification rules.
Taking an example, suppose we need to identify an animal-bite, we have to somehow
identify the animal behind it.
Quite naturally we start asking questions, like is the area swollen after a bite? Not all
animal bites result in swollen areas! So the answer to
this question eliminates a few animals, we repeat and
ask more questions until we are certain of the animal
who had bitten. This is the very eccentric idea of
decision trees!
A sample decision tree from our problem has been
shown below which decides on the nature of the
excercise :
Pulls, Stretches or Curls.
The attributes/questions it considers are maxmin_acceleration and
maxmin_velo_gyroscope values
Rating Decision Trees
15
After forming various decision trees, a very natural question to ask is : Which of them are
actually good?
But some other traits of good decision trees are :
1. Accuracy : The decision tree should perform well when subjected to different test-cases
other than the training set.
2. Low-Depth : Smaller decision trees are easier and faster to compute, this parameter has
its importance on microcontrollers like Rasberry Pi, which was used for our task.
3. Independent Nodes : The more independent the nodes are, the better it is. Since an
error in one of them would not affect other nodes.
The better a decision tree does on the above three parameters a better rating does it get.
We gave a mathematical measure to each of these three traits, and rated different decision
trees based upon these mathematical formulations.
Knowing the weights(ratings) of different trees is essential in the last step when we club
them to generate the final outcome!
Ranking Different Attributes
Among different features available, we need to basically rank features upon their
importance. Some features are more informative than others. An important feature can
help us decide better and quicker.
So the solutions to the ranking attribute problem have numerous applications, not just for
Classifications but in various other learning algorithms.
The difficulty arrives at measuring the 'importance' statistically on what is worth and what
is not!
There are various statistical parameters that govern how informative a
particular trait is.
One such parameter is InfoGainValue, which basically means separates the all the entire
dataset on the particular feature, and measures how clean the partitions are. Suppose an
attribute can clearly distinguish between three classes of Excercises, then that feature has a
lot of information and hence will have higher infoGain value than others.
There are other parameters like entropyVal, which is the amount of entropy generated by a
split using a particular feature. For an overlapping (bad) split the entropy value (the
measure of randomness among different splits) would be larger. In a way InfoGainVal is
inversely proportional to entropy values.
The parameter we used for our project was based upon InfoGainVal
Automated Procedures/Flowcharts for Selecting good Decision Trees
16
Two methodologies, which can be automted, are presented below.
Generating Weighted Trees
Some Details :
1. The attribute ranking model to be used in Weka is 'InfoGainAttributeEval'
2. I would recommend k to be somewhere between 4 and 10.
3. The generation of decision tree is done using the 'J48 Model' in weka.
4. While decrementing weights assigned to subsequent trees, various decaying
functions can be experimented upon.
17
Generating Equal-weight Trees
Some Details :
1. This model generates equal-weight trees, here we for the first tree we pick say 1
st
,
6
th
,11
th
,17
th
23
rd
features. In this example : m=k=5.
The key idea is to exploit best features in each and every tree.
2. I would recommend m and k to be somewhere between 5-9, and m~=k.
3. The generation of decision tree is done using the 'J48 Model' in weka.
4. The attribute ranking model to be used in Weka is 'InfoGainAttributeEval'
18
Combination/Ensemble Techniques
Suppose we want to combine 5 trees. Assuming each decision tree is basically a set of if-else
statements. We will get five answers for each training data file we input to it. Each decision
initially has weight 1. In this way we will get a confidence voting for each group of tree.
For example: if 4 trees give the answer as "curl" and one tree gives the answer "pull". Due
to majority the confidence vote would be 4/5=0.8.
We can average out the confidence vote for each training data file to get a final mean
confidence vote for the combination of trees.
We also rechecked the confidence vote for different sets of training data files.
We also tried out weighted polling, where different decision trees have different priorities,
and the weight of a high-priority tree is considered more than the lower-priority trees.
With these exercises we finalized a decent block (of if-else statements), that would be able
to accurately classify the exercise.
19
REFERENCES
Analysis of Time and Frequency Domain Features of Accelerometer Measurements
Waltenegus Dargie
Activity Recognition from Accelerometer Data
Nishkam Ravi and Nikhil Dandekar and Preetham Mysore and Michael L. Littman
A Feature Extraction Method for Realtime Human Activity Recognition on Cell
phones
Mridul Khan1, Sheikh Iqbal Ahamed2, Miftahur Rahman1, Roger O. Smith3
Java Documentations
Weka Documentations
20
APPENDIX
ARFF FILE
Overview
ARFF files have two distinct sections. The first section is the Header information, which is
followed the Data information.
The Header of the ARFF file contains the name of the relation, a list of the attributes (the
columns in the data), and their types. An example header on the standard IRIS dataset looks
like this:
@RELATION iris
@ATTRIBUTE sepallength NUMERIC
@ATTRIBUTE sepalwidth NUMERIC
@ATTRIBUTE petallength NUMERIC
@ATTRIBUTE petalwidth NUMERIC
@ATTRIBUTE class {Iris-setosa,Iris-versicolor,Iris-virginica}
The Data of the ARFF file looks like the following:
@DATA
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa
4.4,2.9,1.4,0.2,Iris-setosa
4.9,3.1,1.5,0.1,Iris-setosa
Lines that begin with a % are comments.
The @RELATION, @ATTRIBUTE and @DATA declarations are case insensitive.
The ARFF Header Section
The ARFF Header section of the file contains the relation declaration and attribute
declarations.
The @relation Declaration
The relation name is defined as the first line in the ARFF file. The format is:
@relation <relation-name>
21
where <relation-name> is a string. The string must be quoted if the name includes spaces.
The @attribute Declarations
Attribute declarations take the form of an ordered sequence of @attribute statements. Each
attribute in the data set has its own @attribute statement which uniquely defines the name
of that attribute and its data type. The order the attributes are declared indicates the
column position in the data section of the file. For example, if an attribute is the third one
declared then Weka expects that all that attributes values will be found in the third comma
delimited column.
The format for the @attribute statement is:
@attribute <attribute-name> <datatype>
where the <attribute-name> must start with an alphabetic character. If spaces are to be
included in the name then the entire name must be quoted.
The <datatype> can be any of the four types supported by Weka:
numeric
integer is treated as numeric
real is treated as numeric
<nominal-specification>
string
date [<date-format>]
where <nominal-specification> and <date-format> are defined below. The
keywords numeric, real, integer, string and date are case insensitive.
Numeric attributes
Numeric attributes can be real or integer numbers.
Nominal attributes
Nominal values are defined by providing an <nominal-specification> listing the possible
values: {<nominal-name1>, <nominal-name2>, <nominal-name3>, ...}
The ARFF Data Section
The ARFF Data section of the file contains the data declaration line and the actual instance
lines.
The @data Declaration
The @data declaration is a single line denoting the start of the data segment in the file. The
format is:
@data
22
The instance data
Each instance is represented on a single line, with carriage returns denoting the end of the
instance.
Attribute values for each instance are delimited by commas. A comma may be followed by
zero or more spaces. Attribute values must appear in the order in which they were declared
in the header section (i.e., the data corresponding to the nth @attribute declaration is always
the nth field of the attribute).
A missing value is represented by a single question mark, as in:
4.4,?,1.5,?,Iris-setosa
Values of string and nominal attributes are case sensitive, and any that contain space must
be quoted, as follows:
@relation LCCvsLCSH
@attribute LCC string
@attribute LCSH string
@data
AG5, 'Encyclopedias and dictionaries.;Twentieth century.'
AS262, 'Science -- Soviet Union -- History.'
AE5, 'Encyclopedias and dictionaries.'
AS281, 'Astronomy, Assyro-Babylonian.;Moon -- Phases.'
AS281, 'Astronomy, Assyro-Babylonian.;Moon -- Tables.'
Dates must be specified in the data section using the string representation specified in the
attribute declaration. For example:
@RELATION Timestamps
@ATTRIBUTE timestamp DATE "yyyy-MM-dd HH:mm:ss"
@DATA
"2001-04-03 12:12:12"
"2001-05-03 12:59:55"
JAVA CODE FOR GENERATING THE REQUIRED ARFF FILE
Accelerometer.java
import java.io.BufferedReader;
...
...
import java.util.Scanner;
23
public class Accelerometer {
static FileWriter f=null;
public static void main(String[] args) {
ArrayList<String> acc_x = new ArrayList<String>();
ArrayList<String> acc_y = new ArrayList<String>();
...
...
ArrayList<String> vel_gyr_avg = new ArrayList<String>();
BufferedReader br = null;
//Scanner sc = null; FileWriter f=null;
try {
f = new FileWriter(new File("outfile.arff"));
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
try {
f.write("@relation actvity");
f.write("\n");f.write("\n");
//f.write("@attribute activity string");f.write("\n");
f.write("@attribute minmax_acc_x");f.write("\n");
f.write("@attribute minmax_acc_y");f.write("\n");
...
...
f.write("@attribute mean_vel_gyr_z");f.write("\n");
f.write("@attribute mean_vel_gyr_avg");f.write("\n");
...
...
f.write("@attribute type {curls, pulls,
stretches}");f.write("\n");f.write("\n");
f.write("@data");f.write("\n");f.write("\n");
} catch (IOException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
for(int iter=1;iter<=117;iter++)
{
try {
String s;
double cum_acc_x=0.0;
double cum_acc_y=0.0;
double cum_acc_z=0.0;
double cum_gyr_x=0.0;
double cum_gyr_y=0.0;
double cum_gyr_z=0.0;
24
br = new BufferedReader(new
FileReader(getFileName(iter)));
int i=0;
while ((s = br.readLine()) != null) {
String[] arr = s.split(",", 100);
acc_x.add(i, arr[0]);
acc_y.add(i, arr[1]);
acc_z.add(i, arr[2]);
double
avg_acc=(Double.parseDouble(arr[0])+Double.parseDouble(arr[1])+Double.p
arseDouble(arr[3]))/3;
acc_avg.add(i, ""+avg_acc);
gyr_x.add(i, arr[3]);
gyr_y.add(i, arr[4]);
gyr_z.add(i, arr[5]);
double
avg_gyr=(Double.parseDouble(arr[3])+Double.parseDouble(arr[4])+Double.p
arseDouble(arr[5]))/3;
gyr_avg.add(i, ""+avg_gyr);
cum_acc_x+=Double.parseDouble(arr[0]);
cum_acc_y+=Double.parseDouble(arr[1]);
cum_acc_z+=Double.parseDouble(arr[2]);
vel_acc_x.add(i, ""+cum_acc_x);
vel_acc_y.add(i, ""+cum_acc_y);
vel_acc_z.add(i, ""+cum_acc_z);
double
avg_vel_acc=(cum_acc_x+cum_acc_y+cum_acc_z)/3;
vel_acc_avg.add(i, ""+avg_vel_acc);
cum_gyr_x+=Double.parseDouble(arr[3]);
cum_gyr_y+=Double.parseDouble(arr[4]);
cum_gyr_z+=Double.parseDouble(arr[5]);
vel_gyr_x.add(i, ""+cum_gyr_x);
vel_gyr_y.add(i, ""+cum_gyr_y);
vel_gyr_z.add(i, ""+cum_gyr_z);
double
avg_vel_gyr=(cum_gyr_x+cum_gyr_y+cum_gyr_z)/3;
vel_gyr_avg.add(i, ""+avg_vel_gyr);
i++;
//System.out.println(arr[0]);
}
} catch (IOException e) {
e.printStackTrace();
25
} finally {
try {
if (br != null)br.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
double minmax_acc_x =StdStats.minmax( acc_x);
double minmax_acc_y =StdStats.minmax(acc_y );
...
... // calculating values
...
double corr_vel_gyr_yz =StdStats.correlation(vel_gyr_y
,vel_gyr_z );
double corr_vel_gyr_xz
=StdStats.correlation(vel_gyr_x,vel_gyr_z );
try {
f.write(""+ minmax_acc_x +",");
f.write(""+ minmax_acc_y +",");
...
...
f.write(""+ mean_vel_gyr_z +",");
f.write(""+ mean_vel_gyr_avg +",");
if(iter>0&&iter<=44){
f.write("pulls");
}
else if(iter>44&&iter<=90){
f.write("stretches");
}
else if(iter>90&&iter<=117){
f.write("curls");
}
f.write("\n");
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
acc_x = new ArrayList<String>();
...
...
vel_gyr_avg = new ArrayList<String>();
}
try {
f.close();
} catch (IOException e) {
26
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static String getFileName(int n){
...
...
return name;
}
}
StdStats.java
public static double minmax(ArrayList<String> a){
double arr[]= new double[a.size()];
for (int i = 0; i <a.size(); i++) {
arr[i]=Double.parseDouble(a.get(i));
}
double mm=0;
double min = Double.POSITIVE_INFINITY;
for (int i = 0; i < arr.length; i++) {
if (Double.isNaN(arr[i])) return Double.NaN;
if (arr[i] < min) min = arr[i];
}
double max = Double.NEGATIVE_INFINITY;
for (int i = 0; i < arr.length; i++) {
if (Double.isNaN(arr[i])) return Double.NaN;
if (arr[i] > max) max = arr[i];
}
mm=max-min;
return mm;
}
public static double root_mean_square(ArrayList<String> a){
double arr[]= new double[a.size()];
for (int i = 0; i <a.size(); i++) {
arr[i]=Double.parseDouble(a.get(i));
}
double rms=0;
double ss=0;
for (int i = 0; i < arr.length; i++) {
ss=ss+Math.pow(arr[i], 2);
}
rms=Math.sqrt((double)(ss/arr.length));
return rms;
}
/**
27
* Returns the average value in the array a[], NaN if no such
value.
*/
public static double mean(ArrayList<String> a) {
double arr[]= new double[a.size()];
for (int i = 0; i <a.size(); i++) {
arr[i]=Double.parseDouble(a.get(i));
}
if (arr.length == 0) return Double.NaN;
double sum = sum(arr);
return sum / arr.length;
}
public static double mean(double[] arr) {
if (arr.length == 0) return Double.NaN;
double sum = sum(arr);
return sum / arr.length;
}
public static double correlation(ArrayList<String> x,ArrayList<String>
y) {
int n=y.size();
double arrx[]= new double[x.size()];
for (int i = 0; i <x.size(); i++) {
arrx[i]=Double.parseDouble(x.get(i));
}
double arry[]= new double[y.size()];
for (int i = 0; i <y.size(); i++) {
arry[i]=Double.parseDouble(y.get(i));
}
double corr=0;
double sumx=sum(arrx);
double sumy=sum(arry);
double arry2[]= new double[y.size()];
for (int i = 0; i <y.size(); i++) {
arry2[i]=arry[i]*arry[i];
}
double arrx2[]= new double[x.size()];
for (int i = 0; i <x.size(); i++) {
arrx2[i]=arrx[i]*arrx[i];
}
double sumx2=sum(arrx2);
double sumy2=sum(arry2);
double arrxy[]= new double[y.size()];
for (int i = 0; i <y.size(); i++) {
arrxy[i]=arrx[i]*arry[i];
}
double sumxy=sum(arrxy);
28
double corr_num= n*sumxy-sumx*sumy;
double corr_den=Math.sqrt(n*sumx2-sumx*sumx)*Math.sqrt(n*sumy2-
sumy*sumy);
corr=corr_num/corr_den;
return corr;
}
public static double var(ArrayList<String> a) {
double arr[]= new double[a.size()];
for (int i = 0; i <a.size(); i++) {
arr[i]=Double.parseDouble(a.get(i));
}
if (arr.length == 0) return Double.NaN;
double avg = mean(arr);
double sum = 0.0;
for (int i = 0; i < arr.length; i++) {
sum += (arr[i] - avg) * (arr[i] - avg);
}
return sum / (arr.length - 1);
}
JAVA CODE FOR GENERATING BINARY DECISION TREE
package com.tic.DTree;
import java.util.Scanner;
import com.tic.DTree.BTNode;
public class WorkoutGuess
{
private static Scanner stdin = new Scanner(System.in);
public static void main(String[ ] args)
{
BTNode<String> root;
instruct( );
root = beginningTree( );
do
play(root);
while (query("Want to try again?"));
}
public static void instruct( )
{
System.out.println("Please get some features of the workout
ready?");
System.out.println("I will ask some yes/no questions to try to
figure");
29
System.out.println("out what type of workout it is.");
}
public static void play(BTNode<String> current)
{
while (!current.isLeaf( ))
{
if (query(current.getData( )))
current = current.getLeft( );
else
current = current.getRight( );
}
System.out.print("Are you doing " + current.getData( ) + ". ");
if (!query("Am I right?"))
learn(current);
else
System.out.println("OK.");
}
public static BTNode<String> beginningTree( )
{
BTNode<String> root;
BTNode<String> child;
final String ROOT_QUESTION = "Is maxmin_acc_z greater than
1686?";
final String LEFT_QUESTION = "Is maxmin_gyc_avg greater than
133?";
//We can add RIGHT_QUESTION if we want.
//final String RIGHT_QUESTION = "Is <var> greater/smaller than
<value>?";
final String WORKOUT1 = "Pulls";
final String WORKOUT2 = "Curls";
final String WORKOUT3 = "Stretches";
//final String WORKOUT4 = "Stretches";
// Create the root node with the question Is maxmin_gyc_3
greater than 1686?
root = new BTNode<String>(ROOT_QUESTION, null, null);
// Create and attach the left subtree.
child = new BTNode<String>(LEFT_QUESTION, null, null);
child.setLeft(new BTNode<String>(WORKOUT1, null, null));
child.setRight(new BTNode<String>(WORKOUT2, null, null));
root.setLeft(child);
// Create and attach the right subtree.
child = new BTNode<String>(WORKOUT3, null, null);
//child.setLeft(new BTNode<String>(WORKOUT3, null, null));
// child.setRight(new BTNode<String>(WORKOUT4, null, null));
root.setRight(child);
return root;
}
30
public static void learn(BTNode<String> current)
// Precondition: current is a reference to a leaf in a taxonomy
tree. This
// leaf contains a wrong guess that was just made.
// Postcondition: Information has been elicited from the user, and
the tree
// has been improved.
{
String guessWORKOUT; // The WORKOUT that was just guessed
String correctWORKOUT; // The WORKOUT that the user was thinking
of
String newQuestion; // A question to distinguish the two
WORKOUTS
// Set Strings for the guessed animal, correct animal and a new
question.
guessWORKOUT = current.getData( );
System.out.println("I give up. What are you doing? ");
correctWORKOUT = stdin.nextLine( );
System.out.println("Please type a yes/no question that will
distinguish a");
System.out.println(correctWORKOUT + " from a " + guessWORKOUT +
".");
newQuestion = stdin.nextLine( );
// Put the new question in the current node, and add two new
children.
current.setData(newQuestion);
System.out.println("As a " + correctWORKOUT + ", " +
newQuestion);
if (query("Please answer"))
{
current.setLeft(new BTNode<String>(correctWORKOUT, null,
null));
current.setRight(new BTNode<String>(guessWORKOUT, null,
null));
}
else
{
current.setLeft(new BTNode<String>(guessWORKOUT, null, null));
current.setRight(new BTNode<String>(correctWORKOUT, null,
null));
}
}
public static boolean query(String prompt)
{
String answer;
System.out.print(prompt + " [Y or N]: ");
answer = stdin.nextLine( ).toUpperCase( );
while (!answer.startsWith("Y") && !answer.startsWith("N"))
31
{
System.out.print("Invalid response. Please type Y or N: ");
answer = stdin.nextLine( ).toUpperCase( );
}
return answer.startsWith("Y");
}
}