Python para
Python para
Diploma Thesis
The aim of this diploma thesis is, to develop a visual tracking system,
which improves the pet-computer interface of Metazoa Ludens, in order to
allow the interaction of one or more pets with a human through the game.
1 Introduction 1
1.1 Remote human-pet interaction . . . . . . . . . . . . . . . . . . . 1
1.2 Metazoa Ludens . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 The Hamsters 27
3.1 Biological classification . . . . . . . . . . . . . . . . . . . . . . . 27
3.2 History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.3 Natural Habitat . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
i
Table of Contents
4 Implementation 37
4.1 Development environment . . . . . . . . . . . . . . . . . . . . . 37
4.2 Software architecture . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3 Image retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.1 Image capturing . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.2 Camera calibration . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.1 Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.4.2 Region labelling . . . . . . . . . . . . . . . . . . . . . . . . 44
4.4.3 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.4.4 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.5 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.5.1 Binary region descriptors . . . . . . . . . . . . . . . . . . 47
4.5.2 Basic gray value region descriptors . . . . . . . . . . . . 48
4.5.3 Histogram descriptors . . . . . . . . . . . . . . . . . . . . 49
4.5.4 Face recognition methods . . . . . . . . . . . . . . . . . . 51
5 Testing 53
5.1 Experimental setup . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.1 Test set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.1.2 Test setup . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Classification by histograms . . . . . . . . . . . . . . . . . . . . 54
5.2.1 Discriminant analysis of histogram classifiers . . . . . . 54
ii
Table of Contents
6 Conclusion 75
6.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.2 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
I Appendix I
A Classification performance of different histogram classifiers . II
B Comparison of different distance metrics (HSV-H) . . . . . . . VI
C Classifier robustness against external variance . . . . . . . . . VIII
D Quality of classification . . . . . . . . . . . . . . . . . . . . . . . XI
E Face recognition algorithms test results . . . . . . . . . . . . . XIII
F Bibliography XVI
iii
Chapter 1
Introduction
Pets are an important part of life for many people, and the interaction be-
tween pet-owner and pet is proven to be beneficial for the well-beeing of
humans as well as for the animals. Medical studies showed positive effects
of human-pet interaction on the medical condition of humans1 as well as
positive effects on social behaviour and self-esteem.2 Unfortunately, the
time which can be spent for these positive relationships between animals
and humans is getting constantly shorter due to increased working hours
and the resulting lack of free time. Furthermore an increased number of
professionals is forced to travel across the globe for business trips,3 with-
out the chance to see family members as well as their beloved pets. The
technological development of modern communication technologies, first of
all the internet, improves communication all around the world. However,
these improvements mainly affect the inter-human communication and
do not provide any way of remote human-pet interaction.
1
Chapter 1 Introduction
and speak to him through a phone. Furthermore they could use the phone
buttons to control a robotic arm and pet the rabbit through the system (see
Figure 1.1). Although this system allowed only limited interaction modes
it can be seen as the first remote human-pet interaction system.
Rover@home,6 a project of the MIT Media Lab, consisted of a computer me-
diated human-dog interface, which was derived from the popular clicker-
training technique. The author emphasised the animal-computer interface
as the main problem in mediated human-pet interaction and regarded it
as a more general form of a human-computer interface. Subsequently
human-computer interfaces would be just a special case of these inter-
faces for the animal homo sapiens.7 This definition allowed it to use design
rules for human computer interfaces to design the dog-computer interface
of the Rover@home project. The implementation of the project included
speakers, to talk to the dog, a treat dispenser and a motor-driven toy
which could be remotely controlled, as well as a webcam to watch the
reactions of the dog.
The project "Hello Kitty"8 proposed a combination of a feeding device and
a remote controlled toy for the remote interaction of a cat and a cat-owner.
The human could activate the feeding device and the toy by clicking on
a hyperlink on a website, and watch the reactions of the cat through a
webcam system. An interesting outcome of this project is the use of a
5
from Exploration (1997)
6
see Resner (2001)
7
see p.13 Resner (2001)
8
see Mikesell (2003)
2
1.1 Remote human-pet interaction
3
Chapter 1 Introduction
Figure 1.3: Poultry internet: rooster wearing a touch dress and a human
interacting with the rooster12
oped a game setup to play the game Pac-Man, while real crickets controlled
the movements of the ghosts in the game. The crickets where trapped in a
maze (see Figure 1.2), and could be forced to move by activating vibration
motors in different parts of the maze, while their position was tracked with
a color tracking algorithm. The aim of the project was not to establish an
enjoyable game-play for both, the human and the crickets, but to find out,
whether the crickets could act as intelligent control mechanisms which
ensure a less deterministic opponent behaviour than pre-programmed ar-
tificial opponents.
"Poultry internet"13 is a multimodal remote human-pet interaction system
which is meant to improve poultry welfare by establishing a strong link
between a human and a chicken. In contrast to other human-pet inter-
action projects poultry internet has also a strong focus on novel haptic
human-computer interfaces, beside the focus on a the pet-computer inter-
face. The poultry-pet wears a custom-built touch dress (see Figure 1.3),
while its position is tracked by a video tracking system. The movements of
the pet trigger the movement of a toy chicken on a table near the human.
If the human touches the toy chicken this touch is transmitted through
12
from Lee et al. (2006)
13
see Lee et al. (2006)
4
1.2 Metazoa Ludens
the internet, and is replicated by vibrating motors in the touch dress of the
poultry-pet. Furthermore movements of the pet trigger vibrating motors
at the humans feet, so the human can feel if the pet moves.
In general, most remote human-animal interaction systems have a strong
focus on monitoring the behaviour of the pet, and propose the interaction
based on basic schemes like the clicker-technique or gratification through
feeding of treats. Therefore an inter-species gaming framework, Metazoa
Ludens,14 is proposed, which allows the remote interaction of a human
and his pet, which is interesting for both, in the form of a game.
5
Chapter 1 Introduction
6
1.3 Outline
score through the game play.17 However, the tests showed as well, that
it would be desirable to have the possibility to include more than one
hamster at the same time into the game. In this case the human-pet inter-
action in Metazoa Ludens could be enriched with the inter-pet interaction
between the hamsters. Furthermore the training inside the game could be
used to ensure the well-beeing of more hamsters at one time. The current
version of the pet-computer interface does not allow to play with more
than one pet simultaneously, because the motion tracking algorithm is
only able to track the movement of one pet at the same time. Therefore
the tracking algorithm of Metazoa Ludens has to be improved.
Another reason why the tracking system has to be changed is the fact, that
in the current version of Metazoa Ludens, a human is needed to place the
hamster from the cage to the tank. For the future it is planned to connect
the game tank to the cage by a tunnel, to give the hamsters the possibility
to play whenever they like, and end the game as well, if they are tired. In
order to keep track which of the hamsters are playing, a system has to be
developed which is able to distinguish between the individual animals.
1.3 Outline
The main part of this thesis will describe how the pet-computer interface
of Metazoa Ludens can be improved to allow the interaction of multiple
animals. Therefore the fields of motion tracking and object recognition are
explored. Human face recognition is taken as a starting point on the way
to build a pet recognition algorithm. After describing multiple face recog-
nition algorithms, Chapter 3 will discuss features of the hamsters which
can be used for recognition. Chapter 4 will describe the implementation of
a prototypical hamster recognition algorithm, and Chapter 5 will discuss
the classification performance of different hamster recognition algorithms,
in comparison with standard face recognition algorithms.
17
see p.p. 6 KC Tan et al. (2006b)
7
Chapter 2
In the easiest case the segmentation can be based on the gray-values of the
image, assuming that the objects have significantly different gray-values
than the background. In this case the segmentation can be easily done
using a threshold operation,2 which labels every pixel as object pixel or
background pixel, depending on the fact if the pixel value is above or below
a certain threshold (see Figure 2.1). Unfortunately simple thresholding
fails whenever the gray-values of objects and background are not different
enough or when the lighting is not uniform, which leads to changing gray-
values. To solve this problem several adaptive thresholding operations
1
see p.544 Steger (2006)
2
see p.31 Davies (2005)
9
Chapter 2 Theory and Background
10
2.1 Motion tracking
6
see p.507 Davies (2005)
7
see Horn and Schunck (1981)
8
see p.552 Steger (2006)
11
Chapter 2 Theory and Background
12
2.2 Tracking for Metazoa Ludens
Figure 2.2: Template matching: The crosses denote the tracked image
points, the dashed rectangle denotes the template position in
the previous frame10
the flat surface, and the camera is mounted on top and is looking down,
the problem can be tackled as a 2D tracking problem. The background of
the tracking input is a black fabric, which allows the use of thresholding
techniques for segmentation. Furthermore, the setup ensures that occlu-
sion, which is a big problem in other applications like people-tracking, is
no issue, as the pets are not likely to be occluding each other.
The following section describes the current tracking system which was
used for a prototype of Metazoa Ludens, as well as the further development
towards a combined hamster-tracking/hamster-recognition system.
13
Chapter 2 Theory and Background
using a variable threshold, which results in black pixels for the foreground
and white pixels for the background.
The tracking algorithm determines the position of the hamster by the sim-
ple calculation of the mean position of all black pixels. Although this
method is simple, computionally cheap and therefore suitable for a real-
time tracking system, it has some major drawbacks. It just works if only
one object is present, because multiple objects influence the tracker alto-
gether. This leads to a tracking point which is somewhere between the
objects, depending on the size of the objects.
In order to track multiple hamsters, a tracking algorithm has to be devel-
oped, which segments the image not only in background and foreground
but into multiple foreground objects. It has to be able to distinguish be-
tween the objects, and find the objects again if they leave and enter the
field of view of the camera again. Although the camera’s view covers the
whole gaming area it does not cover the rest area for the hamsters, which is
connected to the gaming area with a tunnel. For this reason the algorithm
has to involve a recognition part, which is able to distinguish between all
hamsters which are part of the test.
The new tracking system should be able to detect the position of each
hamster inside the gaming area at any time. In general, the system has
to track and recognise a number of non-rigid moving objects. The first
step uses an adaptive color threshold to distinguish between foreground
and background. Therefore the background color and standard deviance
is learned by the capturing of some frames without any hamsters. These
values are used for the segmentation process, where all pixels inside the
background color range are considered as background and all other pixels
are considered as foreground.
In the next step a region labelling algorithm is used to label the different
objects. The algorithm returns a number of labeled regions, which de-
scribe the objects inside the image, e.g. the hamsters, and a background
object. The region labels are not corresponding to the individual hamsters,
14
2.2 Tracking for Metazoa Ludens
Figure 2.3: Input image for the tracking software in Metazoa Ludens
but to the position of the region inside the image. In order to track individ-
ual hamsters, a way has to be found to distinguish between the individual
hamsters. The easiest approach for the tracking problem would be a track-
ing filter, for example a kalman filter, which could help to keep the labeling
of the hamsters constant by approximating the movements. Unfortunately,
this approach fails whenever the hamsters are close together, and it fails
whenever a hamster leaves the gaming area and moves back in again. To
ensure tracking in these cases as well, a combination of a tracking filter
with a recognition algorithm could be used. The tracking filter is able to
track the movement of the objects in the default case, while the recognition
part is used to handle the special cases. Furthermore a pet recognition al-
gorithm could be useful in many more scenarios, for example in pet shops,
for pet lovers or in medical research.
15
Chapter 2 Theory and Background
16
2.4 Face recognition
17
Chapter 2 Theory and Background
tool in the last couple of years. Today, face recognition systems show
good performance in controlled environments, while unconstrained face
recognition is still a difficult task, because of the big variations through
illumination, pose, expressions and aging.15
In general, face recognition can be formulated as the following problem:
"Which persons are shown on a given still image or video?" There are three
different types of scenarios for face recognition:16
Face verification (”Am I who I say I am?”): A person claims to have a
certain identity, which has to be proven by comparing the face image of
the person to a stored face image of the claimed identity. If the result of the
comparison is over a certain threshold, the persons’ identity is considered
as verified.
Face identification (”Who am I?”): In order to identify a person, the face
image has to be compared with each image inside the face database. The
results are ranked, and the result with the highest similarity value is con-
sidered as match. This approach works as a closed-test, i. e. the individ-
ual has to be known to be inside the face database.
The watch list(”Are you looking for me?”): In this case, the system matches
the current face against each image in the database, and raises an alarm,
if the similarity value is over a certain threshold. This scenario is called
an open-universe test, and can be used for example in law enforcement to
spot criminals, which are on a list of wanted persons.
The hamster recognition system will be developed as an identification sys-
tem, in which each object in the camera’s field of view will be tested on the
hamster database, in order to recognise the hamster.
18
2.4 Face recognition
ground. For this purpose skin color filtering,17 kernel based approaches
like the support vector machine18 or boosted classifiers based on haar-
like features19 can be used. The next module in the processing flow is
a face alignment module which normalises the captured faces. One possi-
ble approach is the detection of salient features of the faces, for example
the position of the eyes. The eyes’ position can be used to determine the
rotation and scaling of the face in the captured image, in order to trans-
form the face image into a standard representation. This representation
is necessary to ensure the correct matching of features, especially if sta-
tistical pattern recognition methods are used to describe the face image.
The normalised face image, which is often called probe image, is fed into
an feature extraction algorithm to find a group of features which describe
the individual face image. In the last step these features are matched to
stored features from a database. The result of this feature matching pro-
cess is a similarity value which denotes the similarity of the probe image
and a stored gallery image. An application specific threshold can be used
to decide if a face is considered as recognised or not.
19
Chapter 2 Theory and Background
the goal to classify patterns into a set of classes.21 Figure 2.5 shows the
general processing flow of pattern recognition. The data from a sensor, for
example a camera, is called representation pattern. A feature selector is
used to produce a feature pattern, by selecting salient features out of the
input data. Based on the feature pattern a classifier can decide to which
class the input data belongs. In a face recognition context the classes are
the individuals which have to be distinguished by their faces. The feature
pattern is a set of measurements on features of the object representation
which is stored in a feature vector. Which features are measured is de-
pending on the application and the input data, for example formants -
peaks in the frequency domain - can be used for speech recognition.
A simple approach to generate a feature vector for image recognition is
to build a vector which includes the intensity value of each pixel in the
image. In this high-dimensional image space each image is described
by one point. The problem of this approach is the high-dimensionality
of the image space, which leads to the "curse of dimensionality".23 The
feature vector of an image with 64x64 pixels and 256 possible gray val-
ues already has the dimension 2564096 , which makes matching of these
feature vectors computationally expensive and increases the number of
samples which are needed to train a recognition system. Therefore it is
necessary to find possibilities to reduce the dimensionality of the feature
space. In the case of face recognition this reduction is theoretically fea-
sible, because only face images have to be compared and all faces share
geometric properties, like a rough symmetry, and consist of the same ba-
21
see p.2 Webb (2002)
22
from Webb (2002), p. 3
23
see p.141 Shaknarovich and Moghaddam (2004)
20
2.4 Face recognition
sic elements like two eyes, a nose, and a mouth.24 Thus the "intrinsic
dimensionality"25 of face images is much lower than the dimensionality of
general images of this size, and the feature vectors can be transformed
from the high-dimensional image space to a face space which has a much
lower dimensionality. Apart from face recognition this face space can be
also used for face detection. Images which lie inside the face space can be
considered as face images while outliers are detected as non-face images.
To reduce the high-dimensional image space to a subspace with a smaller
dimension, several statistical methods can be used.
21
Chapter 2 Theory and Background
space. The vectors of multiple images of each subject are averaged and
stored to represent the individuals in the database. The euclidean dis-
tance of a new image vector to the stored vectors is taken as a measure-
ment for the similarity of the new face and the faces in the database.
Despite its success for face recognition, the eigenface PCA approach has
some major drawbacks. It does not model different variance classes but
maximises the overall variance of the data. Therefore it cannot deal well
with variance which stems from image transformations like scaling, shift
or rotation,29 as well as variance through lighting or facial expression.
Furthermore it has to be noted, that PCA itself just orders data based on
variability, and helps to reduce the dimensionality of the data by retaining
the most variant dimensions on the data. However, a high variance does
not automatically mean that these features are well-suited for discrimina-
tion between classes and therefore perform well in a pattern classification
task.30
28
from Shaknarovich and Moghaddam (2004), p. 144
29
see p.242 Howell (1999)
30
see p.713 Davies (2005)
22
2.4 Face recognition
31
see Fisher (1936)
32
from Belhumeur et al. (1997), p. 714
23
Chapter 2 Theory and Background
|ΦT Sb Φ|
(2.1)
|ΦT Sw Φ|
The "fisherface" algorithm34 was the first algorithm which used the linear
discriminant based on the fisher criteria in face recognition. Because of
the high computational complexity of Sw and Sb in the original image space
the fisherface algorithm uses PCA to reduce the dimensionality and calcu-
lates equation 2.1 from the vectors in the PCA subspace. In empirical
tests, which included extreme illumination changes, fisherfaces showed
significant better results than the eigenface method.35 This benefit can be
explained from the structural difference between LDA and PCA: PCA max-
imises the global variance of the data while LDA maximises the variance
in-between the classes. Therefore it can separate better between classes,
if they are linear separable. Figure 2.7 shows how classes, which lie near
a linear subspace, can be separated using the fisher linear discriminant
(FLD).
P (B|A) P (A)
P (A|B) = (2.2)
P (B)
33
see p.147 Shaknarovich and Moghaddam (2004)
34
see Belhumeur et al. (1997)
35
see p.717 Belhumeur et al. (1997)
36
see p.6 Webb (2002)
37
see Bayes (1763)
38
see p.453 Webb (2002)
24
2.4 Face recognition
39
see Moghaddam et al. (1996)
25
Chapter 3
The Hamsters
This chapter describes the animals which are used as pets and test sub-
jects for the Metazoa Ludens game. They are biologically classified, de-
scribed in their natural habitat and behaviour, in order to discuss to what
extend the Metazoa Ludens framework can be beneficial for them. Further-
more the different breeds are described, with the goal to find features of
the breeds as well as features of individual hamsters, which can be used
for the classification of the breed, and for the recognition of individual
hamsters, respectively.
27
Chapter 3 The Hamsters
3.2 History
In April 1930, professor Aronin, who currently studied and collected na-
tive animals in the dusty field near Aleppo in Syrian Desert (situated in
the Middle East to the north of Israel)2 , found a mother with twelve young
and decided to take them back to the Hebrew University in Jerusalem.
This is the first time that hamsters were kept in captivity. As the scien-
tists always look for new and more useful laboratory animals, hamsters,
which eat well, grow fast, become tame, breed rapidly and remain healthy
when they are caged, caught their attention. Pairs of hamsters were sent
to England, then France and in 1938, they were shipped to the United
States of America, for the first time. They found their way to spread all
over the world. Hamsters have become numerous and popular in schools,
museums, pet stores, laboratories and homes just in the short space of
twenty years.3
In the wild, hamsters live underground and search for food, their natural
habitat is the desert. Naturally hamsters are animals of fields, meadows
and open places. Living in rock piles and fence rows, they burrow down in
tunnels which are two to ten feet long. Females will build a nest and raise
her young.
Hamsters usually emerge under cover of darkness to find food. They rob
grain of the farmer, sometimes the nests of ground birds and eat eggs or
the young, they even eat small lizards, insects or worms. Hamsters stuff
the food they found into a pair of cheek pouches. When their pouches are
full, they hurry back to their burrows where they store food. This habit has
given them their name from the German word “hamstern”, meaning hoard
2
see Pet (2005b)
3
see Zim and Wartik (1951)
28
3.4 Hamsters as pet
All the hamsters found in schools, laboratories and homes now are off-
spring of the ones collected in Syria in 1930. Hamsters are usually very
tame, eat well, grow fast and remain healthy when they are caged. A full-
grown hamster is only 10 cm to 12 cm long in case of a golden hamster,6
dwarf hamsters are even smaller - so they need less room than any other
pets. Hamsters are clean, easy to house, simple to feed and interesting to
watch as they grow up, play in their cage and raise their families. These
characteristics have made them popular as pets.
Having a body length of only 4-5cm, the Roborovski is the smallest among
the hamster species. Its white eyebrows give it a very distinctinve feature.
4
see Zim and Wartik (1951)
5
see Pet (2005b)
6
see Zim and Wartik (1951)
29
Chapter 3 The Hamsters
The natural color for Roborovski is sandy-gold with an ivory belly, black
eyes and gray ears (see Figure 3.1). The “White Face Roborovski” has a
distinguishing white face.7
In comparison to other hamsters, Roborovski hamsters are very active and
fast runners. Therefore they are not a suitable pet for children, and not as
easy to handle as other breeds. On the other hand this attribute qualifies
this species for Metazoa Ludens, as the game is able to satisfy their desire
to be active, and is therefore able to improve the quality of living for these
animals.
The hamster body length varies from 5.3 to 10.2 cm, with an additional
0.7 to 1.1 cm of tail. Its size will usually reach the larger end of that
range. The fur of the upper body is grayish.The grayish body color usually
extends to the upper part of each leg. A dark dorsal stripe runs along the
7
see Pet (2006b)
30
3.5 Hamster species
length of the body. The underside and the sides of the muzzle, upper lips,
lower cheeks, lower flanks, limbs, legs and tails are white. Its tail and feet
are usually covered by its fur.
The hamster is most active in the evening, with some activity continued
throughout the night. It can also be quite alert during the daytime. It
appears to be docile as it is very nearsighted. It can appear calmer if
accustomed to a familiar voice during handling time.8 Like all dwarf ham-
sters, the russian dwarf hamster is a very fast runner, which would make
it an ideal choice for the Metazoa Ludens gaming system. Unfortunately it
is not possible to cage russian dwarf hamsters and roborovski hamsters
together. Therefore the decision to test only roborovski hamsters has been
made.
31
Chapter 3 The Hamsters
the initial plan to keep dwarf hamsters and golden hamsters in one cage
was abandoned, in order to avoid a deadly carnage. In the wild golden
hamsters occupy a large area, and try to keep a wide distance between
their burrows, which is usually above 100 m.12 Having said that, wild
golden hamsters are used to run a lot, a possibility which they do not
have when they are kept in captivity, unless they have a running wheel or
a gaming system like Metazoa Ludens.
The most important natural behavior of hamsters is that they store food
in their cheek pouches and empty the food into a storage pile. Hamster
feed mainly on grain found in nearby field; now and then they eat small
lizards, insects or worms. Their natural enemies - snakes, hawks, owls,
weasels and foxes - are usually bigger than them.13 Thus, they choose to
run and hide rather than fight back. In a first prototype a bait with food
12
see Gattermann et al. (2001)
13
see Zim and Wartik (1951)
32
3.7 Hamsters in Metazoa Ludens
(see Figure 3.3) was used to attract the hamsters, which did not work well,
because it is not a natural behaviour of hamsters to hunt for food.
The recent prototype of the system is equipped with a bait which consists
of a small pipe (see Figure 3.4), where the hamster can crawl into and
hide. This uses the natural behaviour of hamsters to run away and hide
if they are attacked. Although this method needs some training until the
hamsters recognise the bait as attractive to crawl into and start running
after it, the pipe bait is working much better than the food bait.
Like all pets, hamsters need exercise and entertainment to maintain their
physical and mental health. Normally, a running wheel is attached in the
cage for them to exercise. As an alternative the Metazoa Ludens system
is tested on its ability to influence not only the well-being of the human
player, but have beneficial effects for the hamsters. Furthermore the mo-
tivation of the hamsters to play Metazoa Ludens was also tested, using
33
Chapter 3 The Hamsters
34
3.8 Hamster recognition
The hamsters’ near-sightedness unables them to see objects that are close
in range.17 This would mean that they would often identify objects by
biting them rather than looking at them.18 Nevertheless, with the lateral
position of their eyes, hamsters have a wide angle of vision. They may still
be able to spot movements of objects from a greater perceived distance.
Hamsters are color-blind, being only able to see different shades of black
and white. They could also be nearly blind in bright daylight. It is also
believed that those with red eyes have poorer eye sight than those with
black eyes.19
Unlike their eyes, hamsters’ sense of hearing is very well-developed. They
can hear a wide variety of sounds, including those made in ultrasonic
frequencies. This helps hamsters to communicate with each other without
being heard by others. Hamsters can often be seen to freeze if they hear
unfamiliar sounds or noise, especially loud noises.20
Apart from their good sense of hearing, hamsters are also compensated
with a very good sense of smell. They distinguish one another by their
distinct scents. They can make use of distinct musk-like liquid produced
from scent glands to identify other hamsters and to mark their territory.
This may also enable them to distinguish the sex of another hamster
through smelling.21
Hamsters can have more than one color on their fur. For the Russian
Dwarf Hamsters species, the normal type is dark gray in color with a
darker gray undercolor. It has a thick jet black dorsal stripe and an almost
white belly. The eyes are black and the ears are gray. The Sapphire type
has a soft purple-gray fur with gray undercolor, a thick gray dorsal stripe
and ivory belly. The eyes are black and the ears are light gray-brown.22
17
see Hamsterhideout (2006)
18
see for the Prevention of Cruelty to animals (2006)
19
see Hamsterhideout (2006)
20
see Hamsterhideout (2006)
21
see Hamsterhideout (2006)
22
see Pet (2006a)
35
Chapter 3 The Hamsters
Roborovski hamster have a top coat color which is dark chestnut or gold
with a slate gray undercolor. The belly and side arches are white. The
hamsters have white eyebrows just above their eyes and a white patch
around the nose. Unlike other dwarf hamsters, they do not have a dorsal
stripe.23
The golden hamster has golden-brown top fur, while the belly is gray
or white. It can be possible to distinguish individual golden hamsters
through dark patches on their forehead and a black stripe on their back.24
Apart from such non-specific statements, there are not many facts about
the probability that two hamsters can be distinguished based on the ap-
pearance of their fur. Therefore the question has to be answered later,
based on the outcome of the experimental analysis in Chapter 5.
23
see Chamberlain. (1992)
24
see Alderton and Tanner (1999)
36
Chapter 4
Implementation
The software for Metazoa Ludens is written in C++/C, and Microsoft Visual
Studio is being used for software development. The decision to use C++ for
the hamster tracking stems from a couple of reasons. At first, some parts
of the Metazoa Ludens system, for example motor controls, have already
been developed in C++, and the new code should integrate seamlessly into
the old code. Secondly, the whole system is planned to work in realtime
on standard PC hardware, and produce sound and 3D graphics output
via standard libraries - in this case DirectX libraries. The third reason
was the insight, that a framework has to be used for the programming
of the hamster tracking software, in order to avoid the time-consuming
37
Chapter 4 Implementation
The server grabs a frame from one of the cameras which are mounted
above the hamsters’ running area. After the initial preprocessing, the
tracking of the different objects inside the image frame is done. The infor-
mation about position, orientation and identity of the hamsters is sent to
the client via a TCP/IP connection.
1
see OpenCV
2
see: http://opencvlibrary.sourceforge.net/cvBlobsLib
38
4.2 Software architecture
Cam 1
Camera Hamster ID/Position/Orientation
Tracking Hamster Player
Selector
Cam 2
Arm position
Direct 3D
Game Engine
Display
Accentuator Rendering
Surface data
Accentuators Collision Detect
Controller
On the client side this information is used to create an object for every
hamster player inside the DirectX game engine. The client takes the input
of the human player on the keyboard, to move the human player object in
the virtual gamefield. Furthermore the graphics rendering for the virtual
world, which includes the hamster avatar as well as the human avatar, is
done on the client machine, and the output is displayed on the standard
display. The position of the human player and information about the
virtual surface are sent back via a TCP/IP connection to the server, by
the client software.
The server controls the moving arm with the bait according to the position
of the human player inside the game. The surface of the hamster tank
is controlled by sixteen accentuators, according to the virtual surface in-
formation which has been sent from the client. Thereby the real surface
for the hamster can be warped in realtime, depending on the virtual game
model. For improved flexibility the motor control has been implemented
as a bluetooth interface.
39
Chapter 4 Implementation
Filesystem
Orientation/Pos. Position
findOrientation & Orientation
The video stream is captured by two Dragonfly cameras4 with 30 fps at 640
x 480 pixels. The OpenCV function cvCapture is used for image retrieval,
which allows capturing of frames from a camera as well as frames from a
video file. The two cameras are both mounted above the gaming field, and
show both the whole gaming area. Therefore the image of only one camera
3
see Intel (2006)
4
see http://www.ptgrey.com/products/dragonfly/index.asp
40
4.3 Image retrieval
Figure 4.3: Camera calibration: The left image shows the distorted input
image, the right image shows the undistorted and cropped out-
put image
is needed for the tracking process. However, the moving arm which holds
the bait obstructs some parts of the image. Because of that, two cameras
are used, and the camera selector module selects the unobstructed image
for further processing, based on the known position of the movable arm.
The gaming area is lit using two standard fluorescent tubes to get a nearly
uniform lighting. Due to the short distance of only 60 cm between the
cameras and the floor of the Metazoa Ludens structure, the cameras have
been equipped with a ultra-wideangle lens (2.5mm), in order to capture
the whole gaming area which is 86 x 86 cm wide. Unfortunately this wide-
angle lens produces a distorted image due to lens distortion which has to
be undistorted by the software. Without undistortion the position of the
hamsters would be tracked wrong, and the image would not be a suitable
input image for the recognition module.
The goal of camera calibration is to bring two coordinate systems into coin-
cidence: the world coordinate system of the objects which are depicted and
41
Chapter 4 Implementation
4.4 Preprocessing
4.4.1 Segmentation
42
4.4 Preprocessing
Figure 4.4: Segmentation: The left image shows the input image from the
camera, the right image shows the resulting mask (Foreground
is black, Background is white)
43
Chapter 4 Implementation
Figure 4.5: Region labelling: The left image shows the input image from
the camera, the right image shows the connected components
in different colours
Region labelling is done with use of the library cvBlobsLib. The library im-
plements region labelling for connected components in binary images, as
well as filtering of the connected components, for example by size. The con-
nected components, called blobs, are saved in a run-length representation,
which allows fast computation of features like the size of the blob, the coor-
dinates of the bounding box and the ellipse, which approximates the blob.
Furthermore the calculation of moments is implemented in cvBlobsLib.
As input for cvBlobsLib the binary mask output of the segmentation pro-
cess is used. The regions are labelled and then filtered by size to suppress
image noise which is represented by small, non-connected regions. The
output of the region labelling process is an indexed image, i.e. an array
of numbers, where every pixel of the input image is represented by an in-
dex number, depending on the connected component to which the pixel
belongs (see Figure 4.5). As the filtered image does only consist of fore-
ground (hamster) and background pixels, there is one large background
blob (denoted as 0 in Figure 4.5), as well as one blob per hamster. In the
following, all image operations are done for every foreground blob individ-
ually, while all background pixels are omitted.
44
4.4 Preprocessing
4.4.3 Orientation
1 2µ1,1
θ = − arctan (4.1)
2 µ0,2 − µ2,0
The orientation angle θ is used to normalise the rotation of the input region,
in this case the image of the hamster in a top-view, to a standard rotation.
Problems with the calculation of the rotation angle θ can arise for any
objects for which the major and the minor axis of the ellipse have the
same length, for example circles, as well as squares. This shortcoming is
no problem for blob objects which describe hamsters, because hamsters
8
see p.557 Steger (2006)
9
see p.557 Steger (2006)
10
from Steger (2006), p. 558
45
Chapter 4 Implementation
4.4.4 Normalisation
The function iplRotate is used to normalise the rotation and the position
of the input region into a standard position and rotation. This normalisa-
tion step is important for the application of the face recognition algorithms,
which are alignment based, and therefore need the input images in a nor-
malised position and rotation. Basic descriptors, which are only based on
the number of pixels inside the region as well as histogram descriptors do
not need to have a normalised input. iplRotate calculates the displace-
ment matrix D, based on the rotation angle θ and the position of the centre
of gravity of the region CoGx,y , as well as the normalised target position
Nx,y .
cos θ − sin θ xN − xCoG
D = sin θ cos θ yN − yCoG
0 0 1
In a second step the pixels inside the region are extracted from the in-
put frame with the function extractForeground, and transformed to the
normalised position and rotation by transformation with the displacement
matrix D. Figure 4.7 shows the normalisation of two input regions into two
separate images with a standard position and rotation. These separated
and normalised images are not moving or rotating wherever the hamsters
move inside the gaming area, although they are still subject to non-rigid
transformations, which stem from the hamsters’ way to move.
11
see p.558 Steger (2006)
46
4.5 Classification
4.5 Classification
Binary region descriptors describe the shape of a binary object and ignore
the actual intensity values of the objects’ image. They can therefore be
directly applied on the output image of the region labelling step. One of
the simplest region descriptors is the area of the object, measured by the
47
Chapter 4 Implementation
Gray value region descriptors describe the region, based on the intensity
value of the pixels inside the region. Therefore not only the shape of the
region is described but also the different brightness values which consti-
tute the region. The simplest features of a region are the minimum, the
maximum and the mean gray value of the region. The mean gray value is
a measure of the brightness of the region, and can therefore be used to
distinguish between objects which have significantly different gray values.
This distinction approach is quite similar to the thresholding approach,
which has been used for segmentation and included the mean RGB value
as well as the standard deviation (see Section 4.4.1). A basic gray value de-
scriptor does not work as a descriptor for individual hamsters out of two
reasons: At first, the mean gray value of the hamsters’ fur is not differ-
ent enough to allow distinction, and secondly, the mean brightness is too
48
4.5 Classification
Figure 4.8: Histogram descriptor: The chart shows the histogram distribu-
tion of the lightness values of 250 hamster images. The 32
bins of the histogram are shown in different angles, the dis-
tance from the middle shows the amount of values in this bin.
Every line refers to one test image.
49
Chapter 4 Implementation
as well as from colour channels (R,G,B) of a colour image. The RGB colour
model describes the colour of a pixel by the intensity of red, green and
blue light. Unfortunately this representation mixes the brightness and the
hue of a colour into the RGB values. Therefore it can be helpful to convert
the image into another colour space, for example HSL, which describes
a pixel by its hue, saturation and luminance value, if colour information
should be used as a descriptor.12 Furthermore, the RGB values are more
sensitive to image noise than transformed representations. In the HSL
model hue corresponds to the dominant wavelength of the colour, satura-
tion is the relative purity of the colour, in other words the amount of white
which is added to the colour, while luminance refers to the amount of light.
Another possible colour model is HSV, hue, saturation and value, which
is similar to HSL apart from one difference. While HSL uses a gray point
in between black and white as reference, HSV uses the white point (illu-
minant) as reference. Thus the lightness value in HSL spans from black
through hue to white, while the V part of the HSV model is defined as
the range in between black and the hue component. First tests using a
histogram representation with 32 bins of the luminance part of the HSL
representation of the images showed significant differences between the
two hamsters (see Figure 4.8).
This effect is mainly due to the fact, that one of the hamsters has more
bright fur than the other one. The different shapes of the histogram dis-
tribution indicate the differences between the hamsters, which are used
for distinction with the histogram descriptor. Figure 4.9 shows two ham-
ster images and the corresponding histograms. The histogram descriptor
is rotation invariant by definition, which means that it does not need the
normalisation steps described in Section 4.4.4. Furthermore it is very
robust against the non-rigid transformations through the movement of
the hamsters, as the effect of these non-rigid transformations is mainly a
change of the spatial layout, whereas the absolute number of pixels of one
brightness value is relatively stable. As the histogram does not model the
spatial layout, but only the frequency of specific values, the recognition
12
see p.506 Iglesias et al. (2006)
50
4.5 Classification
Figure 4.9: Histogram descriptor: The image shows the colour images of
two hamsters as well as the corresponding histograms of the
luminance value in HSL colour space
The alignment based face recognition methods, which have been described
in Section 2.4.2, are standard algorithms in face recognition and have
been already implemented in several face recognition libraries. The open
51
Chapter 4 Implementation
13
see Bolme et al. (2003)
52
Chapter 5
Testing
This chapter describes the composition of the test set, as well as the ex-
perimental setup which has been used to conduct the software tests. Fur-
thermore the choice of one specific histogram classifier is explained on
the basis of a comparative analysis of the discrimination performance of
seven different histogram classfiers. A comparative test of three differ-
ent distance metrics and their performance is being performed, and ROC
curves are being used to evaluate the quality of classification of the his-
togram classifiers. The final part of this chapter describes the results of
tests with several face recognition algorithms, and compares them to the
histogram classifiers.
1
Yellow circles denote the family groups, the green circles denote the different breeds
53
Chapter 5 Testing
The test set consists of five individual hamsters, which are of two different
breeds and of three different families. In the beginning a test set with six
hamsters was planned, but the abrupt exitus of one roborovski hamster
(sister of subject 1), prohibited the realisation of this plan. The biological
classification of the hamsters is done in Chapter 3 (see Section 3.1). The
test subjects are numbered from 1 to 5 and held in different cages to allow
multiple test series. The Figures 5.1 and 5.2 show the allocation of the
hamsters to the subsets of family and breed.
Instead of doing a real-time test, the hamsters are recorded while they
are running around in the playing field, and the test videos are analysed
later. This allows to test and compare different classifiers and algorithms,
and restrict the impact of possible side effects, e.g. changes in the lighting
situation. The tests are conducted with one hamster present at the playing
field at one time, in order to have a stable ground truth. Each hamster
is recorded for one minute for training and for one minute for testing, at
30 fps per second and a video size of 640x480 pixels. The videos are
edited in order to show two seconds of the background in the beginning,
without a hamster present, and 58 seconds of hamster and background.
The background footage is used to train the segmentation module (see
Section 4.4.1). In order to avoid side effects of compression algorithms, all
videos are recorded and edited as uncompressed footage.
The first classifiers which are tested are the histogram classifiers, which
have been specified in Section 4.5.3. In the following, the performance of
54
5.2 Classification by histograms
SSb
Γ= (5.1)
SSw
SSb is the scatter matrix in-between the classification groups, in this case
2
see SPSS Inc. (2007)
3
see p.165 Backhaus et al. (2003)
55
Chapter 5 Testing
56
the individual hamsters, while SSw is the scatter matrix inside the groups.
The number of discriminance functions depends on the number of classes,
for n classes n − 1 discriminance functions can be calculated. In general,
the first two discriminance functions are most important for the perfor-
mance of the class seperation.4 The coefficients of the first and second
discriminance function can be drawn into a diagram, and give a good
overview of the distribution of the test data in the discrimination space.
These diagrams are shown in Figure 5.3 and Figure 5.4, and allow a visual
comparison of the seven different histogram classifiers, which is done in
the following.
The first insight from the analysis of the diagrams is the fact, that all classi-
fiers are not able to discriminate well between subjects 4 and 5. The group
centroids (marked with a blue box) of these two subjects are close to each
other, and the area of distribution nearly lies on top of each other. There-
4
see p.179 Backhaus et al. (2003)
57
Chapter 5 Testing
As stated in Section 4.5.3, the colour models HLS and HSV are similar,
and therefore the diagrams of the hue based discriminant functions HSV-
H and HLS-H appear almost identical. The differences can be explained by
rounding errors at the transformation of the colour space. The same point
applies to the comparison of HSV-V and HLS-L. In contrast, the diagrams
for the saturation based HSV-S and HLS-S classifiers show a significant
difference, which is due to the different definition of saturation in the two
58
5.2 Classification by histograms
• HSV-S saturation
• HLS-S saturation
It has to be noted that these classification results are based on two as-
sumptions: The first assumption is, that the test set only includes mem-
bers of the group of known hamsters, whereby it is not necessary to dis-
tinguish the known hamster objects from other objects. The second point
5
The complete classification results are listed in Appendix A
59
Chapter 5 Testing
is, that the function does not use any information about the quality of the
classification, but will always return the hamsterID of the closest match,
even though the gallery histograms may be too close together for a reliable
classification. This problem will be addressed later (see Section 5.2.4).
The histogram classifier, which is based on gray values, needs less pro-
cessing time than the other classifiers, due to the easy transformation
from RGB to gray values. It has a poor performance compared to the hue
based histograms, but is good enough for the differentiation between the
two breeds, and shows nearly no misclassification for this task. However,
it is not able to distinguish between the individual hamsters of one breed,
and therefore has a mean classification rate of 74.40 %. As expected from
the diagrams, the "correct classification" values of the HSV-H and the HLS-
H classifier are very similar, as well as the results for HSV-V and HLS-L
classifiers. The performance difference of 2.5 % in this test between HSV-
S and HLS-S stems mainly from the fact, that the HLS-S classifier has a
poorer performance for the classification of subjects 2 and 3. In general, it
can be noted, that all classifiers have big problems to distinguish between
subject 4 and subject 5. Although nearly no misclassification between the
different breeds (subject 4 and 5 vs. 1,2,3) happen, the bad performance
inside the golden hamster breed has a substantial negative effect on the
overall classification performance of all classifiers. Due to the outstanding
classification performance, compared to the other classifiers, a hue based
classifier is chosen for further testing. These classifiers show a slightly bet-
ter performance than the saturation based classifiers, but a much better
performance than the value, lightness or gray-value based classifiers.
Apart from the correct colour model, which has to be used to build a
well-working histogram classifier, the distance metric is important for the
classification performance as well. It defines how the correlation between
two histograms is calculated, which is important for the matching of the
probe histogram in question, with the gallery histograms. Especially if the
class distributions are close together, like subjects 4 and 5, small differ-
60
5.2 Classification by histograms
ences in the calculation of the similarity value can affect the classification
performance notably. Three different distance metrics are tested with a
hue based histogram classifier:
• Correlation
• Chi-Square
• Intersect
where N is the number of histogram bins and Hk0 (I) can be computed as
follows:
X
Hk0 (I) = Hk (I) − 1/N ∗ Hk (J) (5.3)
J
The Chi-Square method defines the similarity of the two histograms by the
ratio of the difference to the sum of the values of the two histogram bins.
This ratio is summed up to find the similarity value for the histograms,
which becomes 0 if the histograms are identical.
X H1 (I) − H2 (I)
d(H1 , H2 ) = (5.4)
I H1 (I) + H2 (I)
The last method is called Intersect and needs least processing time of the
three formulas. It intersects the two histograms by finding the minimum
of every bin in the histograms, and summing up these minima:
X
d(H1 , H2 ) = min(H1 (I), H2 (I)) (5.5)
I
6
see Intel (2007)
61
Chapter 5 Testing
Based on the HSV-H classifier, a test including all three distance metrics,
has been conducted. The comparison of the ground truth and the clas-
sifiers’ guess provides the correct classification rate and the error rate,
respectively. The results are listed in Table 5.2:
Table 5.2: Classification and error rates for different distance metrics7
It is obvious from the data that the Intersect method has the worst perfor-
mance overall, and therefore should not be chosen for the pet recognition
application. In contrast to the other distance metrics, it does not only have
problems to differentiate between subjects 4 and 5, but also between sub-
jects 1 and 3. Figure 5.6 shows the perfect performance of the classifier
for subject 1 - it is classified correctly in all cases. Unfortunately subject
3 is never classified correctly, most of the time it is classified as subject 1.
7
The complete results are listed in Appendix B
62
5.2 Classification by histograms
The discrimination between the different breeds is the only task, which is
done well by this combination of HSV-H classifier and intersect distance
metric, but this task can be easily done as well by a more simple classifier,
like the classifier based on gray values.
Figure 5.7: HSV-H Correl Test 1 Figure 5.8: HSV-H Chi-Sqr Test 1
The Chi-Square and Correl distance metric generate much better classifi-
cation results of around 80% correct classification rate. The differences
between the results are not too big, Chi-Square seems to be more exact
for all subjects and has therefore a better overall performance than Correl.
Both methods deliver very good results for the discrimination inside the
roborovski breed (subjects 1,2,3), even between the members of one family.
There are nearly no cross-breed errors, which leads to a cross-breed error
rate of approximately 1% for the Chi-Square method. The result is even
better if the data of the subjects 4 and 5 is omitted, as these cannot be
distinguished by any of the algorithms, and they are responsible for a big
part of the classification errors. In this case the correct recognition rate,
to identify the correct one out of three roborovski hamsters, rises above
90%. As a result of this comparison, the intersect method is ignored in the
following, while the performance of the other two methods, under varying
external conditions, is investigated in the next section.
63
Chapter 5 Testing
Figure 5.9: HSV-H Correl Test 2 Figure 5.10: HSV-H Chi-Sqr Test 2
Even though the Metazoa Ludens System has fixed fluorescent tubes for
lighting, the room light is likely to be changing, as well as a changing
64
5.2 Classification by histograms
amount of sunlight, which can mix up with the light from the fluorescent
lights. Therefore the second test series includes a change in lighting of
approximately one f-stop, which is done by alteration of the mechanical
aperture of the camera. The white balance has been readjusted after the
lighting change. In the following, two classifiers and two distance metrics
are compared by their performance in the second test series. The classi-
fiers for this test had been trained on the first test series, so the effect of
the lighting change on the performance can be measured.
An exhaustive comparison of the classification performance for the second
testset, which included all tested classifiers, combined with all distance
metrics, has been conducted. For the sake of space, the individual re-
sults are not presented here, but discussed on the example of the HSV-H
classifier, which had the best recognition rates overall, and the gray value
classifier. It is assumed that a gray based histogram classifier is more
affected by the lighting change than a hue based classifier, while a hue
based classifier should be affected by hue changes, which can occur if the
white balance is shifted. The classification results of the second test series
are shown in Table 5.3:
It is evident from Table 5.3 as well as from the diagrams (see Figures 5.9
and 5.10) that the classification rate drops significantly for the second test
series, and the error rate rises, respectively. Especially the hue based clas-
sifiers, which should not be too much affected, as only the amount of light
but not the colour temperature has been changed, show a performance
drop of up to 35%. This performance drop is even higher than the one
for the gray based histogram classifier, which leads to the conclusion that
the hue based classifier, especially in connection with the correl distance
8
The complete results are listed in Appendix C
65
Chapter 5 Testing
Figure 5.11: Gray ChiSqr Test 1 Figure 5.12: Gray Chi-Sqr Test 2
66
5.2 Classification by histograms
is able to discriminate well between the individual hamsters for test set
2, which limits the technology to a laboratory setting, where the lighting
and camera parameters can be kept constant. For further development
more test series are needed, to explore the relation of changing light and
classification performance.
As described in Section 5.1.1, the test set consists of five hamsters, and
the classifier has to detect which of the five hamsters is present in the test
image. The assumption, that only objects which are part of the group of
known hamsters are depicted on the test videos, has been made. There-
fore the algorithm works as a closed-test, with a test setup similar to a
face identification system (see Section 2.4). As a basic algorithm the sys-
tem considers the hamster as a match whose gallery histogram has the
highest similarity value to the current histogram. Problems arise if the
distance values of two gallery histograms lie near together, or in more gen-
eral words, if two class distributions have similar locations in the feature
space. A good example are subjects 4 and 5 whose appearance is very sim-
ilar, and whose classifiers are close in feature space (see Figure 5.3). In
this case the test confidence is low, which means that the result of the clas-
sification process may be the right class, but a wrong classification has a
high likelihood as well. Therefore the outcome of the classifier is overlaid
by an error which has the form of an unknown likelihood distribution. In
order to minimise the impact of this error, a classification confidence value
has to be calculated, which is used to reject the classification if the con-
fidence is too low. One quality value for histogram classification, which
is based on the calculation of the similarity value (see Equations 5.2, 5.4
and 5.5), can be computed as the difference of the maximum similarity
value and the second largest similarity value:9
67
Chapter 5 Testing
68
5.2 Classification by histograms
Figure 5.13: ROC curve HSV-H Figure 5.14: ROC curve Gray
69
Chapter 5 Testing
be distinguished well by the classifier (see Figure 5.8), and therefore get
rejected in 96.9 % of all cases. The mean reject-rate for the subjects 1-3
is at 14.2%, a value which would be adequate for a hamster identification
system. In this configuration the software could be used to detect if a
roborovski hamster (subjects 1-3) is inside the playing field, and identify
the specific individual hamster, if it is a roborovski. However, a reject
result would not identify the hamster as a member of the golden hamster
breed (subjects 4 and 5), as the reject signal is raised as well in the case of
false-negatives. The results for the second test set with the same classifier
are depicted in Figure 5.16, and indicate, that the classifier performance
is not sufficient. Although the cut-off value is high, and the reject rate
is at 47.8 %, a high amount of misclassifications is returned. The only
subject which can be detected by this setup is subject 3, but even this
classifier result is not very specific, as subject 2 is misclassified as subject
3 in a large number of cases.
12
Rejected cases are denoted by -1. The complete results are listed in Appendix D
70
5.3 Classification by face recognition algorithms
The face recognition algorithms are tested with videos from the same test
set as the histogram classifiers, in order to ensure comparability of the re-
sults. The algorithms are trained on a subset of all video frames, and then
tested with frames from other parts of the same videos. As described in
Section 4.5.4 the images are converted to gray value images and exported
from the Metazoa Ludens software, including their corresponding image
lists. The CSU Face Identification Evaluation System13 is then used for the
training and testing of the classifiers. In general, the classification results
which are achieved by the face recognition algorithms are significantly
lower than the results of the histogram methods. One possible explana-
tion stems from the high variability of the hamsters appearance, through
the distortion of their fur, and their non-rigid movement. This variability
is very high compared to the differences between the individual hamsters,
which complicates the recognition of individual animals. The face recogni-
tion algorithms are optimised for the recognition of the human face, which
is quite variant, but less variant than the constant mutation of the ham-
sters’ appearance. Furthermore the differences between different human
faces are bigger than between the fur of individual hamsters. Therefore
the face recognition algorithms, which compare the spatial layout of the
input images, have big problems to distinguish between objects like the
hamsters, whose appearance is constantly changing. In the following, the
results of the several face recognition algorithms are shown and briefly
discussed.
71
Chapter 5 Testing
14
The complete results are listed in Appendix E
72
5.3 Classification by face recognition algorithms
15
see Belhumeur et al. (1997)
16
The complete results are listed in Appendix E
73
Chapter 5 Testing
17
The complete results are listed in Appendix E
74
Chapter 6
Conclusion
6.1 Conclusion
The goal of this thesis, to build a combined tracking and recognition soft-
ware, which can be used for remote human-pet interaction, has been
reached. It could be shown that a histogram classifier is able to distin-
guish between multiple hamsters, at least in the case of the roborovski
breed. Furthermore it could be demonstrated by experiment, that the
standard face recognition technologies do not have a better recognition
performance for this task, than the histogram classifier. This fact can
be explained by the highly-variant appearance of the hamsters, which is
constantly changing while they move. Apart from the recognition perfor-
mance, the face recognition algorithms need more computational power
than the histogram methods, for the training of the classifier as well as for
the classification. Therefore the use of a histogram classifier is proposed
for Metazoa Ludens. The poor classification results of the histogram algo-
rithm for one breed can be explained by the high similarity of the subjects,
in connection with the high variability of the appearance of the individual
subject. Furthermore, segmentation errors, which mainly stem from the
small difference between the colour of the background, and the colour of
parts of the golden hamsters’ fur, are a possible reason for the bad recogni-
tion performance for these subjects. Therefore a change of the segmenta-
tion algorithm, e.g. to a motion based segmentation approach, could help
to improve the recognition performance in these cases as well.
75
Chapter 6 Conclusion
76
Appendix
I
Appendix A
Classification performance
of different histogram classifiers
II
Classification Results (HSV-H)
HamsterID Predicted Group Membership Total
1 2 3 4 5
Original Count 1 1,688 28 23 0 1 1,740
2 0 1,668 72 0 0 1,740
3 22 13 1,705 0 0 1,740
4 14 0 8 1,264 454 1,740
5 21 0 26 457 1,236 1,740
% 1 97.0 1.6 1.3 0.0 0.1 100.0
2 0.0 95.9 4.1 0.0 0.0 100.0
3 1.3 0.7 98.0 0.0 0.0 100.0
4 0.8 0.0 0.5 72.6 26.1 100.0
5 1.2 0.0 1.5 26.3 71.0 100.0
86,9% of original grouped cases correctly classified.
III
Appendix A Classification performance of different histogram classifiers
IV
Classification Results (HLS-L)
HamsterID Predicted Group Membership Total
1 2 3 4 5
Original Count 1 1,546 114 80 0 0 1,740
2 142 1,271 327 0 0 1,740
3 61 225 1,454 0 0 1,740
4 0 0 5 1,054 681 1,740
5 0 0 7 543 1,190 1,740
% 1 88.9 6.6 4.6 0.0 0.0 100.0
2 8.2 73.0 18.8 0.0 0.0 100.0
3 3.5 12.9 83.6 0.0 0.0 100.0
4 0.0 0.0 0.3 60.6 39.1 100.0
5 0.0 0.0 0.4 31.2 68.4 100.0
74,9% of original grouped cases correctly classified.
V
Appendix B
Comparison of different
distance metrics (HSV-H)
Correl
Result Total
1 2 3 4 5
Original 1 Count 1,444 18 238 0 40 1,740
% 83.0% 1.0% 13.7% 0.0% 2.3% 100.0%
2 Count 0 1,686 54 0 0 1,740
% 0.0% 96.9% 3.1% 0.0% 0.0% 100.0%
3 Count 251 120 1,369 0 0 1,740
% 14.4% 6.9% 78.7% 0.0% 0.0% 100.0%
4 Count 30 0 0 1,278 432 1,740
% 1.7% 0.0% 0.0% 73.4% 24.8% 100.0%
5 Count 40 0 0 494 1,206 1,740
% 2.3% 0.0% 0.0% 28.4% 69.3% 100.0%
Total Count 1,765 1,824 1,661 1,772 1,678 8,700
% 20.3% 21.0% 19.1% 20.4% 19.3% 100.0%
Correct 80.2644%
Error rate 19.74%
VI
Chi-Square
Result Total
1 2 3 4 5
Original 1 Count 1,675 8 54 0 3 1,740
% 96.3% 0.5% 3.1% 0.0% 0.2% 100.0%
2 Count 1 1,567 172 0 0 1,740
% 0.1% 90.1% 9.9% 0.0% 0.0% 100.0%
3 Count 52 39 1,649 0 0 1,740
% 3.0% 2.2% 94.8% 0.0% 0.0% 100.0%
4 Count 22 0 8 1,276 434 1,740
% 1.3% 0.0% 0.5% 73.3% 24.9% 100.0%
5 Count 73 0 27 500 1,140 1,740
% 4.2% 0.0% 1.6% 28.7% 65.5% 100.0%
Total Count 1,823 1,614 1,910 1,776 1,577 8,700
% 21.0% 18.6% 22.0% 20.4% 18.1% 100.0%
Correct 83.9885%
Error rate 16.01%
Intersect
Result Total
1 2 3 4 5
Original 1 Count 1,739 1 0 0 0 1,740
% 99.9% 0.1% 0.0% 0.0% 0.0% 100.0%
2 Count 109 1,631 0 0 0 1,740
% 6.3% 93.7% 0.0% 0.0% 0.0% 100.0%
3 Count 1,686 53 1 0 0 1,740
% 96.9% 3.0% 0.1% 0.0% 0.0% 100.0%
4 Count 51 0 0 1,369 320 1,740
% 2.9% 0.0% 0.0% 78.7% 18.4% 100.0%
5 Count 133 0 0 856 751 1,740
% 7.6% 0.0% 0.0% 49.2% 43.2% 100.0%
Total Count 3,718 1,685 1 2,225 1,071 8,700
% 42.7% 19.4% 0.0% 25.6% 12.3% 100.0%
Correct 63.1149%
Error rate 36.89%
VII
Appendix C
VIII
Test 2 Results HSV-H (ChiSqr)
Result Total
1 2 3 4 5
Original 1 Count 1,362 0 29 1 348 1,740
% 78.3% 0.0% 1.7% 0.1% 20.0% 100.0%
2 Count 481 318 941 0 0 1,740
% 27.6% 18.3% 54.1% 0.0% 0.0% 100.0%
3 Count 71 2 1,665 0 2 1,740
% 4.1% 0.1% 95.7% 0.0% 0.1% 100.0%
4 Count 0 0 5 1,652 83 1,740
% 0.0% 0.0% 0.3% 94.9% 4.8% 100.0%
5 Count 0 0 3 1,473 264 1,740
% 0.0% 0.0% 0.2% 84.7% 15.2% 100.0%
Total Count 1,914 320 2,643 3,126 697 8,700
% 22.0% 3.7% 30.4% 35.9% 8.0% 100.0%
Correct 60.5%
Error Rate 39.53%
IX
Appendix C Classifier robustness against external variance
X
Appendix D
Quality of classification
1
-1 denotes cases which have been rejected due to the cut-off criteria
XI
Appendix D Quality of classification
XII
Appendix E
PCA-Euclidean
Result Total
1 2 3 4 5
Original 1 Count 411 203 243 339 544 1,740
% 23.6% 11.7% 14.0% 19.5% 31.3% 100.0%
2 Count 768 634 128 116 94 1,740
% 44.1% 36.4% 7.4% 6.7% 5.4% 100.0%
3 Count 156 207 900 258 219 1,740
% 9.0% 11.9% 51.7% 14.8% 12.6% 100.0%
4 Count 183 21 32 808 696 1,740
% 10.5% 1.2% 1.8% 46.4% 40.0% 100.0%
5 Count 189 17 24 682 828 1,740
% 10.9% 1.0% 1.4% 39.2% 47.6% 100.0%
Total Count 1,707 1,082 1,327 2,203 2,381 8,700
% 19.6% 12.4% 15.3% 25.3% 27.4% 100.0%
XIII
Appendix E Face recognition algorithms test results
PCA-MahCosine
Result Total
1 2 3 4 5
Original 1 Count 608 444 422 99 167 1,740
% 34.9% 25.5% 24.3% 5.7% 9.6% 100.0%
2 Count 586 968 144 23 19 1,740
% 33.7% 55.6% 8.3% 1.3% 1.1% 100.0%
3 Count 184 455 901 124 76 1,740
% 10.6% 26.1% 51.8% 7.1% 4.4% 100.0%
4 Count 294 171 52 700 523 1,740
% 16.9% 9.8% 3.0% 40.2% 30.1% 100.0%
5 Count 474 178 53 494 541 1,740
% 27.2% 10.2% 3.0% 28.4% 31.1% 100.0%
Total Count 2,146 2,216 1,572 1,440 1,326 8,700
% 24.7% 25.5% 18.1% 16.6% 15.2% 100.0%
LDA-Soft
Result Total
1 2 3 4 5
Original 1 Count 424 242 211 400 463 1,740
% 24.4% 13.9% 12.1% 23.0% 26.6% 100.0%
2 Count 1,128 473 41 75 23 1,740
% 64.8% 27.2% 2.4% 4.3% 1.3% 100.0%
3 Count 67 143 1,317 95 118 1,740
% 3.9% 8.2% 75.7% 5.5% 6.8% 100.0%
4 Count 136 18 28 853 705 1,740
% 7.8% 1.0% 1.6% 49.0% 40.5% 100.0%
5 Count 68 34 31 810 797 1,740
% 3.9% 2.0% 1.8% 46.6% 45.8% 100.0%
Total Count 1,823 910 1,628 2,233 2,106 8,700
% 21.0% 10.5% 18.7% 25.7% 24.2% 100.0%
XIV
Bayesian-MAP
Result Total
1 2 3 4 5
Original 1 Count 36 11 7 0 4 58
% 62.1% 19.0% 12.1% 0.0% 6.9% 100.0%
2 Count 26 28 4 0 0 58
% 44.8% 48.3% 6.9% 0.0% 0.0% 100.0%
3 Count 18 11 26 0 3 58
% 31.0% 19.0% 44.8% 0.0% 5.2% 100.0%
4 Count 28 13 5 4 8 58
% 48.3% 22.4% 8.6% 6.9% 13.8% 100.0%
5 Count 19 15 10 2 12 58
% 32.8% 25.9% 17.2% 3.4% 20.7% 100.0%
Total Count 127 78 52 6 27 290
% 43.8% 26.9% 17.9% 2.1% 9.3% 100.0%
Bayesian-ML results
Result Total
1 2 3 4 5
Original 1 Count 37 11 5 0 5 58
% 63.8% 19.0% 8.6% 0.0% 8.6% 100.0%
2 Count 25 31 2 0 0 58
% 43.1% 53.4% 3.4% 0.0% 0.0% 100.0%
3 Count 15 9 31 0 3 58
% 25.9% 15.5% 53.4% 0.0% 5.2% 100.0%
4 Count 26 13 5 5 9 58
% 44.8% 22.4% 8.6% 8.6% 15.5% 100.0%
5 Count 19 14 9 3 13 58
% 32.8% 24.1% 15.5% 5.2% 22.4% 100.0%
Total Count 122 78 52 8 30 290
% 42.1% 26.9% 17.9% 2.8% 10.3% 100.0%
XV
Appendix F
Bibliography
[Alderton and Tanner 1999] A LDER TON, David ; T ANNER, Bruce: Rodents
of the World (Of the World). Blandford, 1999. – ISBN 0713727896
[Bayes 1763] B AYES, T.: An essay towards solving a problem in the doc-
trine of chances. In: Philosophical Transactions of the Royal Soc. of Lon-
don 53 (1763), pp. 370–418. – reprinted in Biometrika 45(3/4) 293-315
Dec 1958
XVI
Appendix F Bibliography
[Burnie 2005] B URNIE, David: Animal: The Definitive Visual Guide to the
World’s Wildlife. DK ADULT, 2005. – ISBN 0756616344
[van Eck 2006] E CK, Wim van: Animal controlled computer games: Playing
Pac-Man against real crickets. Version: 2006. http://pong.hku.nl/
%7Ewim/bugman.htm, Retrieved on: 25. Nov. 2006
[van Eck and Lamers 2006] E CK, Wim van ; L AMERS, Maarten H.: Animal
controlled computer games: Playing Pac-Man against real crickets. In:
XVII
Appendix F Bibliography
[Festing 1986] F ESTING, M.F.: Hamsters. In: P OOLE, T.P. (Publisher): The
UFAW Handbook on the Care and Management of Laboratory Animals. 6.
Longman Scientific and Technical, London, 1986, pp. 242—256
[Hanley 1982] H ANLEY, JA: The meaning and use of the area under a
receiver operating characteristic (ROC) curve. In: Radiology 143 (1982),
Nr. 1, pp. 29–36
XVIII
Appendix F Bibliography
[Jang and Lee 2004] J ANG, Sunyean ; L EE, Manjai: Hello-Fish: In-
teracting with Pet Fishes Through Animated Digital Wallpaper on a
Screen. Version: 2004. http://www.springerlink.com/content/
8rl5508nlrwbqccf. In: Lecture Notes in Computer Science: Entertain-
ment Computing ICEC 2004. Springer, 2004, 559–564
[Lee et al. 2006] L EE, Shang P. ; C HEOK, Adrian D. ; J AMES, Teh Keng S.
; D EBRA, Goh Pae L. ; J IE, Chio W. ; C HUANG, Wang ; F ARBIZ, Farzam:
A mobile pet wearable computer and mixed reality system for human-
poultry interaction through the internet. In: Personal and Ubiquitous
XIX
Appendix F Bibliography
[Mikesell 2003] M IKESELL, Dan: Networking Pets and People. In: Adjunct
proceedings of Ubicomp 2003, ACM Press, 2003, pp. 88–89
XX
Appendix F Bibliography
[Pet 2006a] P ET, Website: Colors. Dwarf Winter White Russian Hamsters
(Phodopus sungorus). Version: 2006. http://www.petwebsite.com/
hamsters/dwarf_winter_white_russian_hamsters_colors.htm, Re-
trieved on: 08. Dec. 2006
[SPSS Inc. 2007] SPSS I NC .: SPSS for Windows. Version: 2007. http:
//www.spss.com/spss/, Retrieved on: 15. Jan. 2007
XXI
Appendix F Bibliography
[Turk and Pentland 1991] T URK, M. ; P ENTLAND, A.: Eigenfaces for Recog-
nition. In: Journal of Cognitive Neuroscience 3 (1991), Nr. 1, pp. 71–86
[Viola and Jones 2002] V IOLA, Paul ; J ONES, Michael: Robust Real-time
Object Detection. In: International Journal of Computer Vision - to appear
(2002). citeseer.comp.nus.edu.sg/viola01robust.html
[Webb 2002] W EBB, Andrew R.: Statistical Pattern Recognition, 2nd Edition.
John Wiley & Sons, 2002. – ISBN 0470845147
[Zim and Wartik 1951] Z IM, Herbert S. ; WAR TIK: Golden Hamsters.
William Morrow Company, 1951. – ISBN B000GT25Y4
Internet resources
All internet resources in this thesis have been checked for the last time in
November 2006.
XXII