0% found this document useful (0 votes)
4 views

Computer Visual Tracking

Uploaded by

lahlouh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Computer Visual Tracking

Uploaded by

lahlouh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

University of Leeds

SCHOOL OF COMPUTER STUDIES

RESEARCH REPORT SERIES

Report 98.02

Computer Visual Tracking of Poultry

by

D M Sergeant, R D Boyle & J M Forbes1

January 1998

1 Department of Animal Physiology and Nutrition, The University of Leeds


Abstract

We introduce the wide area of problems that exist in the domain of animal ethology. We then develop a
technique to solve one such problem using aspects of computer vision. The method processes a video se-
quence of broiler chickens in a group-housing pen. A new technique has been developed for extracting a
background image, which compensates for the particular idiosyncrasies inherent to the problem. A multi
faceted approach to tracking has been demonstrated, and preliminary examination shows the approach
to be robust.

1 Introduction

Automated tracking via passive observation is well developed [2, 7], but has received little applica-
tion to the animal domain, although analysis of ‘snapshot’ behaviour has demonstrated the potential to
automate some farming tasks [9].
Many applications exist, however, in which the groups of animals to be studied are active, sometimes
for protracted periods. One application is a laboratory ‘maze’ in which time and route of an animal are
of interest, and this problem has been successfully solved [12]; another, more challenging, is the Silsoe
Robotic Sheepdog [15] which requires a group of animals to be tracked at high frame rate. In a more
restricted domain, success has been exhibited in the use of visual tracking to assist in automatic milking
of dairy cattle [5].
We consider here a different problem which has particular difficulties associated with it that have not
arisen in tracking tasks hitherto. A broiler chicken’s behaviour is unpredictable and cannot be expected
to follow regular paths, such as those a human takes walking across a quadrangle [8]. The behaviour of
broiler chickens consists of sporadic walking, feeding and drinking interspersed by resting and less fre-
quent activities including dust bathing, stretching, and peer interactions. Behavioural study is motivated
by a lack of knowledge about what contributes to a broiler chicken’s welfare, and locomotion is used as
an indicator of the overall state of this. Once the causal effect that changing the husbandry system has on
broiler chickens’ behaviour can be demonstrated and quantified, moves can be made towards successful
legislation that protects their welfare. Current legislation concerning the welfare of broiler chickens in
the UK is guideline based [1], stemming from studies using human observation.

1
Human observers record the behaviour of a single bird (occasionally more than one) and stand in the
middle of the flock while observing. Most flocks are housed in a large shed using the entire floor space.
The observer must be close enough to see interesting behaviour, which necessitates standing among the
flock. A bird is chosen to be observed using a criterion such as a distinct marking or that the bird was seen
to start in the centre of the house [11]. The criterion is chosen to generate a good random sample, but due
to the limited availability of birds with distinctive marking the sample cannot be relied upon. Inter- and
intra-observer inconsistencies have to be taken into account; even if observations are made from a video
recording, two observers are unlikely to produce identical observations of the same scene. Even the same
observer cannot guarantee that the same set of observations are given twice. Using an observer is time-
consuming, and observations can only be of limited duration. Human observation is seldom passive, and
for some studies is completely invasive [13].
The difficulty of observation is heightened by the large size of a commercial broiler house. Typically
a broiler house is 90 x 14 m and this can be stocked with 18000 birds, up to 20 birds =m2 [13] although
in the UK the recommended maximum is 6.3 birds=m2 [1].

Figure 1. Densely populated broiler house.

Figure 1 shows a sample population taken in situ. Feeders are spaced regularly, and the quantity of
feeding space provided is calculated from the number of birds. Usually drinkers are placed equidistant
from the feeders. For the purpose of the photograph the intensity of the house lighting has been raised.
Researchers at ADAS Gleadthorpe are interested in whether varying the housing light intensity alters the
individual bird’s activity level. Leg weakness in broiler chickens is a large problem in commercial units

2
and experiments have been performed at the ADAS site where a scotoperiod is introduced into the daily
lighting strategy to find if this has effects on reducing instances of leg weakness.
This application has some interesting features from the point of view of computer vision. Firstly, the
contrast of the scene can be deeply unfavourable; this is because the birds can be far from uniform in
colour and because the background (the litter on which the birds live) can often be of similar intensity
to the birds’ feathers. Additionally, the density of birds as observed from an aerial view can be high if
many of them choose to congregate (which is frequently observed behaviour). Thirdly, broilers are not
highly mobile animals and a difficulty in segmentation may well persist for some time, and evolve very
gradually, or subtly; conversely, on rare occasions, a bird may move suddenly and quickly, also flapping
its wings, which might easily be interpreted as a discontinuity.
The monitoring task here requires the simultaneous tracking of many animals; characteristically, this
will be to assist in seeking ways to improve the design of accommodation and husbandry in order to
minimise welfare problems. We might expect the scene to include artifacts such as feeders and drinkers,
attention to which would be most important. Usually, the physical height of houses is small; this means
that the resolution of the scene is likely to be favourable, but the area covered will be small. A full system
for automatic monitoring would demand the use of many cameras with overlapping fields of view, since
there is no prospect of one device covering the whole house.
We describe here an essential component of such a system being developed for ADAS Gleadthorpe, in
which we consider the segmentation, separation into individual birds, and tracking from a single camera
view.

2 Image acquisition

We consider a view taken from a single video camera mounted above a population of thirteen broilers.
Prolonged sequences have been gathered from which a quantity of ground truth data has been extracted
by manual tracking of birds.

2.1 Background extraction

Extraction of regions representing moving objects in video sequences has received much attention. An
obvious approach is to derive a background image representing the scene without the objects of interest,

3
and perform frame by frame image subtraction. It is customary to initialise such systems by, for example,
assuming the median value of a pixel’s intensity represents its current background value when measured
over some suitable recent history. Such an approach usefully compensates for gradual changes in contrast
caused, for example, by climatic or daylight changes [4, 10].
Our application is not immediately susceptible to this approach for four reasons

 The density of the birds is such that the ratio of (moving) object to background pixels is very high,
and may easily exceed 1:1.

 The objects often remain stationary for prolonged periods (intervals of 20 minutes without move-
ment have been observed as normal - much longer ones are probable.)

 The contrast between bird and background may be very poor, either due to the colour of the ground
litter, or because the house is in subdued lighting, or both.

 It is not safe to assume the background only evolves slowly; frequently the ground litter may be
kicked over by moving birds causing very sharp local changes in intensity over short time intervals.

Together, these features would cause the straightforward approach to fail, marking many pixels with a
background value that in fact represents a bird.
To compensate for the particular nature of the images in our application, the background image is
evolved statistically under a simple, effective, limiting criterion. The system is initialised using the first
few minutes of image sequence, and the first guess is to set the background image to be the same as the
first frame, I0 . Each subsequent frame, It , is presented and all of its pixels that pass the limiting criterion
cause the equivalent pixel in the background, B, to be updated appropriately. The task of the limiting
criterion is to decide whether a pixel intensity value is likely to represent a foreground object or not. Es-
sentially the observation is made that foreground objects consist, predominantly, of pixels in the intensity
range [ fl ; fh] and so the limiting criterion says that any pixel seen in that range is not allowed to influence
the evolution of its background pixel. Such a criterion is efficient and easy to implement, and because of
the initial assignment B0 I0 any background pixel, coordinate (x; y), whose true intensity lies within
the range [ fl ; fh ] will remain unchanged at B0 (x; y), its value at time step 0, since no pixel will be observed
where It (x; y) 2
= [ f l ; f h ]. The evolution of the background is shown formulaically in equations 1 and 2:

4
B0 I0 (1)

8x yfIt (x y) 2 [ fl
; ; = ; fh ] update(B(x; y))g (2)

Given a sequence, the correct range [ fl ; fh ] must be found and this is established using a sample of
consecutive images (fewer than twenty is sufficient). It is noted that any pixel difference between a con-
secutive pair of images is caused by the movement of objects in the images. A moving object affects the
image difference (It ? It ?1 ) in two places; it introduces a difference at its new location and it leaves a
difference at its previous location. A higher magnitude difference gives a good indication of an object
that is moving, and is unlikely to be noise. Of primary concern here is the pixel intensity of the object
that is moving and causing the differences.
frequency

0 50 100 150 200 250


pixel intensity

Figure 2. Accumulated moving intensity frequency (Td = 49).

Proceeding on the assumption that foreground pixels lie in a particular band of intensities and that
foreground is brighter than background, but that many background pixels also lie in the band, we try
to identify this range by accumulating evidence of movement over a number of frames. Formally we
construct an accumulator, A , of dimension equal to the number of intensity levels in the image (here,
256) which is initialised to zero. Then, for a pair of consecutive images It , It ?1 , we locate every pixel
(x; y) which changes in intensity, and increment the accumulator cells A [It (x; y] and A [It ?1(x; y)]. Noise
and other effects will cause many minor variations, and so we will choose only to consider differences
exceeding a threshold, Td . In this manner, a pixel that is part of a moving object has its intensity counted

5
twice, while the background pixel it obscures and the one it reveals are both counted once.
Iterating this procedure over a number of consecutive frames might be expected to produce a response
histogram in which the intensities of the moving objects predominate, while intensities due to background
and shadow effects would produce less significant responses. For our trial sequence, Figure 2 illustrates
the result. Here we see several peaks: the first (the darkest pixels showing a response) might be expected
to be background in shadow; the second is expected to be background; and the third is predominantly
expected to consist of pixels from foreground objects, although part of the contribution might come from
similar intensity background pixels.
Figure 2 was accumulated using Td = 49, but rather than the system being dependent on a sequence
specific threshold an accumulator is built for all possible Td (again, 256 accumulators). For low Td , the
plot is dominated by noise; as Td increases, coarse patterns emerge. We proceed by assuming foreground
to be predominantly lighter than background, and then expect three peaks:

 shadow

 background

 foreground

The characteristic shape of the accumulator plot Td is found using Gaussian smoothing, and the extremi-
ties of the foreground part are taken to be [ fl ; fh ]. The range that figure 2 generates is [ fl ; fh ] = [157; 219].
The median has been used as the update strategy to determine B(x; y) in equation 2. All of the non-
foreground pixels are placed in a history set, H x y , at time t as shown in equation 3:
;

H x y = f8i=1 t Ii(x; y)jIi(x; y) 2= [ fl ; fh]g


;
:: (3)

8
< B0 (x y) ; : H x y =6
;

B(x; y)
: median(H x y) ;
: otherwise
(4)

No aging of data has been taken into account for earlier entries in the treatment of H x y , equation 4.
;

Whether weighting the history set, so recent pixels are given more significance than older ones during
the update of B, leads to further improvement of B is not investigated in this paper.

6
2.2 Segmentation

Provided with an estimate of the background, we can proceed by subtracting live frames from it. High
absolute differences in the result are then attributable to

 chickens

 shadow

 noise effects caused by variation in illumination or locally evolving background

We are not in a position to assume that the pixels representing a single bird (or the ‘core area’ of such)
will all provide a high response in this difference image. Indeed, it is common for one bird to generate
two or more disconnected regions. An example is shown in figure 3.b.
Fortunately each bird is likely to contain some pixels that provide a high response, and the remain-
ing pixels that represent the bird predominantly give a medium response. We define two thresholds, Th
and Tl , that characterise the differences as high response, medium response, or neither appropriately. A
region-based hysteresis is used to discriminate between all of the medium response pixel regions. Region-
based hysteresis is identical in principal to edge-based hysteresis [6] with medium response regions being
allowed only when they encompass at least one high response region. (See figure 3.d.) Before region se-
lection takes place a single application of a morphological erode operator [14] ensures that specular noise
is removed, and a subsequent dilation prevents any region fragmentation that would result from the ero-
sion. Following these morphological operators the hysteresis discrimination is applied, and all successful
regions are given a unique tag and are retained.
In the hysteresis, shadow regions can be distinguished from foreground ‘bird’ regions as their actual
difference responses will always be the more negative ones. Shadows detected in this way are cast by
birds, as other shadows are essentially static, being cast by non-mobile features of the housing system, and
so will have already been incorporated into the background image. Since the bird’s shadows are directly
related to the positions of the birds and the in-house lighting they could aid in searching for birds missed
by the segmentation. However, in our current implementation, information provided from shadows cast
by the birds is not used, and any pixel intensities resulting from such shadows are treated in exactly the
same way as all non-bird pixels. Each non-shadow region is treated as if it could represent a bird, and

7
(a)A typical input image, It (b)It (x; y) ? B(x; y) > Th

(c)It (x; y) ? B(x; y) > Tl (d)Hysteresis image

Figure 3. Region-based hysteresis.

is subjected to one further morphological dilate operation to close any small intra-region holes. Such
holes are undesirable and likely to be caused by noise artifacts and poor segmentation. Using dilation
also ensures the region boundary can be described as a unique list of connected pixels. Without dilation
there is the possibility of one pixel width lines extending from the region, and the pixels on these lines
will define two points of the inner boundary; this can cause problems with curvature calculations at a later
stage. The final dilation guarantees that the region is always more than one pixel thick.

3 Correspondence and tracking

After segmentation an image is left that is the basis from which we can locate the position of all the
birds in the scene, which then have to be matched to the corresponding position of the correct bird in

8
the previous frame, It ?1 . The test sequence was conducted with high quality video and is a favourable
one, where the birds move over a dark orange tiled floor instead of their usual litter. The ethos behind
the tracking method described here could obviously work with a less favourable sequence, provided that
the segmentation is available. Since the test sequence has been manually tracked a priori we can give a
quantitative measure of the error.

3.1 Finding individual birds

We aim to convert the scene provided by the segmentation, containing regions, into a scene of individ-
ual birds, located by their centroids. This is done so that each centroid may be matched with a centroid
from It ?1, where both centroids represent the location of the same individual bird in both time steps. This
constitutes the tracking. It is noted that the regions provided by the segmentation often represent more
than one bird and each ‘multi’-bird region gives a sub-division problem that needs solving before the final
It ?1 to It , many to many, correspondence problem can be tackled.
To sub-divide the regions into individuals we must first find how many individual birds constitute each
region. A clue to this quantity is given by the number of pixels in the region, its size. This is reinforced by
a local proximity correspondence between the previously observed centroids from It ?1 and the regions in
the current frame It . Normally a bird is expected to move no further than v pixels, the maximum velocity
constraint, between frames. Matching the bird’s last position to regions that lie within v pixels, taking
into consideration other nearby birds and which regions they might move to, gives a reliable upper limit
to the number of birds within each region.
The relationship between region size and number of birds constituting its makeup is derived by taking
a sample of region scenes, produced by the segmentation of (It ? B), and generating a frequency graph of
the region sizes present. This frequency graph is smoothed using a variable σ Gaussian, until all of the
characteristic peaks and troughs are discovered. If a large enough sample is taken most group populations
will be observed but, because larger groups are less frequent and there might be some group populations
missing from the scene being studied (hence the empty portion in the middle of the graph, figure 4), only
the early contour is relied upon. Clearly the position of the valleys give the cut-off limits for region size
between the different group populations.
Regions that are too small to fall into the ‘one bird’ group population are considered to be partial birds

9
group population frequency (ln scale)

region size (pixels)

Figure 4. Frequency graph of region sizes (smoothed).

(the segmentation having divided a bird into two regions or only part of the bird has survived through
the hysteresis) but might be image noise. An assessment can be made as to the closeness of such partial
birds to the last predicted positions, and to other bird groups. If this suggests that the partial bird is truly
foreground pixels then this region can be merged with an adjacent region or, if isolated, it can be treated
as a ‘one bird’ group.
When the group populations have been found we then divide the regions into individual birds. All
regions that have population 1 can be put aside at this stage, and preserved for the final correspondence.
From then we take the assumption that a region, group population g, can be divided into g individual
birds using (g ? 1) split lines. Looking at region profiles it is evident that suitable split lines can be drawn
between points on the perimeter that exhibit sharp concavity. Points on the perimeter of the region are
identified as concave by plotting the curvature [3] and applying Gaussian smoothing to this (the Gaussian
uses the wrap-around nature of a perimeter, which is a connected curve.) The peaks of curvature are
used as the perimeter points where a split can occur. See figure 5 for an example of a perimeter with its
corresponding curvature plot. If there are too few points it is necessary to resmooth the curvature plot
with a finer Gaussian to ensure that there are enough points of concavity to be able to draw at least (g ? 1)
lines.
Equation 5 gives the number of possible ways of splitting between k concave peak points. From these
candidate lines a set, L , where jL j = (g ? 1) must be selected. Two strategies for choosing the most
suitable line set have been examined. One uses (g ? 1) instantiations of a local cost measure, the other

10
1
region curvature
0.8 smoothed curvature

0.6

0.4

0.2

-0.2

-0.4

-0.6
0 50 100 150 200 250 300

(a) (b)

Figure 5. Example of curvature for a region.

applies a global cost function to all possible line sets.

k ?1
s= ∑i = k(k ? 1)= 2 (5)
i=1

s!
s
Cg?1 =
(s ? (g ? 1))!(g ? 1)!
(6)

The local cost measure takes each of the s splits and establishes which one best cuts a single bird from
the region. It takes the two regions, puts aside the single bird regions, and then relocates the concave
points for the larger region. With k new concave points there are a new s possible candidate divide lines,
and these new candidates are evaluated in an attempt to cut off another single bird region. Paring off sin-
gle bird regions continues until each region found has a group population of 1. The procedure is reductive,
with each iteration finding a single bird and generating curvature information with candidate divide lines
for the remaining larger region. Deciding which of the candidate lines produces the best cut is effected
through a comparison of costs, taking into account the single region having the expected area (derived
for the particular scene) and the length of the candidate line.
The global cost function iterates through every possible line set (equation 6). Some line sets can be
rejected very quickly because of the shape of the region. A candidate line is not valid if it crosses the re-
gion’s perimeter between its endpoints, travels entirely outside the region, or crosses another split already
in L . Every surviving line set is evaluated using the cost function:

11
Cost = A + αθ + βρ + γC + δλ (7)

which takes into account

 A the fractions of area that set L divides the region into. (See figure 6.a.) Intuitively, since all of
the birds in the enclosure are the same age and of very similar size, they can be assumed to all have
the same surface area visible to the camera.

 θ the combination of angles that the lines in L make with the normal to the perimeter at their end-
points. (Figure 6.b.) The concave peak points are formed by two birds touching each other, and a
line following the now joined edges of the birds will run in the average direction of the interpolated
outside edge of both birds. This average direction coincides with the normal to the concavity.

 ρ the degree of response that the lines’ endpoints give on the curvature graph. Extremely sharp con-
cavities, indicated by a high response, are very likely to be caused by two birds as each individual
bird has a smooth outline and only very low response concavities.

 C the compactness of the sub-regions formed by L . Similarly to area, all the birds are essentially
obloid or round and the main cause of variation to this is the craning of the neck.

 λ the length of all the lines in L . There is a limit on the length a split line can be, for example a
split line cannot be longer than the length of one bird. Intuitively this suggests that shorter lines
are better, although there are several examples where poor segmentation provides possible short
candidate lines that are not the ‘correct’ choice.

The line set that gives the minimal cost is used to divide up the region into p individual birds. The cost
function weightings, α; β; γ, and δ, are empirically determined based on finding a suitable minima error
in the 4 dimensional weight-space.
At first it was thought that only the components A, θ, and ρ would be needed to make the decision
about which line set should be used. Each of these three components was taken individually and the one
component cost function was applied to one hundred pre-segmented images. Success of the cost func-
tion was measured by using the sum of displacement errors of the predicted centroids from the line set
L to the hand marked centroids of the test sequence. Another measure of success was attempted, where

12
A* = a1 + a2 + a3 + a4

Σ | _A*− g_ |
a2 a g
ai 1
3
a1 a4 A=
i=1

(a)

θ1 θ3 θ5 2*(g−1)

θ2 θ6
θ= Σ
i=1
θi
θ4

(b)

Figure 6. Measures used in the cost function.

each chosen split line set was assigned a binary score indicating whether all of the lines were placed cor-
rectly or not. This alternative measure was giving equivalent answers to the displacement error measure,
but was discarded as it required frequent manual interaction where the first measure is entirely automatic.
Component A was found to be the single most discriminatory, as expected, and so was placed as the fixed
weight component of the cost function. To establish the weightings α and β needed to give the best deci-
sions for the hundred images a surface was plotted of the error measure as α and β are varied. Initially a
coarse surface was generated, and then interesting (low error) sections were plotted with finer and finer
resolution until the lowest point on the surface is taken, and its α β values are applied as the weights for
θ and ρ respectively.
When the cost function Cost = A + αθ + βρ was then checked to see which frames contained the high-
est contribution to the displacement error measure, it was discovered that these outliers had failed to chose
the ‘correct’ split line, instead preferring similar kinds of ‘mistakes’. Although these outliers were not
frequent, they were significant and appeared to have a more plausible choice for the split lines. It was
often obvious that a shorter split line would have been a better alternative to one chosen by the cost func-
tion, so λ was introduced as a component to the cost function. Varying δ while keeping the weights α
and β fixed showed that as λ was given a low influence it had no effect on the decision, but as soon as
λ became influential the error measure increased rather than improving. Upon investigation, this phe-
nomenon was the result of λ allowing very short lines between adjacent concave points which previously
were not favoured by A,θ, or ρ. To compensate for the over-sensitive nature of λ a final component C,

13
the compactness, was introduced. Compactness measures the degree of a shapes elongatedness, and for
this sub-division rounder is better than long and thin. Although a wrong short line, introduced from λ,
will produce a small region, having a good compactness score, it also produces a larger region which has
a compensatory bad compactness score. In this way C stabilises the λ effect and also acts to reinforce
the constraint of component A. Weights γ and δ were plotted against each other and the surface point
with minimum error was chosen. Once found these weights and the previously found α and β are placed
into equation 7. Re-checking the hundred frames with the established weights shows that the weights are
good at selecting splits, and plotting all four variables against each other in hyper-space could not signif-
icantly improve on this. Figure 7 shows five examples of the split set that have optimal cost when using
these weights in the cost function (equation 7).

Figure 7. Examples of subdivision using L with optimal cost.

3.2 Frame to frame correspondence

Once all the centroids of individual birds are found, an M : N correspondence problem can be formu-
lated to match the birds found in It ?1 to the new centroids in It . Each individual bird has a locomotion
vector associated with it, describing its coordinate at each time step, which is built up using the frame
to frame centroid matching. From the locomotion vectors the trajectories, or tracks, can be deduced to-
gether with the proximity to interesting scene features such as the feeders. To an extent the reliability of
the tracking is dependent on the size of the time step between frames, δt, but the confounding factor to
reliability is the distance individual birds have moved. Given that we have the correct centroids, a small
δt means that there can only be small changes between the corresponding image pair Pt = (It ?1 ; It ). As
δt increases the changes in Pt become larger, and more birds contribute to movement, until the changes
observed are so drastic that the centroids from It cannot reliably be matched to the correct centroid of
origin in It ?1 and eventually matching Pt becomes totally ambiguous. Where ambiguity occurs between
only two or three birds this can ordinarily be resolved, but as such ambiguities occur more frequently

14
and involve more birds the frame to frame correspondence no longer can be matched successfully and so
the tracking fails. For this sequence, δt = 1sec was chosen after observing broilers housed at different
population densities and in different settings. Both live observation and several hours of video footage
showed that the birds do not move often enough to require a smaller value for δt. It is also noted that the
closed scene scenario of the test sequence makes the M : N correspondence problem simplify to an M : M
problem.
The M : M correspondence problem is solved using a minimisation approach, where potential match-
ings of the two sets of centroids for Pt are evaluated. It is clear that to evaluate every single possible
matching for Pt needs M! evaluations. In the case of our test sequence, M = 13, and therefore to use an
exhaustive approach is computationally intractable. Of course this can be avoided by looking at ways
in which the problem can be solved in parts, where the solution of each part guarantees to be the same
matching that would be the minimum possible contribution from the set of centroids involved in the global
(13!) problem. This is facilitated by first assessing all individual centroids in It ?1 to see if they have an
exclusive match that does not need to be considered in the matching of any other centroids. Often this
exclusive match is apparent because of the isolation of the centroids involved, being at least v pixels away
from any other (unmatched) centroid. After all these exclusive mappings have been performed, clusters
of centroids can be identified that interact with their neighbours but not with centroids in any of the other
clusters (figure 8c contains an eight centroid cluster). Each cost of all possible matchings within a cluster
must be evaluated exhaustively, but the number of centroids in these, and hence the complexity, is much
less.
The mapping is assigned a cost based on

 how far the matched centroid has been displaced (see figure 8a).

 whether the bird was stationary in Pt ?1 .

 the deviation from the previous direction of motion.

Other information, if it can be detected, would also aid in the matching decision, for example the cur-
rent orientation of the bird will have influence on its direction of movement. Given that birds have an
expected maximum velocity of v pixels per δt a heavy penalty is imposed on any matching that attempts
to exceed this distance. Exceeding v is still allowed because, on occasion, a bird can move very quickly

15
(a) (b)

a) It ?1 showing previous translation vectors


from Pt ?1

b) It with identified centroids

c) Matching problem to solve for Pt


(c)

Figure 8. Essentials of solving M : M correspondence.

and if this possibility is neglected a mistake enters into the locomotion vector. In cases where a centroid
is accidentally matched to a noise region, and the bird is moving quickly, there is no chance of recovery,
as when the noise feature disappears in a later frame the bird is much further than v pixels away. This
leaves an image pair Pt with a region of It having a centroid where It ?1 does not have a potential original
centroid, and another part where It ?1 has an extra original centroid that has no suitable match in It . If a
matching is chosen wrongly then the erroneous centroid can displace any nearby centroids, and the mov-
ing region that is not tagged as a bird generates ambiguities. Figure 9 displays two different samples of
movement vectors, over ten second intervals, for all thirteen birds.

16
Figure 9. Tracking results (note the circular feeder).

4 Robustness

Figure 10 shows the frequency distribution of the error between the manually located centroids and
those generated automatically. 99 images were used for this comparison, totaling 1287 centroids. From
this distribution 95% of the centroids lie within 5 pixels, < 2 cm, of the manually chosen centroid, and it
is noted that the manual track is expected to exhibit at least an error of 2pixels.
Centroids prove to be resilient to errors in the sub-dividing line set L . Where L 0 is chosen (having
optimal cost for equation 7) such that it contains a line li 2 L drawn between the wrong concave points
then the displacement of li from the true line, which follows the single bird’s boundary, has a tiny deviance
effect on the centroid calculated. The centroid is taken to be the region’s centre of mass, and so it takes
a substantial addition or subtraction of pixels (each having 1 unit of mass) on one side of the single bird
region to shift the centroid significantly. Since the calculation of centroids in this manner is moment based
the result is also less sensitive to loss of intra-region pixels due to poor segmentation and image noise.
The rightmost illustration in figure 7, gives an example of such a line li .
Figure 11 is the comparison of the manual track to the track generated by our approach. It can be seen
that they are essentially the same. This shows that the correspondence mapping (M : M) works well,

17
800

700

600

500

f(err)
400

300

200

100

0
0 2 4 6 8 10 12 14
displacement error (pixels)

Figure 10. Centroid displacement error.

although further comparison is needed in order to quantify exactly how well it performs.

160 automatic track 160 manual track

180 180

200 200

220 220

240 240

260 260

280 280

150 200 250 300 350 150 200 250 300 350

automatic track
160 manual track

180

200

220

240

260

280

150 200 250 300 350

Figure 11. Manual track vs. automatic.

Occasionally broiler chicken behaviour has been observed where one bird has squeezed its way under

18
another bird. The ambiguity introduced from occlusion in this manner is hard to clarify. The resulting
region’s perimeter has an unusual form and its pixel area is lower than anticipated (for predicting its group
population, g). So far only 7 consecutive frames out of 15 minutes of video (900 frames) have exhibited
this occurrence which does not merit special efforts to solve it. Without any adjustment to the current
implementation the birds where this ambiguity occurs are tracked successfully, primarily because the
bird that initiates the action of squeezing under the other continues to move in its original direction. In
similar events the worst outcome would be if the tracks of birds involved became interchanged.

5 Conclusions

We demonstrate an approach to the tracking of group-housed broiler chickens, and have shown this
working on one image sequence. On other image sequences the stages of tracking have been run sepa-
rately. The approach uses a segmentation and separation to produce centroids of each broiler chicken for
each frame, and solves a frame to frame correspondence problem using these.
Currently other image sequences are being presented to the approach and, after assessing the validity of
the results, we will move towards solving the larger scale problem. To solve this will require integrating
views from multiple cameras.
Other application domains have problems that the approach can be applied to; as well as the obvious
transition to the visual tracking of different species of domestic livestock, these include observing large
crowds at key locations like the entrance turnstiles of football stadia.

Acknowledgements

Financial support for this work from the UK’s Ministry of Agriculture, Fisheries and Food (MAFF) is
gratefully acknowledged. The authors also wish to thank Sue Gordon of ADAS Gleadthorpe for her help
throughout this project.

References

[1] M. C. Appleby, B. O. Hughes, and G. S. Hogarth. Behaviour of laying hens in a deep litter house. British
Poultry Science, 30:545–553, 1989.

19
[2] A. M. Baumberg. Learning Deformable Models for Tracking Human Motion. PhD thesis, School of Com-
puter Studies, University of Leeds, 1995.
[3] N. D. Efford. Knowledge generation techniques for the model-driven segmentation of hand and wrist radio-
graphs. In Eighth Scandinavian Conference on Image Analysis, 1:251–256, 1993.
[4] T. J. Ellis, P. L. Rosin, and P. Golton. Model-based vision for automatic alarm interpretation. IEEE Aerospace
and Electronic Systems Magazine, 6:14–20, Mar. 1991.
[5] J. Gouws. The Systematic Development of a Machine Vision Based Milking Robot. PhD thesis, Wageningen
Agricultural University, 1993.
[6] E. R. Hancock and J. Kittler. Adaptive estimation of hysteresis thresholds. In IEEE Computer Vision and
Pattern Recognition, pages 196–201, 1991.
[7] S. S. Intille and A. F. Bobick. Visual tracking using closed-worlds. In Fifth International Conference on
Computer Vision, pages 672–678, June 1995.
[8] N. Johnson and D. Hogg. Learning the distribution of object trajectories for event recognition. Image and
Vision Computing, 14(8):609–615, Aug. 1996.
[9] J. A. Marchant. Accurate boundary location from motion. In D. C. Hogg and R. D. Boyle, editors, British
Machine Vision Conference, pages 89–95, 1992.
[10] N. J. B. McFarlane and C. P. Schofield. Segmentation and tracking of piglets in images. Machine Vision and
Applications, 8(3):187–193, 1995.
[11] L. B. Murphy and A. P. Preston. Time-budgeting in meat chickens grown commercially. British Poultry
Science, 29(3):571–580, Sept. 1988.
[12] Noldus Information Technology, Tracksys Ltd., Vernon House, 18, Friar Lane, Nottingham, NG1 6DQ. UK.
Ethovision: Video Tracking, Motion Analysis and Behavior Recognition System, Feb. 1994.
[13] A. P. Preston and L. B. Murphy. Movement of broiler chickens reared in commercial conditions. British
Poultry Science, 30(3):519–532, 1989.
[14] M. Sonka, V. Hlavac, and R. Boyle. Image Processing, Analysis and Machine Vision. Chapman and Hall
Computing, 1994.
[15] N. Sumpter, R. D. Boyle, and R. D. Tillet. Modelling collective animal behaviour using extended point dis-
tribution models. In A. F. Clark, editor, British Machine Vision Conference, 1:242–251, Sept. 1997.

20

You might also like