Song 2014
Song 2014
ScienceDirect
Research Paper
article info
In our post-genomic world, where we are deluged with genetic information, the bottleneck
Article history: to scientific progress is often phenotyping, i.e. measuring the observable characteristics of
Received 20 March 2013 living organisms, such as counting the number of fruits on a plant. Image analysis is one
Received in revised form route to automation. In this paper we present a method for recognising and counting fruits
28 November 2013 from images in cluttered greenhouses. The plants are 3-m high peppers with fruits of
Accepted 13 December 2013 complex shapes and varying colours similar to the plant canopy. Our calibration and
Published online 20 January 2014 validation datasets each consist of over 28,000 colour images of over 1000 experimental
plants. We describe a new two-step method to locate and count pepper fruits: the first step
is to find fruits in a single image using a bag-of-words model, and the second is to aggregate
estimates from multiple images using a novel statistical approach to cluster repeated,
incomplete observations. We demonstrate that image analysis can potentially yield a good
correlation with manual measurement (94.6%) and our proposed method achieves a cor-
relation of 74.2% without any linear adjustment for a large dataset.
ª 2013 IAgrE. Published by Elsevier Ltd. All rights reserved.
Fig. 1 e Examples in the training data. The top three rows are fruit examples, and the bottom three are background. The
background templates are much larger than the fruit templates, and their sizes have been adjusted for display purposes.
count any fruit in images of dense pepper plants, to reduce they used morphological operations and constant shape
manual measurement and labour requirements, and to in- constraint to separate the round apple fruits from leaves. This
crease objectivity. In a recent paper, van der Heijden et al. is not possible for our images, since the difference in colour
(2012) showed that several manual measurements could be and shape between fruits and other plant parts are small.
replaced by image analysis leading to the same QTL (positions Jimenez, Jain, Ceres, and Pons (1999) provided a review of
on a genetic map, which shows a relation with the trait under different vision systems to recognise fruits for automated har-
study). Besides they showed that image analysis could aid in vesting using a laser range-finder. Zhao, Tow, and Katupitiya
the identification of additional physiological traits that are (2005) presented methods to recognise apples grown on trees,
hard or impossible to measure by human operators. which used the texture and redness colour. It was shown that
Machine vision applications developed for fruit have been redness works equally well for green apples as for red ones.
reviewed by Brosnan and Sun (2004) and Lee et al. (2010). Yang, Dickinson, Wu, and Lang (2007) proposed methods to
Compared with previous fruit applications, e.g. finding red recognise mature fruit and locate cluster positions for tomato
apples in green canopies (Bulanon, Kataoka, Ota, & Hiroma, harvest applications. Kitamura and Oka (2005) described a
2002), we are looking for predominantly green fruits. picking robot to recognise and cut sweet peppers in green-
Stajnko, Lakota, and Hocevar (2004) described the use of houses, but their image analysis methods were developed only
thermal imaging for measuring apple fruits. In their work, for this specific application under fixed lighting conditions.
b i o s y s t e m s e n g i n e e r i n g 1 1 8 ( 2 0 1 4 ) 2 0 3 e2 1 5 205
which would not be practical for analysing such a large green plant parts. For each colour pixel (R,G,B), the first
dataset. We therefore applied a simple, fast method to identify transformation G B quantifies the intensity difference be-
approximately 10,000 points of interests (POI) per image, dis- tween green and blue. The second transformation G R
carding points that clearly were not fruits, thus significantly quantifies the intensity difference between green and red. The
reducing the number of required operations. An overview of final transformation G/(R þ G þ B) quantifies the proportion of
our methods can be found in Fig. 2. Similar approaches have green. This simple, straightforward colour transformation
been taken by Leibe, Leonardis, and Schiele (2004), Leibe, was chosen because it is less sensitive to changing illumina-
Seemann, and Schiele (2005) and others. tion conditions than the original R, G and B values, but no
attempt was made to optimise the transformation (Gevers &
3.1. Initial points of interest Smeulders, 1999).
Colour pixels in a training template are first transformed
The steps we have taken to identify POIs are somewhat arbi- into a N 3 vector with columns G B, G R and G/(R þ G þ B).
trary. However, we argue that this is not important. There are For example, for a 1280 480 template,
typically 10 fruits in an image, so a cautious 60-fold pruning N ¼ 1280 480 ¼ 614,400. To reduce N, for each template, two
from the original 600,000 points down to about 10,000 by dis- clusters were found in the transformed space using K-means
carding non-interesting points is small compared to the second clustering with a Euclidean metric. The means and numbers
stage 1000-fold selection process. Alternative approaches that of pixels in the two clusters are then used to represent the
could have been considered include the thresholding used by template. We found that two clusters were sufficient to cap-
Reis et al. (2012) to distinguish red and white grapes from leaves, ture the variability in pixel values in templates, whereas one
and the linear colour model approach of Teixido et al. (2012). cluster, or equivalently sample means, lost this variability.
Colour transformation A classifier is trained on colour in- Figure 3 shows the mean values extracted from the fruit and
formation to identify the initial points of interest. Many pep- background templates. Note that (R,G,B) has a value range of
per fruits are green, and we transform the RGB colour [0,1] and we applied a linear rescaling so that G B, G R and
intensity in order to distinguish between the fruit and other G/(R þ G þ B) also lie in the range [0,1].
Fig. 3 e Relationship between the fruit group and the background group in G L B, G L R and G/(R D G D B) from all training
templates. The x-axis, y-axis and z-axis are normalised values for G L B, G L R and G/(R D G D B) respectively. Red and
green dots are for red and green fruits respectively. The background group is represented by blue dots.
b i o s y s t e m s e n g i n e e r i n g 1 1 8 ( 2 0 1 4 ) 2 0 3 e2 1 5 207
Colour classifier Given the transformed vectors for the pedestrians, and here we explore its potential to obviate the
templates of red fruit, green fruit and background, we then need for many different shapes and sizes of templates to
constructed a Naive Bayes classifier with these three classes detect our highly variable fruit shapes.
using prior parameters given in Table 1. For simplicity and For each initial point, we allocate a support window cen-
speed, we ignored correlation between variables. The classi- tred at the point in order to provide sufficient image infor-
fier was applied separately to the transformed colours of every mation for recognition. In this work, the size of the support
pixel in an image, and the posterior probabilities of red and window used was 4090 pixels, which was based on the
green fruits were combined. Figure 4 shows an example of the average size of the fruit templates.
posterior probabilities for the combined fruit group and the Feature extraction To determine whether a fruit is present
background group. We applied a thresholding on the posterior in a support window, we describe the window using two
probabilities Tp to obtain the initial points of interest. different feature sets: MSCR features (Forssén, 2007) and
texture features obtained by local range filters.
An MSCR feature set is a set of descriptors of coloured el-
3.2. Bag-of-Words model
lipses in a window. These descriptors are found using an
MSCR detector, which is an extension to colour of the maxi-
Our approach is inspired by, and builds on, Nilsback and
mally stable extremal region (MSER) covariant region detector
Zisserman (2006), who used a ‘bag of words’ (BoW) approach
(Matas, Chum, Martin, & Pajdla, 2002). The original MSER de-
to classify flowers. We combine this with the use of Maximally
tector finds regions (ellipses) that are stable over a wide range
Stable Colour Region (MSCR) features (Forssén, 2007). The
of thresholdings of a grey-scale image. In MSCR, regions are
methodology is well established in detection and tracking of
Fig. 4 e An example illustrating the fruit recognition method. We first identify a number of possible fruit positions (initial
points of interest), and then verify each fruit position for the removal of false and duplicated estimates by a Bag-of-Words
model. Initial fruit probability is calculated based on G L B, G L R and G/(R D G D B). Fruit recognition is obtained by applying
the Bag-of-Words model on the initial points of interest.
208 b i o s y s t e m s e n g i n e e r i n g 1 1 8 ( 2 0 1 4 ) 2 0 3 e2 1 5
detected that are stable across a range of time-steps in an similarly for the local range features, following Nilsback and
agglomerative clustering of image pixels, based on proximity Zisserman (2006). Then, using the constructed vocabularies,
and similarity in colour (Forssén, 2007). Default parameters as we learned the frequency distribution of the combined vo-
described in Forssén (2007) were used. The obtained feature cabularies (2000 words) for the training data.
set provides an approximate description of the ‘objects’ in a SVM classifier Finally a support-vector-machine (SVM)
window (see Fig. 5(b) in the form of ellipses, which constitute classifier was used on all the frequency distributions of the
an affine-invariant object representation when viewed from training data to represent two groups, Fruit and Others. In ef-
different angles. We used the geometric shape (five variables) fect, the BoW model represents each image by a frequency
and mean colour (three variables) of the fitted ellipses as a distribution of its visual vocabularies.
feature set.
Besides MSCR features, texture features from local range 3.3. Using bag-of-words model
filters are also used. A local range filter simply calculates per
colour the difference between the largest and smallest in- For processing a validation image, we first find the initial
tensity in the filter window. Nilsback and Zisserman (2006) points of interest in an image, as described in Section 3.1. Next
used a set of filters with 4 sizes: 3, 7, 11, 15, to define the tex- the MSCR and local range features are calculated per window
ure. To reduce the amount of computational load and mem- at each initial point. From the quantised vector of these two
ory, we used only two filter sizes and good results were vocabularies, the frequency distribution in the bag-of-words
obtained with the filter sizes 5 5 and 9 9 (see Figures 5(c) frequency histogram is calculated, which subsequently is
and (d)). classified to a fruit class or not, using the SVM.
Bag-of-Words (BoW) frequency distribution Next a so- The outputs include fruit locations, and each estimate also
called ‘bag of words’ approach is used as proposed by Zisser- has a weight threshold for the two classes. The weight
man and collaborators (Nilsback & Zisserman, 2006; Sivic & threshold W is the (arbitrary) distance computed from the
Zisserman, 2008). Nilsback and Zisserman (2006) describe a SVM classification and a higher value means that the window
set of flower images by creating a flower vocabulary, using is more likely to belong to that class. In fact, the values
three different vocabularies for colour, shape (SIFT features) quantify how far an object of interest is from the decision line
and texture (using the filter bank). Each vocabulary vector is separating the two groups. A smaller value means closer to
quantised (discretised) to obtain so-called Visual Words. The the borderline, while a larger value means it is more likely to
frequency histogram of the visual words form a so-called bag belong to that class.
of words. These frequency histograms can then be used to When points of interest are ‘close’ together, we obtain
calculate similarities, yielding a quick search method for im- multiple classifications. In that case, we select the point/
ages or videos (Sivic & Zisserman, 2008). window with the highest weight threshold W. ‘Close’ is
We used K-means clustering to construct a vocabulary defined here as the overlap between the two windows, and we
with 1000 ‘words’ to represent the MSCR features, and consider that two windows which have more than 50%
Fig. 5 e MSCR features and image textures. Each ellipse in (b) is an MSCR feature, and its region is filled by the average colour
of that ellipse. The MSCR features provide an approximation to an image. Regions not covered by the MSCR features were
shown as blue. The 5 3 5 and 9 3 9 textures are obtained by a range filter on the colour image. The colour indicates the
magnitude of local colour variation, e.g. green regions indicate large variation in green colour. (a) Image (b) MSCR (c) 5 3 5
texture (d) 9 3 9 texture.
b i o s y s t e m s e n g i n e e r i n g 1 1 8 ( 2 0 1 4 ) 2 0 3 e2 1 5 209
overlap with each other are ‘close’. Overlap is calculated by the results we sum the K’s at the 4 heights to obtain total fruit
the intersection of two detection windows divided by their count. Let (ak,bk) denote the true column & row coordinates of
union. fruit k in image 0, and gk the true shift in column coordinate
Overall, the recognition method performs reasonably well between consecutive images. We propose as our observation
given the challenges we faced (see discussion section). The model:
relationship between successive views must be investigated 2
to help filter out isolated false positives, find occluded fruits xij akðijÞ þ igkðijÞ s ; 0
wN ; x ; (1)
yij bkðijÞ 0; s2y
and produce total fruit count.
where k(ij) denotes the correct fruit label of the observation
indexed (i,j), and ðs2x ; s2y Þ denote the variances of the normally
distributed observation errors. There are 3K parameters (a, b,
4. Fruit counting from multiple views
g) associated with each experimental plot, together with 2
variance parameters which are common to all plots. The
Since we observe the same plant/fruit in multiple images, we
challenge is to estimate the number of fruits, K, in the pres-
need to combine this information into a single result. The
ence of the remaining nuisance parameters.
aim is to count the correct number of fruits K in a plot, while
In principle, we could estimate all parameters by max-
preventing double counting of the same fruit which may
imising the likelihood of the model specified by (1), condi-
appear in multiple images as well as correctly counting fruits
tional on K, and estimate K using likelihood ratio tests.
that are possibly missed in certain views (e.g. because of
However, this would involve an enormous combinatorial
occlusion). Note that a fruit is approximately shifted by a
search to assign fruit labels k(ij), so direct optimisation is
fixed amount (g) in the horizontal direction in consecutive
computationally infeasible. Reversible jump Markov chain
images, depending on its distance from the camera. This
Monte Carlo is another possible approach to tackle this
property will be used to find the same fruit in multiple
problem, but is problematic because of the large dataset of
images.
40,000 observations, so we have instead developed a simpler,
Consider a contiguous sequence of images from a single
much faster, more ad hoc method. We first estimate s2y by
experimental plot at one of the four camera heights. Let
fitting a mixture distribution to differences in y between pairs
(xij,yij) denote the column & row coordinates of the jth fruit
of observations. Similarly we estimate s2x by considering trip-
located in the ith image, where i ¼ 0,.,I and j ¼ 1,.,Ji. For
lets of observations. Finally, for each plot we apply a 95%
example, Fig. 6 shows illustrative data for a short sequence of
significance threshold rule to identify non-overlapping sets of
3 images. It is likely that some of these data are repeat ob-
observations of a single fruit, starting with the largest possible
servations of the same fruit in different images, because
some row coordinates (y) are very similar and column co-
ordinates (x) shift by similar amounts between images. In
order to estimate the number of fruits we need to determine
which are repeat observations.
Suppose that there are K fruits observed at least once in an
experimental plot, indexed by k ¼ 1,.,K. To simplify exposi-
tion, we will only consider a single camera height, though in
set size (I) and progressively reducing until we are left with range from 0 to 1000 as in Table 2. The BoW model elimi-
singletons. The number of sets is our estimate of K. See the nated 2/3rds of the false positives in the initial estimates
Appendix for details. and the minimum precision is 0.61. False positives can
almost be eliminated, but the number of true positives re-
duces and the miss detection rate can become quite high.
5. Results For example, the highest precision was 0.97, but the recall
was only 0.17 and the F1 score was lower than 0.3. For the 10
To quantify the performance of our fruit recognition method experimental plots, the highest F1 score was 0.65 at
(i.e. first finding the points of interest and then classifying W 100, and the F1 score was also above 0.6 for thresholds
the bag-of-words), we used the precision-recall curve and {0,200,300}.
the ground truth consisted of manually labelled fruit posi- It should be noted that the fruit count of a plot could not be
tions for a single row of 10 experimental plots (408 valida- estimated using single images only. Plants were visible in a
tion images in total, see Section 2). If the overlap between a successive sequence of images, and there were multiple
window classified as fruit (detection) and a similar sized plants in an image. We therefore applied the multiple-view
window around the ground truth position is greater than fruit counting algorithm to the 10 experimental plots where
50%, then the detection is considered a true positive; locations of fruits had been visually identified from images
otherwise the detection is a false positive. We treat each (i.e. the ground truth for evaluating fruit recognition method).
detection as unique for a fruit: if there are multiple de- Using the methods described in the Appendix, we obtained
tections satisfying the overlap criterion, the one with s 2x ¼ b
b s 2y ¼ 52 . We note that these are smaller than those for
maximum overlap is the true positive and the others are automatic fruit detection, showing the superiority of the
considered false positives. A false negative represents a human eye. Figure 8 shows the results, with 94.6% correlation
ground truth position which has no corresponding detec- between K and K b VIS . We see that fruit numbers are over-
tion. The precision is defined as, estimated for all but one plot.
This is most likely due to fruits from border plants
Precision ¼ TruePositive=ðTruePositive þ FalsePositiveÞ appearing in images although they are excluded from manual
The recall is: counting. Also, as we had four vertical levels of images, some
fruits may have appeared at both the top of one image and the
Recall ¼ TruePositive=ðTruePositive þ FalseNegativeÞ bottom of the one above. However, these sources of over-
We also combine both precision and recall into a single estimation will be convolved with those from under-
score F1, estimation, mainly from occlusion due to fruits being hidden
behind leaves.
F1 ¼ 2 Precision Recall=ðPrecision þ RecallÞ We applied the multiple-view algorithm to all experi-
Figure 7 presents the precision-recall performance for our mental plots in the validation trial with at least 12 images at
fruit recognition methods. The result for the initial points of each of the 4 camera heights, totalling 435, for a range of
interest was obtained by varying the threshold Tp weight thresholds W. This is more than the 264 plots stated
(0.3,0.4,0.5,.0.9). In case of overlapping detection windows,
we used the one with the highest posterior probabilities Tp.
There were many false positives using the initial colour
classifier, resulting in low precision (0.45). However, pre-
cision is less relevant, as this only provides initial esti-
mates. We do want a high recall however for initial points,
to make sure that we do not miss fruits. For the BoW model,
Tp was set to 0.9, and the weight threshold W was in the
Figure A.1 e Derived data used to estimate s b2y and sb2x : (a) differences between pairs of row coordinates (D) plotted against
estimated shift (bg) for restricted range of values; (b) square-root of residual sums of squares of model fit to triplets of column
coordinates (Sx) plotted against estimated shift (b g) for restricted range of value; (c) histogram of values of D and maximum
likelihood fit (red line) of mixture of normal and uniform distributions; (d) histogram of values of Sx and maximum
likelihood fit (red line) of mixture of half-normal and uniform distributions.
Figure A.1(a) shows D plotted against gb for the full dataset, distribution being more spiked than a normal, but we are not
restricted to jDj DM h100 and 30 g b 150, which is a con- overly concerned with this discrepancy as statistical inference
servative range of values that g can take for fruits, given the is usually robust to normality assumptions and use of other
distance from the plants to the cameras. (For clarity, only a distributions would greatly complicate estimation to follow.
random 10% of data are plotted.) We see a cluster of values We next consider all observation triplets from the same
around (D,g) z (0,60), which are likely to be repeat observa- plot. However, for subsequent useage, we will express this in
tions of the same fruit. Figure A.1(c) shows the histogram of the greater generality of n observations {(i1,j1),(i2,j2),.,(in,jn)}
D, which looks well approximated by a mixture of a normal such that i1<i2<.<in. Given such a set, we can estimate (a,b,g)
and a uniform distribution, and this agrees with what we by least squares, analytically using standard formulae, and we
expect if distances between neighbouring fruits can be can also compute residual sums of squares S2x and S2y . For row
assumed to be approximately uniformly distributed: coordinates (y):
( 1 X
n n
X 2
N 0; 2s2y if k i1 ; j1 ¼ k i2 ; j2 b
b¼ yil ;jl S2y ¼ yil ;jl b
b ;
ðDjjDj DM Þw n
UðDM ; DM Þ otherwise: l¼1 l¼1
In particular, for a triplet (n¼3), S2x ws2x c21 , so Sx wNþ ð0; s2x Þ, The algorithm is fast because in step 2 most sets can be
the positive half of a normal distribution. excluded with fewer than n points, because S2 increases
Figure A.1(b) shows a plot of Sx against gb for triplets from all monotonically as points are added to a set. So a tree search
experimental plots in the dataset, restricted to Sx SM h 50 can be used. Figure A.2 shows the results of the algorithm
and 30 gb 150. We also restrict to S2y b s 2y c22 ð95%Þ to ensure applied to the data in Fig. 6. As there are only 3 images in this
that values of y are consistent with repeat observations of a illustrative example, we start with n ¼ 3. The group marked ‘A’
single fruit, at a 95% level of significance. (For clarity, again are the first identified, with S2 ¼ 2.0, followed by ‘B’ with
only a random 10% of data are plotted.) Similar to D, the data S2 ¼ 5.3. No other triple of observations remaining has S2
are consistent with c23 ð95%Þ ¼ 7:8, so we then search for sets of size n ¼ 2. We find 5
pairs, labelled ‘C’.‘G’ with increasing values of S2
Nþ 0; s2x if k i1 ; j1 ¼ k i2 ; j2 ¼ k i3 ; j3 c21 ð95%Þ ¼ 3:8. Figure A.2 also shows the fitted values. Three
Sx Sx SM w
Uð0; SM Þ otherwise:
unassigned points remain, singletons labelled ‘H’,‘I’,‘J’, and we
Figure A.1(d) shows the histogram of Sx from the full infer that the total number of observed fruit to be
dataset, and the maximum likelihood fit, with b s 2x ¼ 6:672 . b ¼ 2 þ 5 þ 3 ¼ 10.
K
Again, there is some evidence for the distribution being more
spiked than a normal, which again does not overly concern us.
We also note that b s 2x < b
s 2y , indicating that column locations of references
fruit are more easily determined than row locations.
Now we have estimates of the 2 variance parameters, we
can consider data from each experimental plot separately to Alimi, N., Bink, M., Dieleman, J., Nicola, M., Wubs, M.,
estimate K. Although it is possible to estimate the number of Heuvelink, E., et al. (2013). Genetic and QTL analyses of yield
pairs and triplets of observations from the same fruit, it is not and a set of physiological traits in pepper. Euphytica, 190,
181e201.
possible to extend this to direct estimation of K. Instead, by the
Breitenstein, M., Reichlin, F., Leibe, B., Koller-Meier, E., & Van
following algorithm we can identify sets of observations Gool, L. (2009). Robust tracking-by-detection using a detector
which are inferred to have been of a single fruit, at a 95% level confidence particle filter. In IEEE 12th international conference on
of significance. For each plot: computer vision (pp. 1515e1522).
Brosnan, T., & Sun, D.-W. (2004). Improving quality inspection of
1. Initialise set size n ) (I þ 1) and estimated number of fruits food products by computer vision e a review. Journal of Food
b
K)0; Engineering, 61(1), 3e16.
Bulanon, D., Kataoka, T., Ota, Y., & Hiroma, T. (2002). A
2. Find the set of size n, {(i1,j1),(i2,j2),.,(in,jn)}, which minimises
segmentation algorithm for the automatic recognition of fuji
S2, subject to i1 < i2 < . < in (but no contiguity constraint) apples at harvest. Biosystems Engineering, 83(4), 405e412.
and 50 g b 130 (Note, we use this realistic range of values Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for
for g, rather than the conservative range in Figure A.1); human detection. In IEEE computer society conference on computer
3. If S2 c22n3 ð95%Þ then accept this set, remove the n data points vision and pattern recognition (pp. 886e893).
from further consideration, K)ð b b þ 1Þ, and return to step 2;
K De-An, Z., Jidong, L., Wei, J., Ying, Z., & Yu, C. (2011). Design and
control of an apple harvesting robot. Biosystems Engineering,
4. n ) (n 1), and return to step 2 provided n 2;
110(2), 112e122.
b
5. K)ð b þ number of remaining singletonsÞ.
K
Ferrari, V., Fevrier, L., Jurie, F., & Schmid, C. (2008). Groups of
adjacent contour segments for object detection. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 30(1), 36e51.
Forssén, P.-E. (2007). Maximally stable colour regions for
recognition and matching. In IEEE conference on computer vision
and pattern recognition (pp. 1e8).
Furbank, R. T., & Tester, M. (2011). Phenomics e technologies to
relieve the phenotyping bottleneck. Trends in Plant Science,
16(12), 635e644.
Gevers, T., & Smeulders, W. M. (1999). Color-based object
recognition. Pattern Recognition, 32, 453e464.
Jimenez, A., Jain, A., Ceres, R., & Pons, J. (1999). Automatic fruit
recognition: a survey and new results using range/attenuation
images. Pattern Recognition, 32(10), 1719e1736.
Ji, W., Zhao, D., Cheng, F., Xu, B., Zhang, Y., & Wang, J. (2012).
Automatic recognition vision system guided for apple
harvesting robot. Computers & Electrical Engineering, 38(5),
1186e1195.
Kitamura, S., & Oka, K. (2005). Recognition and cutting system of
sweet pepper for picking robot in greenhouse horticulture. In IEEE
international conference on mechatronics and automation (Vol. 4);
(pp. 1807e1812).
Lee, W., Alchanatis, V., Yang, C., Hirafuji, M., Moshou, D., & Li, C.
(2010). Sensing technologies for precision specialty crop
Figure A.2 e Data in Fig. 6, showing sets of points identified production. Computers and Electronics in Agriculture, 74(1), 2e33.
by algorithm, with ‘A’.‘J’ denoting identification order Leibe, B., Leonardis, A., & Schiele, B. (2004). Combined object
(see text), and fitted values (red dots). categorization and segmentation with an implicit shape
b i o s y s t e m s e n g i n e e r i n g 1 1 8 ( 2 0 1 4 ) 2 0 3 e2 1 5 215
model. In ECCV workshop on statistical learning in computer vision images with application to automatic plant phenotyping. In
(pp. 17e32). 17th Scandinavian conference on image analysis, SCIA 2011 (pp.
Leibe, B., Seemann, E., & Schiele, B. (2005). Pedestrian detection in 467e478).
crowded scenes. In IEEE computer society conference on computer Stajnko, D., Lakota, M., & Hocevar, M. (2004). Estimation of
vision and pattern recognition (pp. 878e885). number and diameter of apple fruits in an orchard during the
Linker, R., Cohen, O., & Naor, A. (2012). Determination of the growing season by thermal imaging. Computers and Electronics
number of green apples in rgb images recorded in orchards. in Agriculture, 42(1), 31e42.
Computers and Electronics in Agriculture, 81, 45e57. Tanigaki, K., Fujiura, T., Akase, A., & Imagawa, J. (2008). Cherry-
Matas, J., Chum, O., Martin, U., & Pajdla, T. (2002). Robust wide harvesting robot. Computers and Electronics in Agriculture, 63(1),
baseline stereo from maximally stable extremal regions. In 65e72.
Proceedings of the British machine vision conference (pp. 384e393). Teixido, M., Font, D., Palleja, T., Tresanchez, M., Nogues, M., &
Nilsback, M.-E., & Zisserman, A. (2006). A visual vocabulary for Palacin, J. (2012). Definition of linear color models in the
flower classification. In Proceedings of the IEEE conference on RGB vector color space to detect red peaches in orchard
computer vision and pattern recognition (Vol. 2); (pp. 1447e1454). images taken under natural illumination. Sensors, 12,
Polder, G., van der Heijden, G. W. A. M., Glasbey, C. A., Song, Y., & 7701e7718.
Dieleman, J. A. (2009). Spy-See - advanced vision system for van der Heijden, G., Song, Y., Horgan, G., Polder, G., Dieleman, A.,
phenotyping in greenhouses. In Proceedings of the MINET Bink, M., et al. (2012). SPICY: towards automated phenotyping
Conference: Measurement, sensation and cognition (pp. 115e117). of large pepper plants in the greenhouse. Functional Plant
National Physical Laboratory. Biology, 39(11), 870e877.
Reis, M. J. C. S., Morais, R., Peres, E., Pereira, C., Contente, O., Yang, L., Dickinson, J., Wu, Q., & Lang, S. (2007). A fruit
Soares, S., et al. (2012). Automatic detection of bunches of recognition method for automatic harvesting. In 14th
grapes in natural environment from color images. Journal of International conference on mechatronics and machine vision in
Applied Logic, 10(4), 285e290. practice (pp. 152e157).
Sivic, J., & Zisserman, A. (2008). Efficient visual search for objects Zhao, J., Tow, J., & Katupitiya, J. (2005). On-tree fruit recognition
in videos. Proceedings of the IEEE, 96(4), 548e566. using texture properties and color data. In International
Song, Y., Glasbey, C. A., van der Heijden, G. W. A. M., Polder, G., & conference on intelligent robots and systems (pp. 263e268).
Dieleman, J. A. (2011). Combining stereo and Time-of-Flight