0% found this document useful (0 votes)
55 views6 pages

2D/3D Vision-Based Mango's Feature Extraction and Sorting: Thanarat Chalidabhongse, Panitnat Yimyam Panmanas Sirisomboon

This document describes a vision system that can extract 2D and 3D visual properties of mangoes from images and use them for sorting. It takes multiple images of mangoes from different views, extracts silhouettes, and measures 2D properties like size from top views and thickness from side views. It also reconstructs the 3D shape using the silhouettes to measure volume and surface area. These visual properties are then used for automatic mango sorting.

Uploaded by

MekaTron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

2D/3D Vision-Based Mango's Feature Extraction and Sorting: Thanarat Chalidabhongse, Panitnat Yimyam Panmanas Sirisomboon

This document describes a vision system that can extract 2D and 3D visual properties of mangoes from images and use them for sorting. It takes multiple images of mangoes from different views, extracts silhouettes, and measures 2D properties like size from top views and thickness from side views. It also reconstructs the 3D shape using the silhouettes to measure volume and surface area. These visual properties are then used for automatic mango sorting.

Uploaded by

MekaTron
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2D/3D Vision-Based Mangos Feature Extraction and

Sorting
Thanarat Chalidabhongse, Panitnat Yimyam

Panmanas Sirisomboon

Faculty of Information Technology,


King Mongkuts Institute of Technology Ladkrabang,
Bangkok, Thailand
E-mail: [email protected] ,[email protected]

Department of Agricultural Engineering, Faculty of


Engineering, King Mongkuts Institute of Technology
Ladkrabang, Bangkok, Thailand
E-mail: [email protected]

Abstract This paper describes a vision system that can


extract 2D and 3D visual properties of mango such as size
(length, width, and thickness), projected area, volume, and
surface area from images and use them in sorting. The 2D/3D
visual properties are extracted from multiple view images of
mango. The images are first segmented to extract the silhouette
regions of mango. The 2D visual properties are then measured
from the top view silhouette as explained in [7]. The 3D mango
volume reconstruction is done using volumetric caving on
multiple silhouette images. First the cameras are calibrated to
obtain the intrinsic and extrinsic camera parameters. Then the
3D volume voxels are crafted based on silhouette images of the
fruit in multiple views. After craving all silhouettes, we obtain
the coarse 3D shape of the fruit and then we can compute the
volume and surface area. We then use these features in
automatic mango sorting which we employ a typical
backpropagation neural networks. In this research, we employed
the system to evaluate visual properties of a mango cultivar
called Nam Dokmai. There were two sets total of 182 mangoes
in three various sizes sorted by weights according to a standard
sorting metric for mango export. Two experiments were
performed. One is for showing the accuracy of our vision-based
feature extraction and measurement by comparing results with
the measurements using various instruments. The second
experiment is to show the sorting accuracy by comparing to
human sorting. The results show the technique could be a good
alternative and more feasible method for sorting mango
comparing to humans manual sorting.
Keywords Vision system, Vision-base fruit sorting, 3D
Volume reconstruction, Feature extraction

I.

INTRODUCTION

Thailand is known as a home to a large variety of fruits,


both seasonal and all-year-round. Mango is one of the
potential fruits for both local and international markets. Its
industry plays an important role in the countrys export
economy. It is grown in all regions but area of concentration
is in the Central and Northeast. At present, mango production
has greatly been developed through improved varieties to meet
the consumer taste and preference. Overseas demand for Thai
mango has steadily increased both in the forms of fresh and
canned fruit. In grading mangoes for export, the farmers must

1-4244-0342-1/06/$20.00 2006 IEEE

examine all the harvested mature mangoes by eyes and hands.


In sorting mangoes, some confused mango is resolved by
weighing. This is quite subjective. So, farmers need
alternatives for sorting and grading mangoes since hand labor
is costly and inaccurate. An automated mango sorting system
could be more feasible.
In recent years, image processing and computer vision
technology has become more potential and more important to
many areas including agricultural industry. Machine vision has
been often used to develop automatic fruits/vegetable sorting
and grading. Color image processing techniques, presented in
[1-3], were proposed to judge maturity levels or growing of
the agricultural products.
Reference [3] developed an
automated fruit grading system using physical properties
analysis such as size, maximum and minimum diameters,
projected area, shape, color, bruise, etc. Cunha [4] used
recognition technique to analyse the pathological stress
conditions and characterization of the fruits or plant leafs.
Runtz and Dave [5,6] applied image processing techniques for
classification and identifying of the plant species. All above
systems are very beneficial to the agricultural industry.
There have been many works in agricultural product
sorting using vision techniques. However, most of them work
on 2D vision such as work presented in [3] which considered
the size such as diameters or projected area. Also, most
techniques are suitable for circular agricultural products.
Thus, the motivation of our work is to develop a method that
can extract both 2D and 3D visual properties of generic shape
fruits for grading and sorting.
This report describes a vision system that can extract the
2D and 3D mangos visual features and use them in fruit
sorting. The work is an extension of the previous system
proposed in [7] which introduced techniques for extract and
analyse the 2D mango properties which include projected area,
length and width. This report introduces an additional 2D
property which is thickness and also 3D properties, volume
and surface area, of the fruit. The thickness comes from the
side view image. The 3D fruit volume reconstruction is done
using volumetric caving on multiple silhouette images. First
the cameras are calibrated to obtain the intrinsic and extrinsic
camera parameters. Then the 3D volume voxels are crafted

ICARCV 2006

based on silhouette images of the fruit in multiple views.


After craving all silhouettes, we obtain the coarse 3D shape of
the fruit and then we can compute the surface area. We then
use these features in automatic mango sorting which we
employ a typical backpropagation neural networks.
We employed the system to evaluate visual properties of a
mango cultivar called Nam Dokmai. There were two sets
total of 182 mangoes in three various sizes sorted by weights
according to a standard sorting metric for mango export. Two
experiments were performed. One is for showing the accuracy
of our vision-based feature extraction and measurement by
comparing results with the measurements using various
instruments. The second experiment is to show the sorting
accuracy by comparing to human sorting.
The main objective of this research is to investigate the
potential of using image processing and computer vision
techniques on mangos external physical analysis as an
alternative or supplemental to the traditional manual method.
The method is shown to be more convenient, comfortable,
efficient, and less time consuming than management by human.
In additions, the method can be extended and applied to
analyse and extract visual properties of other agricultural
products as well.

II. SHAPE FROM SILHOUETTES


The reconstruction of objects 3D shape from its multiple
2D images has been a challenging problem in the field of
computer vision. Shape from silhouettes is well-known
technique for estimating 3D shape. The process starts by
obtaining the multiple views images of the object which can
perform in many ways. The works presented in [8-14] capture
an object images using multiple cameras which are located in
the different positions surrounding the object. References
[15,16] use a fixed camera capturing image sequence of an
object which is placed on a turning table. Kuzu and Sinram
[17] use a camera moving surrounding an object to capture
multiple views sequence. After camera calibration is done to
produce the intrinsic and extrinsic parameters of each camera,
these parameters are then used to reconstruct 3D model. There
are approaches for 3D model reconstruction. Reference [10]
employed a basic theory of epipolar geometry to reconstruct
the 3D shape of the object by considering camera calibration
information and matching points in the images. References [1115,17-18] used volumetric intersection techniques. The idea is
based on silhouette constraints that each 2D silhouette of an
object constrains the object inside the volume produced by
back- projecting the silhouette from the corresponding view
point. The object is generated by intersection of backprojecting silhouette images into the 3D world space.
Reference [14] applied parallel processing for the volume
intersection because the back-projection process can be divided
into small independent processes. The back-projecting of each
silhouette to 3D space runs on each computer without referring
to other silhouettes. The method generates a coarse conical
shape, which is an approximate model that called visual hull.
The shape is more similar and accurate to the real object, if it is
generated by more multiple view images of the object.

However, the concavities and critical areas on the object cannot


be recovered because the viewing region doesnt completely
surround the object. Works presented in [8,9,16] employ
another methods for 3D reconstruction which is space carving
technique. This technique is quite intuitive. After camera
calibration is processed to obtain the intrinsic and extrinsic
camera parameters. A 3D bounding box is modelled to be an
initial volume model of the object, then discrete it into voxels.
The algorithm is performed by projecting each voxel onto each
image plane. If the projected voxel is not totally contained by
the silhouette region, the voxel is removed from the object
volume. Otherwise, it is kept in the object volume.

III. PROPOSED SYSTEM


In this section, we describe in details our methods in
processing and analyzing mango images in order to extract
their 2D and 3D visual properties. We divide the section into
five subsections: first subsection discusses our image
acquisition method and setup.
The second subsection
describes camera calibration. The third subsection describes
the method for 3D object reconstruction from multiple views
of silhouette images. Then, next subsection describes methods
for computing 2D and 3D mangos visual properties. The last
one, we discuss about the automatic sorting method using
these extracted features.
A. Mango Images Acquisition
Our image acquisition setup is shown in Fig. 1. The four
digital cameras are used. One camera captures top-view image
of the object; the other three are put surrounding the object to
capture three side-view images. The images are 1200 x 900
pixels and saved in raw RGB format. The mango is placed on
a back-and-white chessboard calibration grid with 20mm x
20mm grid size. The captured images are shown in Fig. 2.

Figure 1. Image acquisition setup.

After capturing all mango images, the region of mango is


segmented by our methods explained in [7], and it is defined to
be the object silhouette. Fig. 3 shows the four silhouette
images which are results of our segmentation on the input
images shown in Fig. 2.

Top-view

Side-view1

Side-view2

Side-view3

C. 3D Volume Reconstruction
After obtaining the intrinsic and extrinsic camera
parameters from calibration, the next process is based on space
carving method. A large bounding box is modelled to be an
initial volume enclosing the 3D as initialization. After that we
divide the whole volume into cubical voxels. Then the voxel
space containing n3 voxels is generated. Subsequently, we
project each voxel in the voxel space onto the images by using
the corresponding camera parameters. If the projected voxel
falls outside the silhouette in at least one view, it is discarded
from the volume which is set to be transparent. Otherwise, it is
kept in the object voxel space and set opaque to be an object
voxel. After all voxels in the voxel space are processed, the
remaining is an approximation of the object volume. The
correct object volume is definitely equal or less than this rough
approximation. Fig. 4 illustrates the explained method of 3D
volume reconstruction.

Figure 2. The captured images from four camera views.

Top-view

Side-view1

Figure 4. Voxel space and 3D volume reconstruction from multiple


silhouette images.

Side-view2

Side-view3

Figure 3. Segmentation results.

B. Camera Calibration
To reconstruct 3D model from multiple view images, we
calibrate our cameras using method in [19] to obtain both
intrinsic and extrinsic camera parameters.
The intrinsic camera parameters are the parameters to link
the pixel coordinates of an image point with the corresponding
coordinates in the camera reference frame. The intrinsic
camera parameters include focal length (fx,fy), principal point
coordinate (Ox,Oy), skew coefficient (S) and image distortion
coefficients (kc) which contain both radial and tangential
distortions.
The extrinsic camera parameters are the parameters that
define the orientation and location of the camera reference
frame with respect to the world reference frame. The extrinsic
camera parameters: rotation matrix (R) and translation vector
(T).

To project the voxel onto image planes, we follow these


equations. First, transform the world coordinate to camera
coordinate by considering extrinsic camera parameters using
this equation
Xc = RXw+ T

(1)

where Xc is an objects coordinate in the camera coordinate


and Xw is an objects coordinate in the world coordinate.
Equation (2)-(6) are then used to transform the camera
coordinate to pixel coordinate by considering intrinsic camera
parameters.

xc / z c x
xn =
=
yc / z c y

(2)

r 2 = x2 + y2

(3)

2 kc xy + kc 4 (r 2 + 2 x 2 )
dx = 3 2

2
kc 3 (r + 2 y ) + 2 kc 4 xy

x dx
2
4
6
= 1 + kc1 r + kc 2 r + kc5 r x n + dx
y
dy

x p f x
y = 0
p
1 0

Sf x
fy
0

O x x dx
O y x dy
1 1

(4)
W

L
T

(5)
(a) Top-view

(6)

(b) Side-view2

Figure 5. 2D visual features: (a) Projected Area (A), Length


(L), Width(W), and (b) Thickness (T).

where xn is normalized image projection, dx is a tangential


distortion, kc contains both radial and tangential distortion
coefficients, xp,yp is a projected pixel coordinate in the image.
D. 2D/3D Visual Feature Extraction
As mentioned, this work is an extension of the previous
system proposed in [7] which introduced techniques for
extract and analyse the 2D mango properties which include
projected area, length and width. This report introduces an
additional 2D property which is thickness and also 3D
properties, volume and surface area, of the fruit. The 2D/3D
features that our system extracts include:
1) Projected area (A) is defined as the area of the 2D
projection image of the mangos top-view shown in Fig. 5a.
2) Length (L) is defined as the length of the mangos
major axis which lies between the tip and the pole of the
mango.
3) Width (W) is defined as the length of the mangos
minor axis in the top-view image. The minor axis is the
widest line that perpendicular to the major axis.
4) Thickness (T) is defined as the length of the mangos
minor axis in the side-view image.
5) Volume (V) is defined as the volume of 3D object
which is reconstructed using four views of silhouette images.
The volume is estimated by counting the number of voxels
which are set to be object voxel in the 3D object voxel space.
With the known calibration grid size, we can calculate the
volume.
6) Surface area (S) is defined as the surface area of the
reconstructed 3D object. Surface area is estimated by
counting the number of voxels that are located on the surface
of 3D object.
Fig. 5 shows 2D visual properties: A, L, W, and T. Fig. 6
shows the image of the reconstructed 3D mango volume from
four view images which we can computed V and S.
E. Mango Sorting
In sorting mangoes for export, the farmers examine
all the harvested mature mangoes by eyes and hands which are
quite subjective. The manual sorting is usually acceptable but
sometimes mistaken. Moreover, the sorting process by hands
is quite time consuming. Our objective of this work is to

Figure 6. Reconstructed 3D voxel surface.

investigate the potential of using image processing and


computer vision techniques as an alternative or supplemental
to the traditional manual method. To sort the mangoes, we
employ backpropagation neural networks to classify the
mangoes samples into three classes SS, S, L- according to
the standard sorting metric used in export business. The
network is three-layer network. The input layer contains 6
nodes; each corresponds to each of the 6 measurements
described in previous subsection. The network contains one
hidden layer. The output layer contains 3 nodes; each
corresponds to each size (SS, S, L).
IV. EXPERIMENTS AND RESULTS

The proposed techniques mentioned in previous sections


were implemented in C/C++ and MATLAB on a typical PC.
We employed the system to evaluate visual properties of a
mango cultivar called Nam Dokmai obtained from a local
mango farm in Chachoengsao, a province in Central of
Thailand. The mangoes were harvested at the same maturity
as they are harvested for export. There were two sets total of
182 mangoes in three various sizes sorted by weights
according to a standard sorting metric for mango export. The
standard metric for export sorting uses mangos weight: SS is
for mango weighs less than 280 grams, S is for weight
between 280-330 grams, and L is for weight between 331-550
grams.
The mango samples were divided into two groups. First
group consists of 97 mangoes. We analyze this group for two
purposes, first is comparing the feature measurement accuracy
between proposed vision techniques and using instruments.
The other purpose is to compare the classification accuracy

using the extracted visual features. The second group of


mango samples consists of 82 mangoes. We analyze this
group for comparing of classification accuracy between
proposed vision technique and experienced farmers.
A. Experiment#1: Feature Extraction and Analysis
The purpose of this experiment is to evaluate the accuracy
of our proposed vision-based mango feature extraction and
analysis. We did this by comparing the vision-based results
with the results obtained from using instruments typically used
in laboratory. Various instruments were used to measure
mangos physical properties. Length, width and thickness are
measured using vernier caliper. The mangos volume is
measured by weighing mango in the distilled water. The
obtained weight generates the volume using Eq. (7), where Ww
is weight of the fluid displaced by the object (g), is density
of the fluid that is about 1 g/cm3, V is volume of the fluid
displaced by the object (cm3).
Ww = V

TABLE 2
MANGOS PHYSICAL PROPERTIES MEASURED USING PROPOSED
VISION-BASED SYSTEM
Parameters
Av(cm2)
Lv(cm)
Wv(cm)
Tv(cm)
Vv(cm3)
Sv(cm2)

SS

76.92 6.98
13.88 1.00
7.53
0.27
5.42
0.30
279.96 22.12
336.16 18.73

86.83 4.46
14.86 0.59
7.93
0.26
5.71
0.30
327.12 14.02
372.69 11.63

99.50 9.96
15.83 0.99
8.54
0.47
6.02
0.37
401.14 58.85
426.17 40.38

(7)

The projected area is measured using planimeter on


top-view images. Fig. 7 shows some of the equipment setup
and measurement. The measurements were done on the first
set of mangos, and the results are shown in Table 1.

Vernier caliper

To evaluate our vision system, we captured images of


those mangos using four Kodak CX6330 digital cameras. We
then perform techniques we have described above to extract
those features and analyze them to obtain those 6
measurements. The results are shown in Table 2. Compare to
the results from instrument measurement, our vision approach
yields bigger variations.
However, the vision-based
measurement is much efficient in term of speed and
convenience.

Planimeter

Weighing in
distilled water

B. Experiment#2: Mango Sorting


For sorting, we used the backpropagation neural network
toolbox in MATLAB. The two groups of samples in the total
of 182 mangos were used in sorting experiments. All
mangoes were classified into three size groups (SS, S and L)
based on their visual features extracted using our vision-based
technique. The 6 features (A, L, W, T, V, S) were used and the
classification result shows 96.47% accuracy.
For the second sample set, we asked two experienced
farmers sorted mangos using their eyes and hands comparing
to the ground-truth which is the classification based on weight
following the standard metric. The average classification
accuracy of the two experienced farmers is 87.65% (farmer1 :
83.53%, farmer2 : 91.76%).

Figure 7. Equipment setup for measurement by lab instruments.


IV. CONCLUSIONS
TABLE 1
MANGOS PHYSICAL PROPERTIES MEASURED USING LAB
INSTRUMENTS
Parameters
Ai(cm2)
Li(cm)
Wi(cm)
Ti(cm)
Vi(cm3)

SS

78.65
13.34
6.83
6.06
257.09

6.27
0.84
0.18
0.28
18.93

88.14
13.92
7.14
6.31
297.48

3.85
0.59
0.14
0.12
16.32

100.07
14.87
7.62
6.74
365.89

8.86
0.86
0.39
0.34
48.03

We have described image processing and computer vision


techniques to analyze the 2D and 3D mangos physical
properties. Some parameters are defined and calculated for
physical properties. These include projected area (A), length
(L), width (W), thickness (T), 3D volume (V), and 3D surface
area (S). One hundred and eighty two Nam Dokmai cultivar
mangoes in three sizes (SS, S and L) were evaluated. The
experimental results show that our propose techniques could
be a good alternative and more feasible method for grading
and sorting mango comparing to humans manual.

REFERENCES
[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

Y. Gejima, H. Zhang, and M. Nagata, Judgement on Level of Maturity


for Tomato Quality Using L*a*b* Color Image Processing, Proc. the
2003 IEEE/ASME Intl Conf. on Advanced Intelligent Mechatronics
(AIM 2003), 2003.
T. Masahiko, K. Miyamoto, I. Hiroaki, and T. Nishimatu, Near Infrared
Spectral Imaging for the Field Server of Lettuce Growth, SICE, pp.
380-381, 2002.
J.B. Njoroge, K. Ninomiya, N. Kondo, and H. Toita, Automated Fruit
Grading System using Image Processing, Proc. SICE Annual
Conference (SICE2002), Japan, 2002.
J. B. Cunha, Application of Image Processing Techniques in the
Characterization of Plant Leafs, Proc. IEEE Intl Symposium on
Industrial Electronics, 2003.
K.J. Runtz, Electronic Recognition of Plant Species for Machine Vision
Sprayer Control Systems, Proc. IEEE Western Canada Conference on
Computer, Power, and Communication Systems in a Rural Environment
(WESCANEX91), pp. 84-88, 1991.
S. Dave and K. Runtz, Image Processing Methods for Identifying
Species of Plants, Proc. IEEE Western Canada Conference on
Computer, Power, and Communication Systems in a Rural Environment
(WESCANEX95), pp. 403-408, 1995.
P. Yimyam, T. Chalidabhongse, P. Sirisomboon, and S. Boonmung,
Physical Properties Analysis of Mango using Computer Vision, Proc.
ICCAS2005, 2005.
P. Eisert, E. Steinbach, and B. Girod, Multi-Hypothesis, Volumetric
Reconstruction of 3-D Objects from Multiple Calibrated Camera
Views, Proc. ICASSP99, 1999.
K.N. Kutulalos and S.M. Scitz, A Theory of Shape by Space Carving,
Proc. IJCV, 38(3) : 197-216, 2000.

[10] M. Kimura, H. Saito, and T. Kanade, 2D Voxel Construction based on


Epipolar Geometry, Proc. IEEE, 2000.
[11] Y. Iwadate, M. Katayama, K. Tomiyama, and H. Imaizumi, VRML
Animation from Multi-view Images, NHK Laboratories Note No. 478,
2002.
[12] B. Lok, Online Model Reconstruction for Interactive Virtual
Environments, Proc. ACM 2001 Symposium on Interactive 3D
Graphics, Chapel Hill, NC, 18-21, March 2001, pp.69-72.
[13] M. Kampel, R. Sablatnig, and S. Tosovic, Fusion of Surface and
Volume Data, CVGIP: Image Understanding, 58(1) : 23-32, 1993.
[14] T. Wada, X. Wu, S. Tokai, and T. Matsuyama, Homography Based
Parallel
Volume Intersection: Toward Real-Time Volume
Reconstruction Using Active Cameras, Fifth IEEE International
Workshop on Computer Architectures for Machine Perception
(CAMP00), pp. 331-340, 2000.
[15] V. Fremont and R. Chellali, Turntable-Based 3D Object
Reconstruction, Proc. IEEE Conference on Cybernetics and Intelligent
Systems, Singpore, pp. 1276-1281, 2004.
[16] A.Y. Mulayim and V. Atalay, Silhouette-based 3D Model
Reconstruction from Multiple Images, Proc. IEEE Transactions on
Systems, Man and Cybernetics, Part B,
[17] Y. Kuzu and O. Sinram, Volumetric Recontruction of Cultural Heritage
Artifacts, CIPA 2003, XIXth International Symposium, Antalya,
Turkey, pp. 93-98, 2003.
[18] J. Carr, W. Fright, A. Gee, R. Prager and K. Dalton, 3D Shape
Reconstruction using Volume Intersection Techniques, IEEE
International. Conference on Computer Vision Proceedings, pp. 10951110, January 1998.
[19] J.Y. Bouguet, Camera Calibration Toolbox for MATLAB,
unpublished.

You might also like