0% found this document useful (0 votes)
14 views6 pages

Ying 2018

Uploaded by

21224103161
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views6 pages

Ying 2018

Uploaded by

21224103161
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International

Conference on Smart City; IEEE 4th Intl. Conference on Data Science and Systems

A Deep Learning Approach to Sensory Navigation


Device for Blind Guidance
Josh Jia-Ching Ying, Chen-Yu Li, Guan-Wei Wu, Jian-Xing Li, Wei-Jheng Chen, and Don-Lin Yang

Department of Computer Science and Information Engineering, Feng Chia University, Taiwan, ROC

e-mail: [email protected], [email protected], [email protected], [email protected],


[email protected], [email protected]

Abstract—Sensory navigation device is an important trend in conditions, the blind who use the conventional guidance tools
the field of machine learning and data science. Nowadays, more might consider the road conditions is very safe. When the blind
and more sensory navigation devices are built for blind people. walk through the street and found the real road conditions had
The core of such sensory navigation devices for blind people become as shown in the right picture, he might be damaged by
usually is implemented by an Image Recognition Method. To passing around the obstacles. This phenomena is called time-
build an image recognition model, many tools and online machine variate dynamics. As the result, hug number of blind people
learning platforms are proposed. However, these tools or suffer terribly with these time-variate dynamics problem. To
platforms are not able to completely satisfy the requirements for protect blind people against potential losses caused by time-
sensory navigation device. To build a sensory navigation device
variate dynamics, many real-time road condition recognition
with satisfying requirements for blind people, an ability of
mechanisms have been proposed, whereas such mechanisms
reducing the cost of model training and a capability of user-
centric image recognition are the two main issues. Therefore, to for blind guidance are still in its infancy stage. Currently, all of
address the above issues, we propose a novel approach, namely, the real-time road condition recognition mechanisms for blind
DLSNF (Deep-Learning-based Sensory Navigation Framework). guidance more or less rely on the offline modeling manner. The
Our proposed DLSNF is built based on the YOLO architecture to system would collect street view via users’ wearable sensors,
deal with the reducing cost of model training and NVIDIA Jetson and the recognition model would be trained on a server.
TX2 to take the user-centric image recognition into account. However, the blind people still use the out-of-date model to
Based on our proposed DLSNF, the real-time image recognition detect obstacles before the model is updated. In other words,
can be trained well and conduct a sensory navigation to help such offline modeling manner still has the time-variate
blind people. At the same time, the train model is embedded in dynamics problem.
NVIDIA Jetson TX2 which is the fastest, most power-efficient
embedded AI computing device. For the experiments, we Sensory Navigation Device [7] is an important trend in the
evaluated our proposed DLSNF with a real-world dataset field of guidance tool for blind people. The core of the sensory
consisting of 4,570 images collected by part-time workers. The navigation device is a computer program that can lead blind
extensive experimental results show that our proposed DLSNF and visually impaired people around obstacles through voice
more effectively and efficiently beyond the existing baselines. with artificial intelligence, data mining, or customized rules.
Nowadays, there are three main groups of Sensory Navigation
Keywords—Sensory Navigation Device, Deep Learning, Devices, according to their working principle: radar, global
Residual Convolution Neural Network, Blind Guidance. positioning and stereovision. Meanwhile, the most widely
known are the Sensory Navigation Devices based on the radar
I. INTRODUCTION principle [18] [19] [20] . These devices emit laser or ultrasonic
In blind people’s daily life, guidance tools can be realized beams. When a beam strikes the object surface, it is reflected.
in many ways, such as guide dog, guidance tiles, and Tactile Then, the distance between the user and the object can be
sticks. However, the road conditions are always changed such calculated as the time difference between the emitted and
that the guidance tools might make some irreparable mistakes. received beam. A second type of Sensory Navigation Devices
Take figure 1 as an example. The two pictures show the street includes devices based on the Global Positioning System (GPS)
view in different dates. If the left picture shows the usual road [14] [15] [16] . These devices aim to guide the blind user
through a previously selected route; also, it provides user
location such as street number, street crossing, etc.
Unfortunately, although the Sensory Navigation Devices based
on the radar principle or the Global Positioning System have
widely been applied to guidance tool for blind people, these
two types of Sensory Navigation Devices are not able to deal
with the time-variate dynamics problem.
With the development of the webcam, many researchers [5]
[6] [7] proposed the application of stereovision to develop new
Figure 1. An Example of a Dialogue Robot.

978-1-5386-6614-2/18/$31.00 ©2018 IEEE 1195


DOI 10.1109/HPCC/SmartCity/DSS.2018.00201
techniques for representation of the surrounding environment. The remainder of this paper is organized as follows: Section
As the result, the Sensory Navigation Devices based on the 2 presents a review of related research. Section 3 details our
stereovision principle which intend to represent the surrounding proposed Deep-Learning-based Sensory Navigation
environment through acoustic signals has been proposed. Framework (DLSNF). Evaluations of the proposed system are
Unlike the other two types of Sensory Navigation Devices, the presented in Section 4, and we present a case study in Section 5.
Sensory Navigation Devices based on the stereovision principle Finally, the work is concluded in Section 6.
can real-timely be updated. Accordingly, the time-variate
dynamics problem could be partially solved. Although most II. RELATED WORKS
sensory navigation devices based on the stereovision principle In this section, we briefly introduce most popular methods
have been significantly improved with rapid growth of which can be utilized for constructing a blind guidance tool.
technology of artificial intelligence, the core of conventional According to the type of core idea behind the methods, we
sensory navigation devices is still constructed based on the categorize these methods into two categories, Conventional
offline modeling manner. Therefore, it is not realistic to Image Recognition and Deep-Learning-based Image
directly adopt the conventional sensory navigation device as Recognition.
the guidance tool. Based on our observation, applying sensory
navigation device with satisfying requirements for blind people A. Conventional Image Recognition
has two main issues: 1) ability of reducing the cost of model OpenCV. Decade ago, OpenCV [3] is originally introduced by
training and 2) capability of user-centric image recognition. Intel for image and video analysis, originally introduced more
than. In [26] , Xie et al. introduce a method of image edge
As mentioned earlier, the reason why the offline training
detection based on OpenCV with rich computer vision and
manner can not deal with the time-variate dynamics problem is
image processing algorithms and functions. Meanwhile, the
the blind would be damaged because the model is out-of-date.
detection model determines the exact number of the copper
The ability of reducing the cost of model training plays crucial
core in the tiny wire. In [8] , Emami et al. utilize OpenCV
role for speeding up the newer rate. The capability of user-
and .NET framework to build an application that would allow
centric image recognition can improve precision of sensory
user access to a particular machine based on an in-depth
navigation device such that the blind can prevent some
analysis of a person's facial features.
emergency accidents. To build a sensory navigation device
with satisfying above-mentioned requirements, we propose a SIFT. Scale Invariant Feature Transform (SIFT) is a method
novel approach, DLSNF (Deep-Learning-based Sensory for extracting distinctive invariant features from images that
Navigation Framework). Meanwhile, to address the above- can be used to perform reliable matching between different
mentioned issues, DLSNF was built based on the YOLO [18] views of an object or scene [17] . Lowe also describes an
[19] to deal with the ability of reducing the cost of model approach to using these features for object recognition. The
training issue and NVIDIA Jetson TX21 to take the user-centric recognition proceeds by matching individual features to a
image recognition issue into account. Based on our proposed database of features from known objects using a fast nearest-
DLSNF, the user-centric image recognition model can be neighbor algorithm, followed by a Hough transform to identify
trained well and deploy on a sensory navigation device for clusters belonging to a single object, and finally performing
blind guidance. At the same time, combined with real-world verification through least-squares solution for consistent pose
images crawled through our device, it can holistically enhance parameters.
the users’ usage experience.
SURF. In [1] , Bay et al. present a novel scale- and rotation-
The contributions of our research are four-fold: invariant interest point detector and descriptor, coined SURF
(Speeded Up Robust Features). It approximates or even
x We propose the Deep-Learning-based Sensory Navigation outperforms previously proposed schemes with respect to
Framework (DLSNF), a novel approach for blind guidance repeatability, distinctiveness, and robustness, yet can be
tool. The problems and ideas in DLSNF have not been computed and compared much faster.
explored previously in the research community.
BRIEF. In [2] , Calonder et al. propose to use binary strings as
x We develop the Residual-CNN based on YOLO an efficient feature point descriptor, which they call BRIEF.
architecture to deal with the ability of reducing the cost of They show that it is highly discriminative even when using
model training issue in DLCF. relatively few bits and can be computed using simple intensity
x We deploy the trained model on NVIDIA Jetson TX2 to difference tests. Furthermore, the descriptor similarity can be
take the user-centric image recognition issue into account. evaluated using the Hamming distance, which is very efficient
to compute, instead of the L2 norm as is usually done.
x We evaluated our sensory navigation device with a real-
world dataset collected by staffs. The dataset consists of HOG. In [4] , Dalal et al. study the question of feature sets for
4,570 images. The extensive experimental results show that robust visual object recognition, adopting linear SVM based
our sensory navigation device more effectively and human detection as a test case. The result shows
efficiently beyond the existing baselines. experimentally that grids of Histograms of Oriented Gradient
(HOG) descriptors significantly outperform existing feature
sets for human detection.

1
https://developer.nvidia.com/embedded/buy/jetson-tx2

1196
B. Deep-Learning-based Image Recognition
CNN. Convolutional Neural Networks (CNN). In [13] ,
Krizhevsky et al. trained a large, deep convolutional neural
network to classify the 1.2 million high-resolution images in
the ImageNet LSVRC-2010 contest into the 1000 different
classes. On the test data, they achieved top-1 and top-5 error
rates of 37.5% and 17.0% which is considerably better than
the previous state-of-the-art. In [11] , Goodfellow et al.
propose an unified approach that use a deep convolutional
neural network to recognize arbitrary multi-character text in
unconstrained natural photographs. They evaluate this
approach on the publicly available SVHN dataset and achieve
over 96% accuracy in recognizing complete street numbers. Figure 2. An Illustration of Object Annotation of an Image.

R-CNN. In [10] , Girshick et al. propose an approach to improve YOLO’s performance. In [23] , Redmon et al.
combines two key insights: (1) one can apply high-capacity continue to improve the YOLO’s performance so that
convolutional neural networks (CNNs) to bottom-up region YOLOv3 has been proposed in 2018.
proposals in order to localize and segment objects and (2)
when labeled training data is scarce, supervised pre-training III. OUR PROPOSED METHOD
for an auxiliary task, followed by domain-specific fine-tuning, To build the sensory navigation device for blind guidance,
yields a significant performance boost. In [9] , Girshick we first collect images through a webcam and annotate the
proposes a fast R-CNN which can speed up the training objects shown in the images. Then we utilize the annotated
process of deep convolutional networks. Compared to R-CNN, images to train an image recognition model based on the
Fast R-CNN employs several innovations to improve training YOLOv3 architecture. Finally, we deploy the trained model on
and testing speed while also increasing detection accuracy the NVIDIA Jetson TX2.
VGG nets. In [24] , Simonyan et al. investigate the effect of A. Data Preprocess
the convolutional network depth on its accuracy in the large- As mentioned earlier, the idea of our framework is listed as
scale image recognition setting. Their main contribution is a follows: 1) YOLO architecture can be used to deal with the
thorough evaluation of networks of increasing depth using an ability of reducing the cost of model training, and 2) NVIDIA
architecture with very small ( 3 × 3) convolution filters, which Jetson TX2 can take the capability of user-centric image
shows that a significant improvement on the prior-art recognition into account. Therefore, transforming annotated
configurations can be achieved by pushing the depth to 16–19 images into the formulation which is compatible for the
weight layers. training data of model building plays crucial role for building
GoogLeNet. In [25] , Szegedy et al. propose a deep our sensory navigation device.
convolutional neural network architecture codenamed To do so, we first hire several staffs to collect images
Inception that achieves the new state of the art for through a webcam. All objects shown in the collected images
classification and detection. The main hallmark of this would be annotated by the staffs. Here, we adopt the an open
architecture is the improved utilization of the computing source image labeling tool, LabelImg [27] , to annotate the
resources inside the network. objects which is critical for blind guidance. Figure 2 shows an
Residual-Net. In [12] , He et al. present a residual learning example of LabelImg. The image shows several chairs and one
framework to ease the training of networks that are table. Thus, the staff annotate these object and LabelImg would
substantially deeper than those used previously. They output a text file which records the boundary of these object
explicitly reformulate the layers as learning residual functions and their labels.
with reference to the layer inputs, instead of learning
B. Sensory Navigation Device Building
unreferenced functions.
As mentioned earlier, we have already collect and annotate
YOLO. In [21] , Redmon et al. developed a fast single-shot images. In this subsection, we then detail how we build a model
detection method named you only look once (YOLO). YOLO for building an object detection model which can help blind to
is to predict multiclass bounding box candidates directly from pass by skirted the obstacles. In order to producing the object
the grids in the full input images. The combination of the class detection model, we utilize the YOLOv3 architecture, which is
probabilities and bounding box confidence provides the one of the popular types of Residual-CNNs. Figure 3 shows the
resulting detection. The input images are divided into 7 × 7 structure of a neuron of the YOLOv3. Meanwhile, it has 53
grids. Thus, each grid predicts classification probabilities for convolutional layers which uses successive 3 × 3 and 1 × 1
class and candidate bounding boxes with the confidence score. convolutional layers but now has some shortcut connections as
Each bounding box contains five position indicators, including well and is significantly larger.
the box coordinates (x, y, w, h) and the position confidence. In
[22] , Redmon et al. proposed YOLOv2, a faster and more Accordingly, we can realize that the model is too large to be
accurate detector has been proposed. Redmon et al. pool a trained within a reasonable time. Fortunately, the “Residual
variety of ideas from past work with our own novel concepts Neuron Network” inherently has some shortcut connections,
i.e., Residual layer in Figure 3. Such shortcut connections can

1197
(a) (b) (c)
Figure 5. Three type image samples in our database, (a) is a chair sample, (b
is a table sample and (c) is a image contains table and chair sample.

images were resized to 448 x 448. To build the training set, we


manually draw the bounding boxes and assign the labels of
4,570 images.
YOLO’s loss function must simultaneously solve the object
detection and object classification tasks. This function
simultaneously penalizes incorrect object detections as well as
considers what the best possible classification would be. We
implement the following loss function as Equations (1), where
denotes if object appears in cell and denotes that the
jth bounding box predictor in cell is “responsible” for that
Figure 3. An Illustration of YOLOv3 Architecture. prediction.
make the model much simpler if the characteristics of the
training data is simple enough. For example, if our detection
job is simplified as detecting “chair” and “desk”, it only
requires thousands images to train the model, and the trained
model may only need lower 16 convolution layers. Thus, all
residual layer would be activated such that all repeat block
would not be performed. The remaining problem is how to use
this characteristics to speed up training process. The ideal way
is to divide the training dataset into small mini-batch, and to (1)
update the parameters based on accumulation of the loss
produced from a mini-batch. The example of the mini-batch is
shown in Figure 4. As the result, we utilize the mini-batch
manner to build the object detection model within reasonable
learning time

Note that the loss function only penalizes classification


error if an object is present in that grid cell. It also only
Figure 4. Illustration of Model Training Based on Mini-Batch Manner penalizes bounding box coordinate error if that predictor is
“responsible” for the ground truth box.
IV. EXPERIMENTS B. Experimental Results
In this section, we present the results from a series of In this section, we trained our model in set of 4,000 images,
experiments and evaluate the performance of our model. All the testing images are collected from a different scenes but are
the experiments are implemented in Python 3.5 on a NVIDIA
tested under the same computation environment as training.
Tesla K40 GPU machine with 64 GB of memory running
The proposed method displays good results in localizing the
Ubuntu Linux 16.04 LTS. We first describe the preparation of
the datasets then, present and discuss our experimental results. chairs and tables. Figure 6 shows several visualized detection

A. Dataset Description & Performance Metrics


We collected table and chair images, as shown in Figure 5
The data set used in the experiments consists of the chair and
table images captured from various indoor sciences such as
classrooms, library, and conference room, in which 2,084 chair
images, 301 table images, 2,185 images contain both chair and
table. Finally, the dataset was collected 4,570 images. All Figure 6. Most of the chairs and tables can be recognized and dectected

1198
(a) (b) (c)

(d) (e) (f)


Figure 7. Average Loss under Various Hyperparameter Settings

examples and results.We also compared different iterations in and fluctuation is very unstable. The reason might be that too
terms of loss score. We do the normal training and fine-tuning little training data, it means training data are not rich. Therefore,
both using batchsize = 64 to see the result of image we can say that a small amount of data sets results in poor
preprocessing. performance.

Figure 7(a)-(c) shows the comparison of different iterations V. CASE STUDY


with 4,000 images including tables and chairs in terms of loss In this section, we use our model for object detection and
score. It can be seen from Figure 7 (a) that 1,000 iterations is distance experiment. To prove the adaptability of the model, we
unstable and loss is high in end of loss score. However, experimented with chair detection and distance measurement
according to Figure 9 (c), The loss decreases stably in 50,000 indoors. Figure 8 shows we simulate the situation of blind
iterations. Therefore, we can observe that training through people using camera indoors. We stand at a distance of one
more multiple iterations to achieve the good result. meter in front of the chair, and using the camera to detect the
chair and its distance.
Figure 7(d)-(f) shows the training data for 400 chairs in
terms of loss score. It can be seen that loss score does not drop

Figure 9. Detection of the chair by our model. Left screen output is left
camera shot, and right screen output is right camera shot.

The proposed method displays good results in localizing


the chairs. Figure 9 shows detection examples and results. It
Figure 8. Test a chair about one meter away.
can be seen that both of the screen have good performance, and

1199
the proposed YOLO offers a speedup. It sometimes generates 2005. IEEE Computer Society Conference, 2005
[5] L. Dunai, G. P. Fajarnes, V. S. Praderas, and B. D. Garcia, “Electronic
bounding boxes of different sizes because of different angles
Travel Aid systems for visually impaired people.” in Proceedings of
from left and right camera shot, but a slight gap of the false DRT4ALL 2011 Conference, IV Congreso Internacional de Diseño,
prediction of the bounding boxes is allowed to some extent. Redes de Investigación y Tecnología para Todos, Madrid, Spain, 2011.
[6] L. Dunai, G. P. Fajarnes, V. S. Praderas, B. D. Garcia, and I. Lengua,
“RealTime assistance prototype – a new navigation aid for blind
people.” In Proceedings of IEEE Industrial Electronics Society
Conference (IECON 2010), Phoenix, Arizona. 1173–1178, 2010.
[7] L. Dunai, G. P. Fajarnes, V. S. Praderas, B. D. Garcia, and I. Lengua,
“EYE2021-Acoustical cognitive system for navigation.” AEGIS 2nd
International Conference, Brussels, 2011.
[8] S. Emami, and V. P. Suciu, “Facial Recognition using OpenCV,”
Journal of Mobile,Embedded and Distributed Systems, 4(1), 38-43,
2012
[9] R. Girshick, “Fast r-cnn,” IEEE international conference on computer
vision, 2015
[10] R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature
hierarchies for accurate object detection and semantic segmentation,”
IEEE conference on computer vision and pattern recognition, 2014.
[11] I. J. Goodfellow, Y. Bulatov, J. Ibarz, S. Arnoud, and V. Shet, “Multi-
Figure 10. Output of the chair’s distance digit Number Recognition from Street View Imagery using Deep
Convolutional Neural Networks,” arXiv:1312.6082, 2014
Figure 10 shows the output of the chair’s distance on [12] K. He, X. Zhang, S. Ren, and Ji Sun, “Deep residual learning for
command, it determines the distance based on the different image recognition,” arXiv:1512.03385, 2015.
[13] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet
angles of the two screens. According to the bounding box float Classification with Deep Convolutional Neural Networks,” In:
will output the distance, it can be seen from command that the Advances in neural information processing systems, pp. 1097-1105,
distance is constantly output. There may be some slight 2012.
distance errors, but a slight gap of the distance will not affect [14] R. Kuc, “Binaural Sonar Electronic Travel Aid Provides Vibrotactile
Cues for Landmark, Reflector Motion and Surface Texture
users. We can see that the output of the chair’s distance is about Classification.” IEEE Transactions on Biomedical Engineering, 49,
1 meter, and the output is the same as the actual distance. 1173–1180, 2002.
[15] J. M. Loomis, R. G. Golledge, and R. L. Klatzky, “GPS-Based
VI. CONCLUSIONS Navigation Systems for the Visually Impaired.” Fundamentals of
wearable computers and augmented reality, W. Barfield and T. Caudell,
In this paper, we propose a novel Deep-learning-based Eds., 429–446, Mahwah, NJ: Lawrence Erlbaum Associates, 2001.
Sensory Navigation Framework (DLCF) to build a Sensory [16] J. Loomis and R. Golledge, “Personal Guidance System using GPS,
GIS, and VR technologies.” In Proceedings, CSUN Conference on
Navigation Device for blind guidance. We also tackled the
Virtual Reality and Person with Disabilities, San Francisco, 2003.
problem of object detection, which is a crucial prerequisite for [17] D. G. Lowe, “Distinctive Image Features from Scale-Invariant
blind guidance tool. The core task of model learning is Keypoints,” IJCV, 60 (2), pp. 91-110, 2004
conveniently transformed to the problem of object detection [18] R. W. Mann, “Mobility aids for the blind – An argument for a
model learning. We develop the Residual-CNN architecture to computer-based, man-device environment, interactive, simulation
detect object shown in a snapshot catch by a webcam. Through system.” In Proceedings of Conference on Evaluation of Mobility Aids
for the Blind, Washington, DC: Com. On Interplay of Engineering
a series of experiments using a dataset crawled by staffs, we With Biology and Medicine, National Academy of Engineering, 101–
have validated the Residual-CNN for building an object 116, 1970.
detection model and shown that it has excellent performance [19] D. L. Morrissette, G. L. Goddrich, and J. J. Henesey, “A follow-up-
under various conditions. In future work, we plan to design study of the Mowat sensors applications, frequency of use and
more sophisticated methods and compare it with state-of-the-art maintenance reliability.” Journal of Visual Impairment and Blindness,
75, 244–247, 1981.
methods. [20] L. Russell, “Travel Path Sounder.” In Proceedings of Rotterdam
Mobility Research Conference, New York: American Foundation for
REFERENCES the Blind, 1965.
This research was partially supported by Ministry of [21] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look
Science and Technology, Taiwan, R.O.C. under grant no. Once: Unified, Real-Time Object Detection,” arXiv:1506.02640, 2015.
[22] J. Redmon, and A. Farhadi, “YOLO9000: Better, Faster, Stronger,”
MOST 106-2218-E-126-001 and MOST 106-2221-E-035-094. Computer Vision and Pattern Recognition (CVPR), 2017.
[23] J. Redmon and A. Farhadi, “YOLOv3: An Incremental Improvement,”
REFERENCES arXiv:1804.02767v1, 2018.
[1] H. Bay, T. Tuytelaars, and L. V. Gool, “SURF: Speeded Up Robust [24] K. Simonyan, and A. Zisserman. “Very deep convolutional networks
Features,” Computer Vision and Image Understanding (CVIU), for large-scale image recognition.” In ICLR, 2015.
110(3):346-359, 2008 [25] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D.
[2] M. Calonder, V. Lepetit, C. Strecha, and P. Fua. “BRIEF: Binary Erhan, V. Vanhoucke, and A. Rabinovich. “Going deeper with
Robust Independent Elementary Features,” In Proceedings of the convolutions.” In CVPR, 2015.
European Conference on Computer Vision (ECCV), 2010 [26] G. Xie, and W. Lu, “Image Edge Detection Based on OpenCV.”
[3] I. Culjak, “A brief introduction to OpenCV,” Proceedings of the International Journal of Electronics and Electrical Engineering 1 (2):
35th International MIPRO Convention, IEEE (2013), pp. 2142-2147, 104-6, 2013
2012 [27] https://github.com/tzutalin/labelImg
[4] N. Dalal, and B. Triggs, “Histograms of Oriented Gradients for Human
Detection,” Computer Vision and Pattern Recognition, 2005. CVPR

1200

You might also like