A Representation Invariant To Any Translations, Rotations and Illuminations
A Representation Invariant To Any Translations, Rotations and Illuminations
Abstract
This paper presents a representation invariant to arbitrary translations, ro-
tations and illuminations. The input is a gray image and the output consists
of a set of pair of numbers. These numbers are obtained by the convolution
of the image with two different Mexican Hat wavelets. The representation
can be implemented as a neural network with two layers and can be used to
recognize 2D images.
Keywords: Invariance, Translation, Rotation, Illumination
1. Introduction
This paper is an attempt to understand how brain extracts invariant fea-
tures from retina images. The brain is able to recognize an image regardless
of its position and illumination. It makes a proper abstraction of the infor-
mation from the input image. Artificial neural networks are often used as
models of the brain. Among the major problems of connectionist networks
is the situation when two or more images are presented simultaneously. The
representation of one image destroys the representation of the other image.
In order this to not happen the representation must have a property named
quasi-linearity. We will define this property later in this paper.
In (Muresan, 1997) is presented a geometric transformation invariant to
arbitrary translations, rotations and scales changes. The input image is a
geometrical configuration of any type of curves. For each point of the input
image the transformation finds one or more pairs of angles. In (Reddy and
Chatterji, 1996) is presented an extension of the phase correlation technique
for automatic image registration, which is characterized by its insensitivity
2
For each pixel (x, y) in the input image Ω will make a corresponding point
(conv1 , conv2 ) in the representation space. Now we can define convk as the
convolution between the image I(x, y) and a function Mk (x, y) .
ZZ
convk (u, v) = I(u − x, v − y)Mk (x, y)dxdy with k=1,2 (1)
The Mexican Hat decay very quiqly to zero. Because of this we work with
the range x, y ∈ [−7, 7] for M(x, y) . The integral of a Mexican Hat is zero.
In order to perform the convolution for discrete image we need to make
the discretization of the Mexican Hat too. The M1 and M2 are different in
the number of discretization points we use. In this way conv1 and conv2 will
represent two different local features of the input image.
3
image R(I). And we can change the constant illumination I of the image
obtaining another image I(I). How Ω behaves to these changes?
Ω is invariant to translations,rotations and illuminations changes. So we
have the following three proprieties:
4
2.2. An Artificial Neural Network Based on Representation Ω
The representation Ω can be implemented as a neural network with two
layers. The input of Ω can be seen as the retina. One or more images can
appear on the retina. A cell in the retina corresponds to a pixel in the
image. For each cell (x,y) in the retina there are two firing neurons N1 (x, y)
and N2 (x, y) in the first layer corresponding to the values of local features
conv1 (x, y) and conv2 (x, y) respectively. In this way we follow the place
coding paradigm. In the second layer we put neurons which are firing only
when two neurons from the first layer are firing simultaneously. So there is a
neuron M (conv1 , conv2 ) which will fire only if N1 (x, y) and N2 (x, y) are firing
simultaneously. M (conv1 , conv2 ) can take input from any neurons (x,y) of
the first layer that correspond to the (conv1 , conv2 ) values. In this way for a
cell in the retina there are two neurons firing in the first layer and one neuron
firing in the second layer.
For an image presented on the retina there will be a specific firing pattern
in the first layer and another specific firing pattern in the second layer.The
position of the firing pattern in the second layer will not change if we trans-
late, rotate and/or change the illumination of the input image.
(|conv1A −conv1B | < DELT A1 and |conv2A −conv2B | < DELT A2 ) = true (8)
where DELT A1 and DELT A2 are small values. If the condition from
eq.(8) is true for a given two points (xA , yA ) and (xB , yB ) of the images
IA respectively IB we say that there is a match between those points. If
we compare all possible pairs of points (xA , yA ) and (xB , yB ) from the two
images we obtain a number of matches Nmatches (IA , IB ). In this way an
invariant visual pattern recognition system can be designed. If we have an
input image Iinput and a set of images in the memory I1 , I2 , ...In we calculate
Nmatches (Iinput , Ik ) for k=1...n and see if it is bigger that a treshold value. If
it is, we can say that the system recognized the input image.
5
3. Computer Simulation
With C++ implementation of this algorithm we obtained the number of
matches Nmatches (Iinput , Ik ) for different input images Iinput and images from
memory Ik that show the invariance to translations, rotations, illuminations
and the quay-linearity property of the representation Ω.
As first input image we choose a face (Figure 2).
6
3.4. The Quasi-Linearity of Ω
In this experiment we change the input image. We took as input image
a scene formed by two images (Figure 6 input). In the memory we put
the images (Figure 6 I1 ) and (Figure 6 I2 ). The rest of the images are the
same as in previous experiments. We obtained the number of matches for
each memory image (Figure 6 bottom). I1 has 5936 matches, I2 has 31778
matches and a maximum number of matches from the rest of images is at
I5 with 578 matches. The number of matches depends on the size of the
compared images.
Acknowledgements
I would like to thank to Dan Protopopescu and Florin Radu for helpful
suggestions during the preparation of this article.
References
Bronstein, A., Bronstein, M., Kimmel, R., 2007. Expression-invariant repre-
sentations of faces. Image Processing, IEEE Transactions on 16 (1), 188–
197.
Han, J., Ma, K., 2007. Rotation-invariant and scale-invariant gabor features
for texture image retrieval. Image and Vision Computing 25 (9), 1474–
1481.
7
Kazhdan, M., Funkhouser, T., Rusinkiewicz, S., 2003. Rotation invariant
spherical harmonic representation of 3d shape descriptors. In: Proceed-
ings of the 2003 Eurographics/ACM SIGGRAPH symposium on Geometry
processing. Eurographics Association, pp. 156–164.
Li, S., Chu, R., Liao, S., Zhang, L., 2007. Illumination invariant face recogni-
tion using near-infrared images. Pattern Analysis and Machine Intelligence,
IEEE Transactions on 29 (4), 627–639.
Quiroga, R., Reddy, L., Kreiman, G., Koch, C., Fried, I., 2005. Invari-
ant visual representation by single neurons in the human brain. Nature
435 (7045), 1102–1107.
Ranzato, M., Huang, F., Boureau, Y., Lecun, Y., 2007. Unsupervised learn-
ing of invariant feature hierarchies with applications to object recognition.
In: Computer Vision and Pattern Recognition, 2007. CVPR’07. IEEE Con-
ference on. IEEE, pp. 1–8.
Rao, C., Yilmaz, A., Shah, M., 2002. View-invariant representation and
recognition of actions. International Journal of Computer Vision 50 (2),
203–226.
Reddy, B., Chatterji, B., 1996. An fft-based technique for translation, ro-
tation, and scale-invariant image registration. Image Processing, IEEE
Transactions on 5 (8), 1266–1271.
Tao, Y., Grosky, W., 2001. Spatial color indexing using rotation, translation,
and scale invariant anglograms. Multimedia Tools and Applications 15 (3),
247–268.
Torres-Mendez, L., Ruiz-Suarez, J., Sucar, L., Gomez, G., 2000. Translation,
rotation, and scale-invariant object recognition. Systems, Man, and Cyber-
netics, Part C: Applications and Reviews, IEEE Transactions on 30 (1),
125–130.
8
(I1 )
9
(I1 )
10
(I1 )
11
(Input)
(I1 ) (I2 )
12