Fractal Image Compression: Self-Similarity Via Locality Sensitive Hashing
Fractal Image Compression: Self-Similarity Via Locality Sensitive Hashing
Fractal Image Compression: Self-Similarity Via Locality Sensitive Hashing
Stanford University
Mitchell Douglass
Abstract
In this paper I describe a Haskell implementation of fractal image compression, a lossy image compression
technique that leverages self-similarity within an image to produce an encoding. Known for its lengthy
encoding time, fractal image encoding implementations require the most cleverness in identifying highly
self-similar image regions. In this paper, I describe a simple locality sensitive hash (LSH) used by my
implementation to reduce the search time for self-similarity. Though the project is under continued
development, I provide details on some preliminary results as well as a discussion of future development
and improvements.
1
Mitchell Douglass ([email protected]) March 18, 2016
eratively to a blank unit square as initial input, transformation which simply applies a con-
the result is a pattern commonly known as a stant shading to to a range area. In general,
fractal. range and domain areas need not be rectangu-
lar and may undergo transformations beyond
basic reflections. However, in this paper this
is the case, and indeed this simplified model
is quite powerful when applied to encoding
general-purpose images.
As mentioned earlier, an encoding of this
type is meaningful in the sense that it encodes
Figure 2: Iterations 5 (left), and 10 (right) a fixed point under iterated application. To
ensure that such a fixed point exists, the im-
To encode an image such as the one pro- age transformation must be a contraction in
duced by 10 iterations of our transformation, it the space of all images, meaning the transfor-
is clear that recording the value of each pixel mation, applied to two distinct input images,
in the result is unnecessary, and in fact even must produce output images which are more
traditional compression techniques seem like similar (under the l2 norm) than these inputs,
overkill. Instead, one need only store a repre- by a constant factor less than 1. Sparing the
sentation of the simple generating transforma- details, a theorem of Real Analysis called the
tion. Notice that it is not necessary to store the Contraction Mapping Theorem states that any
number of required iterations, since the fractal contraction mapping produces a unique fixed-
is the intrinsic fixed point of this transforma- point under iteration, independent of choice of
tion; a decoding algorithm need only iterate initial point (a.k.a. image). In reference to our
the until no change is detected at the desired transformation model, our transformations are
resolution. contractions when (1) domain areas are larger,
Our first transformation example general- in both dimensions, than their corresponding
izes to the following model: range areas, and (2) the contrast is effectively
reduced under domain-range transformations.
Shade-range transformations are also contrac-
tions. This gives a simple criteria for valid
image transformations.
The existence and uniqueness provided by
the Contraction Mapping Theorem guarantee
us that any image transformation which is a
Figure 3: The general form of an image transformation contraction is a valid encoding of its fixed point.
Yet the question remains: can a general image
In this model, “image transformations” are be well-represented by the fixed point of an
represented by a collection of “range transfor- image transformation of this form, and if so,
mations”. Each range transformation is asso- how can these transformations be constructed?
ciated with a particular “range area” of the The answer to the first part is straight forward:
output; these range areas must partition the yes. If we want an image transformation which
output image. A “domain-range transforma- encodes image A as a fixed point, we must
tion” is a range transformation that covers find a transformation which alters A as little
a range area by transforming an associated as possible; that is we must find domain areas
“domain area”, applying scaling, symmetrical which, under transformation, are almost iden-
transformation (there are 8 on the rectangle), tical to their corresponding range areas. For
as well as basic brightness and contrast alter- those ranges that are not-well approximated by
ations. A “shade-range transformation” is a larger domains, or that are best approximated
2
Mitchell Douglass ([email protected]) March 18, 2016
3
Mitchell Douglass ([email protected]) March 18, 2016
becomes very expensive very quickly. On large with the proj( X ) function. The features of each
scales, a brute force approach is impractical. component quadrant are captured by adding
Implementations of fractal image compres- a scaled-down version of their lsh value. Due
sion must employ some method of minimiz- to properties of normalization, the proj func-
ing the search for acceptable transformations. tion produces vectors of magnitude 2, where
Some methods involve classifying ranges by the important point is that proj( X ) has con-
a short list of measurable properties, such as stant magnitude. The factor of 1/8 ensures
luminance gain vs. average luminance, maxi- that the size of the vector produced by the
mum pixel variation, and limiting the search sum term of lsh is no greater than 1 = 1/2 · 2.
for good domain areas to only those in the Therefore, lsh obeys a sort of limiting prop-
same or similar category. Other solutions in- erty: if X is an image region of size 2n x2n
volve much more complicated feature detection and v is lsh vector which results from scaling
techniques that are beyond the scope of this down X to a region of size 2m x2m by aver-
paper. I have chosen to implement a simple aging pixels in blocks of size 2n−m x2n−m , then
locality-sensitive hash algorithm to map image ||lhs( X ) − v|| < 2(1−m) − 2(1−n) , which is quite
regions into low-dimensional space, and I use small for values of n, m larger than 3 or 4.
proximity of regions hashes in low dimension This indicates that regions with identical fea-
to identify likely candidate domain areas. tures (i.e. when a domain is itself scaled down)
A locality-sensitive hash (LSH) is a func- have lhs values that are very close, especially
tion that maps high-dimensional data (in our at reasonably-high dimension. Some other nice
case regions of an image) into low-dimensional properties of this LSH are
space (in this case R4 ), such that “similar” high-
dimensional input data are mapped to “close” • Images are reduced to the same low-
low-dimensional points. dimension, R4 , appropriate for search in
Here is how the LSH works in my im- kd-trees.
plementation: Let X be an image region, let
• Images with low l2 distance are close un-
q1 , q2 , q3 , and q4 be functions from rectangular
der this LSH. However, due to the arbi-
image regions to rectangular image regions cor-
trary combination of quadrant lhs values,
responding to the four standard, equally-sized
images may have close lsh values when
quadrants of their inputs (e.g. q1 ( X ) represents
they are not similar.
the top-right quadrant of X, q2 ( X ) the top-left,
etc). Let avg be a function giving the average • Due to properties of the normalization of
luminance of an image region. Our LHS, call proj, the lsh value of a region is invariant
it lsh, is defined inductively as follows: under brightness and contrast changes.
1 This property is useful since good trans-
8 1≤∑
lsh( X ) = proj( X ) + lsh(qi ( X )) formations requiring changes to bright-
i ≤4 ness and contrast can be identified by a
where proj( X ) is a vector in R4 satisfying single hash.
4
Mitchell Douglass ([email protected]) March 18, 2016
5
Mitchell Douglass ([email protected]) March 18, 2016
data structure that allows image regions to be of affairs (although the data structure is not
embedded into R4 . mutable, it is treated as a sort of accumulator).
6
Mitchell Douglass ([email protected]) March 18, 2016
The kd-tree is used for storing ap- corporating a sufficiently broad l2 search of
proximations of regions that are larger nearest neighbours in the kd-tree, I was able
than any previously considered regions. to achieve an algorithm which produces inter-
The AreaInfoTblLst corresponds to the esting results for both simple geometry and
ImageAreaTableList data type, and stores the general images. In the case of the circle, I
computed region information. was able to achieve 42:1 image compression.
Unfortunately, the decompression of the circle
V. Results produces artefacts in the form of blurred edges
and a gray hue which is not present in the in-
While this implementation is still under devel- put image. In the case of lena, I was able to
opment, there have already been promising achieve only a 3:2 compression ratio. However,
results to suggest that the technique worth fur- the quality of the decompressed lena is quite
ther exploration and improvement. In what good, with few distracting artefacts.
follows, claimed compression ratios are based
on conservative pen-and-paper calculation, tak-
ing into consideration the number of range-
transformations required to encode an image
with respect to the pixel dimension of the im-
age.
The first success of this implementation is
to compress images of simple geometry, for
instance the circle. Early attempts at the circle
were abysmal, while quite entertaining.
Figure 10: From top-left to bottom-left clockwise: lena
original, lena full decompressed at 3:2, cir-
cle first iteration of decompression, circle full
decompression at 42:1
Figure 9: The circle on the left, an attempt to decode lena • Use of criterion and other profiling /
on the right benchmarking libraries to improve this
baseline resource use, and performance,
After further streamlining, specifically in- of the existing algorithm.
7
Mitchell Douglass ([email protected]) March 18, 2016
• Begin writing a robust test suite for • Currently blindly partitions regions only
the library to track and maintain well- into 4 equal quadrants: modify partition-
beardedness of the various components ing scheme to split based on feature de-
of the algorithm. tection.
• Currently only considers blocks of size [3] Saupe, D. and Hamzaoui, R. and Harten-
2k , which improves encoding efficiency, stein, H. Fractal Image Compression - An In-
but is a lost opportunity in terms of com- troductory Overview, Albert-Ludwigs Uni-
pression. Incorporate blocks of arbitrary versity at Freiburg, TFP 2005 pp: 383-398,
size and non uniform aspect ratio. 1997.