Fractal Image Compression: Self-Similarity Via Locality Sensitive Hashing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Mitchell Douglass (mrdoug95@stanford.

edu) March 18, 2016

Fractal Image Compression


Self-Similarity via Locality Sensitive Hashing

Stanford University
Mitchell Douglass

Abstract

In this paper I describe a Haskell implementation of fractal image compression, a lossy image compression
technique that leverages self-similarity within an image to produce an encoding. Known for its lengthy
encoding time, fractal image encoding implementations require the most cleverness in identifying highly
self-similar image regions. In this paper, I describe a simple locality sensitive hash (LSH) used by my
implementation to reduce the search time for self-similarity. Though the project is under continued
development, I provide details on some preliminary results as well as a discussion of future development
and improvements.

I. Introduction fractal encoding, giving a general framework


for any fractal compression implementation.
Fractal image compression is a general algo- Next, I will describe a modification to the naive
rithmic technique for lossy image compression. algorithm that I used in my implementation to
It was pioneered in the late 1980’s and early reduce the search burden of fractal encoding.
1990’s, first by Michael Barnsley, founder of the Thirdly, I will discuss some of the details of
fractal compression company Iterated Systems, my Haskell implementation. I’ll then wrap up
but has since been studied by many researchers with a summary of recent results, as well as a
who have introduced various modifications discussion of future development and improve-
and improvements to the base algorithm. ments.
Fractal encodings depend upon the discov-
ery of self-similarities within an image, regions II. The Basic Fractal Method
within an image which are highly similar with
respect to some metric approximating visual To gain a base understanding of the compres-
distinctness; usually the l2 norm is used. As sion technique, consider the following transfor-
such, the main obstacle to the effective fractal mation of the unit square:
compression is the computationally-intensive
search for these self-similarities, reflected in
the frustratingly slow encoding times of fractal
encoding algorithms. The following is a brief
overview of the trade-offs of fractal compres-
sion.
Pros:
• fast decoding time
Figure 1: A basic image transformation
• descent compression ratio
• resolution independent
The transformation is composed of 4 ac-
Cons:
tions: quadrants 1, 3, and 4 of the output are
• Slow encoding time
• susceptible to pathological cases scaled down versions of the entire input do-
main, while quadrant 2 is the result of a simple
In what follows, I will touch on three main shading pattern, applied independent of the
topics. First, I will discuss the basic method of input domain. Applying this transformation it-

1
Mitchell Douglass ([email protected]) March 18, 2016

eratively to a blank unit square as initial input, transformation which simply applies a con-
the result is a pattern commonly known as a stant shading to to a range area. In general,
fractal. range and domain areas need not be rectangu-
lar and may undergo transformations beyond
basic reflections. However, in this paper this
is the case, and indeed this simplified model
is quite powerful when applied to encoding
general-purpose images.
As mentioned earlier, an encoding of this
type is meaningful in the sense that it encodes
Figure 2: Iterations 5 (left), and 10 (right) a fixed point under iterated application. To
ensure that such a fixed point exists, the im-
To encode an image such as the one pro- age transformation must be a contraction in
duced by 10 iterations of our transformation, it the space of all images, meaning the transfor-
is clear that recording the value of each pixel mation, applied to two distinct input images,
in the result is unnecessary, and in fact even must produce output images which are more
traditional compression techniques seem like similar (under the l2 norm) than these inputs,
overkill. Instead, one need only store a repre- by a constant factor less than 1. Sparing the
sentation of the simple generating transforma- details, a theorem of Real Analysis called the
tion. Notice that it is not necessary to store the Contraction Mapping Theorem states that any
number of required iterations, since the fractal contraction mapping produces a unique fixed-
is the intrinsic fixed point of this transforma- point under iteration, independent of choice of
tion; a decoding algorithm need only iterate initial point (a.k.a. image). In reference to our
the until no change is detected at the desired transformation model, our transformations are
resolution. contractions when (1) domain areas are larger,
Our first transformation example general- in both dimensions, than their corresponding
izes to the following model: range areas, and (2) the contrast is effectively
reduced under domain-range transformations.
Shade-range transformations are also contrac-
tions. This gives a simple criteria for valid
image transformations.
The existence and uniqueness provided by
the Contraction Mapping Theorem guarantee
us that any image transformation which is a
Figure 3: The general form of an image transformation contraction is a valid encoding of its fixed point.
Yet the question remains: can a general image
In this model, “image transformations” are be well-represented by the fixed point of an
represented by a collection of “range transfor- image transformation of this form, and if so,
mations”. Each range transformation is asso- how can these transformations be constructed?
ciated with a particular “range area” of the The answer to the first part is straight forward:
output; these range areas must partition the yes. If we want an image transformation which
output image. A “domain-range transforma- encodes image A as a fixed point, we must
tion” is a range transformation that covers find a transformation which alters A as little
a range area by transforming an associated as possible; that is we must find domain areas
“domain area”, applying scaling, symmetrical which, under transformation, are almost iden-
transformation (there are 8 on the rectangle), tical to their corresponding range areas. For
as well as basic brightness and contrast alter- those ranges that are not-well approximated by
ations. A “shade-range transformation” is a larger domains, or that are best approximated

2
Mitchell Douglass ([email protected]) March 18, 2016

by a constant shade, we may apply shade-range the best choice.


transformations. As it turns out, general im-
-- candidate regions for a domain-
ages contain an abundance of of these pairs
-- range transformation.
(e.g. Figure 4). The main challenge of fractal
larger_regions = []
image compression is finding them.
-- the result of the computation
image_transform = []

foreach regionlist in regionlists:


-- transformations that are good
-- enough to be accepted in into
-- the image transform this level
good_lvl_trans = []

-- collect possible transformations


Figure 4: Self similarity in a coffee table foreach region in regionlist:
region.best = find_best_trans(
Fractal image encoding involves finding a region, larger_regions)
sufficient number of similar domain area-range if is_good_fit(region.best):
area pairs such that the range areas partition good_lvl_trans.add(region.best)
the image. Compression is achieved for each
matched range area covering more than just a -- find maximal set of good
few rows and columns, since storing a domain- -- transforms whose ranges do not
range or shade-range transformation requires -- intersect
the equivalent disk space of a small number of new_trans = best_nonintersecting(
pixels. The decoding process involves iterating good_lvl_trans, transforms,
the stored image transformation to an arbitrary image_transform)
base image, stopping when no changes occur image_transforms.add(new_trans)
at the desired resolution. An important ques-
tion, impacting decoding time, is whether a larger_regions.add_all(regions)
fixed point will be achieved quickly. The an-
swer is yes, provided that all domain-range
Figure 5: A pseudo-code for a fractal encoding algorithm
transformations uniformly involve a scaling of
a factor k < 1. If k ≤ 1/2, as is the case in my
implementation, then a fixed point is achieved III. Locality Sensitive Hashing
within log2 d iterations, where d is the largest
dimension of the output image. As mentioned earlier, the most computationally
Following is a pseudo-code for a fractal intensive part of the fractal encoding algorithm
encoding algorithm. Here, regionlists is a is the find_best_trans function in the pseudo-
list of lists of regions, where each element re- code of figure 5. This is due to the fact that
gion list contains image regions of the same most of the self-similarity within images mani-
size, and where the size of regions in each fests itself on a small scale; that is, the median
list is strictly decreasing in the high-level list. appropriate range area is very small. As such,
The function find_best_trans must iterate the algorithm outlined above must search a
through all larger image regions, finding the massive set of larger_regions for each range
domain which is a best l2 approximation un- area, and a comparison with any particular
der some domain-range transformation, or else larger area involves computing an l2 distance
produce a shade-range transformation if this is on high dimensional data. This computation

3
Mitchell Douglass ([email protected]) March 18, 2016

becomes very expensive very quickly. On large with the proj( X ) function. The features of each
scales, a brute force approach is impractical. component quadrant are captured by adding
Implementations of fractal image compres- a scaled-down version of their lsh value. Due
sion must employ some method of minimiz- to properties of normalization, the proj func-
ing the search for acceptable transformations. tion produces vectors of magnitude 2, where
Some methods involve classifying ranges by the important point is that proj( X ) has con-
a short list of measurable properties, such as stant magnitude. The factor of 1/8 ensures
luminance gain vs. average luminance, maxi- that the size of the vector produced by the
mum pixel variation, and limiting the search sum term of lsh is no greater than 1 = 1/2 · 2.
for good domain areas to only those in the Therefore, lsh obeys a sort of limiting prop-
same or similar category. Other solutions in- erty: if X is an image region of size 2n x2n
volve much more complicated feature detection and v is lsh vector which results from scaling
techniques that are beyond the scope of this down X to a region of size 2m x2m by aver-
paper. I have chosen to implement a simple aging pixels in blocks of size 2n−m x2n−m , then
locality-sensitive hash algorithm to map image ||lhs( X ) − v|| < 2(1−m) − 2(1−n) , which is quite
regions into low-dimensional space, and I use small for values of n, m larger than 3 or 4.
proximity of regions hashes in low dimension This indicates that regions with identical fea-
to identify likely candidate domain areas. tures (i.e. when a domain is itself scaled down)
A locality-sensitive hash (LSH) is a func- have lhs values that are very close, especially
tion that maps high-dimensional data (in our at reasonably-high dimension. Some other nice
case regions of an image) into low-dimensional properties of this LSH are
space (in this case R4 ), such that “similar” high-
dimensional input data are mapped to “close” • Images are reduced to the same low-
low-dimensional points. dimension, R4 , appropriate for search in
Here is how the LSH works in my im- kd-trees.
plementation: Let X be an image region, let
• Images with low l2 distance are close un-
q1 , q2 , q3 , and q4 be functions from rectangular
der this LSH. However, due to the arbi-
image regions to rectangular image regions cor-
trary combination of quadrant lhs values,
responding to the four standard, equally-sized
images may have close lsh values when
quadrants of their inputs (e.g. q1 ( X ) represents
they are not similar.
the top-right quadrant of X, q2 ( X ) the top-left,
etc). Let avg be a function giving the average • Due to properties of the normalization of
luminance of an image region. Our LHS, call proj, the lsh value of a region is invariant
it lsh, is defined inductively as follows: under brightness and contrast changes.
1 This property is useful since good trans-
8 1≤∑
lsh( X ) = proj( X ) + lsh(qi ( X )) formations requiring changes to bright-
i ≤4 ness and contrast can be identified by a
where proj( X ) is a vector in R4 satisfying single hash.

avg(qi ( X )) − µ • Due to properties of symmetries of the


( proj( X ))i = rectangle, the lsh vectors of regions
σ
which vary only by a symmetry of the
where µ and σ are the mean rectangle are themselves only permuta-
and population standard deviation of tions of each other. Thus, only a single
avg(q1 ( X )), . . . , avg(q4 ( X )). Image regions hash is required to identify candidate
that are single pixels have the zero vector of symmetric transformations.
R4 as lsh value.
As you can see, this LSH attempts to cap- • The lsh vectors can be computed dynam-
ture the high-level features of an image region ically; i.e. by a “table-filling” technique.

4
Mitchell Douglass ([email protected]) March 18, 2016

Computing averages of smaller regions • unm-hip package: image processing li-


before larger regions, the lsh value of all brary, used for intermediate image stor-
regions of size 2k may be computed in age, image utility functions, as well as
time linear in the number of pixels in the image IO.
image.
• kdt package: kd-tree library for storing
image region approximations, and per-
forming nearest neighbor searches.
• pqueue package: priority queues to
choose the best-fit non-intersecting trans-
formations for each region size.
• fingertree package: Implements interval
storage and search, used for quickly iden-
Figure 6: The Lena image, showing proj(lena), normal- tifying overlapping range regions.
ized values in parentheses
We have already described the general type
of an image transformation, which I represent
The modification to the general fractal en- in the following way in Haskell:
coding algorithm of Figure 5 is straightforward.
When a range area is considered for a trans- type ImageTransform
formation, only candidate domain areas with = [ RangeTransform ]
hash values sufficiently close to the hash value
of the range need be considered, and they can data RangeTransform =
be considered in order of nearest hash value. DomainRangeTransform {
As mentioned in the previous list, a domain domainArea :: ImageArea ,
area with a close hash vector need not neces- rangeArea :: ImageArea ,
sarily be a good approximation of the query symmetry :: D8 ,
range area, so an exhaustive search of near invertLum :: Bool ,
neighbors is still necessary to verify the quality contrastFac :: Double ,
of a transformation. brightFac :: Double
In lieu of a list of larger domains, the algo- } |
rithm employs a kd-tree to store the hash val- ShadeRangeTransform {
ues of each image area as it is considered. Just rangeArea :: ImageArea ,
as in the pseudo-code, the kd-tree in the mod- shade :: Double
ified algorithm contains domain area hashes } deriving ( Eq , Show )
of image areas that are strictly larger than the
range area under consideration, so near points data ImageArea = ImageArea {
truly represent valid candidate domain areas. areaYOffset :: Rational ,
areaXOffset :: Rational ,
areaHeight :: Rational ,
IV. Haskell Implementation areaWidth :: Rational
} deriving ( Eq , Show )
I now give a tour of this basic infrastructure, as
implemented in Haskell. Here is a list of some
-- 8 symmetries of the rect .
of the technologies I used in my project:
data D8 = ID | R1 | R2 | R3 |
• repa package: high-performance arrays S | RS | SR | R2S
for unboxed numerical data, used for deriving ( Eq , Show )
storing and transforming image pixel lu- The following Haskell data data types rep-
minance data. resents the dynamically built, low-dimensional

5
Mitchell Douglass ([email protected]) March 18, 2016

data structure that allows image regions to be of affairs (although the data structure is not
embedded into R4 . mutable, it is treated as a sort of accumulator).

type LowDimPnt = V . Vector Double type LvlPQueue =


PQ . MinPQueue Double
data AreaInfo = RangeTransform
AreaInfo {
-- a 4 - dim . approximation data ImageEncodingState =
approx :: LowDimPnt , ImageEncodingState {
proj :: LowDimPnt , imgWidth :: Int ,
avgLum :: Double , imgHeight :: Int ,
maxLum :: Double , imgArray
minLum :: Double , :: ImageArray ,
pxArea :: PixelArea areaInfoTblLst
} | :: ImageAreaTableList ,
NoArea depth :: Int ,
rangeTransforms
type ImageArray = :: [ RangeTransform ] ,
R . Array U DIM2 Double xIntrMap
:: IntrMap . IntervalMap
type ImageAreaTable = Int ( Int , Int ) ,
Array ( Int , Int ) AreaInfo yIntrMap
:: IntrMap . IntervalMap
type ImageAreaTableList = Int ( Int , Int ) ,
[ ImageAreaTable ] kdTree
:: KdMap Double
Figure 7: Pseudo-code for specialized data structures.
LowDimPnt
AreaInfo ,
pixelsCovered :: Int
As seen here, image pixel data is repre-
}
sented by efficient arrays from the repa pack-
age, while the low-dimensional approxima-
tions are represented by vectors from the stan- To start, the depth field determines the
dard vector package. number of “levels" of computation that have
The ImageAreaTable and ImageAreaTaleList been performed, where each level represents
is a data structure utilized to store the evaluating ranges of increasingly smaller size.
AreaInfo structures, which hold meta-data The imgHeight and imgWidth fields provide
about image regions. As of the current state of a context for the size image being encoded.
the implementation, only blocks of size 2k are PixelsCovered tracks the number of pixels
stored in these tables. However, all possible which are contained in a transformation in
blocks of size 2k are dynamically computed in rangeTransforms, allowing the algorithm to
these tables, not solely those which are “well” short circuit when full coverage has been
aligned. The result is that the the implementa- achieved.
tion currently considers a very large number, The purpose of the IntervalMaps is to store
on the hundreds of thousands, of image re- a ledger of which range regions have already
gions as candidate domain regions for small been covered by transformations. Using these
range regions. interval maps for interval search, determining
The following data structure is utilized dur- a region that is already covered becomes an
ing the encoding process to track the “state" efficient process.

6
Mitchell Douglass ([email protected]) March 18, 2016

The kd-tree is used for storing ap- corporating a sufficiently broad l2 search of
proximations of regions that are larger nearest neighbours in the kd-tree, I was able
than any previously considered regions. to achieve an algorithm which produces inter-
The AreaInfoTblLst corresponds to the esting results for both simple geometry and
ImageAreaTableList data type, and stores the general images. In the case of the circle, I
computed region information. was able to achieve 42:1 image compression.
Unfortunately, the decompression of the circle
V. Results produces artefacts in the form of blurred edges
and a gray hue which is not present in the in-
While this implementation is still under devel- put image. In the case of lena, I was able to
opment, there have already been promising achieve only a 3:2 compression ratio. However,
results to suggest that the technique worth fur- the quality of the decompressed lena is quite
ther exploration and improvement. In what good, with few distracting artefacts.
follows, claimed compression ratios are based
on conservative pen-and-paper calculation, tak-
ing into consideration the number of range-
transformations required to encode an image
with respect to the pixel dimension of the im-
age.
The first success of this implementation is
to compress images of simple geometry, for
instance the circle. Early attempts at the circle
were abysmal, while quite entertaining.
Figure 10: From top-left to bottom-left clockwise: lena
original, lena full decompressed at 3:2, cir-
cle first iteration of decompression, circle full
decompression at 42:1

In terms of efficiency, this implementation


has not yet been sufficiently optimized for per-
Figure 8: Early attempts at the circle
formance. Encoding the 128x128 circle image
above required 15.12 seconds, while encoding
After some streamlining, I was able to the 128x128 lena image required 47.0 seconds.
achieve an implementation which did infact These deficiencies likely are the result of sub-
encode the circle (and other simple geometry) optimal memory use in the functional setting
quite well, with a compression ratio of approx. of Haskell. There are certain algorithmic im-
90:1). However, these algorithms were not very provements to be made as well.
good at encoding general images.

VI. Future Development and


Improvement
The primary goals for development on this im-
plementation in the future are

Figure 9: The circle on the left, an attempt to decode lena • Use of criterion and other profiling /
on the right benchmarking libraries to improve this
baseline resource use, and performance,
After further streamlining, specifically in- of the existing algorithm.

7
Mitchell Douglass ([email protected]) March 18, 2016

• Begin writing a robust test suite for • Currently blindly partitions regions only
the library to track and maintain well- into 4 equal quadrants: modify partition-
beardedness of the various components ing scheme to split based on feature de-
of the algorithm. tection.

• Begin writing a utilities library to aid in


agile development and testing. References
• Provide a robust disk serialization of im- [1] Curtis, S.E. and C.E. Martin Functional
age encoding such that image transfor- fractal image compression, Proceedings of
mations may be written and read from the 6th Symposium on Trends in Fun-
disk correctly, and efficiently. cional Programming, TFP 2005 pp: 383-
398, 2005.
• Leverage the parallel / mutable state ca-
pabilities of the Repa, and other package [2] Fisher, Y. Fractal Image Compression, SIG-
to improve performance. GRAPH’92 course notes, 1992.

• Currently only considers blocks of size [3] Saupe, D. and Hamzaoui, R. and Harten-
2k , which improves encoding efficiency, stein, H. Fractal Image Compression - An In-
but is a lost opportunity in terms of com- troductory Overview, Albert-Ludwigs Uni-
pression. Incorporate blocks of arbitrary versity at Freiburg, TFP 2005 pp: 383-398,
size and non uniform aspect ratio. 1997.

You might also like