Fms lz77
Fms lz77
net/publication/262558632
CITATIONS READS
0 329
3 authors, including:
All content following this page was uploaded by Claudia Feregrino on 31 May 2014.
Virgilio Zuñiga Grajeda, Claudia Feregrino Uribe and Rene Cumplido Parra
National Institute for Astrophysics, Optics and Electronics
Luis Enrique Erro No. 1. Tonantzintla, Puebla, Mexico. Postal Code: 72840
[email protected]; [email protected]; [email protected]
Abstract.
Nowadays, the use of digital communication systems has increased in such a way that network bandwidth is affected.
This problem can be solved by implementing data compression algorithms in communication devices to reduce the
amount of data to be transmitted. However, the design of large hardware data compression models implies to consider
an efficient use of the silicon area. This work proposes the conjunction of two different hardware lossless data
compression approaches which share common hardware elements. The project also involves the design of a
hardware/software architecture to exploit parallelism increasing execution speed while keeping flexibility. A custom
coprocessor unit executes the compute-intense tasks of the Burrows-Wheeler Transform and the Lempel-Ziv lossless
data compression schemes. This coprocessor unit is controlled by a SPARC V8 compatible general purpose
microprocessor called LEON2.
Keywords: Data compression, Burrows-Wheeler Transform, Lempel-Ziv, Coprocessor, LEON2.
Resumen.
Hoy en día, el uso de sistemas de comunicación digitales ha aumentado de tal forma que el ancho de banda en las
redes resulta afectado. Este problema puede solucionarse implementando algoritmos de compresión de datos en
dispositivos de comunicación reduciendo la cantidad de datos a transmitir. Sin embargo, el diseño de modelos
complejos de compresión de datos en hardware implica considerar el uso eficiente de la superficie de silicio. Este
trabajo propone la combinación de dos esquemas diferentes de compresión de datos sin pérdida que compartan
elementos comunes. Este proyecto también trata el diseño de una arquitectura hardware/software que explote el
paralelismo e incremente la velocidad de ejecución manteniendo su flexibilidad. Un coprocesador ejecuta las tareas
computacionalmente intensas de los esquemas de compresión Burrows-Wheeler Transform y Lempel-Ziv. El
coprocesador es controlado por un microprocesador de propósito general compatible con la arquitectura SPARC V8
llamado LEON2.
Palabras clave: Compresión de Datos, Transformada de Burrows-Wheeler, Lempel-Ziv, Coprocesador, LEON2.
1 Introduction
In recent years, there has been an unprecedented explosion in the amount of digital data transmitted by communication
systems. Millions of users access the World Wide Web every day to send and receive all kind of data. However, the
amount of data transmitted tends to grow and the network bandwidth can be insufficient. To solve this problem, it is
possible to reduce this amount of data without altering the information in the message. This procedure, known as data
compression, may result in a reduction of the total amount of data to be transmitted. A wide variety of data can be
transmitted through the network such as images, video, text, sound and software. Although some data types allow some
loss of information when compressed, text and software must be decoded exactly as they were before compression, since
the smallest change can compromise the meaning of the information. Also when choosing a data compression algorithm
it is important to consider that some algorithms achieve better compression on certain kind of data than others. But when
the kind of data to transmit is not known, lossless data compression algorithms are required. Hence, the present work
focuses on lossless data compression and also explores the use of two different algorithms to suit the variety of the kind
of data.
Many algorithms have been developed to compress data. As any other algorithm in computer science, basically
there are two methods to execute them. One common method is to use a general purpose processor programmed to
perform a set of tasks. A main microprocessor controls the computer resources and the software uses these resources to
accomplish certain task. This method has the advantage of flexibility, if the algorithm must be modified, the software
can easily be changed to execute another task. However the method has a disadvantage, its performance is poor and for
some tasks the general purpose processor is unacceptably slow. This is because the approach lacks of hardware-
optimized components. The other method is to implement the algorithm using an Application Specific Integrated Circuit
(ASIC). The ASICs are circuits designed to solve a specific problem using optimized hardware to achieve better
performance. However the drawback of this approach is the loss of flexibility, it is impossible to make any changes to
the architecture when the circuit is already fabricated.
In the last 10 years, designers have been using reconfigurable computing to exploit the advantages of both methods [10],
[9], [5]. Reconfigurable computing offers greater flexibility than an ASIC solution and increased speed over a software
approach. These benefits are obtained with Field Programmable Gate Arrays (FPGAs). An FPGA is a semiconductor
device containing programmable logic components and programmable interconnects. The programmable logic
components can be programmed to duplicate the functionality of basic logic gates (such as AND, OR, XOR, NOT) or
more complex combinatorial functions such as decoders or simple math functions. In most FPGAs, these programmable
logic components (or logic blocks, in FPGA parlance) also include memory elements, which may be simple flip-flops or
more complete blocks of memories. A hierarchy of programmable interconnects allows the logic blocks of an FPGA to
be interconnected as needed by the system designer. These logic blocks and interconnects can be programmed after the
manufacturing process by the customer/designer (hence the term “Field Programmable”) so that the FPGA can perform
whatever logical function is needed. An FPGA can be used in conjunction with a general purpose processor. In this case,
the functions that can not be speed up could be run simultaneously on a general purpose machine with only the compute-
intense algorithms executed on the FPGA. Similarly, for problems requiring more resources than physically available on
a single FPGA, multiple devices can be tied together and configured to solve even more complex problems. Other way
to solve large problems when resources are limited is to reuse these resources.
In summary, a general purpose processor can execute an algorithm with an FPGA implementing the compute-
intense tasks achieving higher processing speeds. Also it is possible to save FPGA resources by reusing elements. The
goals of this work are twofold. First, to design a custom coprocessor to execute the compute-intense tasks of lossless
data compression algorithms and to save resources by reusing hardware elements. Second, to test and validate the
designed coprocessor by attaching it to a general purpose processor programmed with the lossless data compression
algorithms. The execution of the algorithms using the coprocessor must be faster than a pure software implementation.
The design of the hardware system is carried out using the bottom-up methodology, starting with simple components
that are assembled and encapsulated to produce more complex components until the system is completed. The selection
of the Lempel-Ziv (LZ77) and Burrows-Wheeler Transform (BWT) methods is the result of the analysis of several
lossless data compression algorithms: The Arithmetic Coder [1], the Prediction by Partial Matching (PPM) [3] and the
Huffman Coding [11] among others. The analysis of the corresponding hardware approach allows to compare the
structures used by the algorithms and this study showed the feasibility to combine the presented algorithms. It is worth
noting that the present work tackles the design of the compressor architecture and a decompressor scheme is part of the
future work. In the next section, a brief introduction to data compression is given as well as a description of these
methods.
2 Data compression
Data compression is the process of converting an input data stream (the source stream or the original raw data) into
another data stream (the output or the compressed stream) that has a smaller size. A stream is either a file or a buffer in
memory. There are many known methods for data compression. They are based on different ideas, are suitable for
different types of data, and produce different results, but they are all based on the same principle, namely they compress
data by removing redundancy from the original data in the source file.
Data compression has important application in the areas of data transmission and data storage. Many data processing
applications require storage of large volumes of data, and the number of such applications is constantly increasing as the
use of computers extends to new disciplines. At the same time, the proliferation of wireless communication networks is
resulting in massive transfer of data over communication links. Compressing data to be stored or transmitted reduces
storage and/or communication costs. When the amount of data to be transmitted is reduced, the effect is that of
increasing the capacity of the communication channel. Similarly, compressing a file to half of its original size is
equivalent to doubling the capacity of the storage medium.
The essential figure of merit for data compression is the compression ratio, or ratio of the size of a compressed file
to the original uncompressed file. For example, suppose a data file takes up 100 Kilobytes (KB). Using data
compression software, that file could be reduced in size to, let us say, 50 KB, making it easier to store on disk and faster
to transmit over a communication channel. In this specific case, the data compression software reduces the size of the
data file by a factor of two, or results in a compression ratio of 2:1.
There are two kinds of data compression models, lossy and lossless. Lossy data compression, works on the
assumption that the data do not have to be stored perfectly. Much information can be simply thrown away from images,
video data or sound, and when uncompressed such data will still be of acceptable quality. Lossless compression, in
contrast, is used when data have to be uncompressed exactly as it was before compression. Text files (specially files
containing computer programs) are stored using lossless techniques, since losing a single character can make, in the
worst case, the text dangerously misleading.
The part on the left is called the search buffer. This is the current dictionary, and it always includes symbols that
have recently been input and encoded. The part on the right is the look-ahead buffer, containing text yet to be encoded.
While shifting the input stream into the search buffer it is compared with the existing data in the dictionary to find the
maximum-length-matching phrase. Once such a matching phrase is found, an output codeword or token T = (To,Tl,Tn) is
generated. Each token contains three elements: The offset To that points to the starting position of the matching phrase in
the dictionary, the length Tl of the matching string and the next source data symbol Tn immediately following the
matching string. In the next cycle following the generation of the token, a new source data string enters the dictionary,
and a new matching process begins and proceeds in the same way until all the data are completely encoded. Starting
from the idea that recent data patterns are expected to appear in the near future, and the latest string is contained in the
dynamic dictionary, LZ77 can often replace a long and frequently encountered string with a shorter code. An example of
the LZ77 encoding process is shown in Figure 1(b). In the first four steps the search buffer is empty, thus the symbols,
“_”, “s” and “h” are encoded with a token with zero offset, zero length and the unmatched symbol. In the next two steps
the “_” and “s” symbols are found, no token is generated yet. In the next step, the symbol “e” is found but it is not part
of the string “_s”, so the token (4,2,“e”) is constructed. The process continues until all the input string has been coded. A
token of the form (0,0,...) which encodes a single symbol does not provide good compression but these kinds of tokens
appear only at the beginning of the process. In Figure 1, it can be noticed that the more data contains the search buffer,
the less tokens are needed to represent a string. The LZ77 algorithm laid the basis for compressed graphics formats such
us GIF, TIFF and JPEG.
Let S = “_ s h e _ s e l l s _ s e a _ s h e l l s”
The encoder constructs an n x n matrix where it stores string S in the top row, followed by n - 1 copies of S, each
cyclically shifted (rotated) one symbol to the left. Then the matrix is sorted lexicographically by rows as seen in Figure
2. Notice that every row and every column of each of the two matrices is a permutation of S and thus contains all n
symbols of S. The permutation L selected by the encoder is the last column of the sorted matrix, in this example “s e s a
e h s s h s s e e l l l l _ _ _ _”. The only other information needed to eventually reconstruct S from L is the row number of
the original string in the sorted matrix, which in this case is 2 (row and column numbering starts from 0). This number is
stored in I. It can be seen why L contains concentrations of identical symbols. It is worth noting that the larger n, the
longer the concentrations of symbols and the better the compression. As this work only deals with the encoding process,
the decoder phase is not described. The interested reader is referred to [15]. The most popular implementation of the
BWT is the open-source program bzip2 [16] that uses the Huffman coding method.
_she_sells_sea_shells _sea_shells_she_sells
she_sells_sea_shells_ _sells_sea_shells_she
he_sells_sea_shells_s _she_sells_sea_shells
e_sells_sea_shells_sh _shells_she_sells_sea
_sells_sea_shells_she a_shells_she_sells_se
sells_sea_shells_she_ e_sells_sea_shells_sh
ells_sea_shells_she_s ea_shells_she_sells_s
lls_sea_shells_she_se ells_sea_shells_she_s
ls_sea_shells_she_sel ells_she_sells_sea_sh
s_sea_shells_she_sell he_sells_sea_shells_s
_sea_shells_she_sells hells_she_sells_sea_s
sea_shells_she_sells_ lls_sea_shells_she_se
ea_shells_she_sells_s lls_she_sells_sea_she
a_shells_she_sells_se ls_sea_shells_she_sel
_shells_she_sells_sea ls_she_sells_sea_shel
shells_she_sells_sea_ s_sea_shells_she_sell
hells_she_sells_sea_s s_she_sells_sea_shell
ells_she_sells_sea_sh sea_shells_she_sells_
lls_she_sells_sea_she sells_sea_shells_she_
ls_she_sells_sea_shel she_sells_sea_shells_
s_she_sells_sea_shell shells_she_sells_sea_
Matrix of cyclic shifts of S Matrix sorted lexicographically
Fig. 2. BWT matrix.
3 Related work
This section describes the strategies proposed in the past to design hardware architectures of the LZ77 and BWT
algorithms. For the LZ77 algorithm, two main hardware proposals were studied: The systolic array and the Content-
Addressable Memory (CAM) approaches. Whereas only one architecture was found for the BWT algorithm: The
Weavesorter machine. As one of the goals of this work is to combine an LZ77 architecture with a BWT one, the
Content-Addressable Memory approach was chosen because it uses similar hardware components of the Weavesorter
machine architecture. This allows to combine both architectures reusing hardware resources. A description of these two
strategies is presented in this section.
matching position address can be resolved by a (log2N)-stage position encoder in the second stage. The position encoder
in the second stage is a Priority Encoder such that when several inputs are logically-1 in the same cycle, only the one
with the highest priority is selected as the output. Priorities of the inputs do not affect the compression performance.
They can be assigned in an ascending or descending order according to the indexes of inputs to minimize the encoder
complexity. In a pipelined fashion, the compared data also shifts into the end of the CAM array to update the dictionary,
and the next source symbol is injected into the system to start the next comparison operation. Note that in this two-stage
pipeline scheme, the output stage follows immediately after the Matched signal is disabled because the To and Tl
elements of the output codeword are available at the same clock edge. The sliding dictionary in this scheme can be
implemented using a sliding pointer representing the current writing address in the CAM. The pointer also provides an
offset for the match position encoder. Note that since the goal of the comparisons is to find the maximum-length-
matching string, the matching result of the source symbol in each cell has to propagate to the comparison processes of
the next source symbol. Hence, the actual match result in each cell is obtained by ANDing its own match result in the
current cycle and the delayed match result of the previous cell in the earlier cycle. In addition, the complement of the
Matched signal is used to set all the string match results to 1 in order to start the next string matching operation.
and it indicates when all the columns of the matrix are sorted. See Figure 5.
LEON2 [7] is a System on Chip (SoC) platform with a 32-bit Scalar Processor Architecture (SPARC) V8 [17]
compatible embedded processor. It was developed by the European Space Agency (ESA) and is freely available [6] with
full source code in VHDL under GNU Lesser General Public License (LGPL) [8]. The LEON2 platform is extensively
configurable and may be efficiently implemented on both FPGA and ASIC technologies. A LEON2 diagram is shown in
Figure 10. The SPARC V8 architecture defines two (optional) coprocessors: one Floating-Point Unit (FPU) and one
custom-defined coprocessor. The LEON2 pipeline provides one interface port for each of these units. Three different
FPUs can be interfaced: Gaisler Research Floating-Point Unit (GRFPU) (available from Gaisler Research [6]), the
Meiko FPU core (available from Sun Microsystems [18]) and the incomplete LTH FPU. A generic coprocessor interface
is provided to allow interfacing of custom coprocessors. The LTH FPU is a partial implementation of an IEEE-754
compatible FPU contributed by Martin Kasprzyk from the Lund Technical University and is used in this work because
of its free-source nature. This FPU uses the Meiko compatible interface and is replaced by the BWT/LZ77 IP core. The
interface is a wrapper around compatible Meiko FPU cores and is described using VHDL records to connect with the
Integer Unit (IU) of the LEON2 processor. Figure 11 shows the interface connected to the LEON2 IU.
Fig. 12. Main BWT and LZ77 programs for the LEON2 processor.
In line 1:, the SetReadAddr() procedure sets the first memory allocation where the processing file is stored. In line
2:, the SetWriteAddr() procedure sets the first memory allocation where the results of the process are written. The
ReadData() procedure in line 4: calculates the addresses to be read from the memory and executes the FADDd
instruction to send the data to the coprocessor. In line 5: the BWT() procedure executes the FSQRTd instruction and the
BWT process is started. Once the BWT operation is done, the results are read from the coprocessor and stored in
memory by the WriteColumns() procedure that executes the FSUBd instruction (line 6:). Finally, the coprocessor is set
to the Reset state by the ResetCoprocessor() procedure in line 7: executing the FMULd instruction. The process is
repeated until the end of file (line 3:) is reached. The pseudocode for the LZ77 execution is very similar to the BWT one,
the only changes are the LZ77() and WriteTokens() procedures in lines 5: and 6: respectively. The LZ77() procedure
executes the FSQRTs instruction to perform the LZ77 algorithm and the WriteTokens() procedure reads the LZ77 tokens
from the coprocessor and writes them into memory using the FSUBd instruction.
7.1 Simulation
The LEON2 provides a generic test bench (“tbgen.vhd”) that allows to generate a model of a LEON2 system with
various memory sizes/types by setting the appropriate generics. The file “tbleon.vhd” contains a number of alternative
configurations using the generic test bench. To test the coprocessor the TB_FUNC_SDRAM test bench is selected
because it allows to operate a SDRAM module where the original data can be stored. The test bench loads a test program
in the LEON2 processor that among other operations executes a checking routine for the FPU. The C code for this test is
provided with the model in the “fpu.c” file and the code to execute the BWT and LZ77 algorithms is written there. The
test program must be recompiled with the RCC cross-compiler to generate a new assembler code for the
TB_FUNC_SDRAM test bench. The compiler writes the assembler code in the “sdram.rec” file as the Motorola S-
record format that simulates the SDRAM memory. Motorola S-records are an industry-standard format for transmitting
binary files to target systems and PROM programmers. When the LEON2 model is simulated, the instructions are read
from this file. To validate the operation of the system, an extra module is added to the output of the coprocessor, this
module is for testing purposes only and it writes the results into a text file. The BWT and LZ77 algorithms are
prototyped in Matlab and the results are also written to a text file. Both resulting files are identical.
7.2 Timing
The throughput of the coprocessor is estimated assuming that the LEON2 processor has already read the data from the
SDRAM and they are stored in the floating-point registers. This is because the SDRAM read/write operations depend on
the Memory Controller connected to the Advanced Microcontroller Bus Architecture (AMBA) bus (Figure 10) and the
specific model of the SDRAM module. It is worth noting that these operations must be performed also in a software
implementation. The number of clock cycles needed for the communication between the LEON2 and the coprocessor is
fixed but it is not in the sorting operation for the BWT algorithm. This is because the Weavesorter performs a shift-left
and shift-right iteration until all the characters are sorted, then the number of iterations depends directly on the kind of
data. The Weavesorter takes n x 2 x 2 clock cycles to shift in and out the first column of the matrix. However in the next
iteration, the second column is inserted while the first one is shifted out, it means that every extra iteration takes only n x
2 clock cycles. The number of iterations is equal to the length of the largest string segment repeated in the block, this
number is represented by ι. Therefore, the total clock cycles requested by the Weavesorter is 2n (ι + 1). On the other
hand, the throughput for the LZ77 algorithm is 1 symbol per clock cycle, the same as the CAM-based approach. Table 2
shows a comparison between the coprocessor and software implementations of the Quicksort and Heapsort algorithms.
A block of 128 bits is sorted for every file of the Canterbury Corpus. On average, the coprocessor is 3.6 times faster than
the Quicksort software implementation and 1.6 times faster than the Heapsort one.
7.3 Synthesis
The coprocessor architecture is synthesized with a Weavesorter of 64 cells. The selected device is a Virtex-II
2v2000bg575-4. According to the synthesis report, the maximum clock frequency is 101.309 MHz. A device utilization
summary is presented in Table 3.
To find which components are shared between algorithms, the coprocessor is synthesized only with the components
of the BWT algorithm. On the other hand, a synthesis with only the LZ77 algorithm components is also performed.
Comparing with a synthesis of the complete coprocessor, the elements found in both algorithms are shared. From the
data obtained it is possible to find the common elements shown in Table 4.
To obtain a percentage of the reused hardware, a summary of the device utilization for the BWT and LZ77 schemes
is obtained. The total number of slices needed to design both algorithms separately is 3680 + 5707 = 9387. As seen in
Table 3 the coprocessor designed in this work uses 7870 saving 1517 slices on the FPGA, in other words, 19.2% of the
resources is shared.
8 Conclusions
This work proposed a hardware/software architecture with a general purpose microprocessor (LEON2) and a custom
coprocessor. The microprocessor maintains the flexibility of the system while the coprocessor takes advantage of
parallelism. Thus, the architecture combines the hardware and software approaches to make the most of the two
solutions.
The architecture was designed to combine two different hardware lossless data compression algorithms in such a
way that they had common elements, increasing resource utilization. The analysis of the synthesis reports leads to
calculate a resource utilization of 19.2% of the total FPGA slices used by the coprocessor, this reduces the
implementation costs and allows to increase the number of cells, improving the compression ratio.
Some advantages of the presented work are: The architecture can execute two different lossless data compression
schemes. This flexibility is useful since both algorithms have different compression ratios and execution speed. The
architecture favours compression ratio using the BWT capabilities and also can achieve high speeds with the LZ77
scheme. The design of the coprocessor does not introduce delays to the state of the art data compression schemes, this
means that once the coprocessor has the data input available, the processing rate is the same as the original Weavesorter
machine (2n (ι + 1)) and CAM-based (1 symbol per clock cycle) architectures. The coprocessor is closely attached to the
IU of the LEON2 processor, this reduces the clock cycles needed for communication and data transfer unlike a universal
bus scheme. Since a universal bus is designed to handle more than one processor, the communication with the IU is
arbitrated and this introduces delays.
Acknowledgments
The authors acknowledge the financial support from the Mexican National Council for Science and Technology
(CONACyT), grant number 181512.
References
1. N. Abramson. Information Theory and Coding. McGraw-Hill, New York, 1963.
2. M. Burrows and D. J. Wheeler. A block-sorting lossless data compression algorithm. Technical Report 124, SRC
(digital, Palo Alto), May 1994.
3. J. G. Cleary and I. H. Witten. Data compression using adaptive coding and partial string matching. IEEE
Transactions on Communications, 32(4):396–402, Apr. 1984.
4. T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein. Introduction to Algorithms. The MIT Press and
McGraw-Hill Book Company, second edition, 2001.
5. C. Ebeling, D. C. Cronquist, and P. Franklin. RaPiD - reconfigurable pipelined datapath. In R. Hartenstein and
M. Glesner, editors, th International Workshop on Field-Programmable Logic and Compilers, pages 126–135.
Springer-Verlag, Apr. 1996.
6. Gaisler Research. http://www.gaisler.com/. [Accessed September, 2005].
7. Gaisler Research. LEON2 Processor User’s Manual, 1.0.30 edition, July 2005. http://www.gaisler.com/,
[Accessed September, 2005].
8. GNU Lesser General Public License. http://www.gnu.org/copyleft/, Feb. 1999. [Accessed September, 2005].
9. S. Goldstein, H. Schmit, M. Budiu, S. Cadambi, M. Moe, R. Taylor, and R. Laufer. PipeRench: A co-processor
for streaming multimedia acceleration. In D. DeGroot, editor, Proceedings of the 26th Annual International
Symposium on Computer Architecture, Computer Architecture News, pages 28–41, New York, N.Y., May 1999.
ACM Press.
10. J. R. Hauser and J. Wawrzynek. Garp: A MIPS processor with a reconfigurable coprocessor. In K. L. C. S. Press,
editor, IEEE Symposium on FPGAs for Custom Computing Machines, pages 12–21, Los Alamitos, CA, Apr.
1997. IEEE Computer Society Press.
11. D. A. Huffman. A method for the construction of minimum redundancy codes. Proceedings of the Institute of
Electronics and Radio Engineers, 4(9):1098–1101, Sept. 1952.
12. S. Jones. 100 Mbit/s adaptive data compressor design using selectively shiftable content-addressable memory. IEE
Proceedings-G, 139(4):498–502, Aug. 1992.
A. Mukherjee, N. Motgi, J. Becker, A. Friebe, C. Habermann, and M. Glesner. Prototyping of efficient
hardware algorithms for data compression in future communication systems. In IEEE International
Workshop on Rapid System Prototyping, pages 58–63, June 2001.
13. M. Nelson and J.-L. Gailly. The Data Compression Book. M&T Books, second edition, 1996.
14. D. Salomon. Data Compression: The Complete Reference. Springer-Verlag, third edition, 2004.
Virgilio Zuñiga was born in 1979 in Guadalajara Mexico. In 2003, he obtained a B.Eng. degree on Electronics and
Communications from the Exact Sciences and Engineering Centre, University of Guadalajara. In 2006, at the National
Institute for Astrophysics, Optics and Electronics he received a M.Sc. degree in Computer Science. Nowadays, he is a
PhD student in the System Level Integration research group at the University of Edinburgh, Scotland.
Claudia Feregrino Uribe received the M.Sc. degree from CINVESTAV Guadalajara, Mexico in 1997 and the PhD
degree from Loughborough University, U.K, in 2001. Currently she is a researcher at Computer Science Department at
INAOE. Her research interests cover the use of FPGA technologies, data compression and cryptography,
steganography and software/hardware development for medical applications. She is a founding member of the
ReConFig (Reconfigurable Computing and FPGAs) conference, which is one of the leading international forums for
FPGAs. She is the chair of the Puebla IEEE Computer Chapter.
René Armando Cumplido Parra received the B.Eng. from the Instituto Tecnologico de Queretaro, Mexico, in 1995. He
received the M.Sc. degree from CINVESTAV Guadalajara, Mexico, in 1997 and the Ph.D. degree from Loughborough
University, UK in 2001. Since 2002 he is a member of the Computer Science Department at the National Institute for
Astrophysics, Optics, and Electronics in Puebla, Mexico. His research interests include the use of FPGA technologies,
digital design, computer architecture, reconfigurable computing, DSP, radar signal processing, software radio and
digital communications. He is a founding member of ReConFig international conference.