Fast Mapping and Updating Algorithms For A
Fast Mapping and Updating Algorithms For A
2, SPRING 2021
Abstract— Content-addressable memories (CAMs) are used in a variety of applications, such as IP filtering,
data compression, and artificial neural networks due to its high-speed lookup. Fast field-programmable gate
arrays (FPGAs) are nowadays used to emulate CAMs. These CAM emulations either make use of logical
resources or use memory blocks on FPGAs to emulate CAMs. However, such CAM emulation suffers from
slower mapping and updating mechanisms, which results in an unacceptable response in real-time applications.
The slower response in update mechanism is proportionate to the CAM depth in the schemes. In this article, fast
mapping and updating algorithms for a binary CAM (FMU-BiCAM) are presented, which efficiently utilizes
lookup tables, slice registers, and block random access memories (RAMs) on Xilinx FPGA to emulate faster
mapping and updating CAMs. The advantage of the proposed work lies in directly applying the CAM key
as an address, which helps in updating contents in memory units. CAMs in the literature exhaust the entire
CAM depth in remapping the CAM words along with the updating word, which leads to higher update latency.
The proposed algorithms are implemented on Xilinx Virtex−6 FPGA, and the results show that the proposed
method brings latency to only two clock cycles during update.
Résumé— Les mémoires adressables de contenu (CAMs) sont utilisées dans une variété d’applications, telles
que le filtrage IP, la compression des données et les réseaux de neurones artificiels en raison de sa recherche
à haute vitesse. Les réseaux de portes rapides programmables in situ (FPGAs) sont aujourd’hui utilisées pour
émuler les CAMs. Ces émulations CAM utilisent des ressources logiques ou des blocs de mémoire sur les FPGA
pour émuler des CAMs. Cependant, une telle émulation CAM souffre d’un mappage et de mécanismes mise
à jour lents, ce qui entraîne une réponse inacceptable dans les applications en temps réel. La réponse lente
dans le mécanisme de mise à jour est proportionnelle à la profondeur CAM dans les schémas. Dans cet article,
des algorithmes de mappage et de mise à jour rapides pour un CAM binaire (FMU-BiCAM) sont présentés,
qui utilisent efficacement les tables de recherche, les registres de tranche et bloc de mémoires vives (RAM)
sur Xilinx FPGA pour émuler un mappage et une mise à jour plus rapides des CAMs. L’avantage du travail
proposé réside dans l’application directe de la clé CAM comme adresse, ce qui aide à mettre à jour le contenu
des unités de mémoire. Les CAM dans la littérature épuisent toute la profondeur de CAM en remappant les
mots CAM avec le mot de mise à jour, ce qui conduit à un délai de mise à jour plus élevée. Les algorithmes
proposés sont implémentés sur Xilinx Virtex-6 FPGA et les résultats montrent que la méthode proposée crée
un délai qu’à deux cycles d’horloge lors de la mise à jour.
Index Terms— Content-addressable memory (CAM), fast mapping algorithm, fast updating algorithm, random
access memory (RAM)-based CAM.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.
QAZI et al.: FMU-BICAM ON FPGA 157
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.
158 IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, VOL. 44, NO. 2, SPRING 2021
achieves optimization of memory resources but at the cost of to 36k block RAM used in most of the related work. This is
degraded scalability. In addition, virtual partitioning in a single done to reduce the number of block RAMs, e.g., k, in each
memory block and use of validation memory (VM) trigger layer. Decreasing k though increases the total number of layers
successive processes in single memory block and reduce speed in architecture for successfully emulating the target 512 × 36
significantly. The update mechanism mentioned in [19] again CAM, but experimental results suggested that increasing the
depends on O(N). Lee et al. [20] adopted the technique of number of layers and lesser the SRAM blocks in a layer help
bundle updates and the update mechanism in the architecture to reduce power consumption during update. Hence, instead
is proportionate to the width size of configured CAM rather of using 32 block RAMs of 36k size, 64 block RAMs of 18k
than the depth. The design in [20] has compromised speed size are used. The first layer cascades blocks from (SRAM1,1 )
and scalability to achieve optimization in memory resources to (SRAM1,k ), whereas the last layer cascades blocks from
but still depends on the width of the configured CAM. (SRAM L ,1 ) to (SRAM L ,k ). Upon fetching the required address
The SRAM-based CAM architectures mentioned in [12], location on address line, the control logic on top of each layer
[13], [21], and [22] address the issue of update mechanism selects the corresponding layer. Mapping of the CAM word
through dedicated circuitry, yet the update latency is depen- starts from dividing CAM word into subwords, whereas the
dent on CAM depth. The logic-based schemes mentioned number of CAM subwords depends on the number of SRAM
in [6], [23], and [24] simplify the update procedure at the blocks in each layer or vice versa. A number of addresses
cost of degradation in scalability and speed. Update time is a layer can link indulges on the width N of SRAM block,
reduced up to some extent in [24] and [25] but compromising whereas N refers to valid informative column bits of SRAM
either speed or resource utilization. The dependence of update block.
mechanism on CAM depth in the literature motivated us In every layer, the corresponding significant bit SB of
to resolve the issue of slower updating and mapping for each memory block verifies the presence of linked subwords.
SRAM-based CAM. We effectively develop an algorithm that Concurrently, the correlated SBs of all SRAM blocks in a layer
supports fast run time updating in its mapped contents. The are ANDed together to check the presence of input word Cw .
design is having the beauty that the user can configure it as per Equation (1) describes the size of CAM depth Cd
CAM size requirement to avoid problems in run time updating.
For example, to implement the CAM to handle a 36-bit key, Cd = 2(Cw/k) (1)
the configured CAM could be of size (512×36) or (1024×36), where Cw refers to width of CAM word and k refers to the
or to handle 72-bit key, the configured CAM could be of size number of SRAM blocks in each layer.
(512 × 72) or (1024 × 72).
A. Mapping Mechanism
III. P ROPOSED M APPING AND U PDATING A LGORITHMS
Mapping process is complex in [3], [5], [7], and [19] with
Recent work shows that the comparison process is exhaus- the use of temporary lookup tables. Like [3] and [7], divide
tive in the worst case scenario. This is to the fact that all the CAM word into subwords in time period t1 , make bit
SRAM-based CAM architectures use an indirect searching position table (BPT) from the corresponding subwords in t2 ,
approach, which utilizes a lot of resources such as generation build address position table address generator (APTAG) from
of temporary tables for storing interim data and generation corresponding BPT in t3 , and finally make address position
of VM modules for checking and storing the presence of table (APT) from the corresponding APTAG table in t4 .
subwords. Once the complicated circuitry is of no importance The process differs somewhat in [5] and [19] by using VMs
in the worst case scenario, it can be removed and a direct modules in place of BPT and original address table address
searching approach may be incorporated. The removal of such generators (OATAGs) in place of APTAG, yet consume a
complex circuitry significantly reduces the resource utilization number of clock cycles for mapping contents at the desired
and power consumption of SRAM-based CAMs and ultimately location. The proposed mapping mechanism; however, uses
leads to optimization in other parameters as well. only two clock cycles - t1 for generation of CAM subwords
Updating of data contents discussed in the literature is and t2 for mapping these subwords to the corresponding rows
solemnly dependent on the positioning of updated CAM word in SRAM blocks by using them as row address to set the
with existing stored contents. Mapping the updated CAM corresponding bits of SRAM blocks.
word in the appropriate location involves iterative steps, which The proposed mapping mechanism is independent of arrang-
lead to higher latency. Such limitations in the update mecha- ing the mapped words in any order. Algorithm 1 demonstrates
nism of SRAM-based CAMs motivate the development for a the mapping process. To map the queued CAM word Cw into
faster mapping and updating algorithms that are independent the desired location, Cw is divided into k same sized subwords,
of the CAM depth. The proposed fast mapping and updating Csw(0) to Csw(k) . The beauty of the proposed work lies in
mechanisms/algorithms for a binary CAM is shown in Fig. 1. incorporating control logic for mapping process unlike other
The generalized fast mapping and updating mechanisms architectures that peculiarly activate the addressed location
using block RAMs are proposed for BiCAM, which is com- layer only. The mapping CAM word is also used as a target
posed of L number of layers, each layer having cascaded address location in SRAM blocks where it will be stored. This
SRAM blocks, as shown in Fig. 1. It is important to mention additional feature makes it a unique approach. The subwords
that we used the smaller size of 18k block RAM compared from Csw(0) to Csw(k) set only the corresponding significant
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.
QAZI et al.: FMU-BICAM ON FPGA 159
bits of addressed SRAM blocks, leaving the other bits of row of (SRAM1,2 ). Similarly, for second CAM word, the least
into low level. The process thus eradicates activating all layers significant subword Csw(0) = 100 is mapped to (SRAM2,1 ),
for contents writing and haul downs the energy consumption while the most significant subword Csw(1) = 010 is mapped
during the mapping process. The queued CAM words can be to (SRAM2,2 ) at address “5.” It is important to mention that
mapped individually into the desired locations in random order mapping control logic singularly activates layer 1 for address
until exhausting the depth of the CAM architecture. “1” as it is linked to layer 1 and activates layer 2 for address
“5” as it is linked to layer 2. The beauty of the proposed
1) Mapping Example: For mapping the CAM words shown mapping mechanism lies in directly mapping CAM words by
in Fig. 2, we configure 8 × 4 BiCAM architecture. For this using its subwords as row address to SRAM blocks without
particular example, we select L = 2 layers of cascaded SRAM arranging it in ascending order unlike other designs.
blocks having depth Rd = 8, and each block links N/L
addresses, where N refers to informative column bits in SRAM
block. Mapping CAM words 101110 to address location “1” B. Lookup Process
and 010100 to address location “5” is the aim. Lookup process is initiated concurrently in all the layers.
After partitioning first CAM word into k = 2 number Algorithm 2 demonstrates the lookup process. The input CAM
of subwords, the least significant subword Csw(0) = 110 is word Cw is divided into k same sized subwords, Csw(0) to
mapped to (SRAM1,1 ) by setting the corresponding signif- Csw(k) . The CAM subwords are applied concurrently in all
icant bit of located address “1,” while the most significant the layers to its corresponding SRAM blocks. The subwords
subword Csw(1) = 101 is mapped to the corresponding location from Csw(0) to Csw(k) identify the addressed locations in the
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.
160 IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, VOL. 44, NO. 2, SPRING 2021
Fig. 2. Mapping process in configured 8 × 4 architecture of a BiCAM with L and k both equal to 2.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.
QAZI et al.: FMU-BICAM ON FPGA 161
Fig. 3. Lookup process in configured 8 × 4 architecture of a BiCAM with L and k both equal to 2.
Algorithm 3 elaborates the proposed update mechanism. proposed update mechanism consumes only two clock cycles
To update contents of the desired location, the existing CAM for all sorts of updates. The proposed direct mapping and
word Cw is divided into k equal-sized subwords, Csw(0) to sequence-independent update procedure significantly reduce
Csw(k) , and applied to the corresponding SRAM blocks of the energy consumption during both processes compared with
the addressed layer. In the first write cycle, the correspond- all the reviewed SRAM-based CAM architectures.
ing addressed bits selected through the existing subwords at 1) Update Example: The update mechanism in configured
the desired location in the corresponding SRAM blocks are 8×4 BiCAM architecture is shown in Fig. 4. For this particular
cleared, followed by mapping the updating CAM subwords example, we select L = 2 layers of SRAM blocks having depth
into the corresponding bit positions of already addressed Rd = 8, and each block links N/L addresses, where N refers to
location. informative column bits of SRAM block and L is the number
of layers of cascaded SRAM blocks.
Algorithm 3 Proposed Updating Algorithm Updating the contents of address location “5” is required.
1: procedure T HIS ALGORITHM IS USED TO UPDATE THE In the erasing step, we concurrently provide address of location
CONTENTS MAPPED TO ADDRESSED LOCATION AT “5” through update control logic to all layers, which results
SRAM BLOCKS in activating layer 2 only, as address “5” is linked to layer
2: [Divide the CAM word into Subwords]; 2. Now, the existing least significant CAM subword Csw(0) =
3: for i = 0 to k do \\execute in parallel for all SRAMs 100 clears the corresponding bit position of address “5” in
4: Csw(i) = Cw(i)/k (SRAM2,1 ), while the existing most significant CAM subword
5: end for Csw(1) = 001 clears the corresponding bit position of location
6: Next i “5” in (SRAM2,2 ).
7: [Find the Address and delete contents]; This deletion is followed by insertion, setting the corre-
8: for i = 1 to L do \\execute in parallel for all layers sponding bit position of address location “5” through least
9: for j = 1 to K do significant updated CAM subword Csw(0) = 000 in (SRAM2,1 )
10: if SRAM(i,j) = Csw(i,j) then and setting the corresponding bit position of address location
11: Del (SBs of SRAM(i,j)) “5” through the most significant updated CAM subword
12: Exit Csw(1) = 111 in (SRAM2,2 ). The updating hence achieved
13: end if through independent mechanism in absolutely two clock cycles
14: Next i,j by deleting existing mapped contents 010100 from address
15: end for “5” in first cycle, followed by mapping the updating CAM
16: end for word 111000 to the same location in the second clock cycle.
17: [Insert new contents];
18: SB of SRAM(i,j) = New-word IV. I MPLEMENTATION AND P ERFORMANCE E VALUATION
19: Exit The proposed algorithms for a BiCAM of size of 64 × 36
20: end procedure and 512 × 36 are implemented on Xilinx Virtex-6 FPGA
device XC6VLX760, using Xilinx ISE Design Suite 14.5.
The process thus eradicates activating all layers for contents Comparative analysis of update latency, speed, resource uti-
updating. Unlike all other SRAM-based CAM schemes, the lization, and time consumed during contents update with
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.
162 IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, VOL. 44, NO. 2, SPRING 2021
TABLE II
P ERFORMANCE C OMPARISON OF THE P ROPOSED M APPING AND U PDATE M ECHANISM W ITH THE R EVIEWED L ITERATURE
Fig. 4. Updating process in configured 8 × 4 architecture of a BiCAM with Fig. 6. Resources comparison of related work with the proposed work [P]
L and k both equal to 2. in terms of update time (represented in µs).
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.
QAZI et al.: FMU-BICAM ON FPGA 163
where update is required, leaving all other layers deacti- [11] V. S. Satti and S. Sriadibhatla, “Hybrid self-controlled precharge-free
vated unlike related schemes. Collectively, both factors surely CAM design for low power and high performance,” Turkish J. Electr.
Eng. Comput. Sci., vol. 27, no. 2, pp. 1132–1146, 2019.
result in reducing power consumption during update. Despite [12] I. Ullah, Z. Ullah, U. Afzaal, and J.-A. Lee, “DURE: An energy- and
achieving significant reduction in update latency and energy resource-efficient TCAM architecture for FPGAs with dynamic updates,”
consumption in update stages, we always catered to keep the IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 6,
pp. 1298–1307, Jun. 2019.
memory resources to a reasonable level as evident from the [13] I. Ullah, Z. Ullah, and J.-A. Lee, “EE-TCAM: An energy-efficient
performance comparison in Table II. SRAM-based TCAM on FPGA,” Electronics, vol. 7, no. 9, p. 186,
Sep. 2018.
[14] Y.-J. Chang and Y.-H. Liao, “Hybrid-type CAM design for both power
V. C ONCLUSION AND F UTURE W ORK and performance efficiency,” IEEE Trans. Very Large Scale Integr. (VLSI)
Syst., vol. 16, no. 8, pp. 965–974, Aug. 2008.
SRAM-based CAM architecture plays a pivotal role in [15] D. B. Grover, R. J. Stephani, and C. D. Browning, “Low power
artificial intelligence, pattern recognition, file storage, and content addressable memory hitline precharge and sensing circuit,”
networking router. The designers have proposed several archi- U.S. Patent 13 456 419, Oct. 31, 2013.
[16] D. Jothi and R. Sivakumar, “Design and analysis of power efficient
tectures on reconfigurable hardware, i.e., FPGAs. The state- binary content addressable memory (PEBCAM) core cells,” Circuits,
of-the-art CAMs suffer from higher update latency during Syst., Signal Process., vol. 37, no. 4, pp. 1422–1451, Apr. 2018.
contents updating as their update mechanism is dependent on [17] Y.-J. Chang, K.-L. Tsai, and H.-J. Tsai, “Low leakage TCAM for IP
lookup using two-side self-gating,” IEEE Trans. Circuits Syst. I, Reg.
arranging the entire contents with the updated contents. The Papers, vol. 60, no. 6, pp. 1478–1486, Jun. 2013.
dependence on the entire CAM depth during update stage also [18] W. Jiang, “Scalable ternary content addressable memory implementation
leads to significant power consumption in the update process. using FPGAs,” in Proc. 9th ACM/IEEE Symp. Archit. Netw. Commun.
Syst. (ANCS), Piscataway, NJ, USA, Oct. 2013, pp. 71–82.
This research work presents a different direction of [19] A. Ahmed, K. Park, and S. Baeg, “Resource-efficient SRAM-based
sequence-independent update mechanism, which does not ternary content addressable memory,” IEEE Trans. Very Large Scale
depend on CAM depth. The proposed algorithm selects at Integr. (VLSI) Syst., vol. 25, no. 4, pp. 1583–1587, Apr. 2017.
[20] D.-Y. Lee, C.-C. Wang, and A.-Y. Wu, “Bundle-updatable SRAM-based
most one layer of SRAM blocks for contents updating at any TCAM design for OpenFlow-compliant packet processor,” IEEE Trans.
location rather than activating the entire memory blocks and Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 6, pp. 1450–1454,
ultimately consumes less energy during the update process. Jun. 2019.
[21] I. Ullah, Z. Ullah, and J.-A. Lee, “Efficient TCAM design based on
Thus, the proposed mapping and updating algorithms speed multipumping-enabled multiported SRAM on FPGA,” IEEE Access,
up the table makeup and reduce energy consumption. vol. 6, pp. 19940–19947, 2018.
Our future work includes to optimize the FGPA resources [22] F. Syed, Z. Ullah, and M. K. Jaiswal, “Fast content updating algorithm
for an SRAM-based TCAM on FPGA,” IEEE Embedded Syst. Lett.,
utilization in relation to mapping and updating algorithms and vol. 10, no. 3, pp. 73–76, Sep. 2018.
extend their scope to TCAM. [23] H. Mahmood, Z. Ullah, O. Mujahid, I. Ullah, and A. Hafeez, “Beyond
the limits of typical strategies: Resources efficient FPGA-based TCAM,”
IEEE Embedded Syst. Lett., vol. 11, no. 3, pp. 89–92, Sep. 2019.
R EFERENCES [24] P. Reviriego, A. Ullah, and S. Pontarelli, “PR-TCAM: Efficient TCAM
[1] A. Madhavan, T. Sherwood, and D. B. Strukov, “High-throughput emulation on Xilinx FPGAs using partial reconfiguration,” IEEE Trans.
pattern matching with CMOL FPGA circuits: Case for logic-in-memory Very Large Scale Integr. (VLSI) Syst., vol. 27, no. 8, pp. 1952–1956,
computing,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 26, Aug. 2019.
no. 12, pp. 2759–2772, Dec. 2018. [25] I. Ullah, J.-S. Yang, and J. Chung, “ER-TCAM: A soft-error-
[2] R. Govindaraj and S. Ghosh, “Design and analysis of sttram-based resilient SRAM-based ternary content-addressable memory for FPGAs,”
ternary content addressable memory cell,” ACM J. Emerg. Technol. IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 28, no. 4,
Comput. Syst., vol. 13, no. 4, p. 52, 2017. pp. 1084–1088, Apr. 2020.
[3] Z. Ullah, M. K. Jaiswal, Y. C. Chan, and R. C. C. Cheung, “FPGA [26] Z. Qian and M. Margala, “Low power RAM-based hierarchical CAM on
implementation of SRAM-based ternary content addressable memory,” FPGA,” in Proc. Int. Conf. ReConFigurable Comput. FPGAs (ReCon-
in Proc. IEEE 26th Int. Parallel Distrib. Process. Symp. Workshops PhD Fig), Dec. 2014, pp. 1–4.
Forum, May 2012, pp. 383–389.
[4] Z. Ullah, M. K. Jaiswal, and R. C. C. Cheung, “E-TCAM: An efficient
SRAM-based architecture for TCAM,” Circuits, Syst., Signal Process.,
vol. 33, no. 10, pp. 3123–3144, Oct. 2014.
[5] Z. Ullah, M. K. Jaiswal, and R. C. C. Cheung, “Z-TCAM: An SRAM-
based architecture for TCAM,” IEEE Trans. Very Large Scale Integr.
(VLSI) Syst., vol. 23, no. 2, pp. 402–406, Feb. 2015.
[6] M. Irfan, Z. Ullah, and R. C. C. Cheung, “Zi-CAM: A power and
resource efficient binary content-addressable memory on FPGAs,” Elec-
tronics, vol. 8, no. 5, p. 584, May 2019.
[7] Z. Ullah, K. Ilgon, and S. Baeg, “Hybrid partitioned SRAM-based
ternary content addressable memory,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 59, no. 12, pp. 2969–2979, Dec. 2012. Azhar Qazi received the B.Sc. degree (Hons.) and
[8] Z. Ullah, M. K. Jaiswal, R. C. C. Cheung, and H. K. H. So, “UE- the M.S. degree in electrical engineering (commu-
TCAM: An ultra efficient SRAM-based TCAM,” in Proc. IEEE Region nication) from the University of Engineering and
Conf. (TENCON), Nov. 2015, pp. 1–6. Technology, Peshawar, Pakistan, in 2006 and 2014,
[9] S.-H. Yang, Y.-J. Huang, and J.-F. Li, “A low-power ternary content respectively. He is currently pursuing the Ph.D.
addressable memory with pai-sigma matchlines,” IEEE Trans. Very degree with the Department of Electrical Engineer-
Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 1909–1913, ing, CECOS University of IT and Emerging Sci-
Oct. 2012. ences, Peshawar.
[10] B.-D. Yang, Y.-K. Lee, S.-W. Sung, J.-J. Min, J.-M. Oh, and H.-J. Kang, His research area includes designing fast updating
“A low power content addressable memory using low swing search and mapping algorithms for static random access
lines,” IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 12, memory (SRAM)-based content-addressable mem-
pp. 2849–2858, Dec. 2011. ories (CAMs) on field-programmable gate array (FPGA).
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.
164 IEEE CANADIAN JOURNAL OF ELECTRICAL AND COMPUTER ENGINEERING, VOL. 44, NO. 2, SPRING 2021
Zahid Ullah (Member, IEEE) received the B.Sc. Abdul Hafeez received the Ph.D. degree from
degree (Hons.) in computer system engineering Virginia Tech, Blacksburg, VA, USA, in 2014, with a
from the University of Engineering and Technology, focus on high-performance computing and machine
Peshawar, Pakistan, in 2006, the M.S. degree in learning.
electronic, electrical, control, and instrumentation In his Ph.D. degree, he collaborated with The
engineering from Hanyang University, Seoul, South University of Texas at Arlington, Arlington, TX,
Korea, in 2010, and the Ph.D. degree in electronic USA, IBM Almaden, San Jose, CA, USA, and IBM
engineering from the City University of Hong Kong, Dublin, Dublin, Ireland, on how to leverage parallel
Hong Kong, in 2014. computing and machine learning for bionano sensing
He was an Associate Professor and the Chair- and protein simulations. He worked as an Adjunct
man of the Department of Electrical Engineering, Faculty Member with the Department of Computer
CECOS University of IT and Emerging Sciences, Peshawar. He is currently Science, Virginia Tech. In 2015, he was a Post-Doctoral Fellow with Georgia
an Assistant Professor and the Head of the Department of Electrical and Tech, Atlanta, GA, USA, where he focused on materials informatics to
Computer Engineering, Pak-Austria Fachhochschule: Institute of Applied establish an e-collaboration platform for data scientists, material scientists, and
Sciences and Technology, Haripur, Pakistan. He has authored prestigious manufacturing experts and worked as a Principal Investigator on the GT-FIRE
journal and conference papers and holds patents in his name in the field of project. Following to his Post-Doctoral Fellow, he joined the Department of
field-programmable gate array (FPGA)-based TCAM. His research interests Computer Systems Engineering, University of Engineering and Technology,
include low-power/high-speed content-addressable memory (CAM) design on Peshawar, Pakistan, as an Assistant Professor.
FPGA, reconfigurable computing, pattern recognition, embedded systems, and
image processing
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 14,2021 at 23:56:59 UTC from IEEE Xplore. Restrictions apply.