0% found this document useful (0 votes)
9 views11 pages

A Full-Stack View of Probabilistic Computing With P-Bits Devices Architectures and Algorithms

Uploaded by

cnavaneeth28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views11 pages

A Full-Stack View of Probabilistic Computing With P-Bits Devices Architectures and Algorithms

Uploaded by

cnavaneeth28
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

Received 30 January 2023; revised 20 February 2023; accepted 10 March 2023.


Date of publication 14 March 2023; date of current version 29 May 2023.
Digital Object Identifier 10.1109/JXCDC.2023.3256981

A Full-Stack View of Probabilistic Computing


With p-Bits: Devices, Architectures,
and Algorithms
SHUVRO CHOWDHURY 1 , ANDREA GRIMALDI 2 (Graduate Student Member, IEEE),
NAVID ANJUM AADIT 1 , SHAILA NIAZI 1 , MASOUD MOHSENI 3 , SHUN KANAI 4,5 ,
HIDEO OHNO 4,5 (Life Fellow, IEEE), SHUNSUKE FUKAMI 4,5 (Member, IEEE),
LUKE THEOGARAJAN 1 , GIOVANNI FINOCCHIO 2 (Senior Member, IEEE),
SUPRIYO DATTA 6 (Life Fellow, IEEE), and KEREM Y. CAMSARI 1 (Senior Member, IEEE)
1 Department of Electrical and Computer Engineering, University of California at Santa Barbara, Santa Barbara, CA 93106 USA
2 Department of Mathematical and Computer Sciences, Physical Sciences and Earth Sciences,
University of Messina, 98166 Messina, Italy
3 Google Quantum AI, Venice, CA 90291 USA
4 Research Institute of Electrical Communication, Tohoku University, Sendai 980-8577, Japan
5 Center for Science and Innovation in Spintronics, Tohoku University, Sendai 980-8577, Japan
6 Elmore Family School of Electrical and Computer Engineering, Purdue University, West Lafayette, IN 47907 USA

CORRESPONDING AUTHOR: S. CHOWDHURY ([email protected])


This work was supported in part by the U.S. National Science Foundation under Grant CCF 2106260, in part by the Office of Naval
Research Young Investigator Grant, and in part by the Semiconductor Research Corporation. The work of Shun Kanai was supported
by the Japan Science and Technology Agency (JST)-PRESTO under Grant JPMJPR21B2. The work of Shunsuke Fukami was supported in
part by JST-CREST under Grant JPMJCR19K3 and in part by MEXT Initiative to Establish Next-generation Novel Integrated Circuits Centers
(X-NICS) under Grant JPJ011438. The work of Andrea Grimaldi and Giovanni Finocchio was supported in part by ‘‘The Italian Factory of
Micromagnetic Modeling and Spintronics’’ funded by the Italian Ministry of University and Research (MUR) under Project PRIN
2020LWPKH7, and in part by the Petaspin Association (www.petaspin.com).

ABSTRACT The transistor celebrated its 75th birthday in 2022. The continued scaling of the transistor
defined by Moore’s law continues, albeit at a slower pace. Meanwhile, computing demands and energy
consumption required by modern artificial intelligence (AI) algorithms have skyrocketed. As an alternative
to scaling transistors for general-purpose computing, the integration of transistors with unconventional
technologies has emerged as a promising path for domain-specific computing. In this article, we provide
a full-stack review of probabilistic computing with p-bits as a representative example of the energy-efficient
and domain-specific computing movement. We argue that p-bits could be used to build energy-efficient
probabilistic systems, tailored for probabilistic algorithms and applications. From hardware, architecture,
and algorithmic perspectives, we outline the main applications of probabilistic computers ranging from prob-
abilistic machine learning (ML) and AI to combinatorial optimization and quantum simulation. Combining
emerging nanodevices with the existing CMOS ecosystem will lead to probabilistic computers with orders of
magnitude improvements in energy efficiency and probabilistic sampling, potentially unlocking previously
unexplored regimes for powerful probabilistic algorithms.

INDEX TERMS Artificial intelligence (AI), combinatorial optimization, domain-specific hardware,


machine learning (ML), p-bits, p-computers, quantum simulation, sampling, spintronics, stochastic magnetic
tunnel junctions (sMTJs).

I. INTRODUCTION The widespread implementation of AI, particularly in indus-

T HE slowing down of the Moore era of electronics has


coincided with the recent revolution in machine-learning
(ML) and artificial intelligence (AI) algorithms. In the
tries such as autonomous vehicles [2], is an indication that
the energy crisis caused by large-scale ML models is not just
a data center problem, but a global concern.
absence of steady transistor scaling and energy improve- Efforts of extending the Moore era of electron-
ments, training and maintaining large-scale ML models in ics by improving conventional transistor technology
data centers have become a significant energy concern [1]. continue vigorously. Examples of this approach include

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, NO. 1, JUNE 2023 1
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

3-D heterogeneous integration, 2-D materials for transis- III. FUNDAMENTALS OF p-COMPUTING
tors and interconnects [3], new transistor physics via neg- A large family of problems (see Fig. 2) can be encoded
ative capacitance [4], [5], or entirely new approaches to coupled p-bits evolving according to the following
using spintronic and magnetoelectric phenomena to build equations [12]:
energy-efficient switches [6], [7]. mi = sign tanh(β Ii ) − r[−1,+1]
 
(1)
A complementary approach to extending Moore’s law is X
to augment the existing CMOS ecosystem with emerging, Ii = Wij mj + hi (2)
nonsilicon nanotechnologies [8], [9]. One way to achieve j
this goal is through heterogeneous CMOS + X architectures,
where mi is defined as a bipolar variable (mi ∈ {−1, +1}),
where X stands for a CMOS-compatible nanotechnology.
r is a uniform random number drawn from the interval
For example, X can be magnetic, ferroelectric, memristive,
[−1, 1], [W ] is the coupling matrix between the p-bits, β is
or photonic systems. We also discuss an example of this
the inverse temperature, and {h} is the bias vector. In physical
complementary approach, the combination of CMOS with
implementations, it is often more convenient to represent
magnetic memory technology, purposefully modified to build
p-bits as binary variables, si ∈ {0, 1}. A straightforward
probabilistic computers.
conversion of (1) and (2) is possible using the standard trans-
formation, m → 2 s − 1 [18].
As stated, (1) and (2) do not place any restrictions on
[W ], which may be a symmetric or asymmetric matrix. If an
updated order of p-bits is specified, these equations take the
coupled p-bit system to a well-defined steady-state distri-
bution defined by the eigenvector (with eigenvalue +1) of
the corresponding Markov matrix [12]. Indeed, in the case
of Bayesian (belief) networks defined by a directed graph,
updating the p-bits from parent nodes to child nodes takes
FIGURE 1. Bit, p-bit, and qubit. Each column shows a schematic the system to a steady-state distribution corresponding to that
representation of the basic computational units of classical obtained from Bayes’ theorem [19].
computing (left), probabilistic computing (middle), and quantum
If the [W ] matrix is symmetric, one can define an energy,
computing (right). These are, respectively, the bit, the p-bit, and
the qubit.
E, whose negated partial derivative with respect to p-bit mi
gives rise to (2)
II. FULL-STACK VIEW AND ORGANIZATION  
Research on probabilistic computing with p-bits originated X X
E(m1 , m2 , . . . , ) = −  Wij mi mj + hi mi  . (3)
at the device and physics level, first with stable nanomag-
i<j i
nets [10], followed by low-barrier nanomagnets [11], [12].
In [12], the p-bit was formally defined as a binary stochastic In this case, the steady-state distribution of the network is
neuron realized in hardware. In both approaches with sta- described by [20]
ble and unstable nanomagnets, the basic idea is to exploit
the natural mapping between the intrinsically noisy physics 1
pi = exp (−βEi ) (4)
of nanomagnets to the mathematics of general probabilistic Z
algorithms [e.g., Monte Carlo, Markov Chain Monte Carlo also known as the Boltzmann law. As such, iterating a net-
(MCMC)]. Such a notion of natural computing where physics work of p-bits described by (1) and (2) eventually approx-
is matched to computation was clearly laid out by Feynman imates the Boltzmann distribution which can be useful for
[13] in his celebrated Simulating Physics with Computers probabilistic sampling and optimization. The approximate
talk. Subsequent work on p-bits defined it as an abstrac- sampling avoids the intractable problem of exactly calcu-
tion between bits and qubits (see Fig. 1) with the possi- lating Z . Remarkably, for such undirected networks, the
bility of different physical implementations. In addition to steady-state distribution is invariant with respect to the update
searching for energy-efficient realizations of single devices, order of p-bits, as long as connected p-bits are not updated
p-bit research has extended to finding efficient architec- at the same time (more on this later). This feature is highly
tures (through massive parallelization, sparsification [14], reminiscent of natural systems where asynchronous dynam-
and pipelining [15]) along with the identification of promis- ics make parallel updates highly unlikely and the update
ing application domains. This full-stack research program order does not change the equilibrium distribution. Indeed,
covering hardware, architecture, algorithms, and applications this gives the hardware implementation of asynchronous net-
is similar to the related field of quantum computation where works of p-bits massive parallelism and flexibility in design.
a large degree of interdisciplinary expertise is required to The energy functional defined by (3) is often the starting
move the field forward (see the related reviews [16], [17]). point of discussions in the related field of Ising machines [21],
The purpose of this article is to serve as a consolidated sum- [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32],
mary of recent developments with new results in hardware, [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43]
architectures, and algorithms. We provide concrete and pre- with different implementations (see [44] for a comprehensive
viously unpublished examples of ML and AI, combinatorial review). In the case of p-bits, however, we view (1) and (2)
optimization, and quantum simulation with p-bits (see Fig. 2). more fundamental than (3) because the former can also be

2 VOLUME 9, NO. 1, JUNE 2023


Chowdhury et al.: Full-Stack View of Probabilistic Computing With p-Bits: Devices, Architectures, and Algorithms

FIGURE 2. Applications of probabilistic computing. Potential applications of p-bits are illustrated. The list broadly includes problems
in combinatorial optimization, probabilistic ML, and quantum simulation.

used to approximate hard inference on directed networks, They also serve a useful purpose to illustrate why analog
while the latter always relies on undirected networks. Com- or mixed-signal implementations of p-bits with nanodevices
pared to undirected networks using Ising machines, work on are necessary. Even using some of the most advanced field
directed neural networks for Bayesian inference has been programmable gate arrays (FPGAs), the footprint of a digi-
relatively scarce, although there are exciting developments tal p-bit is very large: Synthesizing such digital p-bits with
[19], [45], [46], [47], [48], [49], [50]. PRNGs of varying quality of randomness results in tens of
Finally, the form of (3) restricts the type of interac- thousands of individual transistors. In single FPGAs that do
tions between p-bits to a linear one since the energy is not use time-division multiplexing of p-bits or off-chip mem-
quadratic. Even though higher-order interactions (k-local) ory, only about 10 000–20 000 p-bits with 100 000 weights
between p-bits are possible [18] (also discussed in the (sparse graphs with degree 5–10) fit, even within high-end
context of Ising machines [51], [52]), such higher-order devices [14].
interactions can always be constructed by combining a stan- On the other hand, using nanodevices such as CMOS-
dard probabilistic gate set at the cost of extra p-bits. In our compatible stochastic magnetic tunnel junctions (sMTJs),
view, in the case of electronic implementation with scalable millions of p-bits can be accommodated in single cores
p-bits, trading an increased number of p-bits for simplified due to the scalability achieved by the magnetoresistive
interconnect complexity is almost always favorable. That random access memory (MRAM) technology, exceeding
being said, algorithmic advantages and the better represen- 1-Gb MRAM chips [56], [57]. However, before the stable
tative capabilities of higher-order interactions are actively MTJs can be controllably made stochastic, challenges at
being explored [51], [53]. the material and device level must be addressed [58], [59]
with careful magnet designs [60], [61], [62]. Different fla-
IV. HARDWARE: PHYSICAL IMPLEMENTATION OF vors of magnetic p-bits exist [63], [64], [65], [66], for a
p-BITS recent review (see [67]). Unlike synchronous or trial-based
A. p-BITS stochasticity (see [68]) that requires continuous resetting,
The p-bit defined in (1) describes a tunable and discrete ran- the temporal noise of low-barrier nanomagnets makes them
dom number generator. Its physical implementation includes ideally suited to build autonomous, physics-inspired prob-
a broad range of options from noisy materials to ana- abilistic computers, providing a constant stream of tunably
log and digital CMOS (see Fig. 3). The digital CMOS random bits [69]. Following earlier theoretical predictions
implementations of p-bits often consist of a pseudoran- [70], [71], [72], recent breakthroughs in low-barrier mag-
dom number generator (PRNG) (r), a lookup table for the nets have shown great promise, using stochastic MTJs with
activation function (tanh), and a threshold to generate a one- in-plane anisotropy where fluctuations can be of the order
bit output. Digital input with a specified fixed point precision of nanoseconds [73], [74], [75]. Such near-zero barrier nano-
(e.g., ten bits with one sign, six integers, and three fractional) magnets should be more tolerant to device variations because
provides tunability through the activation function. Digital when the energy-barrier 1 is low, the usual exponential
p-bits have been very useful in prototyping probabilistic dependence of fluctuations is much less pronounced. These
computers up to tens of thousands of p-bits [14], [54], [55]. stochastic MTJs may be used in electrical circuits with a few
VOLUME 9, NO. 1, JUNE 2023 3
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

B. SYNAPSE
The second central part of the p-computer architecture is the
synapse, denoted by (2). Much like the hardware p-bit, there
are several different implementations of synapses ranging
from digital CMOS, analog/mixed-signal CMOS, as well as
resistive [85] or capacitors crossbars [86], [87]. The synap-
tic equation looks like the traditional matrix–vector prod-
uct (MVP) commonly used in ML models today, however,
there is a crucial difference, thanks to the discrete p-bit
output (0 or 1), the MVP operation is simply an addi-
tion over the active neighbors of a given p-bit. This makes
the synaptic operation simpler than continuous multiplica-
tion and significantly simplifies digital synapses. In ana-
log implementations, the use of in-memory computing
techniques through charge accumulation could be useful
with the added simplification of digital outputs of p-bits
[88], [89].
It is important to note how the p-bit and the neuron for
eventually integrated p-bit applications can be mixed and
FIGURE 3. Different hardware options for building a probabilistic matched, as an example of creatively combining these pieces
computer. Top: Various magnetic implementations of a p-bit. (see the FPGA-stochastic MTJ combination reported in [77]).
These include both digital (CMOS) and mixed-signal The best combination of scalable p-bits and synapses may
implementations (based on, e.g., sMTJ with low-barrier lead to energy-efficient and large-scale p-computers. At this
magnets). Bottom: Hybrid of classical and probabilistic
time, various possibilities exist with different technological
computing schemes is shown where the classical computer
generates weights and programs the probabilistic computer.
maturity.
The probabilistic computer then generates samples accordingly
with high throughput and sends them back to the classical V. ARCHITECTURE CONSIDERATIONS
computer for further processing. Like the building blocks of A. GIBBS SAMPLING WITH p-BITS
p-bits, the synapse of the probabilistic computer can be
The dynamical evolution of (1) and (2) relies on an iterated
designed in several ways, including digital, analog, and a mix of
both techniques. updating scheme where each p-bit is updated one after the
other based on a predefined (or random) update order. This
iterative scheme is called Gibbs sampling [90], [91]. Virtually
all applications discussed in Fig. 2 benefit from accelerating
additional transistors (see Fig. 3) to build hardware p-bits. Gibbs sampling, attesting to its generality.
Two flavors of stochastic MTJ-based p-bits were proposed In a standard implementation of Gibbs sampling in a syn-
in [12] (spin orbit torque (SOT)-based) and in [70] (spin chronous system, p-bits will be updated one by one at every
transfer torque (STT)-based). Both these p-bits have now clock cycle as shown in Fig. 4(a). It is crucial to ensure that
been experimentally demonstrated in [18], [76], and [77] the effective input each p-bit receives through (2) is computed
(STT) and in [78] (SOT). While many other implementations before the p-bit updates. As such, Tclk has to be longer than
of p-bits are possible, from molecular nanomagnets [79] to the time it takes to compute (2). In this setting, a graph with
diffusive memristors [80], resistive random access memory N p-bits will require N clock cycles (NTclk ) to perform a
(RRAM) [81], perovskite nickelates [82], and others, two complete sweep, where Tclk is the clock period. This require-
additional advantages of the MRAM-based p-bits are the ment makes Gibbs sampling a fundamentally serial and slow
proven manufacturability (up to billion-bit densities) and process.
the amplification of room-temperature noise. Even with the A much more effective approach is possible by the fol-
thermal energy of kT in the environment, magnetic switching lowing observation: even though updates between connected
causes large resistance fluctuations in MTJs, creating hun- p-bits need to be sequential, if two p-bits are not directly
dreds of millivolts of change in resistive dividers [70]. Typical connected, updating one of them does not directly change the
noise on resistors (or memristors) is limited by the (kT /C)1/2 input of the other through (2). Such p-bits can be updated
limit which is far lower (millivolts) even at extremely low in parallel without any approximation. Indeed, one motiva-
capacitances (C). This feature of stochastic MTJs ensures tion of designing restricted Boltzmann machines (RBMs;
that they do not require explicit amplifiers [83] at each p-bit, see [92]) over unrestricted BMs is to exploit this paral-
which can become prohibitively expensive in terms of area lelism: RBMs consist of separate layers (bipartite) that can be
and power consumption. Estimates of sMTJs-based p-bits updated in parallel. However, this idea can be taken further.
suggest that they can create a random bit using 2 fJ per If the underlying graph is sparse, it is often easy to split it into
operation [18]. Recently, a CMOS-compatible single-photon disconnected chunks by coloring the graph using a few colors.
avalanche diode-based implementation of p-bits showed sim- Even though finding the minimum number of colors is an
ilar, amplifier-free operation [84] and the search for the most NP-hard problem [93], heuristic coloring algorithms (such as
scalable, energy-efficient hardware p-bit using alternative Dsatur [94]) with polynomial complexity can color the graph
phenomena continues. very quickly, without necessarily finding a minimum. In this

4 VOLUME 9, NO. 1, JUNE 2023


Chowdhury et al.: Full-Stack View of Probabilistic Computing With p-Bits: Devices, Architectures, and Algorithms

FIGURE 4. Architectures of p-computer. (a) Synchronous Gibbs: all p-bits are updated sequentially. N p-bits need N clock cycles
(NTclk ) to perform a complete sweep, Tclk being the clock period. (b) Pseudo-asynchronous Gibbs: a sparse network can be colored
into a few disjoint blocks where connected p-bits are assigned a different color. Phase-shifted clocks update the color blocks one
after the other. N p-bits need ≈ one clock cycle Tclk to perform a complete sweep, reducing O(N) complexity of a sweep to O(1), where
we assume the number of colors c ≪ N. (c) Truly asynchronous Gibbs: a hardware p-bit (e.g., a stochastic MTJ-based p-bit) provides
an asynchronous and random clock with period ⟨Tp-bit ⟩. N p-bits need approximately one clock to perform a complete sweep, as long
as synapse time is less than the clock on average. No graph coloring or engineered phase shifting is required.

context, obtaining the minimum coloring is not critical, and avoided with careful phase-shifting [98]. The main appeal
sparse graphs typically require a few colors. of the truly asynchronous Gibbs sampling is the lack of any
Such an approach was taken on sparse graphs (with no graph coloring and phase shift engineering while retaining
regular structure) to design a massively parallel implemen- the same massive parallelism as N p-bits requires approx-
tation of Gibbs sampling in [14] [see Fig. 4(b)]. Connected imately a single ⟨Tp-bit ⟩ to complete a sweep. Given that
p-bits are assigned a different color, and unconnected p-bits the FPGA-based p-computers already provide about a 10×
are assigned the same color. Equally phase-shifted and same- improvement in sampling throughput to optimized TPU and
frequency clocks update the p-bits in each color block one GPUs [14], such asynchronous systems are promising in
by one. In this approach, a graph with N p-bits requires terms of scalability. Stochastic MTJ-based p-bits should be
only one clock cycle (Tclk ) to perform a complete sweep, able to reach high densities on a single chip. Around 20 W
reducing O(N ) complexity for a full sweep to O(1), assuming of projected power consumption can be reached considering
the number of colors is much less than N . Therefore, the key 20-µW p-bit/synapse combinations at 1M p-bit density [54],
advantage of this approach is that the p-computer becomes [60], [99]. The ultimate scalability of magnetic p-bits is a
faster with larger graphs since probabilistic ‘‘flips per sec- significant advantage over alternative approaches based on
ond,’’ a key metric measured by tensor processing unit (TPU) electronic or photonic devices.
and GPU implementations [95], [96] linearly increases with
the number of p-bits. It is important to note that these TPU
and GPU implementations also solve Ising problems in sparse B. SPARSIFICATION
graphs, however, their graph degrees are usually restricted Both the pseudo-asynchronous and the truly asynchronous
to 4 or 6, unlike more irregular and higher degree graphs parallelisms require sparse graphs to work well. The first
implemented in [14]. problem is the number of colors: if the graph is dense,
We term this graph-colored architecture the pseudo- it requires a lot of colors, making the architecture very similar
asynchronous Gibbs because while it is technically syn- to the standard serial Gibbs sampling.
chronized to out-of-phase clocks, it embodies elements of The second problem with a dense graph is the synapse time
the truly asynchronous architecture we discuss next. While tsynapse . If many p-bits have a lot of neighbors, the synapse
graph coloring algorithmically increases sampling rates by unit needs to compute a large sum before the next update.
a factor of N , it still requires a careful design of out-of- If the time between two consecutive updates is ⟨Tp-bit ⟩,
phase clocks. A much more radical approach is to design it requires tsynapse ≪ ⟨Tp-bit ⟩ to avoid information loss and
a truly asynchronous Gibbs sampler as shown in Fig. 4(c). reach the correct steady-state distribution [54], [101].
Here, the idea is to have hardware building blocks with However, if the graph is sparse, each p-bit has fewer con-
naturally asynchronous dynamics, such as an sMTJ-based nections, and the updates can be faster without any dropped
p-bit. In such a p-bit, there exists a natural ‘‘clock,’’ ⟨Tp-bit ⟩, messages. Any graph can be sparsified using the tech-
defined by the average lifetime of a Poisson process [97]. nique proposed in [14], similar in spirit to the minor-graph
As long as ⟨Tp-bit ⟩ is not faster than the average synapse time embedding (MGE) approach pioneered by D-Wave [102],
(tsynapse ) to calculate (2), the network still updates N spins in even though the objective here is to not find an embedding
a single ⟨Tp-bit ⟩ timescale. This is because the probability of but to sparsify an existing graph. The key idea is to split p-bits
simultaneous updates is extremely low in a Poisson process into different copies, using ferromagnetic COPY gates. These
and further reduced in highly sparse graphs. p-bits distribute the original connections among them, result-
In fact, preliminary experiments implementing such truly ing in identical copies with fewer connections. An important
asynchronous p-bits with ring-oscillator activated clocks point is that the ground state of the original graph remains
show that despite making occasional parallel updates, the unchanged [14], so the method does not involve approxi-
asynchronous p-computer performs similarly compared to mations, unlike other sparsification techniques, for example,
the pseudo-asynchronous system where incorrect updates are based on low-rank approximations [103].

VOLUME 9, NO. 1, JUNE 2023 5


IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

Fig. 6. In Fig. 6, the steps of the invertible logic encoding of


three combinatorial optimization problems, maximum satis-
fiability (left column), number partitioning (middle column),
and knapsack (right column), are shown. First, each problem
is formalized into a tight mathematical formulation (top row).
Next, the problem is mapped into an invertible Boolean logic
circuit (central row), meaning that each logic gate can be
operated using any terminals as input–output nodes (similar
to those discussed in the context of quantum annealing [105],
[106] and memcomputing [107], [108]). Finally, the proba-
bilistic circuit is algorithmically encoded into an Ising model
(bottom row). Each logic gate has several Ising encodings
that map the energy landscape of its logic operator. After
the Boolean logic formulation of a problem, this step can
be automated in standardized synthesis tools. The overall
approach results in relatively sparse circuits, as illustrated
in the bottom row of Fig. 6, where all three problems show
FIGURE 5. (a) Original graph of a 3SAT instance uf20-01.cnf [100] similarly sparse matrices [W ], with bias vectors {h} shown
having 112 p-bits and a graph density of 6.99%. (Graph density, under.
ρ = 2|E|/(|V |2 − |V |), where |E| is the number of edges and |V | The key advantage of this approach, compared to heuristic
is the number of vertices in the graph.) Some of the p-bits have and dense formulations of [104], is due to the generality of
many local neighbors up to 101 neighbors as shown in the Boolean logic, quite similar to how present-day digital VLSI
histogram that slows down the synapse and the p-bits need to
circuits are constructed in sparse, hardware-aware networks
update slowly. (b) Sparsified graph of the same instance having
410 p-bits. COPY gates are inserted between each pair of copies
using billions of transistors. As such, much of the existing
of the same p-bits (COPY edges are highlighted in orange). The ecosystem of high-level synthesis can be directly used to
graph has a density of 0.95% and the maximum number of find invertible logic-based encodings for general optimiza-
neighbors is limited to 4. The synapse operations are now tion problems.
faster and hence the p-bits can be updated faster. Even though
the example shown here starts from a low-density graph, the B. MACHINE LEARNING: ENERGY-BASED MODELS
sparsification algorithm we give is general and applicable to Energy-efficient ML with BMs is a promising application for
any graph. probabilistic computers with a recent experimental demon-
stration in [76]. Mainstream ML algorithms are designed and
Fig. 5(a) shows an example of this process where the chosen with CPU implementation in mind and hence some
original graph of a satisfiability (3SAT) instance has been models are heavily preferred over others even though they are
sparsified as shown in Fig. 5(b). Irrespective of the input often less powerful. For example, the use of RBMs over the
graph size, a sparsified graph has fewer connections locally more powerful unrestricted or deep BMs is motivated by the
and thus the neurons hardly ever need to be slowed down. former’s efficient software implementation in synchronous
One disadvantage of this technique is the increased number of systems. However, by exploiting the technique of sparsity
p-bits; however, the reduced synapse complexity and the pos- and massively parallel architecture described earlier, fast
sibility of massive parallelization outweigh the costs incurred Gibbs sampling with deep Boltzmann machines (DBMs) can
by additional p-bits, which we consider to be cheap in scaled, dramatically improve state-of-the-art ML applications like
nanodevice-based implementations. visual object recognition and generation, speech recognition,
autonomous driving, and many more [109]. Here, we present
VI. ALGORITHMS AND APPLICATIONS an example where a sparse DBM is trained with MNIST
A. COMBINATORIAL OPTIMIZATION VIA INVERTIBLE handwritten digits (see Fig. 7). We randomly distribute the
LOGIC visible and hidden units on the sparse DBM with massively
When using the Ising model to solve an optimization prob- parallel pseudo-asynchronous architecture that yields multi-
lem, the first step is to provide a mapping between the Ising ple hidden layers as shown in Fig. 7(c).
model and the problem to be solved. Early work on quantum Contrasting with earlier unconventional computing
annealing stimulated by D-Wave’s quantum annealers gen- approaches where the MNIST dataset is reduced to much
erated a significant amount of useful research in this area smaller sizes [110], [111], we show how the full MNIST
[104], some of which are being adopted by quantum-inspired dataset (60 000 images and no downsampling) can be trained
classical annealers. There are usually many different ways to using p-computers in FPGAs. We use 1200 mini-batches
find a mapping, for example, some strategies may employ having 50 images in each batch to train the network using
more nodes than others to encode the same instance, while the contrastive divergence (CD) algorithm. The process
others might result in graphs with topology unsuited to the of learning is accomplished using a hybrid probabilistic
computational architecture of choice. In this context, the and classical computer setup. The classical computer com-
invertible logic approach introduced in [12] stands out for its putes the gradients and generates new weights, while the
flexibility and sparse encodings. p-computer generates samples according to those weights
The process of mapping an instance into an Ising model [see Fig. 7(b)]. During the positive phase of sampling, the
can be broadly summarized into three steps, as illustrated by p-computer operates in its clamped condition under the direct

6 VOLUME 9, NO. 1, JUNE 2023


Chowdhury et al.: Full-Stack View of Probabilistic Computing With p-Bits: Devices, Architectures, and Algorithms

FIGURE 6. Invertible logic encoding. The encoding process of three optimization problems—(a) maximum satisfiability problem, (b)
number partitioning, and (c) knapsack problem—is streamlined and visually summarized into three steps: (1) problem first has to be
condensed into a concise mathematical formulation; then, (2) invertible Boolean circuit that topologically maps the problem is
conceived; finally, (3) invertible Boolean circuit is converted into an Ising model using probabilistic and/or/not gates [14].

influence of the training samples. In the negative phase, the


p-computer is allowed to run freely without any environ-
mental input. After training, the deep network not only can
classify images, but also generate images. For any given
label, the network can create a new sample (not present in
the training set) [see Fig. 7(d)]. This is an important feature
of energy-based models and is commonly demonstrated with
diffusion models [112].

C. QUANTUM SIMULATION
One primary motivation for building quantum computers is
to simulate large quantum many-body systems and under-
stand the exotic physics offered by them [113]. Two major
challenges with quantum computers are the necessity of
using cryogenic operating temperatures and the vulnera-
bility to noise, rendering quantum computers impractical,
especially considering practical overheads [114]. Simu-
lating these systems with classical computers is often
extremely time-consuming and mostly limited to small sys-
tems. One potential application of p-bits is to provide a
room-temperature solution to boost the simulation speed and
potentially enable the simulation of large-scale quantum sys-
tems. Significant progress has been made toward this end in
recent years.

1) SIMULATING QUANTUM SYSTEMS WITH


TROTTERIZATION
One approach is to build a p-computer enabling the scal-
FIGURE 7. Generative neural networks with p-bits. (a) Hybrid
able simulation of sign-problem-free quantum systems by computing scheme with a probabilistic computer and a
accelerating standard Quantum Monte Carlo (QMC) tech- classical computer is demonstrated where the probabilistic
niques [115]. The basic idea is to replace the qubits in the computer generates samples according to the weights given by
original lattice with hardware p-bits and replicate the new the CPU with a sampling speed of around 100 flips/ns.
lattice according to the Suzuki–Trotter transformation [116]. (b) Overview of the learning procedure for the hybrid setup.
Receiving the samples from the probabilistic computer, the
Recently, the convergence time of a 2-D square-octagonal
CPU computes the gradient, updates the weights and biases,
qubit lattice initially prepared in a topologically obstructed and sends them back to the probabilistic computer until
state was compared among a CPU, a physical quantum converged. (c) Sparse DBM is utilized here as a hardware-aware
annealer [117], and a p-computer (both digital and analog) graph that can be represented with multiple hidden layers of
[118]. For this particular problem, it was shown that an p-bits. Both interlayer and intralayer connections are allowed
between visible and hidden units. (d) Images shown here are
FPGA-based p-computer emulator can be around 1000 times
generated with a sparse DBM of 4264 p-bits after training the
faster than an optimized C++ (CPU) program. Based on network with full MNIST. The label p-bits are clamped to a
SPICE simulations of a small p-computer, we project that specific image and the network evolves to that image by
significant further acceleration should be possible with a annealing the system from β = 0 to β = 5 with a step size
truly asynchronous implementation. Probabilistic computers of 0.125.

VOLUME 9, NO. 1, JUNE 2023 7


IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

FIGURE 8. ML quantum systems with p-bits. (a) Heisenberg Hamiltonian with a transverse field (0 = + 1) is applied to an FM coupled
(JZ = + 1 and Jxy = + 0.5) linear chain of 12 qubits with periodic boundary. (b) To obtain the ground state of this quantum system,
an RBM is employed with 12 visible and 48 hidden nodes, where all nodes in the visible layer are connected to all nodes in the hidden
layer. (c) This ML model is then embedded into a hardware-amenable sparse p-bit network arranged in a chimera graph using MGE.
We use a coupling strength of 1.0 among the replicated visible and hidden nodes in the embedded p-bit network. (d) Overview of the
ML algorithm and the division of workload between the probabilistic and classical computers in a hybrid setting is shown. (e) FPGA
emulation of this probabilistic computer performs variational ML in tandem with a classical computer, converging to the
quantum (exact) result as shown.

can be used for quantum Hamiltonians beyond the usual Hamiltonian. Emulating this variational ML approach with
Transverse Field Ising Model, such as the antiferromagnetic p-bits requires a few more steps. An RBM network contains
Heisenberg Hamiltonian [119] and even for the emulation of all-to-all connections between the visible and hidden layers
gate-based quantum computers [120]. However, for generic which are not conducive for scalable p-computers because
Hamiltonians (e.g., random circuit sampling), the number of of the large fan-out demanded by the all-to-all connectivity.
samples required in naive implementations seems to grow An alternative is to map the RBM onto a sparse graph through
exponentially [120] due to the notorious sign-problem [121]. MGE [102]. Using a hybrid setup with fast sampling in a
However, clever basis transformations [122] might mitigate probabilistic computer coupled with a classical computer, the
iterative process of sampling and weight updating can then be
or cure the sign problem [123] in the future.
performed. The key advantage of having a massively parallel
and fast sampler is the selection of higher-quality states of
2) MACHINE-LEARNING QUANTUM MANY-BODY the wave function to update the variational guess. Fig. 8
SYSTEMS shows an example simulation of how a p-computer learns the
With the great success of ML and AI algorithms, training ground state of a 1-D FM Heisenberg model. The scaling of
stochastic neural networks (such as BMs) to approximately p-computers using magnetic p-bits may allow much larger
solve the quantum many-body problem starting from a vari- implementations of quantum systems in the future.
ational guess has generated great excitement [124], [125],
[126] and is considered to be a fruitful combination of quan-
D. OUTLOOK: ALGORITHMS AND APPLICATIONS
tum physics and ML [127]. These algorithms are typically
BEYOND
implemented in high-level software programs, allowing users
to choose from various network models and sizes according Despite the large range of applications we discussed in the
to their needs. However, as with classical ML, the difficulty context of p-bits, much of the sampling algorithms have been
of training strongly hinders the use of deeper and more either standard MCMC or generic simulated annealing-based
general models. With scaled p-computers using millions of approaches. Future possibilities involve more sophisticated
magnetic p-bits, massively parallel and energy-efficient hard- sampling and annealing algorithms such as parallel tempering
ware implementations of the more general unrestricted/deep (PT) (see [128], [129] for some initial investigations). Further
BMs may become feasible, paving the way to simulate prac- improvements to hardware implementation include adaptive
tical quantum systems. versions of PT [130] as well as sophisticated nonequilibrium
To demonstrate one such example of this approach, Monte Carlo (NMC) algorithms [131]. Ideas involving over-
we show how p-bits laid out in sparse, hardware-aware graphs clocking p-bits such that they violate the tsynapse ≪ ⟨Tp-bit ⟩
can be used for ML quantum systems (see Fig. 8). The objec- requirement for further improvement [14] or sharing synaptic
tive of this problem is to find the ground state of a many-body operations between p-bits [82] could also be useful. A com-
quantum system, in this case, a 1-D FM Heisenberg Hamil- bination of these ideas with algorithm-architecture-device
tonian with an external transverse field. We start with an codesigns may lead to orders of magnitude improvement
RBM, which is one of the simplest neural network models, in sampling speeds and quality. In this context, as a sam-
and use its functional form as the variational guess for the pling throughput metric, increasing flips/ns is an important
ground state probabilities (the wave function is obtained by goal. In addition, solution quality, the possibility of cluster
taking the square root of probabilities according to the Born updates or algorithmic techniques also need to be considered
rule). A combination of probabilistic sampling and weight carefully. Given the plethora of approaches from multiple
updates gradually adjusts the variational guess such that communities, we also hope that model problems, benchmark-
the final guess points to the ground state of the quantum ing studies comparing different Ising machines, probabilistic

8 VOLUME 9, NO. 1, JUNE 2023


Chowdhury et al.: Full-Stack View of Probabilistic Computing With p-Bits: Devices, Architectures, and Algorithms

accelerators, physical annealers, and dynamical solvers will [20] E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines: A
be performed in the near future by all practitioners, including Stochastic Approach to Combinatorial Optimization and Neural Comput-
ing. Hoboken, NJ, USA: Wiley, 1989.
ourselves.
[21] A. Houshang, M. Zahedinejad, S. Muralidhar, J. Checinski, A. A. Awad,
We believe that the codesign of algorithms, architec- and J. Åkerman, ‘‘A spin Hall Ising machine,’’ 2020, arXiv:2006.02236.
tures, and devices for probabilistic computing may not [22] Y. Su, J. Mu, H. Kim, and B. Kim, ‘‘A scalable CMOS Ising computer
only help mitigate the looming energy crisis of ML and featuring sparse and reconfigurable spin interconnects for solving com-
AI, but also lead to systems that may unlock previously binatorial optimization problems,’’ IEEE J. Solid-State Circuits, vol. 57,
no. 3, pp. 858–868, Mar. 2022.
inaccessible regimes using powerful probabilistic (random-
[23] S. Bhanja, D. K. Karunaratne, R. Panchumarthy, S. Rajaram, and
ized) algorithms [132]. Just as the emergence of power- S. Sarkar, ‘‘Non-Boolean computing with nanomagnets for computer
ful GPUs made the well-known backpropagation algorithm vision applications,’’ Nature Nanotechnol., vol. 11, no. 2, pp. 177–183,
flourish, probabilistic computers could lead us to previously Feb. 2016.
[24] P. Debashis, R. Faria, K. Y. Camsari, J. Appenzeller, S. Datta, and
unknown territory of energy-based AI models, combinatorial Z. Chen, ‘‘Experimental demonstration of nanomagnet networks as hard-
optimization, and quantum simulation. This research pro- ware for Ising computing,’’ in IEDM Tech. Dig., Dec. 2016, pp. 3–34.
gram requires a concerted effort and interdisciplinary exper- [25] P. L. McMahon et al., ‘‘A fully programmable 100-spin coherent Ising
tise from all across the stack and ties into the larger vision of machine with all-to-all connections,’’ Science, vol. 354, pp. 614–617,
Nov. 2016.
unconventional computing forming in the community [133]. [26] S. Dutta et al., ‘‘An Ising Hamiltonian solver based on coupled stochas-
tic phase-transition nano-oscillators,’’ Nature Electron., vol. 4, no. 7,
ACKNOWLEDGMENT pp. 502–512, Jul. 2021.
[27] J. Chou, S. Bramhavar, S. Ghosh, and W. Herzog, ‘‘Analog coupled oscil-
Supriyo Datta has a financial interest in Ludwig Computing. lator based weighted Ising machine,’’ Sci. Rep., vol. 9, no. 1, pp. 1–10,
Oct. 2019.
REFERENCES [28] M. Yamaoka et al., ‘‘A 20 k-spin Ising chip to solve combinatorial opti-
mization problems with CMOS annealing,’’ IEEE J. Solid-State Circuits,
[1] D. Patterson et al., ‘‘Carbon emissions and large neural network training,’’ vol. 51, no. 1, pp. 303–309, Dec. 2015.
2021, arXiv:2104.10350. [29] T. Wang and J. Roychowdhury, ‘‘OIM: Oscillator-based Ising machines
[2] S. Sudhakar, V. Sze, and S. Karaman, ‘‘Data centers on wheels: Emissions for solving combinatorial optimisation problems,’’ in Proc. Int. Conf.
from computing onboard autonomous vehicles,’’ IEEE Micro, vol. 43, Unconventional Comput. Natural Comput. Cham, Switzerland: Springer,
no. 1, pp. 29–39, Jan. 2023. 2019, pp. 232–256.
[3] R. Chau, ‘‘Process and packaging innovations for Moore’s law continua- [30] Y. Shim, A. Jaiswal, and K. Roy, ‘‘Ising computation based combinatorial
tion and beyond,’’ in IEDM Tech. Dig., Dec. 2019, p. 1. optimization using spin-Hall effect (SHE) induced stochastic magnetiza-
[4] J. C. Wong and S. Salahuddin, ‘‘Negative capacitance transistors,’’ Proc. tion reversal,’’ J. Appl. Phys., vol. 121, no. 19, 2017, Art. no. 193902.
IEEE, vol. 107, no. 1, pp. 49–62, Jan. 2019. [31] T. Inagaki et al., ‘‘A coherent Ising machine for 2000-node optimization
[5] M. A. Alam, M. Si, and P. D. Ye, ‘‘A critical review of recent progress on problems,’’ Science, vol. 354, no. 6312, pp. 603–606, Nov. 2016.
negative capacitance field-effect transistors,’’ Appl. Phys. Lett., vol. 114, [32] M. Baity-Jesi et al., ‘‘Janus II: A new generation application-driven com-
no. 9, Mar. 2019, Art. no. 090401. puter for spin-system simulations,’’ Comput. Phys. Commun., vol. 185,
[6] S. Manipatruni et al., ‘‘Scalable energy-efficient magnetoelectric no. 2, pp. 550–559, Feb. 2014.
spin–orbit logic,’’ Nature, vol. 565, no. 7737, pp. 35–42, Dec. 2018. [33] M. Yamaoka, C. Yoshimura, M. Hayashi, T. Okuyama, H. Aoki, and
[7] P. Debashis et al., ‘‘Low-voltage and high-speed switching of a magne- H. Mizuno, ‘‘24.3 20 k-spin Ising chip for combinational optimization
toelectric element for energy efficient compute,’’ in IEDM Tech. Dig., problem with CMOS annealing,’’ in IEEE Int. Solid-State Circuits Conf.
Dec. 2022, pp. 34–36. (ISSCC) Dig. Tech. Papers, Feb. 2015, pp. 1–3.
[8] A. Chen, ‘‘Emerging research device roadmap and perspectives,’’ in Proc. [34] N. G. Berloff et al., ‘‘Realizing the classical XY Hamiltonian in polariton
IEEE Int. Conf. IC Design Technol., May 2014, pp. 1–4. simulators,’’ Nature Mater., vol. 16, no. 11, pp. 1120–1126, Nov. 2017.
[35] T. Takemoto, M. Hayashi, C. Yoshimura, and M. Yamaoka,
[9] G. Finocchio, M. Di Ventra, K. Y. Camsari, K. Everschor-Sitte,
‘‘A 2 × 30 k-spin multichip scalable annealing processor based on
P. K. Amiri, and Z. Zeng, ‘‘The promise of spintronics for uncon-
a processing-in-memory approach for solving large-scale combinatorial
ventional computing,’’ J. Magn. Magn. Mater., vol. 521, Mar. 2021,
optimization problems,’’ in IEEE Int. Solid-State Circuits Conf. (ISSCC)
Art. no. 167506.
Dig. Tech. Papers, Feb. 2019, pp. 52–54.
[10] B. Behin-Aein, V. Diep, and S. Datta, ‘‘A building block for hardware
[36] H. Goto, K. Tatsumura, and A. R. Dixon, ‘‘Combinatorial optimization
belief networks,’’ Sci. Rep., vol. 6, no. 1, pp. 1–10, Jul. 2016.
by simulating adiabatic bifurcations in nonlinear Hamiltonian systems,’’
[11] B. Sutton, K. Y. Camsari, B. Behin-Aein, and S. Datta, ‘‘Intrinsic opti- Sci. Adv., vol. 5, no. 4, Apr. 2019.
mization using stochastic nanomagnets,’’ Sci. Rep., vol. 7, p. 44370, [37] M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa, H. Tamura, and
Mar. 2017. H. Katzgrabeer, ‘‘Physics-inspired optimization for quadratic uncon-
[12] K. Y. Camsari, R. Faria, B. M. Sutton, and S. Datta, ‘‘Stochastic p-bits for strained problems using a digital annealer,’’ Frontiers Phys., vol. 7, p. 48,
invertible logic,’’ Phys. Rev. X, vol. 7, no. 3, 2017, Art. no. 031014. Apr. 2019.
[13] R. P. Feynman, ‘‘Simulating physics with computers,’’ Int. J. Theor. Phys., [38] A. Mallick, M. K. Bashar, D. S. Truesdell, B. H. Calhoun, S. Joshi,
vol. 21, nos. 6–7, pp. 467–488, 1982. and N. Shukla, ‘‘Using synchronized oscillators to compute the max-
[14] N. A. Aadit et al., ‘‘Massively parallel probabilistic computing with imum independent set,’’ Nature Commun., vol. 11, no. 1, pp. 1–7,
sparse Ising machines,’’ Nature Electron., pp. 1–9, 2022. Sep. 2020.
[15] J. Kaiser, R. Jaiswal, B. Behin-Aein, and S. Datta, ‘‘Benchmarking a [39] K. Yamamoto et al., ‘‘7.3 STATICA: A 512-spin 0.25 M-weight full-
probabilistic coprocessor,’’ 2021, arXiv:2109.14801. digital annealing processor with a near-memory all-spin-updates-at-once
[16] S. Misra et al., ‘‘Probabilistic neural computing with stochastic devices,’’ architecture for combinatorial optimization with complete spin-spin inter-
Adv. Mater., Nov. 2022, Art. no. 2204569. actions,’’ in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
[17] P. J. Coles, ‘‘Thermodynamic AI and the fluctuation frontier,’’ 2023, Papers, Feb. 2020, pp. 138–140.
arXiv:2302.06584. [40] S. Patel, P. Canoza, and S. Salahuddin, ‘‘Logically synthesized and
[18] W. A. Borders, A. Z. Pervaiz, S. Fukami, K. Y. Camsari, H. Ohno, and hardware-accelerated restricted Boltzmann machines for combinatorial
S. Datta, ‘‘Integer factorization using stochastic magnetic tunnel junc- optimization and integer factorization,’’ Nature Electron., vol. 5, no. 2,
tions,’’ Nature, vol. 573, no. 7774, pp. 390–393, Sep. 2019. pp. 92–101, Feb. 2022.
[19] R. Faria, J. Kaiser, K. Y. Camsari, and S. Datta, ‘‘Hardware design for [41] R. Afoakwa, Y. Zhang, U. K. R. Vengalam, Z. Ignjatovic, and M. Huang,
autonomous Bayesian networks,’’ Frontiers Comput. Neurosci., vol. 15, ‘‘A CMOS-compatible Ising machine with bistable nodes,’’ 2020,
p. 14, Mar. 2021. arXiv:2007.06665.

VOLUME 9, NO. 1, JUNE 2023 9


IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

[42] A. Lu et al., ‘‘Scalable in-memory clustered annealer with temporal noise [66] L. Rehm et al., ‘‘Stochastic magnetic actuated random transducer
of FinFET for the travelling salesman problem,’’ in IEDM Tech. Dig., devices based on perpendicular magnetic tunnel junctions,’’ 2022,
Dec. 2022, pp. 5–22. arXiv:2209.01480.
[43] W. Moy, I. Ahmed, P.-W. Chiu, J. Moy, S. S. Sapatnekar, and C. H. Kim, [67] B. R. Zink, Y. Lv, and J.-P. Wang, ‘‘Review of magnetic tunnel junctions
‘‘A 1,968-node coupled ring oscillator circuit for combinatorial opti- for stochastic computing,’’ IEEE J. Exp. Solid-State Comput. Devices
mization problem solving,’’ Nature Electron., vol. 5, no. 5, pp. 310–317, Circuits, vol. 8, no. 2, pp. 173–184, Dec. 2022.
May 2022. [68] A. Fukushima et al., ‘‘Spin dice: A scalable truly random number gen-
[44] N. Mohseni, P. L. McMahon, and T. Byrnes, ‘‘Ising machines as hardware erator based on spintronics,’’ Appl. Phys. Exp., vol. 7, no. 8, 2014,
solvers of combinatorial optimization problems,’’ Nature Rev. Phys., Art. no. 083001.
vol. 4, no. 6, pp. 363–379, May 2022. [69] Y. Shao et al., ‘‘Implementation of artificial neural networks using mag-
[45] M. Marsman, G. Maris, T. Bechger, and C. Glas, ‘‘Bayesian netoresistive random-access memory-based stochastic computing units,’’
inference for low-rank Ising networks,’’ Sci. Rep., vol. 5, no. 1, IEEE Magn. Lett., vol. 12, pp. 1–5, 2021.
Mar. 2015. [70] K. Y. Camsari, S. Salahuddin, and S. Datta, ‘‘Implementing p-bits
[46] M. T. McCray, M. A. Abeed, and S. Bandyopadhyay, ‘‘Electrically pro- with embedded MTJ,’’ IEEE Electron Device Lett., vol. 38, no. 12,
grammable probabilistic bit anti-correlator on a nanomagnetic platform,’’ pp. 1767–1770, Dec. 2017.
Sci. Rep., vol. 10, no. 1, pp. 1–11, Jul. 2020. [71] J. Kaiser, A. Rustagi, K. Y. Camsari, J. Z. Sun, S. Datta, and P. Upadhyaya,
[47] P. Debashis, V. Ostwal, R. Faria, S. Datta, J. Appenzeller, and Z. Chen, ‘‘Subnanosecond fluctuations in low-barrier nanomagnets,’’ Phys. Rev.
‘‘Hardware implementation of Bayesian network building blocks with Appl., vol. 12, no. 5, Nov. 2019, Art. no. 054056.
stochastic spintronic devices,’’ Sci. Rep., vol. 10, no. 1, p. 16002, [72] O. Hassan, R. Faria, K. Y. Camsari, J. Z. Sun, and S. Datta, ‘‘Low-barrier
Sep. 2020. magnet design for efficient hardware binary stochastic neurons,’’ IEEE
[48] S. Nasrin, J. Drobitch, P. Shukla, T. Tulabandhula, S. Bandyopadhyay, Magn. Lett., vol. 10, pp. 1–5, 2019.
and A. R. Trivedi, ‘‘Bayesian reasoning machine on a magneto- [73] K. Hayakawa et al., ‘‘Nanosecond random telegraph noise in in-plane
tunneling junction network,’’ Nanotechnology, vol. 31, no. 48, Nov. 2020, magnetic tunnel junctions,’’ Phys. Rev. Lett., vol. 126, no. 11, Mar. 2021,
Art. no. 484001. Art. no. 117202.
[49] K.-E. Harabi et al., ‘‘A memristor-based Bayesian machine,’’ Nature [74] C. Safranski, J. Kaiser, P. Trouilloud, P. Hashemi, G. Hu, and
Electron., vol. 6, pp. 1–12, Dec. 2022. J. Z. Sun, ‘‘Demonstration of nanosecond operation in stochastic mag-
[50] M. Golam Morshed, S. Ganguly, and A. W. Ghosh, ‘‘A deep dive into netic tunnel junctions,’’ Nano Lett., vol. 21, no. 5, pp. 2040–2045,
the computational fidelity of high variability low energy barrier magnet Feb. 2021.
technology for accelerating optimization and Bayesian problems,’’ 2023, [75] S. Kanai, K. Hayakawa, H. Ohno, and S. Fukami, ‘‘Theory of relaxation
arXiv:2302.08074. time of stochastic nanomagnets,’’ Phys. Rev. B, Condens. Matter, vol. 103,
[51] C. Bybee, D. Kleyko, D. E. Nikonov, A. Khosrowshahi, B. A. Olshausen, no. 9, Mar. 2021, Art. no. 094423, doi: 10.1103/PhysRevB.103.094423.
and F. T. Sommer, ‘‘Efficient optimization with higher-order Ising [76] J. Kaiser, W. A. Borders, K. Y. Camsari, S. Fukami, H. Ohno, and S. Datta,
machines,’’ 2022, arXiv:2212.03426. ‘‘Hardware-aware in situ learning based on stochastic magnetic tunnel
[52] M. K. Bashar and N. Shukla, ‘‘Constructing dynamical systems to model junctions,’’ Phys. Rev. Appl., vol. 17, no. 1, Jan. 2022, Art. no. 014016.
higher order Ising spin interactions and their application in solving com- [77] A. Grimaldi et al., ‘‘Experimental evaluation of simulated quantum
binatorial optimization problems,’’ 2022, arXiv:2211.05365. annealing with MTJ-augmented p-bits,’’ in IEDM Tech. Dig., Dec. 2022,
[53] N. Onizawa and T. Hanyu, ‘‘High convergence rates of CMOS invertible pp. 22–24.
logic circuits based on many-body Hamiltonians,’’ in Proc. IEEE Int. [78] J. Yin et al., ‘‘Scalable Ising computer based on ultra-fast field-free spin
Symp. Circuits Syst. (ISCAS), May 2021, pp. 1–5. orbit torque stochastic device with extreme 1-bit quantization,’’ in IEDM
[54] B. Sutton, R. Faria, L. A. Ghantasala, R. Jaiswal, K. Y. Camsari, and Tech. Dig., Dec. 2022, pp. 31–36.
S. Datta, ‘‘Autonomous probabilistic coprocessing with petaflips per [79] G. M. Gutiérrez-Finol, S. Giménez-Santamarina, Z. Hu, L. E. Rosaleny,
second,’’ IEEE Access, vol. 8, pp. 157238–157252, 2020. S. Cardona-Serra, and A. Gaita-Ariño, ‘‘Lanthanide molecular nanomag-
[55] J. Kaiser and S. Datta, ‘‘Probabilistic computing with p-bits,’’ Appl. Phys. nets as probabilistic bits,’’ 2023, arXiv:2301.08182.
Lett., vol. 119, no. 15, Oct. 2021, Art. no. 150503. [80] K. S. Woo, J. Kim, J. Han, W. Kim, Y. H. Jang, and C. S. Hwang,
[56] S. Aggarwal et al., ‘‘Demonstration of a reliable 1 Gb standalone spin- ‘‘Probabilistic computing using Cu0.1 Te0.9 /HfO2 /Pt diffusive memris-
transfer torque MRAM for industrial applications,’’ in IEDM Tech. Dig., tors,’’ Nature Commun., vol. 13, no. 1, p. 5762, Sep. 2022.
Dec. 2019, pp. 1–2. [81] Y. Liu et al., ‘‘Probabilistic circuit implementation based on p-bits using
[57] K. Lee et al., ‘‘1 Gbit high density embedded STT-MRAM in 28 nm the intrinsic random property of RRAM and p-bit multiplexing strategy,’’
FDSOI technology,’’ in IEDM Tech. Dig., Dec. 2019, p. 2. Micromachines, vol. 13, no. 6, p. 924, Jun. 2022.
[58] J. L. Drobitch and S. Bandyopadhyay, ‘‘Reliability and scalability of [82] T. J. Park et al., ‘‘Efficient probabilistic computing with stochastic
p-bits implemented with low energy barrier nanomagnets,’’ IEEE Magn. perovskite nickelates,’’ Nano Lett., vol. 22, no. 21, pp. 8654–8661,
Lett., vol. 10, pp. 1–4, 2019. Nov. 2022.
[59] R. Rahman and S. Bandyopadhyay, ‘‘The strong sensitivity of the charac- [83] S. Cheemalavagu, P. Korkmaz, K. V. Palem, B. E. Akgul, and
teristics of binary stochastic neurons employing low barrier nanomagnets L. N. Chakrapani, ‘‘A probabilistic CMOS switch and its realization by
to small geometrical variations,’’ 2021, arXiv:2108.04319. exploiting noise,’’ in Proc. IFIP Int. Conf. VLSI, 2005, pp. 535–541.
[60] O. Hassan, S. Datta, and K. Y. Camsari, ‘‘Quantitative evaluation of [84] W. Whitehead, Z. Nelson, K. Y. Camsari, and L. Theogarajan, ‘‘CMOS-
hardware binary stochastic neurons,’’ Phys. Rev. Appl., vol. 15, no. 6, compatible Ising and Potts annealing using single photon avalanche
Jun. 2021, Art. no. 064046. diodes,’’ 2022, arXiv:2211.12607.
[61] K. Y. Camsari, M. M. Torunbalci, W. A. Borders, H. Ohno, and S. Fukami, [85] L. Xia et al., ‘‘Technological exploration of RRAM crossbar array for
‘‘Double-free-layer magnetic tunnel junctions for probabilistic bits,’’ matrix-vector multiplication,’’ J. Comput. Sci. Technol., vol. 31, no. 1,
Phys. Rev. Appl., vol. 15, no. 4, Apr. 2021, Art. no. 044049. pp. 3–19, Jan. 2016.
[62] R. Rahman and S. Bandyopadhyay, ‘‘Robustness of binary stochastic neu- [86] Y. Li et al., ‘‘Capacitor-based cross-point array for analog neural network
rons implemented with low barrier nanomagnets made of dilute magnetic with record symmetry and linearity,’’ in Proc. IEEE Symp. VLSI Technol.,
semiconductors,’’ IEEE Magn. Lett., vol. 13, pp. 1–4, 2022. Jun. 2018, pp. 25–26.
[63] Y. Lv, R. P. Bloom, and J.-P. Wang, ‘‘Experimental demonstration of [87] O. Hassan, K. Y. Camsari, and S. Datta, ‘‘Voltage-driven building
probabilistic spin logic by magnetic tunnel junctions,’’ IEEE Magn. Lett., block for hardware belief networks,’’ IEEE Design Test, vol. 36, no. 3,
vol. 10, pp. 1–5, 2019. pp. 15–21, Jun. 2019.
[64] K. Y. Camsari et al., ‘‘From charge to spin and spin to charge: Stochas- [88] M. Kang, S. K. Gonugondla, A. Patil, and N. R. Shanbhag, ‘‘A
tic magnets for probabilistic switching,’’ Proc. IEEE, vol. 108, no. 8, multi-functional in-memory inference processor using a standard 6T
pp. 1322–1337, Aug. 2020. SRAM array,’’ IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 642–655,
[65] X. Chen, J. Zhang, and J. Xiao, ‘‘Magnetic-tunnel-junction-based true Feb. 2018.
random-number generator with enhanced generation rate,’’ Phys. Rev. [89] N. Verma et al., ‘‘In-memory computing: Advances and prospects,’’ IEEE
Appl., vol. 18, no. 2, Aug. 2022, L021002. Solid State Circuits Mag., vol. 11, no. 3, pp. 43–55, Aug. 2019.

10 VOLUME 9, NO. 1, JUNE 2023


Chowdhury et al.: Full-Stack View of Probabilistic Computing With p-Bits: Devices, Architectures, and Algorithms

[90] S. Geman and D. Geman, ‘‘Stochastic relaxation, Gibbs distributions, and [112] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, ‘‘Deep
the Bayesian restoration of images,’’ IEEE Trans. Pattern Anal. Mach. unsupervised learning using nonequilibrium thermodynamics,’’ in Proc.
Intell., vol. PAMI-6, no. 6, pp. 721–741, Nov. 1984. Int. Conf. Mach. Learn., 2015, pp. 2256–2265.
[91] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles [113] H. Ma, M. Govoni, and G. Galli, ‘‘Quantum simulations of materials on
and Techniques. Cambridge, MA, USA: MIT Press, 2009. near-term quantum computers,’’ npj Comput. Mater., vol. 6, no. 1, p. 85,
[92] G. E. Hinton, A Practical Guide to Training Restricted Boltzmann Jul. 2020.
Machines. Berlin, Germany: Springer, 2012, pp. 599–619. [114] R. Babbush, J. R. McClean, M. Newman, C. Gidney, S. Boixo, and
[93] M. R. Garey and D. S. Johnson, Computers and Intractability, vol. 174, H. Neven, ‘‘Focus beyond quadratic speedups for error-corrected quan-
San Francisco, CA, USA: Freeman, 1979. tum advantage,’’ PRX Quantum, vol. 2, no. 1, Mar. 2021, Art. no. 010103.
[94] D. Brélaz, ‘‘New methods to color the vertices of a graph,’’ Commun. [115] K. Y. Camsari, S. Chowdhury, and S. Datta, ‘‘Scalable emulation of sign-
ACM, vol. 22, no. 4, pp. 251–256, 1979. problem–free Hamiltonians with room-temperature p-bits,’’ Phys. Rev.
[95] Y. Fang et al., ‘‘Parallel tempering simulation of the three-dimensional Appl., vol. 12, no. 3, Sep. 2019, Art. no. 034061.
Edwards–Anderson model with compact asynchronous multispin coding [116] M. Suzuki, ‘‘Relationship between d-dimensional quantal spin systems
on GPU,’’ Comput. Phys. Commun., vol. 185, no. 10, pp. 2467–2478, and (d+1)-dimensional Ising systems: Equivalence, critical exponents
Oct. 2014. and systematic approximants of the partition function and spin correla-
[96] K. Yang, Y.-F. Chen, G. Roumpos, C. Colby, and J. Anderson, ‘‘High tions,’’ Prog. Theor. Phys., vol. 56, no. 5, pp. 1454–1469, Nov. 1976.
performance Monte Carlo simulation of Ising model on TPU clusters,’’ in [117] A. D. King et al., ‘‘Scaling advantage over path-integral Monte Carlo in
Proc. Int. Conf. High Perform. Comput., Netw., Storage Anal., Nov. 2019, quantum simulation of geometrically frustrated magnets,’’ Nature Com-
pp. 1–15. mun., vol. 12, no. 1, pp. 1–6, 2021.
[97] P. Debashis, R. Faria, K. Y. Camsari, S. Datta, and Z. Chen, ‘‘Cor- [118] S. Chowdhury, K. Y. Camsari, and S. Datta, ‘‘Accelerated quantum Monte
related fluctuations in spin orbit torque coupled perpendicular nano- Carlo with probabilistic computers,’’ 2022, arXiv:2210.17526.
magnets,’’ Phys. Rev. B, Condens. Matter, vol. 101, no. 9, Mar. 2020, [119] S. Chowdhury, S. Datta, and K. Y. Camsari, ‘‘A probabilistic approach
Art. no. 094405. to quantum inspired algorithms,’’ in IEDM Tech. Dig., Dec. 2019,
[98] N. A. Aadit, A. Grimaldi, G. Finocchio, and K. Y. Camsari, ‘‘Physics- pp. 5–37.
inspired Ising computing with ring oscillator activated p-bits,’’ in Proc. [120] S. Chowdhury, K. Y. Camsari, and S. Datta, ‘‘Emulating quantum inter-
IEEE 22nd Int. Conf. Nanotechnol. (NANO), Jul. 2022, pp. 393–396. ference with generalized Ising machines,’’ 2020, arXiv:2007.07379.
[99] S. Bhatti, R. Sbiaa, A. Hirohata, H. Ohno, S. Fukami, and [121] M. Troyer and U.-J. Wiese, ‘‘Computational complexity and fundamental
S. Piramanayagam, ‘‘Spintronics based random access memory: A limitations to fermionic quantum Monte Carlo simulations,’’ Phys. Rev.
review,’’ Mater. Today, vol. 20, no. 9, pp. 530–548, 2017. Lett., vol. 94, no. 17, May 2005, Art. no. 170201.
[100] H. Hoos and T. Stützle, SATLIB: An Online Resource for Research on [122] D. Aharonov, X. Gao, Z. Landau, Y. Liu, and U. Vazirani, ‘‘A polynomial-
SAT. Amsterdam, The Netherlands: IOS Press, Apr. 2000, pp. 283–292. time classical algorithm for noisy random circuit sampling,’’ 2022,
[101] A. Z. Pervaiz, L. A. Ghantasala, K. Y. Camsari, and S. Datta, ‘‘Hardware arXiv:2211.03999.
emulation of stochastic p-bits for invertible logic,’’ Sci. Rep., vol. 7, no. 1, [123] D. Hangleiter, I. Roth, D. Nagaj, and J. Eisert, ‘‘Easing the
p. 10994, Sep. 2017. Monte Carlo sign problem,’’ Sci. Adv., vol. 6, no. 33, Aug. 2020,
[102] V. Choi, ‘‘Minor-embedding in adiabatic quantum computation: II. Art. no. eabb8341.
Minor-universal graph design,’’ Quantum Inf. Process., vol. 10, no. 3, [124] G. Carleo and M. Troyer, ‘‘Solving the quantum many-body prob-
pp. 343–353, 2011. lem with artificial neural networks,’’ Science, vol. 355, pp. 602–606,
[103] N. Sagan and J. Roychowdhury, ‘‘DaS: Implementing dense Ising Feb. 2017.
machines using sparse resistive networks,’’ in Proc. 41st IEEE/ACM Int. [125] Z. Cai and J. Liu, ‘‘Approximating quantum many-body wave functions
Conf. Comput.-Aided Design, Oct. 2022, pp. 1–9. using artificial neural networks,’’ Phys. Rev. B, Condens. Matter, vol. 97,
[104] A. Lucas, ‘‘Ising formulations of many NP problems,’’ Frontiers Phys., no. 3, Jan. 2018, Art. no. 035116, doi: 10.1103/PhysRevB.97.035116.
vol. 2, p. 5, Feb. 2014. [126] H. Saito and M. Kato, ‘‘Machine learning technique to find quantum
[105] E. Andriyash et al., ‘‘Boosting integer factoring performance via quantum many-body ground states of bosons on a lattice,’’ J. Phys. Soc. Jpn.,
annealing offsets,’’ D-Wave Syst., Burnaby, BC, Canada, D-Wave Tech. vol. 87, no. 1, Jan. 2018, Art. no. 014001, doi: 10.7566/JPSJ.87.014001.
Rep. 14-1002A-B, 2016. [127] S. Das Sarma, D.-L. Deng, and L.-M. Duan, ‘‘Machine learning meets
[106] J. D. Biamonte, ‘‘Nonperturbative k-body to two-body commut- quantum physics,’’ 2019, arXiv:1903.03516.
ing conversion Hamiltonians and embedding problem instances into [128] N. A. Aadit, A. Grimaldi, M. Carpentieri, L. Theogarajan, G. Finocchio,
Ising spins,’’ Phys. Rev. A, Gen. Phys., vol. 77, no. 5, May 2008, and K. Y. Camsari, ‘‘Computing with invertible logic: Combinatorial
Art. no. 052331. optimization with probabilistic bits,’’ in IEDM Tech. Dig., Dec. 2021,
[107] F. L. Traversa and M. D. Ventra, ‘‘Polynomial-time solution of prime pp. 3–40.
factorization and NP-complete problems with digital memcomputing [129] A. Grimaldi et al., ‘‘Spintronics-compatible approach to solving
machines,’’ Chaos, Interdiscipl. J. Nonlinear Sci., vol. 27, no. 2, 2017, maximum-satisfiability problems with probabilistic computing, invertible
Art. no. 023107. logic, and parallel tempering,’’ Phys. Rev. Appl., vol. 17, no. 2, Feb. 2022,
[108] M. Di Ventra, MemComputing: Fundamentals and Applications. Oxford, Art. no. 024052.
U.K.: Oxford Univ. Press, 2022. [130] G. Desjardins, A. Courville, and Y. Bengio, ‘‘Adaptive parallel tem-
[109] R. Salakhutdinov and G. Hinton, ‘‘Deep Boltzmann machines,’’ in Proc. pering for stochastic maximum likelihood learning of RBMs,’’ 2010,
12th Int. Conf. Artif. Intell. Statist., in Proceedings of Machine Learning arXiv:1012.3476.
Research, vol. 5, D. Van Dyk and M. Welling, Eds. Clearwater Beach, FL, [131] M. Mohseni et al., ‘‘Nonequilibrium Monte Carlo for unfreezing variables
USA: Hilton Clearwater Beach Resort, Apr. 2009, pp. 448–455. [Online]. in hard combinatorial optimization,’’ 2021, arXiv:2111.13628.
Available: https://proceedings.mlr.press/v5/salakhutdinov09a.html [132] A. Buluc et al., ‘‘Randomized algorithms for scientific computing
[110] S. H. Adachi and M. P. Henderson, ‘‘Application of quantum annealing (RASC),’’ 2021, arXiv:2104.11079.
to training of deep neural networks,’’ 2015, arXiv:1510.06356. [133] G. Finocchio et al., ‘‘Roadmap for unconventional computing with nan-
[111] H. Manukian, F. L. Traversa, and M. Di Ventra, ‘‘Accelerating deep learn- otechnology,’’ 2023, arXiv:2301.06727.
ing with memcomputing,’’ Neural Netw., vol. 110, pp. 1–7, Feb. 2019.

VOLUME 9, NO. 1, JUNE 2023 11

You might also like