A Full-Stack View of Probabilistic Computing With P-Bits Devices Architectures and Algorithms
A Full-Stack View of Probabilistic Computing With P-Bits Devices Architectures and Algorithms
ABSTRACT The transistor celebrated its 75th birthday in 2022. The continued scaling of the transistor
defined by Moore’s law continues, albeit at a slower pace. Meanwhile, computing demands and energy
consumption required by modern artificial intelligence (AI) algorithms have skyrocketed. As an alternative
to scaling transistors for general-purpose computing, the integration of transistors with unconventional
technologies has emerged as a promising path for domain-specific computing. In this article, we provide
a full-stack review of probabilistic computing with p-bits as a representative example of the energy-efficient
and domain-specific computing movement. We argue that p-bits could be used to build energy-efficient
probabilistic systems, tailored for probabilistic algorithms and applications. From hardware, architecture,
and algorithmic perspectives, we outline the main applications of probabilistic computers ranging from prob-
abilistic machine learning (ML) and AI to combinatorial optimization and quantum simulation. Combining
emerging nanodevices with the existing CMOS ecosystem will lead to probabilistic computers with orders of
magnitude improvements in energy efficiency and probabilistic sampling, potentially unlocking previously
unexplored regimes for powerful probabilistic algorithms.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, NO. 1, JUNE 2023 1
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
3-D heterogeneous integration, 2-D materials for transis- III. FUNDAMENTALS OF p-COMPUTING
tors and interconnects [3], new transistor physics via neg- A large family of problems (see Fig. 2) can be encoded
ative capacitance [4], [5], or entirely new approaches to coupled p-bits evolving according to the following
using spintronic and magnetoelectric phenomena to build equations [12]:
energy-efficient switches [6], [7]. mi = sign tanh(β Ii ) − r[−1,+1]
(1)
A complementary approach to extending Moore’s law is X
to augment the existing CMOS ecosystem with emerging, Ii = Wij mj + hi (2)
nonsilicon nanotechnologies [8], [9]. One way to achieve j
this goal is through heterogeneous CMOS + X architectures,
where mi is defined as a bipolar variable (mi ∈ {−1, +1}),
where X stands for a CMOS-compatible nanotechnology.
r is a uniform random number drawn from the interval
For example, X can be magnetic, ferroelectric, memristive,
[−1, 1], [W ] is the coupling matrix between the p-bits, β is
or photonic systems. We also discuss an example of this
the inverse temperature, and {h} is the bias vector. In physical
complementary approach, the combination of CMOS with
implementations, it is often more convenient to represent
magnetic memory technology, purposefully modified to build
p-bits as binary variables, si ∈ {0, 1}. A straightforward
probabilistic computers.
conversion of (1) and (2) is possible using the standard trans-
formation, m → 2 s − 1 [18].
As stated, (1) and (2) do not place any restrictions on
[W ], which may be a symmetric or asymmetric matrix. If an
updated order of p-bits is specified, these equations take the
coupled p-bit system to a well-defined steady-state distri-
bution defined by the eigenvector (with eigenvalue +1) of
the corresponding Markov matrix [12]. Indeed, in the case
of Bayesian (belief) networks defined by a directed graph,
updating the p-bits from parent nodes to child nodes takes
FIGURE 1. Bit, p-bit, and qubit. Each column shows a schematic the system to a steady-state distribution corresponding to that
representation of the basic computational units of classical obtained from Bayes’ theorem [19].
computing (left), probabilistic computing (middle), and quantum
If the [W ] matrix is symmetric, one can define an energy,
computing (right). These are, respectively, the bit, the p-bit, and
the qubit.
E, whose negated partial derivative with respect to p-bit mi
gives rise to (2)
II. FULL-STACK VIEW AND ORGANIZATION
Research on probabilistic computing with p-bits originated X X
E(m1 , m2 , . . . , ) = − Wij mi mj + hi mi . (3)
at the device and physics level, first with stable nanomag-
i<j i
nets [10], followed by low-barrier nanomagnets [11], [12].
In [12], the p-bit was formally defined as a binary stochastic In this case, the steady-state distribution of the network is
neuron realized in hardware. In both approaches with sta- described by [20]
ble and unstable nanomagnets, the basic idea is to exploit
the natural mapping between the intrinsically noisy physics 1
pi = exp (−βEi ) (4)
of nanomagnets to the mathematics of general probabilistic Z
algorithms [e.g., Monte Carlo, Markov Chain Monte Carlo also known as the Boltzmann law. As such, iterating a net-
(MCMC)]. Such a notion of natural computing where physics work of p-bits described by (1) and (2) eventually approx-
is matched to computation was clearly laid out by Feynman imates the Boltzmann distribution which can be useful for
[13] in his celebrated Simulating Physics with Computers probabilistic sampling and optimization. The approximate
talk. Subsequent work on p-bits defined it as an abstrac- sampling avoids the intractable problem of exactly calcu-
tion between bits and qubits (see Fig. 1) with the possi- lating Z . Remarkably, for such undirected networks, the
bility of different physical implementations. In addition to steady-state distribution is invariant with respect to the update
searching for energy-efficient realizations of single devices, order of p-bits, as long as connected p-bits are not updated
p-bit research has extended to finding efficient architec- at the same time (more on this later). This feature is highly
tures (through massive parallelization, sparsification [14], reminiscent of natural systems where asynchronous dynam-
and pipelining [15]) along with the identification of promis- ics make parallel updates highly unlikely and the update
ing application domains. This full-stack research program order does not change the equilibrium distribution. Indeed,
covering hardware, architecture, algorithms, and applications this gives the hardware implementation of asynchronous net-
is similar to the related field of quantum computation where works of p-bits massive parallelism and flexibility in design.
a large degree of interdisciplinary expertise is required to The energy functional defined by (3) is often the starting
move the field forward (see the related reviews [16], [17]). point of discussions in the related field of Ising machines [21],
The purpose of this article is to serve as a consolidated sum- [22], [23], [24], [25], [26], [27], [28], [29], [30], [31], [32],
mary of recent developments with new results in hardware, [33], [34], [35], [36], [37], [38], [39], [40], [41], [42], [43]
architectures, and algorithms. We provide concrete and pre- with different implementations (see [44] for a comprehensive
viously unpublished examples of ML and AI, combinatorial review). In the case of p-bits, however, we view (1) and (2)
optimization, and quantum simulation with p-bits (see Fig. 2). more fundamental than (3) because the former can also be
FIGURE 2. Applications of probabilistic computing. Potential applications of p-bits are illustrated. The list broadly includes problems
in combinatorial optimization, probabilistic ML, and quantum simulation.
used to approximate hard inference on directed networks, They also serve a useful purpose to illustrate why analog
while the latter always relies on undirected networks. Com- or mixed-signal implementations of p-bits with nanodevices
pared to undirected networks using Ising machines, work on are necessary. Even using some of the most advanced field
directed neural networks for Bayesian inference has been programmable gate arrays (FPGAs), the footprint of a digi-
relatively scarce, although there are exciting developments tal p-bit is very large: Synthesizing such digital p-bits with
[19], [45], [46], [47], [48], [49], [50]. PRNGs of varying quality of randomness results in tens of
Finally, the form of (3) restricts the type of interac- thousands of individual transistors. In single FPGAs that do
tions between p-bits to a linear one since the energy is not use time-division multiplexing of p-bits or off-chip mem-
quadratic. Even though higher-order interactions (k-local) ory, only about 10 000–20 000 p-bits with 100 000 weights
between p-bits are possible [18] (also discussed in the (sparse graphs with degree 5–10) fit, even within high-end
context of Ising machines [51], [52]), such higher-order devices [14].
interactions can always be constructed by combining a stan- On the other hand, using nanodevices such as CMOS-
dard probabilistic gate set at the cost of extra p-bits. In our compatible stochastic magnetic tunnel junctions (sMTJs),
view, in the case of electronic implementation with scalable millions of p-bits can be accommodated in single cores
p-bits, trading an increased number of p-bits for simplified due to the scalability achieved by the magnetoresistive
interconnect complexity is almost always favorable. That random access memory (MRAM) technology, exceeding
being said, algorithmic advantages and the better represen- 1-Gb MRAM chips [56], [57]. However, before the stable
tative capabilities of higher-order interactions are actively MTJs can be controllably made stochastic, challenges at
being explored [51], [53]. the material and device level must be addressed [58], [59]
with careful magnet designs [60], [61], [62]. Different fla-
IV. HARDWARE: PHYSICAL IMPLEMENTATION OF vors of magnetic p-bits exist [63], [64], [65], [66], for a
p-BITS recent review (see [67]). Unlike synchronous or trial-based
A. p-BITS stochasticity (see [68]) that requires continuous resetting,
The p-bit defined in (1) describes a tunable and discrete ran- the temporal noise of low-barrier nanomagnets makes them
dom number generator. Its physical implementation includes ideally suited to build autonomous, physics-inspired prob-
a broad range of options from noisy materials to ana- abilistic computers, providing a constant stream of tunably
log and digital CMOS (see Fig. 3). The digital CMOS random bits [69]. Following earlier theoretical predictions
implementations of p-bits often consist of a pseudoran- [70], [71], [72], recent breakthroughs in low-barrier mag-
dom number generator (PRNG) (r), a lookup table for the nets have shown great promise, using stochastic MTJs with
activation function (tanh), and a threshold to generate a one- in-plane anisotropy where fluctuations can be of the order
bit output. Digital input with a specified fixed point precision of nanoseconds [73], [74], [75]. Such near-zero barrier nano-
(e.g., ten bits with one sign, six integers, and three fractional) magnets should be more tolerant to device variations because
provides tunability through the activation function. Digital when the energy-barrier 1 is low, the usual exponential
p-bits have been very useful in prototyping probabilistic dependence of fluctuations is much less pronounced. These
computers up to tens of thousands of p-bits [14], [54], [55]. stochastic MTJs may be used in electrical circuits with a few
VOLUME 9, NO. 1, JUNE 2023 3
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits
B. SYNAPSE
The second central part of the p-computer architecture is the
synapse, denoted by (2). Much like the hardware p-bit, there
are several different implementations of synapses ranging
from digital CMOS, analog/mixed-signal CMOS, as well as
resistive [85] or capacitors crossbars [86], [87]. The synap-
tic equation looks like the traditional matrix–vector prod-
uct (MVP) commonly used in ML models today, however,
there is a crucial difference, thanks to the discrete p-bit
output (0 or 1), the MVP operation is simply an addi-
tion over the active neighbors of a given p-bit. This makes
the synaptic operation simpler than continuous multiplica-
tion and significantly simplifies digital synapses. In ana-
log implementations, the use of in-memory computing
techniques through charge accumulation could be useful
with the added simplification of digital outputs of p-bits
[88], [89].
It is important to note how the p-bit and the neuron for
eventually integrated p-bit applications can be mixed and
FIGURE 3. Different hardware options for building a probabilistic matched, as an example of creatively combining these pieces
computer. Top: Various magnetic implementations of a p-bit. (see the FPGA-stochastic MTJ combination reported in [77]).
These include both digital (CMOS) and mixed-signal The best combination of scalable p-bits and synapses may
implementations (based on, e.g., sMTJ with low-barrier lead to energy-efficient and large-scale p-computers. At this
magnets). Bottom: Hybrid of classical and probabilistic
time, various possibilities exist with different technological
computing schemes is shown where the classical computer
generates weights and programs the probabilistic computer.
maturity.
The probabilistic computer then generates samples accordingly
with high throughput and sends them back to the classical V. ARCHITECTURE CONSIDERATIONS
computer for further processing. Like the building blocks of A. GIBBS SAMPLING WITH p-BITS
p-bits, the synapse of the probabilistic computer can be
The dynamical evolution of (1) and (2) relies on an iterated
designed in several ways, including digital, analog, and a mix of
both techniques. updating scheme where each p-bit is updated one after the
other based on a predefined (or random) update order. This
iterative scheme is called Gibbs sampling [90], [91]. Virtually
all applications discussed in Fig. 2 benefit from accelerating
additional transistors (see Fig. 3) to build hardware p-bits. Gibbs sampling, attesting to its generality.
Two flavors of stochastic MTJ-based p-bits were proposed In a standard implementation of Gibbs sampling in a syn-
in [12] (spin orbit torque (SOT)-based) and in [70] (spin chronous system, p-bits will be updated one by one at every
transfer torque (STT)-based). Both these p-bits have now clock cycle as shown in Fig. 4(a). It is crucial to ensure that
been experimentally demonstrated in [18], [76], and [77] the effective input each p-bit receives through (2) is computed
(STT) and in [78] (SOT). While many other implementations before the p-bit updates. As such, Tclk has to be longer than
of p-bits are possible, from molecular nanomagnets [79] to the time it takes to compute (2). In this setting, a graph with
diffusive memristors [80], resistive random access memory N p-bits will require N clock cycles (NTclk ) to perform a
(RRAM) [81], perovskite nickelates [82], and others, two complete sweep, where Tclk is the clock period. This require-
additional advantages of the MRAM-based p-bits are the ment makes Gibbs sampling a fundamentally serial and slow
proven manufacturability (up to billion-bit densities) and process.
the amplification of room-temperature noise. Even with the A much more effective approach is possible by the fol-
thermal energy of kT in the environment, magnetic switching lowing observation: even though updates between connected
causes large resistance fluctuations in MTJs, creating hun- p-bits need to be sequential, if two p-bits are not directly
dreds of millivolts of change in resistive dividers [70]. Typical connected, updating one of them does not directly change the
noise on resistors (or memristors) is limited by the (kT /C)1/2 input of the other through (2). Such p-bits can be updated
limit which is far lower (millivolts) even at extremely low in parallel without any approximation. Indeed, one motiva-
capacitances (C). This feature of stochastic MTJs ensures tion of designing restricted Boltzmann machines (RBMs;
that they do not require explicit amplifiers [83] at each p-bit, see [92]) over unrestricted BMs is to exploit this paral-
which can become prohibitively expensive in terms of area lelism: RBMs consist of separate layers (bipartite) that can be
and power consumption. Estimates of sMTJs-based p-bits updated in parallel. However, this idea can be taken further.
suggest that they can create a random bit using 2 fJ per If the underlying graph is sparse, it is often easy to split it into
operation [18]. Recently, a CMOS-compatible single-photon disconnected chunks by coloring the graph using a few colors.
avalanche diode-based implementation of p-bits showed sim- Even though finding the minimum number of colors is an
ilar, amplifier-free operation [84] and the search for the most NP-hard problem [93], heuristic coloring algorithms (such as
scalable, energy-efficient hardware p-bit using alternative Dsatur [94]) with polynomial complexity can color the graph
phenomena continues. very quickly, without necessarily finding a minimum. In this
FIGURE 4. Architectures of p-computer. (a) Synchronous Gibbs: all p-bits are updated sequentially. N p-bits need N clock cycles
(NTclk ) to perform a complete sweep, Tclk being the clock period. (b) Pseudo-asynchronous Gibbs: a sparse network can be colored
into a few disjoint blocks where connected p-bits are assigned a different color. Phase-shifted clocks update the color blocks one
after the other. N p-bits need ≈ one clock cycle Tclk to perform a complete sweep, reducing O(N) complexity of a sweep to O(1), where
we assume the number of colors c ≪ N. (c) Truly asynchronous Gibbs: a hardware p-bit (e.g., a stochastic MTJ-based p-bit) provides
an asynchronous and random clock with period ⟨Tp-bit ⟩. N p-bits need approximately one clock to perform a complete sweep, as long
as synapse time is less than the clock on average. No graph coloring or engineered phase shifting is required.
context, obtaining the minimum coloring is not critical, and avoided with careful phase-shifting [98]. The main appeal
sparse graphs typically require a few colors. of the truly asynchronous Gibbs sampling is the lack of any
Such an approach was taken on sparse graphs (with no graph coloring and phase shift engineering while retaining
regular structure) to design a massively parallel implemen- the same massive parallelism as N p-bits requires approx-
tation of Gibbs sampling in [14] [see Fig. 4(b)]. Connected imately a single ⟨Tp-bit ⟩ to complete a sweep. Given that
p-bits are assigned a different color, and unconnected p-bits the FPGA-based p-computers already provide about a 10×
are assigned the same color. Equally phase-shifted and same- improvement in sampling throughput to optimized TPU and
frequency clocks update the p-bits in each color block one GPUs [14], such asynchronous systems are promising in
by one. In this approach, a graph with N p-bits requires terms of scalability. Stochastic MTJ-based p-bits should be
only one clock cycle (Tclk ) to perform a complete sweep, able to reach high densities on a single chip. Around 20 W
reducing O(N ) complexity for a full sweep to O(1), assuming of projected power consumption can be reached considering
the number of colors is much less than N . Therefore, the key 20-µW p-bit/synapse combinations at 1M p-bit density [54],
advantage of this approach is that the p-computer becomes [60], [99]. The ultimate scalability of magnetic p-bits is a
faster with larger graphs since probabilistic ‘‘flips per sec- significant advantage over alternative approaches based on
ond,’’ a key metric measured by tensor processing unit (TPU) electronic or photonic devices.
and GPU implementations [95], [96] linearly increases with
the number of p-bits. It is important to note that these TPU
and GPU implementations also solve Ising problems in sparse B. SPARSIFICATION
graphs, however, their graph degrees are usually restricted Both the pseudo-asynchronous and the truly asynchronous
to 4 or 6, unlike more irregular and higher degree graphs parallelisms require sparse graphs to work well. The first
implemented in [14]. problem is the number of colors: if the graph is dense,
We term this graph-colored architecture the pseudo- it requires a lot of colors, making the architecture very similar
asynchronous Gibbs because while it is technically syn- to the standard serial Gibbs sampling.
chronized to out-of-phase clocks, it embodies elements of The second problem with a dense graph is the synapse time
the truly asynchronous architecture we discuss next. While tsynapse . If many p-bits have a lot of neighbors, the synapse
graph coloring algorithmically increases sampling rates by unit needs to compute a large sum before the next update.
a factor of N , it still requires a careful design of out-of- If the time between two consecutive updates is ⟨Tp-bit ⟩,
phase clocks. A much more radical approach is to design it requires tsynapse ≪ ⟨Tp-bit ⟩ to avoid information loss and
a truly asynchronous Gibbs sampler as shown in Fig. 4(c). reach the correct steady-state distribution [54], [101].
Here, the idea is to have hardware building blocks with However, if the graph is sparse, each p-bit has fewer con-
naturally asynchronous dynamics, such as an sMTJ-based nections, and the updates can be faster without any dropped
p-bit. In such a p-bit, there exists a natural ‘‘clock,’’ ⟨Tp-bit ⟩, messages. Any graph can be sparsified using the tech-
defined by the average lifetime of a Poisson process [97]. nique proposed in [14], similar in spirit to the minor-graph
As long as ⟨Tp-bit ⟩ is not faster than the average synapse time embedding (MGE) approach pioneered by D-Wave [102],
(tsynapse ) to calculate (2), the network still updates N spins in even though the objective here is to not find an embedding
a single ⟨Tp-bit ⟩ timescale. This is because the probability of but to sparsify an existing graph. The key idea is to split p-bits
simultaneous updates is extremely low in a Poisson process into different copies, using ferromagnetic COPY gates. These
and further reduced in highly sparse graphs. p-bits distribute the original connections among them, result-
In fact, preliminary experiments implementing such truly ing in identical copies with fewer connections. An important
asynchronous p-bits with ring-oscillator activated clocks point is that the ground state of the original graph remains
show that despite making occasional parallel updates, the unchanged [14], so the method does not involve approxi-
asynchronous p-computer performs similarly compared to mations, unlike other sparsification techniques, for example,
the pseudo-asynchronous system where incorrect updates are based on low-rank approximations [103].
FIGURE 6. Invertible logic encoding. The encoding process of three optimization problems—(a) maximum satisfiability problem, (b)
number partitioning, and (c) knapsack problem—is streamlined and visually summarized into three steps: (1) problem first has to be
condensed into a concise mathematical formulation; then, (2) invertible Boolean circuit that topologically maps the problem is
conceived; finally, (3) invertible Boolean circuit is converted into an Ising model using probabilistic and/or/not gates [14].
C. QUANTUM SIMULATION
One primary motivation for building quantum computers is
to simulate large quantum many-body systems and under-
stand the exotic physics offered by them [113]. Two major
challenges with quantum computers are the necessity of
using cryogenic operating temperatures and the vulnera-
bility to noise, rendering quantum computers impractical,
especially considering practical overheads [114]. Simu-
lating these systems with classical computers is often
extremely time-consuming and mostly limited to small sys-
tems. One potential application of p-bits is to provide a
room-temperature solution to boost the simulation speed and
potentially enable the simulation of large-scale quantum sys-
tems. Significant progress has been made toward this end in
recent years.
FIGURE 8. ML quantum systems with p-bits. (a) Heisenberg Hamiltonian with a transverse field (0 = + 1) is applied to an FM coupled
(JZ = + 1 and Jxy = + 0.5) linear chain of 12 qubits with periodic boundary. (b) To obtain the ground state of this quantum system,
an RBM is employed with 12 visible and 48 hidden nodes, where all nodes in the visible layer are connected to all nodes in the hidden
layer. (c) This ML model is then embedded into a hardware-amenable sparse p-bit network arranged in a chimera graph using MGE.
We use a coupling strength of 1.0 among the replicated visible and hidden nodes in the embedded p-bit network. (d) Overview of the
ML algorithm and the division of workload between the probabilistic and classical computers in a hybrid setting is shown. (e) FPGA
emulation of this probabilistic computer performs variational ML in tandem with a classical computer, converging to the
quantum (exact) result as shown.
can be used for quantum Hamiltonians beyond the usual Hamiltonian. Emulating this variational ML approach with
Transverse Field Ising Model, such as the antiferromagnetic p-bits requires a few more steps. An RBM network contains
Heisenberg Hamiltonian [119] and even for the emulation of all-to-all connections between the visible and hidden layers
gate-based quantum computers [120]. However, for generic which are not conducive for scalable p-computers because
Hamiltonians (e.g., random circuit sampling), the number of of the large fan-out demanded by the all-to-all connectivity.
samples required in naive implementations seems to grow An alternative is to map the RBM onto a sparse graph through
exponentially [120] due to the notorious sign-problem [121]. MGE [102]. Using a hybrid setup with fast sampling in a
However, clever basis transformations [122] might mitigate probabilistic computer coupled with a classical computer, the
iterative process of sampling and weight updating can then be
or cure the sign problem [123] in the future.
performed. The key advantage of having a massively parallel
and fast sampler is the selection of higher-quality states of
2) MACHINE-LEARNING QUANTUM MANY-BODY the wave function to update the variational guess. Fig. 8
SYSTEMS shows an example simulation of how a p-computer learns the
With the great success of ML and AI algorithms, training ground state of a 1-D FM Heisenberg model. The scaling of
stochastic neural networks (such as BMs) to approximately p-computers using magnetic p-bits may allow much larger
solve the quantum many-body problem starting from a vari- implementations of quantum systems in the future.
ational guess has generated great excitement [124], [125],
[126] and is considered to be a fruitful combination of quan-
D. OUTLOOK: ALGORITHMS AND APPLICATIONS
tum physics and ML [127]. These algorithms are typically
BEYOND
implemented in high-level software programs, allowing users
to choose from various network models and sizes according Despite the large range of applications we discussed in the
to their needs. However, as with classical ML, the difficulty context of p-bits, much of the sampling algorithms have been
of training strongly hinders the use of deeper and more either standard MCMC or generic simulated annealing-based
general models. With scaled p-computers using millions of approaches. Future possibilities involve more sophisticated
magnetic p-bits, massively parallel and energy-efficient hard- sampling and annealing algorithms such as parallel tempering
ware implementations of the more general unrestricted/deep (PT) (see [128], [129] for some initial investigations). Further
BMs may become feasible, paving the way to simulate prac- improvements to hardware implementation include adaptive
tical quantum systems. versions of PT [130] as well as sophisticated nonequilibrium
To demonstrate one such example of this approach, Monte Carlo (NMC) algorithms [131]. Ideas involving over-
we show how p-bits laid out in sparse, hardware-aware graphs clocking p-bits such that they violate the tsynapse ≪ ⟨Tp-bit ⟩
can be used for ML quantum systems (see Fig. 8). The objec- requirement for further improvement [14] or sharing synaptic
tive of this problem is to find the ground state of a many-body operations between p-bits [82] could also be useful. A com-
quantum system, in this case, a 1-D FM Heisenberg Hamil- bination of these ideas with algorithm-architecture-device
tonian with an external transverse field. We start with an codesigns may lead to orders of magnitude improvement
RBM, which is one of the simplest neural network models, in sampling speeds and quality. In this context, as a sam-
and use its functional form as the variational guess for the pling throughput metric, increasing flips/ns is an important
ground state probabilities (the wave function is obtained by goal. In addition, solution quality, the possibility of cluster
taking the square root of probabilities according to the Born updates or algorithmic techniques also need to be considered
rule). A combination of probabilistic sampling and weight carefully. Given the plethora of approaches from multiple
updates gradually adjusts the variational guess such that communities, we also hope that model problems, benchmark-
the final guess points to the ground state of the quantum ing studies comparing different Ising machines, probabilistic
accelerators, physical annealers, and dynamical solvers will [20] E. Aarts and J. Korst, Simulated Annealing and Boltzmann Machines: A
be performed in the near future by all practitioners, including Stochastic Approach to Combinatorial Optimization and Neural Comput-
ing. Hoboken, NJ, USA: Wiley, 1989.
ourselves.
[21] A. Houshang, M. Zahedinejad, S. Muralidhar, J. Checinski, A. A. Awad,
We believe that the codesign of algorithms, architec- and J. Åkerman, ‘‘A spin Hall Ising machine,’’ 2020, arXiv:2006.02236.
tures, and devices for probabilistic computing may not [22] Y. Su, J. Mu, H. Kim, and B. Kim, ‘‘A scalable CMOS Ising computer
only help mitigate the looming energy crisis of ML and featuring sparse and reconfigurable spin interconnects for solving com-
AI, but also lead to systems that may unlock previously binatorial optimization problems,’’ IEEE J. Solid-State Circuits, vol. 57,
no. 3, pp. 858–868, Mar. 2022.
inaccessible regimes using powerful probabilistic (random-
[23] S. Bhanja, D. K. Karunaratne, R. Panchumarthy, S. Rajaram, and
ized) algorithms [132]. Just as the emergence of power- S. Sarkar, ‘‘Non-Boolean computing with nanomagnets for computer
ful GPUs made the well-known backpropagation algorithm vision applications,’’ Nature Nanotechnol., vol. 11, no. 2, pp. 177–183,
flourish, probabilistic computers could lead us to previously Feb. 2016.
[24] P. Debashis, R. Faria, K. Y. Camsari, J. Appenzeller, S. Datta, and
unknown territory of energy-based AI models, combinatorial Z. Chen, ‘‘Experimental demonstration of nanomagnet networks as hard-
optimization, and quantum simulation. This research pro- ware for Ising computing,’’ in IEDM Tech. Dig., Dec. 2016, pp. 3–34.
gram requires a concerted effort and interdisciplinary exper- [25] P. L. McMahon et al., ‘‘A fully programmable 100-spin coherent Ising
tise from all across the stack and ties into the larger vision of machine with all-to-all connections,’’ Science, vol. 354, pp. 614–617,
Nov. 2016.
unconventional computing forming in the community [133]. [26] S. Dutta et al., ‘‘An Ising Hamiltonian solver based on coupled stochas-
tic phase-transition nano-oscillators,’’ Nature Electron., vol. 4, no. 7,
ACKNOWLEDGMENT pp. 502–512, Jul. 2021.
[27] J. Chou, S. Bramhavar, S. Ghosh, and W. Herzog, ‘‘Analog coupled oscil-
Supriyo Datta has a financial interest in Ludwig Computing. lator based weighted Ising machine,’’ Sci. Rep., vol. 9, no. 1, pp. 1–10,
Oct. 2019.
REFERENCES [28] M. Yamaoka et al., ‘‘A 20 k-spin Ising chip to solve combinatorial opti-
mization problems with CMOS annealing,’’ IEEE J. Solid-State Circuits,
[1] D. Patterson et al., ‘‘Carbon emissions and large neural network training,’’ vol. 51, no. 1, pp. 303–309, Dec. 2015.
2021, arXiv:2104.10350. [29] T. Wang and J. Roychowdhury, ‘‘OIM: Oscillator-based Ising machines
[2] S. Sudhakar, V. Sze, and S. Karaman, ‘‘Data centers on wheels: Emissions for solving combinatorial optimisation problems,’’ in Proc. Int. Conf.
from computing onboard autonomous vehicles,’’ IEEE Micro, vol. 43, Unconventional Comput. Natural Comput. Cham, Switzerland: Springer,
no. 1, pp. 29–39, Jan. 2023. 2019, pp. 232–256.
[3] R. Chau, ‘‘Process and packaging innovations for Moore’s law continua- [30] Y. Shim, A. Jaiswal, and K. Roy, ‘‘Ising computation based combinatorial
tion and beyond,’’ in IEDM Tech. Dig., Dec. 2019, p. 1. optimization using spin-Hall effect (SHE) induced stochastic magnetiza-
[4] J. C. Wong and S. Salahuddin, ‘‘Negative capacitance transistors,’’ Proc. tion reversal,’’ J. Appl. Phys., vol. 121, no. 19, 2017, Art. no. 193902.
IEEE, vol. 107, no. 1, pp. 49–62, Jan. 2019. [31] T. Inagaki et al., ‘‘A coherent Ising machine for 2000-node optimization
[5] M. A. Alam, M. Si, and P. D. Ye, ‘‘A critical review of recent progress on problems,’’ Science, vol. 354, no. 6312, pp. 603–606, Nov. 2016.
negative capacitance field-effect transistors,’’ Appl. Phys. Lett., vol. 114, [32] M. Baity-Jesi et al., ‘‘Janus II: A new generation application-driven com-
no. 9, Mar. 2019, Art. no. 090401. puter for spin-system simulations,’’ Comput. Phys. Commun., vol. 185,
[6] S. Manipatruni et al., ‘‘Scalable energy-efficient magnetoelectric no. 2, pp. 550–559, Feb. 2014.
spin–orbit logic,’’ Nature, vol. 565, no. 7737, pp. 35–42, Dec. 2018. [33] M. Yamaoka, C. Yoshimura, M. Hayashi, T. Okuyama, H. Aoki, and
[7] P. Debashis et al., ‘‘Low-voltage and high-speed switching of a magne- H. Mizuno, ‘‘24.3 20 k-spin Ising chip for combinational optimization
toelectric element for energy efficient compute,’’ in IEDM Tech. Dig., problem with CMOS annealing,’’ in IEEE Int. Solid-State Circuits Conf.
Dec. 2022, pp. 34–36. (ISSCC) Dig. Tech. Papers, Feb. 2015, pp. 1–3.
[8] A. Chen, ‘‘Emerging research device roadmap and perspectives,’’ in Proc. [34] N. G. Berloff et al., ‘‘Realizing the classical XY Hamiltonian in polariton
IEEE Int. Conf. IC Design Technol., May 2014, pp. 1–4. simulators,’’ Nature Mater., vol. 16, no. 11, pp. 1120–1126, Nov. 2017.
[35] T. Takemoto, M. Hayashi, C. Yoshimura, and M. Yamaoka,
[9] G. Finocchio, M. Di Ventra, K. Y. Camsari, K. Everschor-Sitte,
‘‘A 2 × 30 k-spin multichip scalable annealing processor based on
P. K. Amiri, and Z. Zeng, ‘‘The promise of spintronics for uncon-
a processing-in-memory approach for solving large-scale combinatorial
ventional computing,’’ J. Magn. Magn. Mater., vol. 521, Mar. 2021,
optimization problems,’’ in IEEE Int. Solid-State Circuits Conf. (ISSCC)
Art. no. 167506.
Dig. Tech. Papers, Feb. 2019, pp. 52–54.
[10] B. Behin-Aein, V. Diep, and S. Datta, ‘‘A building block for hardware
[36] H. Goto, K. Tatsumura, and A. R. Dixon, ‘‘Combinatorial optimization
belief networks,’’ Sci. Rep., vol. 6, no. 1, pp. 1–10, Jul. 2016.
by simulating adiabatic bifurcations in nonlinear Hamiltonian systems,’’
[11] B. Sutton, K. Y. Camsari, B. Behin-Aein, and S. Datta, ‘‘Intrinsic opti- Sci. Adv., vol. 5, no. 4, Apr. 2019.
mization using stochastic nanomagnets,’’ Sci. Rep., vol. 7, p. 44370, [37] M. Aramon, G. Rosenberg, E. Valiante, T. Miyazawa, H. Tamura, and
Mar. 2017. H. Katzgrabeer, ‘‘Physics-inspired optimization for quadratic uncon-
[12] K. Y. Camsari, R. Faria, B. M. Sutton, and S. Datta, ‘‘Stochastic p-bits for strained problems using a digital annealer,’’ Frontiers Phys., vol. 7, p. 48,
invertible logic,’’ Phys. Rev. X, vol. 7, no. 3, 2017, Art. no. 031014. Apr. 2019.
[13] R. P. Feynman, ‘‘Simulating physics with computers,’’ Int. J. Theor. Phys., [38] A. Mallick, M. K. Bashar, D. S. Truesdell, B. H. Calhoun, S. Joshi,
vol. 21, nos. 6–7, pp. 467–488, 1982. and N. Shukla, ‘‘Using synchronized oscillators to compute the max-
[14] N. A. Aadit et al., ‘‘Massively parallel probabilistic computing with imum independent set,’’ Nature Commun., vol. 11, no. 1, pp. 1–7,
sparse Ising machines,’’ Nature Electron., pp. 1–9, 2022. Sep. 2020.
[15] J. Kaiser, R. Jaiswal, B. Behin-Aein, and S. Datta, ‘‘Benchmarking a [39] K. Yamamoto et al., ‘‘7.3 STATICA: A 512-spin 0.25 M-weight full-
probabilistic coprocessor,’’ 2021, arXiv:2109.14801. digital annealing processor with a near-memory all-spin-updates-at-once
[16] S. Misra et al., ‘‘Probabilistic neural computing with stochastic devices,’’ architecture for combinatorial optimization with complete spin-spin inter-
Adv. Mater., Nov. 2022, Art. no. 2204569. actions,’’ in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech.
[17] P. J. Coles, ‘‘Thermodynamic AI and the fluctuation frontier,’’ 2023, Papers, Feb. 2020, pp. 138–140.
arXiv:2302.06584. [40] S. Patel, P. Canoza, and S. Salahuddin, ‘‘Logically synthesized and
[18] W. A. Borders, A. Z. Pervaiz, S. Fukami, K. Y. Camsari, H. Ohno, and hardware-accelerated restricted Boltzmann machines for combinatorial
S. Datta, ‘‘Integer factorization using stochastic magnetic tunnel junc- optimization and integer factorization,’’ Nature Electron., vol. 5, no. 2,
tions,’’ Nature, vol. 573, no. 7774, pp. 390–393, Sep. 2019. pp. 92–101, Feb. 2022.
[19] R. Faria, J. Kaiser, K. Y. Camsari, and S. Datta, ‘‘Hardware design for [41] R. Afoakwa, Y. Zhang, U. K. R. Vengalam, Z. Ignjatovic, and M. Huang,
autonomous Bayesian networks,’’ Frontiers Comput. Neurosci., vol. 15, ‘‘A CMOS-compatible Ising machine with bistable nodes,’’ 2020,
p. 14, Mar. 2021. arXiv:2007.06665.
[42] A. Lu et al., ‘‘Scalable in-memory clustered annealer with temporal noise [66] L. Rehm et al., ‘‘Stochastic magnetic actuated random transducer
of FinFET for the travelling salesman problem,’’ in IEDM Tech. Dig., devices based on perpendicular magnetic tunnel junctions,’’ 2022,
Dec. 2022, pp. 5–22. arXiv:2209.01480.
[43] W. Moy, I. Ahmed, P.-W. Chiu, J. Moy, S. S. Sapatnekar, and C. H. Kim, [67] B. R. Zink, Y. Lv, and J.-P. Wang, ‘‘Review of magnetic tunnel junctions
‘‘A 1,968-node coupled ring oscillator circuit for combinatorial opti- for stochastic computing,’’ IEEE J. Exp. Solid-State Comput. Devices
mization problem solving,’’ Nature Electron., vol. 5, no. 5, pp. 310–317, Circuits, vol. 8, no. 2, pp. 173–184, Dec. 2022.
May 2022. [68] A. Fukushima et al., ‘‘Spin dice: A scalable truly random number gen-
[44] N. Mohseni, P. L. McMahon, and T. Byrnes, ‘‘Ising machines as hardware erator based on spintronics,’’ Appl. Phys. Exp., vol. 7, no. 8, 2014,
solvers of combinatorial optimization problems,’’ Nature Rev. Phys., Art. no. 083001.
vol. 4, no. 6, pp. 363–379, May 2022. [69] Y. Shao et al., ‘‘Implementation of artificial neural networks using mag-
[45] M. Marsman, G. Maris, T. Bechger, and C. Glas, ‘‘Bayesian netoresistive random-access memory-based stochastic computing units,’’
inference for low-rank Ising networks,’’ Sci. Rep., vol. 5, no. 1, IEEE Magn. Lett., vol. 12, pp. 1–5, 2021.
Mar. 2015. [70] K. Y. Camsari, S. Salahuddin, and S. Datta, ‘‘Implementing p-bits
[46] M. T. McCray, M. A. Abeed, and S. Bandyopadhyay, ‘‘Electrically pro- with embedded MTJ,’’ IEEE Electron Device Lett., vol. 38, no. 12,
grammable probabilistic bit anti-correlator on a nanomagnetic platform,’’ pp. 1767–1770, Dec. 2017.
Sci. Rep., vol. 10, no. 1, pp. 1–11, Jul. 2020. [71] J. Kaiser, A. Rustagi, K. Y. Camsari, J. Z. Sun, S. Datta, and P. Upadhyaya,
[47] P. Debashis, V. Ostwal, R. Faria, S. Datta, J. Appenzeller, and Z. Chen, ‘‘Subnanosecond fluctuations in low-barrier nanomagnets,’’ Phys. Rev.
‘‘Hardware implementation of Bayesian network building blocks with Appl., vol. 12, no. 5, Nov. 2019, Art. no. 054056.
stochastic spintronic devices,’’ Sci. Rep., vol. 10, no. 1, p. 16002, [72] O. Hassan, R. Faria, K. Y. Camsari, J. Z. Sun, and S. Datta, ‘‘Low-barrier
Sep. 2020. magnet design for efficient hardware binary stochastic neurons,’’ IEEE
[48] S. Nasrin, J. Drobitch, P. Shukla, T. Tulabandhula, S. Bandyopadhyay, Magn. Lett., vol. 10, pp. 1–5, 2019.
and A. R. Trivedi, ‘‘Bayesian reasoning machine on a magneto- [73] K. Hayakawa et al., ‘‘Nanosecond random telegraph noise in in-plane
tunneling junction network,’’ Nanotechnology, vol. 31, no. 48, Nov. 2020, magnetic tunnel junctions,’’ Phys. Rev. Lett., vol. 126, no. 11, Mar. 2021,
Art. no. 484001. Art. no. 117202.
[49] K.-E. Harabi et al., ‘‘A memristor-based Bayesian machine,’’ Nature [74] C. Safranski, J. Kaiser, P. Trouilloud, P. Hashemi, G. Hu, and
Electron., vol. 6, pp. 1–12, Dec. 2022. J. Z. Sun, ‘‘Demonstration of nanosecond operation in stochastic mag-
[50] M. Golam Morshed, S. Ganguly, and A. W. Ghosh, ‘‘A deep dive into netic tunnel junctions,’’ Nano Lett., vol. 21, no. 5, pp. 2040–2045,
the computational fidelity of high variability low energy barrier magnet Feb. 2021.
technology for accelerating optimization and Bayesian problems,’’ 2023, [75] S. Kanai, K. Hayakawa, H. Ohno, and S. Fukami, ‘‘Theory of relaxation
arXiv:2302.08074. time of stochastic nanomagnets,’’ Phys. Rev. B, Condens. Matter, vol. 103,
[51] C. Bybee, D. Kleyko, D. E. Nikonov, A. Khosrowshahi, B. A. Olshausen, no. 9, Mar. 2021, Art. no. 094423, doi: 10.1103/PhysRevB.103.094423.
and F. T. Sommer, ‘‘Efficient optimization with higher-order Ising [76] J. Kaiser, W. A. Borders, K. Y. Camsari, S. Fukami, H. Ohno, and S. Datta,
machines,’’ 2022, arXiv:2212.03426. ‘‘Hardware-aware in situ learning based on stochastic magnetic tunnel
[52] M. K. Bashar and N. Shukla, ‘‘Constructing dynamical systems to model junctions,’’ Phys. Rev. Appl., vol. 17, no. 1, Jan. 2022, Art. no. 014016.
higher order Ising spin interactions and their application in solving com- [77] A. Grimaldi et al., ‘‘Experimental evaluation of simulated quantum
binatorial optimization problems,’’ 2022, arXiv:2211.05365. annealing with MTJ-augmented p-bits,’’ in IEDM Tech. Dig., Dec. 2022,
[53] N. Onizawa and T. Hanyu, ‘‘High convergence rates of CMOS invertible pp. 22–24.
logic circuits based on many-body Hamiltonians,’’ in Proc. IEEE Int. [78] J. Yin et al., ‘‘Scalable Ising computer based on ultra-fast field-free spin
Symp. Circuits Syst. (ISCAS), May 2021, pp. 1–5. orbit torque stochastic device with extreme 1-bit quantization,’’ in IEDM
[54] B. Sutton, R. Faria, L. A. Ghantasala, R. Jaiswal, K. Y. Camsari, and Tech. Dig., Dec. 2022, pp. 31–36.
S. Datta, ‘‘Autonomous probabilistic coprocessing with petaflips per [79] G. M. Gutiérrez-Finol, S. Giménez-Santamarina, Z. Hu, L. E. Rosaleny,
second,’’ IEEE Access, vol. 8, pp. 157238–157252, 2020. S. Cardona-Serra, and A. Gaita-Ariño, ‘‘Lanthanide molecular nanomag-
[55] J. Kaiser and S. Datta, ‘‘Probabilistic computing with p-bits,’’ Appl. Phys. nets as probabilistic bits,’’ 2023, arXiv:2301.08182.
Lett., vol. 119, no. 15, Oct. 2021, Art. no. 150503. [80] K. S. Woo, J. Kim, J. Han, W. Kim, Y. H. Jang, and C. S. Hwang,
[56] S. Aggarwal et al., ‘‘Demonstration of a reliable 1 Gb standalone spin- ‘‘Probabilistic computing using Cu0.1 Te0.9 /HfO2 /Pt diffusive memris-
transfer torque MRAM for industrial applications,’’ in IEDM Tech. Dig., tors,’’ Nature Commun., vol. 13, no. 1, p. 5762, Sep. 2022.
Dec. 2019, pp. 1–2. [81] Y. Liu et al., ‘‘Probabilistic circuit implementation based on p-bits using
[57] K. Lee et al., ‘‘1 Gbit high density embedded STT-MRAM in 28 nm the intrinsic random property of RRAM and p-bit multiplexing strategy,’’
FDSOI technology,’’ in IEDM Tech. Dig., Dec. 2019, p. 2. Micromachines, vol. 13, no. 6, p. 924, Jun. 2022.
[58] J. L. Drobitch and S. Bandyopadhyay, ‘‘Reliability and scalability of [82] T. J. Park et al., ‘‘Efficient probabilistic computing with stochastic
p-bits implemented with low energy barrier nanomagnets,’’ IEEE Magn. perovskite nickelates,’’ Nano Lett., vol. 22, no. 21, pp. 8654–8661,
Lett., vol. 10, pp. 1–4, 2019. Nov. 2022.
[59] R. Rahman and S. Bandyopadhyay, ‘‘The strong sensitivity of the charac- [83] S. Cheemalavagu, P. Korkmaz, K. V. Palem, B. E. Akgul, and
teristics of binary stochastic neurons employing low barrier nanomagnets L. N. Chakrapani, ‘‘A probabilistic CMOS switch and its realization by
to small geometrical variations,’’ 2021, arXiv:2108.04319. exploiting noise,’’ in Proc. IFIP Int. Conf. VLSI, 2005, pp. 535–541.
[60] O. Hassan, S. Datta, and K. Y. Camsari, ‘‘Quantitative evaluation of [84] W. Whitehead, Z. Nelson, K. Y. Camsari, and L. Theogarajan, ‘‘CMOS-
hardware binary stochastic neurons,’’ Phys. Rev. Appl., vol. 15, no. 6, compatible Ising and Potts annealing using single photon avalanche
Jun. 2021, Art. no. 064046. diodes,’’ 2022, arXiv:2211.12607.
[61] K. Y. Camsari, M. M. Torunbalci, W. A. Borders, H. Ohno, and S. Fukami, [85] L. Xia et al., ‘‘Technological exploration of RRAM crossbar array for
‘‘Double-free-layer magnetic tunnel junctions for probabilistic bits,’’ matrix-vector multiplication,’’ J. Comput. Sci. Technol., vol. 31, no. 1,
Phys. Rev. Appl., vol. 15, no. 4, Apr. 2021, Art. no. 044049. pp. 3–19, Jan. 2016.
[62] R. Rahman and S. Bandyopadhyay, ‘‘Robustness of binary stochastic neu- [86] Y. Li et al., ‘‘Capacitor-based cross-point array for analog neural network
rons implemented with low barrier nanomagnets made of dilute magnetic with record symmetry and linearity,’’ in Proc. IEEE Symp. VLSI Technol.,
semiconductors,’’ IEEE Magn. Lett., vol. 13, pp. 1–4, 2022. Jun. 2018, pp. 25–26.
[63] Y. Lv, R. P. Bloom, and J.-P. Wang, ‘‘Experimental demonstration of [87] O. Hassan, K. Y. Camsari, and S. Datta, ‘‘Voltage-driven building
probabilistic spin logic by magnetic tunnel junctions,’’ IEEE Magn. Lett., block for hardware belief networks,’’ IEEE Design Test, vol. 36, no. 3,
vol. 10, pp. 1–5, 2019. pp. 15–21, Jun. 2019.
[64] K. Y. Camsari et al., ‘‘From charge to spin and spin to charge: Stochas- [88] M. Kang, S. K. Gonugondla, A. Patil, and N. R. Shanbhag, ‘‘A
tic magnets for probabilistic switching,’’ Proc. IEEE, vol. 108, no. 8, multi-functional in-memory inference processor using a standard 6T
pp. 1322–1337, Aug. 2020. SRAM array,’’ IEEE J. Solid-State Circuits, vol. 53, no. 2, pp. 642–655,
[65] X. Chen, J. Zhang, and J. Xiao, ‘‘Magnetic-tunnel-junction-based true Feb. 2018.
random-number generator with enhanced generation rate,’’ Phys. Rev. [89] N. Verma et al., ‘‘In-memory computing: Advances and prospects,’’ IEEE
Appl., vol. 18, no. 2, Aug. 2022, L021002. Solid State Circuits Mag., vol. 11, no. 3, pp. 43–55, Aug. 2019.
[90] S. Geman and D. Geman, ‘‘Stochastic relaxation, Gibbs distributions, and [112] J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Ganguli, ‘‘Deep
the Bayesian restoration of images,’’ IEEE Trans. Pattern Anal. Mach. unsupervised learning using nonequilibrium thermodynamics,’’ in Proc.
Intell., vol. PAMI-6, no. 6, pp. 721–741, Nov. 1984. Int. Conf. Mach. Learn., 2015, pp. 2256–2265.
[91] D. Koller and N. Friedman, Probabilistic Graphical Models: Principles [113] H. Ma, M. Govoni, and G. Galli, ‘‘Quantum simulations of materials on
and Techniques. Cambridge, MA, USA: MIT Press, 2009. near-term quantum computers,’’ npj Comput. Mater., vol. 6, no. 1, p. 85,
[92] G. E. Hinton, A Practical Guide to Training Restricted Boltzmann Jul. 2020.
Machines. Berlin, Germany: Springer, 2012, pp. 599–619. [114] R. Babbush, J. R. McClean, M. Newman, C. Gidney, S. Boixo, and
[93] M. R. Garey and D. S. Johnson, Computers and Intractability, vol. 174, H. Neven, ‘‘Focus beyond quadratic speedups for error-corrected quan-
San Francisco, CA, USA: Freeman, 1979. tum advantage,’’ PRX Quantum, vol. 2, no. 1, Mar. 2021, Art. no. 010103.
[94] D. Brélaz, ‘‘New methods to color the vertices of a graph,’’ Commun. [115] K. Y. Camsari, S. Chowdhury, and S. Datta, ‘‘Scalable emulation of sign-
ACM, vol. 22, no. 4, pp. 251–256, 1979. problem–free Hamiltonians with room-temperature p-bits,’’ Phys. Rev.
[95] Y. Fang et al., ‘‘Parallel tempering simulation of the three-dimensional Appl., vol. 12, no. 3, Sep. 2019, Art. no. 034061.
Edwards–Anderson model with compact asynchronous multispin coding [116] M. Suzuki, ‘‘Relationship between d-dimensional quantal spin systems
on GPU,’’ Comput. Phys. Commun., vol. 185, no. 10, pp. 2467–2478, and (d+1)-dimensional Ising systems: Equivalence, critical exponents
Oct. 2014. and systematic approximants of the partition function and spin correla-
[96] K. Yang, Y.-F. Chen, G. Roumpos, C. Colby, and J. Anderson, ‘‘High tions,’’ Prog. Theor. Phys., vol. 56, no. 5, pp. 1454–1469, Nov. 1976.
performance Monte Carlo simulation of Ising model on TPU clusters,’’ in [117] A. D. King et al., ‘‘Scaling advantage over path-integral Monte Carlo in
Proc. Int. Conf. High Perform. Comput., Netw., Storage Anal., Nov. 2019, quantum simulation of geometrically frustrated magnets,’’ Nature Com-
pp. 1–15. mun., vol. 12, no. 1, pp. 1–6, 2021.
[97] P. Debashis, R. Faria, K. Y. Camsari, S. Datta, and Z. Chen, ‘‘Cor- [118] S. Chowdhury, K. Y. Camsari, and S. Datta, ‘‘Accelerated quantum Monte
related fluctuations in spin orbit torque coupled perpendicular nano- Carlo with probabilistic computers,’’ 2022, arXiv:2210.17526.
magnets,’’ Phys. Rev. B, Condens. Matter, vol. 101, no. 9, Mar. 2020, [119] S. Chowdhury, S. Datta, and K. Y. Camsari, ‘‘A probabilistic approach
Art. no. 094405. to quantum inspired algorithms,’’ in IEDM Tech. Dig., Dec. 2019,
[98] N. A. Aadit, A. Grimaldi, G. Finocchio, and K. Y. Camsari, ‘‘Physics- pp. 5–37.
inspired Ising computing with ring oscillator activated p-bits,’’ in Proc. [120] S. Chowdhury, K. Y. Camsari, and S. Datta, ‘‘Emulating quantum inter-
IEEE 22nd Int. Conf. Nanotechnol. (NANO), Jul. 2022, pp. 393–396. ference with generalized Ising machines,’’ 2020, arXiv:2007.07379.
[99] S. Bhatti, R. Sbiaa, A. Hirohata, H. Ohno, S. Fukami, and [121] M. Troyer and U.-J. Wiese, ‘‘Computational complexity and fundamental
S. Piramanayagam, ‘‘Spintronics based random access memory: A limitations to fermionic quantum Monte Carlo simulations,’’ Phys. Rev.
review,’’ Mater. Today, vol. 20, no. 9, pp. 530–548, 2017. Lett., vol. 94, no. 17, May 2005, Art. no. 170201.
[100] H. Hoos and T. Stützle, SATLIB: An Online Resource for Research on [122] D. Aharonov, X. Gao, Z. Landau, Y. Liu, and U. Vazirani, ‘‘A polynomial-
SAT. Amsterdam, The Netherlands: IOS Press, Apr. 2000, pp. 283–292. time classical algorithm for noisy random circuit sampling,’’ 2022,
[101] A. Z. Pervaiz, L. A. Ghantasala, K. Y. Camsari, and S. Datta, ‘‘Hardware arXiv:2211.03999.
emulation of stochastic p-bits for invertible logic,’’ Sci. Rep., vol. 7, no. 1, [123] D. Hangleiter, I. Roth, D. Nagaj, and J. Eisert, ‘‘Easing the
p. 10994, Sep. 2017. Monte Carlo sign problem,’’ Sci. Adv., vol. 6, no. 33, Aug. 2020,
[102] V. Choi, ‘‘Minor-embedding in adiabatic quantum computation: II. Art. no. eabb8341.
Minor-universal graph design,’’ Quantum Inf. Process., vol. 10, no. 3, [124] G. Carleo and M. Troyer, ‘‘Solving the quantum many-body prob-
pp. 343–353, 2011. lem with artificial neural networks,’’ Science, vol. 355, pp. 602–606,
[103] N. Sagan and J. Roychowdhury, ‘‘DaS: Implementing dense Ising Feb. 2017.
machines using sparse resistive networks,’’ in Proc. 41st IEEE/ACM Int. [125] Z. Cai and J. Liu, ‘‘Approximating quantum many-body wave functions
Conf. Comput.-Aided Design, Oct. 2022, pp. 1–9. using artificial neural networks,’’ Phys. Rev. B, Condens. Matter, vol. 97,
[104] A. Lucas, ‘‘Ising formulations of many NP problems,’’ Frontiers Phys., no. 3, Jan. 2018, Art. no. 035116, doi: 10.1103/PhysRevB.97.035116.
vol. 2, p. 5, Feb. 2014. [126] H. Saito and M. Kato, ‘‘Machine learning technique to find quantum
[105] E. Andriyash et al., ‘‘Boosting integer factoring performance via quantum many-body ground states of bosons on a lattice,’’ J. Phys. Soc. Jpn.,
annealing offsets,’’ D-Wave Syst., Burnaby, BC, Canada, D-Wave Tech. vol. 87, no. 1, Jan. 2018, Art. no. 014001, doi: 10.7566/JPSJ.87.014001.
Rep. 14-1002A-B, 2016. [127] S. Das Sarma, D.-L. Deng, and L.-M. Duan, ‘‘Machine learning meets
[106] J. D. Biamonte, ‘‘Nonperturbative k-body to two-body commut- quantum physics,’’ 2019, arXiv:1903.03516.
ing conversion Hamiltonians and embedding problem instances into [128] N. A. Aadit, A. Grimaldi, M. Carpentieri, L. Theogarajan, G. Finocchio,
Ising spins,’’ Phys. Rev. A, Gen. Phys., vol. 77, no. 5, May 2008, and K. Y. Camsari, ‘‘Computing with invertible logic: Combinatorial
Art. no. 052331. optimization with probabilistic bits,’’ in IEDM Tech. Dig., Dec. 2021,
[107] F. L. Traversa and M. D. Ventra, ‘‘Polynomial-time solution of prime pp. 3–40.
factorization and NP-complete problems with digital memcomputing [129] A. Grimaldi et al., ‘‘Spintronics-compatible approach to solving
machines,’’ Chaos, Interdiscipl. J. Nonlinear Sci., vol. 27, no. 2, 2017, maximum-satisfiability problems with probabilistic computing, invertible
Art. no. 023107. logic, and parallel tempering,’’ Phys. Rev. Appl., vol. 17, no. 2, Feb. 2022,
[108] M. Di Ventra, MemComputing: Fundamentals and Applications. Oxford, Art. no. 024052.
U.K.: Oxford Univ. Press, 2022. [130] G. Desjardins, A. Courville, and Y. Bengio, ‘‘Adaptive parallel tem-
[109] R. Salakhutdinov and G. Hinton, ‘‘Deep Boltzmann machines,’’ in Proc. pering for stochastic maximum likelihood learning of RBMs,’’ 2010,
12th Int. Conf. Artif. Intell. Statist., in Proceedings of Machine Learning arXiv:1012.3476.
Research, vol. 5, D. Van Dyk and M. Welling, Eds. Clearwater Beach, FL, [131] M. Mohseni et al., ‘‘Nonequilibrium Monte Carlo for unfreezing variables
USA: Hilton Clearwater Beach Resort, Apr. 2009, pp. 448–455. [Online]. in hard combinatorial optimization,’’ 2021, arXiv:2111.13628.
Available: https://proceedings.mlr.press/v5/salakhutdinov09a.html [132] A. Buluc et al., ‘‘Randomized algorithms for scientific computing
[110] S. H. Adachi and M. P. Henderson, ‘‘Application of quantum annealing (RASC),’’ 2021, arXiv:2104.11079.
to training of deep neural networks,’’ 2015, arXiv:1510.06356. [133] G. Finocchio et al., ‘‘Roadmap for unconventional computing with nan-
[111] H. Manukian, F. L. Traversa, and M. Di Ventra, ‘‘Accelerating deep learn- otechnology,’’ 2023, arXiv:2301.06727.
ing with memcomputing,’’ Neural Netw., vol. 110, pp. 1–7, Feb. 2019.