0% found this document useful (0 votes)

22 views8 pages

AI Research Paper (4x4 Sudoku Solver)

The paper introduces 'Multiverse', a deep learning model designed to solve 4x4 Sudoku puzzles by treating them as multi-level sequence completion tasks using Long Short-Term Memory (LSTM) modules. The model demonstrates a completion rate exceeding 99% in a single prediction, outperforming existing state-of-the-art methods that require multiple iterations. This approach opens avenues for further research into solving larger and more complex Sudoku puzzles efficiently.

Uploaded by

mayurbonde144

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views8 pages

AI Research Paper (4x4 Sudoku Solver)

Uploaded by

mayurbonde144

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Multiverse: A Deep Learning 4X4 Sudoku Solver

Chaim Schendowich1 , Eyal Ben Isaac2 and Rina Azoulay1

1 Dept. Computer Sciences, Lev Academic Center, Jerusalem, Israel
2 Dept. Data Mining, Lev Academic Center, Jerusalem, Israel

Keywords: Sudoku, Deep Learning, One-Shot Prediction, Sequence Completion.

Abstract: This paper presents a novel deep learning-based approach to solving 4x4 Sudoku puzzles, by viewing Su-
doku as a complex multi-level sequence completion problem. It introduces a neural network model, termed as
”Multiverse”, which comprises multiple parallel computational units, or ”verses”. Each unit is designed for
sequence completion based on Long Short-Term Memory (LSTM) modules. The paper’s novel perspective
views Sudoku as a sequence completion task rather than a pure constraint satisfaction problem. The study
generated its own dataset for 4x4 Sudoku puzzles and proposed variants of the Multiverse model for compar-
ison and validation purposes. Comparative analysis shows that the proposed model is competitive with, and
potentially superior to, state-of-the-art models. Notably, the proposed model was able to solve the puzzles
in a single prediction, which offers promising avenues for further research on larger, more complex Sudoku
puzzles.

1 INTRODUCTION
In this paper we propose a deep learning model for
solution of 4X4 Sudoku puzzles and show that the
model can solve over 99 percent of the puzzles pro-
vided to it in just a single prediction while state-of-
(a) Boxes. (b) Row Groups. (c) Column Groups.
the-art systems needed more prediction iterations to Figure 1: Depiction of the Sudoku variables.
attain similar results.
The Sudoku puzzle was first introduced in the values so that every row, column, and sub-square will
1970s in a Dell magazine. It became fairly known have all the values from 1 · · · o2 . A block is either a
throughout Japan. In the early 2000s, the puzzle row, a column or a sub-square interchangeably.
started becoming extremely popular in Europe and Sudoku is considered a logic based puzzle. In fact,
then in the United States, igniting an interest not only there are many types of logic that must be combined
in the form of competitions but also in the form of to solve the hardest of puzzles, ranging from trial and
scientific research (see (Hayes, 2006) for a detailed error to deduction and from inference to elimination.
background). The popularity of the puzzle caused a A Sudoku can, technically, have more than one
lot of research to be done regarding its logic-based correct solution. A Well Posed Sudoku is a puzzle that
properties and the diverse methods for solving it. has only one correct solution. A puzzle can be easier
The Sudoku puzzle is composed of a square of to solve if there are redundant hints, namely givens
cells with o2 rows and o2 columns for some natural that can be deduced from one or more other givens. A
constant o called the order of the puzzle. The square puzzle with no redundant hints is called Locally Min-
is subdivided into its o2 primary o × o squares called imal (Simonis, 2005). Figures 2 and 3 demonstrate
boxes or sub-squares. The boxes divide the rows and examples of the types of puzzles and their solutions.
columns into o sets of rows and o sets of columns The puzzle in Fig. 2a is locally minimal because re-
called row groups and column groups respectively. moving any of the numbers will cause it to have more
The puzzle begins with some placement of values than one solution thus no longer being well posed. For
1 · · · o2 in some of the cells called givens or hints. The example, removing the 2 will cause Fig. 3b to be a
object of the puzzle is to fill the rest of the cells with valid solution as well; removing the 1 will make Fig.

15
Schendowich, C., Ben Isaac, E. and Azoulay, R.
Multiverse: A Deep Learning 4X4 Sudoku Solver.
DOI: 10.5220/0012232500003636
Paper published under CC license (CC BY-NC-ND 4.0)
In Proceedings of the 16th International Conference on Agents and Artificial Intelligence (ICAART 2024) - Volume 3, pages 15-22
ISBN: 978-989-758-680-4; ISSN: 2184-433X
Proceedings Copyright © 2024 by SCITEPRESS – Science and Technology Publications, Lda.
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence

(a) Well posed, locally minimal. (b) Only well posed. (c) Not well posed (4 solutions).
Figure 2: Examples for well posed and locally minimal puzzles.

3d be a valid solution, and so on. bespoke deep learning techniques to tackle its solu-
Sudoku can further be classified as a discrete con- tion. Recognizing that the puzzle’s completion pro-
straint satisfaction problem (CSP). In particular, Su- cess entails a multi-dimensional sequence, our pro-
doku is a special case of CSPs called an exact cover posed model incorporates multiple parallel computa-
problem, where each constraint must be satisfied by tional units. For the sake of simplicity, our investiga-
only one variable in the solution assignment. Any al- tion primarily focuses on training the model on 4x4
gorithm that can solve discrete CSPs or exact cover puzzles that exhibit local minimality. Through one-
problems will be able to provide a solution for a Su- shot prediction, we achieved an impressive comple-
doku, albeit usually in exponential run-time. tion rate exceeding 99 percent, effectively showcas-
Sudoku can also be approached as a Machine ing the neural network’s capacity to grasp the abstract
Learning problem, thus described as a multi-class relationships among the puzzle’s cells.
multi-label classification problem, meaning that a To the best of our knowledge, even though Large
given puzzle has multiple values that need to be deter- Language Models (LLMs) have been used for Sudoku
mined (namely the values of the unassigned cells) and solving, there has not been previous research based on
each of those is one discrete value of a shared domain the premise that Sudoku is a sequential completion
of integers. Therefore, there are various machine problem. We show that this approach is effective and
learning and deep learning methods that might be ap- justifiable.
plicable to it. Unfortunately, a very high level of gen-
eralization is required for learning the implicit con-
nections between the cells so simple methods are not 2 RELATED WORKS
good enough here (Palm et al., 2018). Some progress
has been made in this avenue of research, particularly Since solution by logic based algorithms has proven
combining deep learning with other methods (Wang intractable for difficult puzzles of high order, auto-
et al., 2019) or using networks with recurrent predic- mated Sudoku solving has become the object of a
tion steps (Palm et al., 2018; Yang et al., 2023). large amount of highly varied research.
In this study, we introduce a novel approach to Simonis (Simonis, 2005) did a thorough job defin-
tackle Sudoku solving by leveraging a neural network ing the basic constraints in Sudoku puzzles. He also
architecture. Our methodology is rooted in the notion showed that these constraints can be described in var-
that Sudoku puzzles share similarities with intricate ious ways, creating additional redundant constraints
sequential data completion problems. To address this, with which the puzzles can be solved more efficiently.
we devised a specialized sequential completion unit In his paper, he compares 15 different strategies based
based on Long Short-Term Memory (LSTM) and in- on constraint propagation and showed that the well
terconnected multiple such units in a parallel fashion. posed puzzles in his dataset can all be solved by an
Remarkably, our resultant model exhibits comparable all-different hyperarc solver, if the execution starts
competence to state-of-the-art models, while obviat- with one shaving move.
ing the need for multiple prediction iterations. No- Hunt et al. (Hunt et al., 2007) took this one step
tably, we demonstrate a substantial disparity between further by showing that the constraints can be real-
our model and existing approaches in terms of perfor- ized in a binary matrix solvable by the DLX algorithm
mance when such iterations are excluded.
provided by Knuth (Knuth, 2000) and that many of
Drawing upon the notion that Sudoku represents a
the common logical solving techniques can be sum-
complex sequential data completion task, we adopted

16
Multiverse: A Deep Learning 4X4 Sudoku Solver

(a) The solution for Fig. 2. (b) 2nd solution for Fig. 2c. (c) 3rd solution for Fig. 2c. (d) 4th solution for Fig. 2c.
Figure 3: Solutions for puzzles in Fig. 2.

marized in a singe theorem based on that matrix. The the problem structure coded manually and a series of
advantages of using DLX is that the algorithm is faster predictive steps towards a solution. In contrast, our
than other backtracking algorithms and also provides study eliminates the need for prior knowledge of the
the number of possible solutions thus giving indica- problem’s architecture and successfully resolves the
tion if a puzzle is well posed or not. The disadvantage puzzles in a single prediction step.
in using DLX is that since it is a backtracking algo- Wang et al. (Wang et al., 2019) combined a SAT
rithm its run-time is exponential by nature. solver with neural networks to add logical learning
Another related approach can be found in the methods that can overcome the difficulty traditional
works of Weber (Weber, 2005), Ist et al. (Ist neural networks have with global constraints in dis-
et al., 2006) and Posthoff and Steinbach (Posthoff crete logical relationships. Using that combination
and Steinbach, 2010) who all model Sudoku as a they succeeded in attaining a 98.3 percent comple-
SAT problem and use various SAT solvers to solve tion rate without any hand coded knowledge of the
it. This method utilizes powerful existing systems problem structure. This approach differs from our ap-
but requires explicit definition of the rules and lay- proach in that it requires the use of a SAT solver to
out of each puzzle, a number of clauses which could complement the prediction provided by the network.
expand exponentially with more complex puzzles. In Moreover, Chang et al. (Chang et al., 2020) showed
our study we favoured machine learning because it that the good results presented by Wang et al. (Wang
obviates the need of explicit description of the logic et al., 2019) are limited to easy puzzles and the results
problem. are significantly worse than those of Palm et al. (Palm
As mentioned in the introduction, deep learning is et al., 2018) when trying to solve hard puzzles.
also a good candidate for Sudoku solving. The advan- Mehta (Mehta, 2021) created a reward system for
tage of machine learning systems in general and deep a Q-agent and achieved 7 percent full puzzle com-
learning systems in particular is that they can gener- pletion rate in easy Sudokus and 2.1 and 1.2 per-
alize a direct solution for a problem without having cent win rates in medium and hard Sudokus respec-
to use algorithms of exponential or worse runtimes tively all with no rules provided. She did this un-
to solve particular instances of the problem. Once aware of Poloziuk and Smrkovska (Poloziuk and Sm-
trained, such a model could be significantly better rkovska, 2020) who tested more complex Q-based
even than the DLX algorithm. agents and Monte-Carlo Tree Search (MCTS) algo-
Park (Park, 2018) provides a model based on a rithms and came to the conclusion that they require
standard convolutional neural network that can solve too much computation power to be used reasonably
a Sudoku in a sequence of interdependent steps. It to solve Sudoku. With MCTS they performed a small
solved 70 percent of the puzzles it was tested on, us- number of experiments achieving 35-46 percent ac-
ing a loop that predicted the value of the one highest curacy and called it a success, even though the results
probability cell in each iteration. are fairly low, considering that in each experiment the
Palm et al. (Palm et al., 2018) created a graph training took them days to perform.
neural network that solves problems that require mul- Du et al. (Du et al., 2021) used a Multi Layered
tiple iterations of interdependent steps and showed Perceptron (MLP) to solve order 2 puzzles. They cre-
that it solved more than 96 percent of the Sudokus ated a small 4 layer dense neural network and per-
presented to it, including very hard ones as well, al- formed their prediction stage by stage each time fill-
though the solution in this method had to be done in ing in only the one highest probability cell. Their
a sequence of interdependent steps. While the results dataset included puzzles none of which were locally
achieved are noteworthy, it’s important to acknowl- minimal - all missing between 4 and 10 values. Their
edge that they necessitated both an understanding of model solved more than 99 percent of the puzzles

17
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence

tested. However, the vast majority of the puzzles Table 1: Well Posed Order 2 Puzzle Count.
tested were not locally minimal and the number of
H WP LM H WP LM
prediction iterations they required were equal to the
4 25728 25728 10 2204928 0
number of missing values. In our study, we not only
5 284160 58368 11 1239552 0
performed experiments on locally minimal puzzles
6 1041408 1536 12 522624 0
with 10, 11, or 12 missing values - the hardest well
7 2141184 0 13 161280 0
posed order 2 puzzles - and attained a completion rate
8 2961024 0 14 34560 0
greater than 99 percent, but did so in a single predic-
9 2958336 0
tion step, albeit with a more intricate model.
Yang et al. (Yang et al., 2023) trained a generative H - The number of hints in the puzzle.
pre-trained transformer (GPT) based model with the WP - Well Posed, LM - Locally Minimal.
dataset used by Palm et al. (Palm et al., 2018) and
ing 4x4 puzzles lies in the opportunity they provide to
tested it with iterative predictions. They had supe-
build, test and refine models that could be efficiently
rior results, solving more than 99 percent of the puz-
scaled to more intricate Sudoku variants. This makes
zles, although when restricted to the hardest puzzles
them an essential stepping stone in advancing deep
achieved a 96.7 percent completion rate. However,
learning methodologies for solving larger and more
Yang et al. (Yang et al., 2023) required a sequence
complex problems.
of prediction steps to reach the solution and did not
In this section we bring the technical information
provide a solution in one end-to-end prediction. They
of our methods, in particular the data composition and
needed 32 prediction iterations to achieve their results
the structure of our models.
in order 3 puzzles. The baseline for our study is the
model used by Yang et al. (Yang et al., 2023) modi-
fied for order 2 puzzles. We show that our model has 3.1 Datasets
competitively good results in less prediction iterations
than required by their model. Since most of the research into Sudoku was on order
As can be seen above, the significant results so 3 puzzles, we did not find an existing dataset of order
far have been achieved only in systems that integrate 2 puzzles so we created our own.
sequences of interdependent prediction stages. In this There exist exactly 288 unique order 2 solved Su-
paper we propose a model that achieves competent doku boards. Those boards represent 85632 puzzles
results in a single prediction stage. which are both well posed and locally minimal, each
containing only 4, 5, or 6 hints. It is possible to create
a larger number of well posed puzzles by adding more
hints, although those puzzles are not locally minimal.
3 DEEP LEARNING METHODS Figures 2 and 3 demonstrate examples for the various
possible types of puzzles and their solutions. Table 1
Our paper introduces a deep learning approach specif- shows the full number of well posed puzzles with 4 to
ically targeted at tackling order 2 Sudoku puzzles. 14 hints and how many of them are locally minimal.
These are 4x4 puzzles that, while smaller in scale Our primary dataset consists of all 85632 well
compared to the standard order 3 Sudokus, present a posed and locally minimal order 2 puzzles. Details
unique appeal for scientific investigation. Given their on the generation process of the puzzles is provided
relative simplicity, both in terms of representation and in the appendix. The training of our models was per-
analysis, focusing our research on order 2 puzzles en- formed on a subset of 77069 puzzles using 9-fold
ables more rapid training and facilitates quicker at- cross validation (We divided the puzzles into 10 sub-
tainment of results. sets and left one out of the process).
We consider Sudoku to be akin to a sophisticated, The puzzles and their solutions are composed of
multi-layered sequence completion problem. With strings of digits, where missing values are denoted as
this perspective, we developed a deep learning neu- zeros. Since the values are discrete and categorical,
ral network that leverages LSTM modules designed we one-hot encoded them into five digit binary vec-
for sequence completion. This approach has yielded tors in order to make processing easier.
results that are on par with current leading models.
While our demonstrated results are limited to or- 3.2 Machine Learning Models
der 2 puzzles, we maintain the belief that these puz-
zles are sufficiently complex to serve as a sound foun- Below we describe the models used in this study. The
dation for creating a successful model for higher- first section describes our main model, a neural net-
order Sudoku problems. The importance of study- work architecture we call the Multiverse, which is

18
Multiverse: A Deep Learning 4X4 Sudoku Solver

composed of a set of parallel computational units.

The next two sections describe variants of the Mul-
tiverse model that we used for our convergence study
and our ablation study. These studies show that all
the parts of each computational unit are important for
the results attained and the results are competent with
state-of-the-art models. The last section describes the
model we used as a baseline for our study.

Multiverse Model
Figure 4: An order 2 Multiverse model with 6 parallel verse
In the model description below, we use the following modules.
variables. o is the order of the puzzle (o = 2 for all
the tests in this study). r is the size of the range of the direction is unimportant. Unlike NLP problems,
values possible in a given puzzle (r = 5 for all the Sudoku data is discrete and all-different removing the
tests in this study, to include the 4 possible values and semantic related aspects from the problem, so other
zero). s is the length of a side of a given puzzle (s = o2 techniques that have greater effect when used on text
always). a is the number of cells in a given puzzle related problems are not required here.
(a = s2 always).
Sudoku in many respects is a sequence completion Convergence Study Models
problem. Each row must be completed with a permu-
tation of the values in the range. Each column and box Some of the configurations of the baseline model sur-
must be completed likewise. Each value must have a passed a 99 percent completion rate, albeit with more
location in each row forming a sequence of value-in- than one prediction, so we performed a study testing
row location indices that requires completion. Sim- the number of verses required to attain such presti-
ilarly, such sequences are also formed by values in gious results in one-shot predictions. This study was
columns and boxes. There may exist more aspects of performed on Multiverse models with 6 (See Fig. 4),
Sudoku puzzles that can also be viewed as completion 10 and 12 parallel verses (called M6, M10, and M12
problems but that is subject for a separate study. respectively). M6 was our first robust model with sig-
Our initial sequence completion unit (below re- nificantly good results. We based most of our study
ferred to as a “verse”) is the following sequence of around that model, but also tested the more powerful
deep learning layers, with reshape layers between M10 and M12 models to show that results achieved
them where necessary: by the baseline are attainable by our model as well.
1. Conv1D, where: input size = (a, r), output size = The results themselves will be detailed in Section 4.
(a, r), filters = r, kernel size = 1, strides = 1
2. Dense, where: input size = (a × r), output size = Ablation Study Models
(a × r)
We performed our ablation study on the following in-
3. Bidirectional LSTM, where: input size = (a, r), complete Multiverse models with 6 verses:
output size = (a, 2 × r), return sequences is set to
true. • M6 - 6 complete verses.
4. Bidirectional LSTM, where: input size = (a, 2 × • No Conv - 6 verses that have no Conv1D layers.
r), output size = (a, 2 × r), return sequences is set • No Dense - 6 verses that have no Dense layers.
to true. • No LSTM - 6 verses that have no LSTM layers.
Our complete model is composed of a number • One LSTM - 6 verses that have only the first
of parallel verses with their output concatenated into LSTM layer.
a dense softmax termination layer (hence the name
• M5 - Only 5 verses, all complete.
“Multiverse”). A Multiverse model for order 2 puz-
zles with 6 parallel verses (M6) is depicted in Fig- Results will be detailed in Section 4.
ure 4. The motivation for this architecture is that the
combination of the convolutional and the dense layers Baseline Model
provide a basic embedding feature that allows for dif-
ferent interpretations for the parallel verses. The Bidi- As a baseline we modified the transformer based
rectional LSTM is a good sequencing modeler when model provided by Yang et al. (Yang et al., 2023) to

19
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence

fit our order 2 data. The model is based on a MinGPT Table 2: Convergence Study Results.
module (Karpathy, 2020) set to the input of a Su-
Model No. Epochs Completion Rate
doku puzzle. MinGPT performs the computations on
M6 100 92 percent
sparse categorical data. Therefore, the input is not
one-hot encoded, rather left in numerical format with M10 28 97 percent
one change: all the values in the solution that corre- M12 22 >99 percent
spond to hints were changed to -100. We maintained
this data format in our modified model.
For a convincing comparison with our model we
ran tests on the baseline model with a variety of set-
tings for the following parameters:
• Recurrences - The number of prediction itera-
tions. We refer to this parameter by the name It-
erations so as not to confuse with RNN.
• Heads - The number of transformer heads used.
• Embedding - The size of the puzzle embedding.
The outcomes will be elaborated upon in Section 4. Figure 5: Completion percentages of M6, M10 and M12
models by number of training epochs.

4.2 Ablation Study

4 RESULTS
In the ablation study our intent is to show that all the
This section contains the experiments performed in parts of the model are required and important for the
our study and the results they yielded. The first ex- results achieved.
periment recorded highlights the effectiveness of our Table 3 shows the results of the ablation study.
approach by yielding an impressive completion rate When using complete verses, the completion rates are
greater than 99 percent in a single prediction itera- higher than when using partial verses, as illustrated
tion. The second experiment determines the neces- by the M5 completion rate which is higher than all the
sity of all the components of our model by showing completion rates attained by incomplete M6 variants.
the vast improvement each component contributes to As far as the components of each verse are concerned,
the results. Finally, we present a baseline experiment The LSTM layers and the dense layer have the great-
resulting with a favorable comparison with state-of- est effect on the results as shown by the significantly
the-art models. All the training and the testing was low completion rates attained by the No LSTM and
performed using Google Colaboratory. the No Dense models, while the convolutional layer
The metric used in all the tests is the completion also has an important role to play as shown by the rel-
rate, namely the percent of puzzles completed cor- atively low completion rates attained by the No Conv
rectly out of the grand total. model.

4.1 Convergence Study Table 3: Ablation Study Results.

Epochs
With the intent of maximizing the completion rate Model 5 10 15
of order 2 Sudoku puzzles, our study has produced
M12 87.63 96.25 98.21
compelling findings. The performance of our model,
M10 86.3 93.25 95.14
demonstrated in Table 2 and Figure 5, underscores
M6 73.19 83.18 86.53
the strength of our research approach. Our 12 verse
No Conv 49.22 68.49 76.11
model successfully achieves an impressive comple-
No Dense 1.69 5.73 9.34
tion rate of over 99 percent within just 22 epochs, all
No LSTM 9.5 9.26 9.23
within a solitary prediction iteration.
One LSTM 66.01 77.53 81.43
This highlights the ability of our model to effec-
M5 67.04 78.41 82.4
tively tackle nearly all order 2 puzzles, demanding
only a single predictive step for solution. This ro- Results are in percent of completion rate.
bust performance does not only exemplify the model’s
precision, but also its proficiency in rapidly deriving
solutions.

20
Multiverse: A Deep Learning 4X4 Sudoku Solver

4.3 Baseline have refrained from using regular end-to-end deep

learning for Sudoku solving because of the implicit
We now compare our method with the state-of-the art logical connections necessary for the solution. In our
model. To do so we made use of a modified version research, we show that when addressing the problem
of the model used be Yang et al. (Yang et al., 2023). as a multi-dimensional sequence completion problem
Table 4 shows the results of the tests done on the it is possible to reach competent results even with only
baseline model over 15 epochs. The original order 3 a single prediction.
model had Embedding set to 128 and Heads set to 4. On order 2 puzzles, our M12 model successfully
Since the order 2 problems are simpler than order 3 reached an over 99 percent completion rate with a sin-
problems we tried using less complex models with- gle prediction, a result which required another predic-
out significant results. Since our M6 model might be tion iteration by the baseline model. The implication
comparable with a 6 head transformer model, we also of this result is that not only our model is capable of
tested some models that have 6 heads. solving Sudoku puzzles despite their complexity, but
may prove to have greater competence in this field
Table 4: Baseline Results. than the state-of-the-art models.
Iterations Future studies should focus on trying to scale up-
Embedding Heads 1 2 ward to order 3 puzzles, to see if the existing datasets
16 2 0 0 used in other studies are enough to reach significant
16 4 0 0.0011 results in less prediction iterations. The described
32 2 0.5189 24.41 model is easily scalable to higher-order Sudoku puz-
32 4 0.8822 41.96 zles by adjusting the value of o and adding more
128 2 13.17 89.8 verses. However, two key challenges need to be ad-
128 4 25.06 >99 dressed: the availability of data, as higher-order puz-
zles have a vast number of possible solutions mak-
24 6 2.42 5.28
ing it difficult to compute and store them, and the re-
96 6 27.06 >99
source requirements - including significant memory
All tests were performed over 15 epochs. drain and longer training times when scaling up to
higher-order puzzles. These issues taken into account,
As can be seen, none of the tests reached signifi-
it would be interesting to see if competitive results
cant results when Iterations was set to 1. In fact, we
could still be reached with less predictions.
continued the 128 embedding 4 heads model up to
Future research could also attempt to solve well
200 epochs (the number of epochs used for order 3
posed exact cover problems using the Mutiverse
in (Yang et al., 2023)) and the results were still only
model. Although the DLX algorithm is capable of
37.19 percent completion and rising really slowly in
solving such problems, deep learning could reduce
contrast with the 98.21 percent achieved by M12 in
the prediction process to polynomial runtime. The bi-
just 15 epochs of training and one prediction itera-
nary constraint matrix that describes an exact cover
tion, and the 86.53 percent and 82.4 percent achieved
problem could be approached as an extractive se-
by M6 and M5 respectively (See Table 3).
quence summary problem, where only those variable
With 2 iterations the 128 embedding 4 heads
assignments that are relevant for the solution are ex-
model and the 96 embedding 6 heads model both
tracted from the partially complete list of possible as-
achieved >99 percent completion. Even though these
signments that remain after applying the givens of
results are excellent, it must be noted both models re-
the problem. The factors contributing to our model’s
quired 2 prediction iterations and the results attained
strong performance in sequence completion may also
in just one iteration were very poor. This is in con-
be applicable to extractive sequence summary.
trast with our convergence study in which we showed
that the M12 model can achieve a similar >99 percent
completion rate with just one iteration.
REFERENCES
Chang, O., Flokas, L., Lipson, H., and Spranger, M. (2020).
5 CONCLUSION Assessing satnet’s ability to solve the symbol ground-
ing problem. Advances in Neural Information Pro-
In this paper we approached the Sudoku problem as cessing Systems, 33:1428–1439.
a sophisticated sequence completion problem and de- Du, H., Gao, L., and Hu, X. (2021). A 4× 4 sudoku solving
scribed models for solving 4X4 Sudoku puzzles. model based on multi-layer perceptron. In 2021 Inter-
To the best of our knowledge, previous studies national Conference on Electronic Information Engi-

21
ICAART 2024 - 16th International Conference on Agents and Artificial Intelligence

neering and Computer Science (EIECS), pages 306– the 65534 possible 16 bit binary strings except for an
310. IEEE. all-0 string which is an empty puzzle and obviously
Hayes, B. (2006). Unwed numbers. American scientist, not well posed and an all-1 string which is a com-
94(1):12. pleted solution (bitmasks). The bitmasks file was cre-
Hunt, M., Pong, C., and Tucker, G. (2007). Difficulty- ated in descending order of the amount of zeroes in
driven sudoku puzzle generation. The UMAP Journal, the strings. In the final stage, for each solution in so-
29(3):343–362.
lutions we iterated over bitmasks and filtered the well
Ist, I. L., Lynce, I., and Ouaknine, J. (2006). Sudoku as posed and locally minimal puzzles resulting with a
a sat problem. In Proceedings of the International
Symposium on Artificial Intelligence and Mathemat- file with all the 85632 well posed and locally minimal
ics (AIMATH), pages 1–9. puzzles and their respective solutions (puzzles).
Karpathy, A. (2020). mingpt.
Knuth, D. E. (2000). Dancing links. arXiv preprint
Stage 1: Order 2 Solutions
cs/0011047.
Since there are only 288 unique solutions for the or-
Mehta, A. (2021). Reinforcement learning for constraint
satisfaction game agents (15-puzzle, minesweeper, der 2 Sudoku problem, this stage is very straightfor-
2048, and sudoku). arXiv preprint arXiv:2102.06019. ward. We simply took an empty puzzle, applied to it
Palm, R., Paquet, U., and Winther, O. (2018). Recurrent the DLX algorithm, and saved to a file all the resulting
relational networks. Advances in neural information solutions.
processing systems, 31.
Park, K. (2018). Can convolutional neural networks crack Stage 2: Binary Strings
sudoku puzzles.
Poloziuk, K. and Smrkovska, V. (2020). neural networks Creation of the bitmasks file was done with array ma-
and monte-carlo method usage in multi-agent systems nipulation. We created empty arrays and added to
for sudoku problem solving. Technology audit and them indices of locations where a bit in a mask should
production reserves, 6(2):56. be ‘1’, by gradually appending to the arrays more ar-
Posthoff, C. and Steinbach, B. (2010). The solution of dis- rays that are based on them and have more indices
crete constraint problems using boolean models-the added to them. After that we scanned the arrays and
use of ternary vectors for parallel sat-solving. In In- transformed them into bit strings.
ternational Conference on Agents and Artificial Intel-
ligence, volume 2, pages 487–493. SCITEPRESS. The algorithm runs in Θ(216 ) for order 2 puz-
Simonis, H. (2005). Sudoku as a constraint problem. In CP
zles, and if modified for order o puzzles its runtime
Workshop on modeling and reformulating Constraint is Θ(2area ), where area = side2 and side = o2 . It uses
Satisfaction Problems, volume 12, pages 13–27. Cite- a similar amount of memory.
seer.
Wang, P.-W., Donti, P., Wilder, B., and Kolter, Z. (2019). Stage 3: Order 2 Puzzles
Satnet: Bridging deep learning and logical reasoning
using a differentiable satisfiability solver. In Interna- This stage approaches the binary strings in bitmasks
tional Conference on Machine Learning, pages 6545– as binary masks. We consider a mask a to cover
6554. PMLR. another mask b if for every i where b[i] =‘1’, also
Weber, T. (2005). A sat-based sudoku solver. In LPAR, a[i] =‘1’.
pages 11–15. For each solution in solutions we iterate over the
Yang, Z., Ishay, A., and Lee, J. (2023). Learning to solve binary masks in bitmasks and for each mask m we
constraint satisfaction problems with recurrent trans- check the following conditions:
former. In The Eleventh International Conference on
Learning Representations. • Has a puzzle already been added whose mask is
covered by m (and the refore the puzzle is not lo-
cally minimal)?
APPENDIX • Is the corresponding puzzle well posed?
If the first condition is false and the second is true
Order 2 Puzzle Dataset Generation we add the corresponding puzzle and its solution to
puzzles.
The generation process of locally minimal well posed In order to check if a puzzle is well posed we ap-
order 2 puzzles and their solutions was a 3 stage pro- ply to it the DLX algorithm and test if the number of
cess. The first stage entailed creating a file contain- solutions is equal to 1.
ing all the 288 unique solutions for order 2 puzzles
(solutions). We also needed to create a file with all

Instructions: Meet DRU - The World's First Pizza Delivery Robot!
No ratings yet
Instructions: Meet DRU - The World's First Pizza Delivery Robot!
9 pages
702 - Sample Assignment
No ratings yet
702 - Sample Assignment
20 pages
Cyber Security Unit 1
No ratings yet
Cyber Security Unit 1
11 pages
4 Transpiration
No ratings yet
4 Transpiration
15 pages
Complete Download International and Transnational Crime and Justice 2nd Edition Mangai Natarajan PDF All Chapters
100% (3)
Complete Download International and Transnational Crime and Justice 2nd Edition Mangai Natarajan PDF All Chapters
40 pages
2025 Specimen Paper 5 Mark Scheme
No ratings yet
2025 Specimen Paper 5 Mark Scheme
10 pages
DSM 5 Chart
93% (30)
DSM 5 Chart
2 pages
Randomized Controlled Trials
100% (1)
Randomized Controlled Trials
9 pages
2023 NEC Code Changes
75% (4)
2023 NEC Code Changes
46 pages
Buckling of Orthotropic, Curved, Sandwich Panels Subjected To Edge Shear Loads
No ratings yet
Buckling of Orthotropic, Curved, Sandwich Panels Subjected To Edge Shear Loads
4 pages
الدراما والمسرح التربوي كمدخل علاجي لصعوبات التعلم
No ratings yet
الدراما والمسرح التربوي كمدخل علاجي لصعوبات التعلم
18 pages
TX - L-Band - LC12 2150A
No ratings yet
TX - L-Band - LC12 2150A
1 page
JAVA Project Sudoku Solver
No ratings yet
JAVA Project Sudoku Solver
15 pages
PP FFA Chapter Wise - DEC'23 Updated
No ratings yet
PP FFA Chapter Wise - DEC'23 Updated
5 pages
AI Report Format
No ratings yet
AI Report Format
19 pages
DSA Project Sudoku Solver
No ratings yet
DSA Project Sudoku Solver
15 pages
Meal Planning
No ratings yet
Meal Planning
31 pages
Solving The Sudoku With The Differe
No ratings yet
Solving The Sudoku With The Differe
12 pages
Solving Sudoku With Machine Learning
No ratings yet
Solving Sudoku With Machine Learning
2 pages
Energy Virtual Work Diagram
100% (1)
Energy Virtual Work Diagram
2 pages
UGEO - HM70A - Operation Manual (Vol1)
100% (1)
UGEO - HM70A - Operation Manual (Vol1)
232 pages
Asian Countries
No ratings yet
Asian Countries
4 pages
Reading Workbook-3
No ratings yet
Reading Workbook-3
21 pages
Aia Exp-2 Vu22eece0100102
No ratings yet
Aia Exp-2 Vu22eece0100102
6 pages
Math Art 20194707
100% (1)
Math Art 20194707
3 pages
6BT - 6BTA ReCon - Cummins Inc
No ratings yet
6BT - 6BTA ReCon - Cummins Inc
7 pages
Sudoku Solver Report
No ratings yet
Sudoku Solver Report
5 pages
Pmec Bam
No ratings yet
Pmec Bam
29 pages
Data Engineer - Ireland
No ratings yet
Data Engineer - Ireland
3 pages
Prolog Suduko Solver
No ratings yet
Prolog Suduko Solver
14 pages
NRK CPSB 2 Revised 2021 Employment
No ratings yet
NRK CPSB 2 Revised 2021 Employment
4 pages
Sudoku Science
No ratings yet
Sudoku Science
2 pages
Beyond The Grid - Mi
No ratings yet
Beyond The Grid - Mi
10 pages
Kawung Signature Menu
No ratings yet
Kawung Signature Menu
22 pages
DAAREPORT
No ratings yet
DAAREPORT
17 pages
Kayleigh O'Keeffe: Ph. D. in Biology
No ratings yet
Kayleigh O'Keeffe: Ph. D. in Biology
4 pages
Level 2
No ratings yet
Level 2
3 pages
REAL PROJECT REPORT - Merged
No ratings yet
REAL PROJECT REPORT - Merged
11 pages
AICS
No ratings yet
AICS
3 pages
Sudoku Solver
No ratings yet
Sudoku Solver
35 pages
Emcee Script
100% (2)
Emcee Script
2 pages
Flairs07 066
No ratings yet
Flairs07 066
6 pages
Sudoku and AI
No ratings yet
Sudoku and AI
2 pages
Sudoku Solver Project
No ratings yet
Sudoku Solver Project
14 pages
Sudoku Solver: A Comparative Study of Different Algorithms and Image Processing Techniques
No ratings yet
Sudoku Solver: A Comparative Study of Different Algorithms and Image Processing Techniques
10 pages
HHAS
No ratings yet
HHAS
11 pages
Industrial Series HDX Models
No ratings yet
Industrial Series HDX Models
3 pages
362WH14C0
No ratings yet
362WH14C0
77 pages
Cvip005 CRC
No ratings yet
Cvip005 CRC
9 pages
A Novel Evolutionary Algorithm With Column and Sub-Block Local Search For Sudoku Puzzles
No ratings yet
A Novel Evolutionary Algorithm With Column and Sub-Block Local Search For Sudoku Puzzles
11 pages
Sudoku Solver-1
No ratings yet
Sudoku Solver-1
11 pages
Batch23 Ai Casestudy
No ratings yet
Batch23 Ai Casestudy
16 pages
A To Z of Sudoku (Narendra Jussien) (Z-Lib - Org) - Removed
No ratings yet
A To Z of Sudoku (Narendra Jussien) (Z-Lib - Org) - Removed
151 pages
TPEditor V1.10 Manual
No ratings yet
TPEditor V1.10 Manual
100 pages
On Average, How Many Numbers Are Given in A Sudoku Puzzle - Quora
No ratings yet
On Average, How Many Numbers Are Given in A Sudoku Puzzle - Quora
1 page
Instruction Manual: Millivoltmeter
100% (2)
Instruction Manual: Millivoltmeter
51 pages
Different Between Deen and Religion
50% (2)
Different Between Deen and Religion
5 pages
Presentation On Sudoku
50% (2)
Presentation On Sudoku
11 pages
A Novel Evolutionary Algorithm With Column and Sub-Block Local Search For Sudoku Puzzles
No ratings yet
A Novel Evolutionary Algorithm With Column and Sub-Block Local Search For Sudoku Puzzles
12 pages
Anewalgorithmforgeneratingauniquesolution Sudoku
No ratings yet
Anewalgorithmforgeneratingauniquesolution Sudoku
4 pages
Metaheuristics Can Solve Sudoku Puzzles
No ratings yet
Metaheuristics Can Solve Sudoku Puzzles
12 pages
Kruthika CV
No ratings yet
Kruthika CV
4 pages
Final Document
No ratings yet
Final Document
10 pages
Sudoku Solver Report
No ratings yet
Sudoku Solver Report
4 pages
Sudoku Using Constraint Satisfaction
No ratings yet
Sudoku Using Constraint Satisfaction
12 pages
Suduko Casestudy
No ratings yet
Suduko Casestudy
13 pages
An Exhaustive Study On Different Sudoku Solving Techniques: Keywords
0% (1)
An Exhaustive Study On Different Sudoku Solving Techniques: Keywords
7 pages
Sudoku 5
No ratings yet
Sudoku 5
15 pages
AA Project Report
No ratings yet
AA Project Report
13 pages
An Exhaustive Study On Different Sudoku Solving Techniques: Keywords
No ratings yet
An Exhaustive Study On Different Sudoku Solving Techniques: Keywords
8 pages
Literature Review
No ratings yet
Literature Review
29 pages
OPERATING SYSTEM PROJECT REPORTw
No ratings yet
OPERATING SYSTEM PROJECT REPORTw
20 pages
Techniques For Solving Sudoku Puzzles: March 2012
No ratings yet
Techniques For Solving Sudoku Puzzles: March 2012
12 pages
Solving Sudoku With Ant Colony Optimization: IEEE Transactions On Games September 2019
No ratings yet
Solving Sudoku With Ant Colony Optimization: IEEE Transactions On Games September 2019
11 pages
Solving Sudoku Using Backtracking Algorithm: Charu Gupta
No ratings yet
Solving Sudoku Using Backtracking Algorithm: Charu Gupta
5 pages
Sudoku Solving Algorithms - Wikipedia
100% (1)
Sudoku Solving Algorithms - Wikipedia
5 pages
AIProject UG201110023
No ratings yet
AIProject UG201110023
4 pages
Solving Sudoku
No ratings yet
Solving Sudoku
15 pages
Sparse Madasatrix Sudoku
No ratings yet
Sparse Madasatrix Sudoku
3 pages
Efficient Algorithm Comparison To Solve Sudoku Puzzle
No ratings yet
Efficient Algorithm Comparison To Solve Sudoku Puzzle
5 pages
Sudoku As A Constraint Problem
No ratings yet
Sudoku As A Constraint Problem
15 pages
The Game of Sudoku
No ratings yet
The Game of Sudoku
4 pages
Sudoku As A SAT Problem (2005)
No ratings yet
Sudoku As A SAT Problem (2005)
9 pages
Solving Single-Digit Sudoku Subproblems
No ratings yet
Solving Single-Digit Sudoku Subproblems
12 pages
Sudoku Generation
No ratings yet
Sudoku Generation
20 pages

AI Research Paper (4x4 Sudoku Solver)

Uploaded by

AI Research Paper (4x4 Sudoku Solver)

Uploaded by

Multiverse: A Deep Learning 4X4 Sudoku Solver

Chaim Schendowich1 , Eyal Ben Isaac2 and Rina Azoulay1

Keywords: Sudoku, Deep Learning, One-Shot Prediction, Sequence Completion.

composed of a set of parallel computational units.

4.2 Ablation Study

4.1 Convergence Study Table 3: Ablation Study Results.

4.3 Baseline have refrained from using regular end-to-end deep

You might also like