0% found this document useful (0 votes)
316 views

Using SAS/Base and SAS Macro Facility To Solve Sudoku Puzzles With Backtracking Algorithm

This paper illustrates the algorithm to solve any valid Sudoku puzzle using SAS/Base and SAS/Macro techniques. It also explains the backtracking algorithm to find a solution by making an optimised guess if the Soduku puzzle can not be solved by linear methods.

Uploaded by

Anirudh Mehta
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
316 views

Using SAS/Base and SAS Macro Facility To Solve Sudoku Puzzles With Backtracking Algorithm

This paper illustrates the algorithm to solve any valid Sudoku puzzle using SAS/Base and SAS/Macro techniques. It also explains the backtracking algorithm to find a solution by making an optimised guess if the Soduku puzzle can not be solved by linear methods.

Uploaded by

Anirudh Mehta
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Using SAS/Base and SAS Macro facility to solve Sudoku puzzles with backtracking algorithm

Anirudh Mehta, Civil Service, Newcastle-upon-Tyne, UK


ABSTRACT: Sudoku puzzles have been around since the early twentieth century. Who discovered them and where they were discovered is either unknown or debated. Few reports have suggested Sudoku being originated in courtiers such as France, the US etc, albeit in different forms; nevertheless the Sudoku as we know today is supposed to have been originated during the 1980s in Japan. In its simplest form (or the most commonly found form as solving Sudoku puzzles is not that simple task!) a Sudoku puzzle contains a 9x9 grid with nine 3x3 boxes. When fully solved, a Sudoku puzzle will have the digits 1 through 9 in each row, column and 3x3 box with one number appearing only once in a row, column and box. The figure 1 shows one such puzzle and its solution.

. . 2 8 . 7 5 . .

6 . . . . . . . 4

. 8 . . 6 . . 7 .

1 3 . 4 . 9 . 2 5

. . . . . . . . .

4 5 . 7 . 1 . 6 8

. 6 . . 3 . . 9 .

5 . . . . . . . 7

. . 1 6 . 4 2 . .

9 1 2 8 4 7 5 3 6

6 7 5 2 9 3 8 1 4

3 8 4 1 6 5 9 7 2

1 3 6 4 8 9 7 2 5

7 2 8 3 5 6 1 4 9

4 5 9 7 2 1 3 6 8

2 6 7 5 3 8 4 9 1

5 4 3 9 1 2 6 8 7

8 9 1 6 7 4 2 5 3

Figure 1 A number of solutions have been submitted by SAS users, some using processing power of SAS PROCs, some using PROC SQL and others using SAS/BASE programming techniques. Each technique has its own pros and cons showing the variety and power of solutions that can be implemented in SAS. This paper presents one such solution, developed using SAS/BASE and SAS Macro Facility. The uniqueness of the solution is that it is a clean room solution, meaning, the code was developed from scratch using indigenous ideas without referring to any other solution available. The solution extensively uses data step programming focusing more on the programming and logical capability of SAS than using a powerful PROC which does the thinking on behalf of the programmer. Also, the backtracking algorithm is implemented for a puzzle which cannot be solved using linear programming techniques. The backtracking technique is a type of technique in which the algorithm

guesses the best candidate and in case it results in invalid solution it returns back to the last successful guess doing it iteratively until a solution is found; if one exists.

INTRODUCTION:

There are many ways to solve a Sudoku puzzle. In one of the simplest menthod, the first step involves scanning the puzzle. Scanning . 6 . 1 . 4 . 5 means finding the obvious possibility. For . . 8 3 . 5 6 . example, the element in row 8 and column 9 in 2 . . . . . . . figure 2 can only have value 5 because 5 cannot 8 . . 4 . 7 . . appear in row 9 since it has already appeared once. . . 6 . . . 3 . Similarly, the number 5 cannot appear anywhere in 7 . . 9 . 1 . . row 7 and the number 5 cannot appear anywhere in 5 . . . . . . . column 8. This leaves the only possibility for . . 7 2 . 6 9 . element c(8,9)1 to be the number 5
. 4 . 5 . 8 . 7

. . 1 6 . 4 2 5 .

Figure 2 The element c(8,9) will be assigned a value of 5 and the process will be repeated until a valid solution is found. What if no more elements can be found and the puzzle is still incomplete? There are other ways a solution can still be found if scanning does not result in a valid solution. For example, the doubleton technique, which eliminates the multiple possibilities by using pairs of uniquely occurring possibilities, slicing and dicing technique, triplets etc. However, explanation of these techniques is beyond the purpose of this paper. If all techniques have been used and the puzzle is still incomplete then the only method left is to guess an element and try to build up the solution. If the solution results in an incomplete or invalid puzzle the guess is either discarded or a new guess is made keeping the first guess in the solution. The process is carried out until a solution is found. There have been a lot of debates and arguments regarding whether a puzzle that requires guessing is proper in the first place or not. Most of the solutions proposed using BASE/SAS techniques so far do not solve improper2 puzzles. Regardless of the argument, this solution builds and implements the algorithm to find the solution of any Sudoku puzzle as long as one exists.

For the purpose of illustration, throughout this paper every cell will be represented as c(x,y) where x denotes the row and y denotes the column. 2 Puzzles requiring guessing.

METHODOLOGY: A Sudoku cube: The heart of the solution is a Sudoku cube. Any Sudoku puzzle can be seen as a cube, the face which represents the number visible to us as part of the puzzle and an invisible chain of possibilities representing the depth dimension. The first step in the solution involves filling the probability dimension1 . If an element c(x,y) is blank, the probability dimension p(x,y,z) is assigned a number 1 through 9. If the element has a fixed value then the probability dimension is left blank. So for the first box in the above puzzle a dissected cube will look something like this -

6
1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 Figure 3 1 2 3 4 5 6 7 8 9 8 9

8 2

The word possibility and probability mean the same for the scope of this document and are used interchangeably.

Updating the cube: Once initial possibilities have been assigned, they are updated based on the fixed elements across each row, column and box. The update is done by eliminating all of the fixed numbers that appear in the row, column and box of that number from its set of possibilities. Therefore, the initial possibilities in the box 1 for puzzle shown in figure 1 will be updated to:

6
3

8 2

9 3 5 7 9 9 3 4 5

Figure 4 The process is then carried out until all possibilities have been updated for each and every element.

Finding the element: Once the possibilities have been updated, every element in the probability dimension p(x,y,z) is then compared with every element of every probability dimension of row (x), column (y) and box it belongs to. All the unique possibilities are then counted. An occurrence of a unique array element (zu) of possibility p(x,y,z1 z9) across row, column and box means that the only possible value for the main element c(x,y) is the unique array element p(x,y,zu). Such unique possibilities are kept and assigned to the face or main grid. The advantage of using this technique is that it eliminates the need for the scanning and slicing/dicing approach. Once an element is assigned to the grid, it alters the probability continuum of the whole cube. For example, in the figure 4, if it

is established that the only unique possibility element c(3,2) can have is p(3,2,3) then 3 will be assigned to the grid. Once the number 3 has been assigned to the main grid it means no probability array in the row 3 or column 2 or box 1 can have the probability element 3. Therefore, 3 is removed from the probability dimension of row 3, column 2 and box 1; thereby eliminating 3 from the probability dimension of element c(1,3). Since element c(1,3) had only two valid possibilities p(1,3,3) and p(1,3.9) and updating the grid has removed p(1,3,3); the only valid possibility left p(1,3,9) will be assigned to the grid in next iteration. Confirming 9 for element c(1,3) will cause the probability continuum to be updated once again. This process is continued until a valid solution is found or no more elements can be found. For the purpose of this paper the above process is called Linear Calculation because the elements are found in a linear way without requiring any guess.

What if the puzzle is still incomplete? There can be some instances when even after the Linear Calculation the puzzle is still incomplete. In that scenario the solution calls the backtracking algorithm. Back tracking is a process in which a guess is made from the all the probability dimensions combined. The algorithm looks for the best guess, that is of all the probability arrays in the puzzle, it tries to find the best empty element whose dimension can be used as the starting point of the guess also called the first state of guess. Once a possibility has been guessed it is assigned to the empty element and Linear Calculation process is invoked to find the remaining elements. Once all elements have been identified or no more elements can be identified, it calls the consistency check routine where the puzzle is checked for consistency. If the puzzle is found to be consistent but incomplete, the solution finds the next best guess from another array and invokes the Linear Calculation process again. This guess is called second state of guess. If after second or more guesses the solution is found to be complete but invalid or incomplete and invalid then the guessed possibility from the guessed state of guess is discarded and another possibility from the same state is used to find the solution. If this also results in an invalid solution, then the next possibility from the second state of guess is used and so on. If no more possibilities from current state can be guessed and the solution is still invalid then the algorithm retract backs to the last state of guess, discards the guessed possibility from that state, and then picks the next possibility to build the solution.

The flowchart of the algorithm is summarised in the diagram below:

Do Linear Calculation

Is It Complete?

YES

NO

Increase State

Guess One Element

Coninue Linear Cal

YES

Is It Consistent?

NO

Is It Complete?

NO

YES

YES

Can next element in the same state be guessed?


NO

Is It Consistent?
NO

YES

Decrease State

Output

Figure 5

The algorithm is run iteratively until a solution is found. If all the probability elements of the first state have been used but a solution still cannot be found, that means the puzzle is erroneous and there is no possible solution for the puzzle.

TESTING: The solution has been tested on a number of valid puzzles, some requiring no guessing, a few requiring a few guesses and some where the starting point in itself is a guess. These puzzles have resulted in a correct solution every single time. The solution has also been negatively tested where an incorrect puzzle was fed to it and it was able to identify the puzzle as being incorrect.

CONTACT INFORMATION: Anirudh Mehta SAS Developer Civil Service Newcastle-upon-Tyne UK Email: [email protected]

You might also like