L Hoang PDF
L Hoang PDF
Le Thanh Hoang
The Rubiks Cube, often referred to as The Cube, is a puzzle that has troubled many for over 40 years.
A puzzle that most people keep in their drawer gathering dust as their previous attempts at solving it have
only ended in mindnumbing frustration. With 43,252,003,274,489,856,000 possible combinations in its state
space, the cube has a rich and deep mathematical theory attached to it. As daunting as this number may
seem, we know that the maximum number of turns required to solve any scrambled state is just 20. We call
this: Gods Number.[18]
Whilst it is well known that the fastest human solvers need just a few seconds to solve this puzzle, their
solutions are far from optimal. In fact, the best human speedsolvers use on average 50-60 moves per solve,
simply because a human does not know the entire solution to the cube by just looking at it. In essence,
they must solve the cube section by section by putting each colour where it belongs. What humans lack in
insight, they make up for in dexterity. The best human speedsolvers can turn up to 10 faces per second.
We take a different approach to solving the cube by using three major components: A vision system that is
able to accurately track the cube and its colours using a Smartphone camera so that we can read its state,
an algorithm that can find a solution to most cube states in just 22 turns or under, and a robot that is able
to reliably turn and solve the cube.
We are able to point our smartphones camera at each face regardless of background, lighting colour or
cube position within the camera frame in order to read any cube state. We are also able to intelligently
search through the 43 quintillion combinations to find a close to optimal (sometimes even optimal) solution.
Thanks to this, our robot is able to, on average, solve the cube in just 74 seconds.
Acknowledgements
I would like to thank my supervisor, Professor Andrew Davison, for his support throughout the project, his
guidance and ideas in the vision and robotics components, funding the Lego and providing MindStorm kits
as well as Raspberry Pis and BrickPis. Id also like to thank Julia Wei for her support, being an excellent
proofreader and for providing some suggestions for the robot design. Finally, I would like to thank my
parents for supporting me throughout my life.
Contents
1 Introduction 5
1.1 Motivation and Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Background 7
2.1 Fundamental Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Rubiks Cube Jargon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 Face Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.3 Move Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 Rotation Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.5 Cube Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.6 Cubie Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.7 Singmaster Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 The Mathematics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Laws & Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.2 Problem space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Group Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.4 Numbering Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Existing Optimal Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 The Obvious Algorithm: Brute Force . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.2 The First Real Attempt: Thistlethwaites Algorithm . . . . . . . . . . . . . . . . . . . 20
2.3.3 A Different Approach: Korfs Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.3.4 Improving Thistlethwaites Algorithm: Kociembas Algorithm . . . . . . . . . . . . . . 24
2.3.5 Why Not Human Algorithms? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4 Existing Visioning Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.1 Colour Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4.2 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4.3 Object tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.4.4 Colour balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Existing Robots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.1 MindCuber . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.2 JPBrowns CubeSolver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.5.3 Cubestormer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.6 PID Controller . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.1 Open loop vs closed loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.2 What is PID specifically? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.6.3 Whats so great about PID? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3 Design 35
3.1 Overall Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.1 Algorithm Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1.2 Vision Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
1
3.1.3 The Robot Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1.4 Summary of Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4 Implementation 39
4.1 Korfs Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.1 Cube representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.2 Heuristic generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.1.3 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.1.4 HPPC Java Library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2 Kociembas algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.2.1 Coordinate Labelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.3 Combining Kociembas and Korfs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.1 Time to Solve Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4 Searching for shortest number of robot moves . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.1 Dynamic Costing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.2 Reducing the branching factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.4.3 Search speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.5 Vision System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.1 Cube Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.5.2 Recognising Colour . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
4.6 Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6.1 Hardware Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.6.2 Movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.6.3 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Evaluation 66
5.1 Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.1 Vision accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.2 Vision limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.1 Algorithm speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.2.2 Algorithm solution length . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.2.3 Algorithm Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.1 Robot Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.3.2 Robot Speed and TPS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
5.3.3 Robot Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3.4 Robot Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.3.5 System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6 Conclusions 75
6.1 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.1 Improving the Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.1.2 Improving vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.1.3 Improving the Robot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2
List of Figures
3
4.21 Overview: Birdseye view . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.22 Configuration by Arm vs Job . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
4.23 move method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.24 Moves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.25 An example of the whole protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4.26 The rotational gear ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.27 Degree rotation table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
A.1 U Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.2 F Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.3 D Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
A.4 R Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.5 B Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.6 L Face . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
A.7 First Connection Failure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4
Chapter 1
Introduction
Our end goal is to build a system that is able to read the state of the Rubiks Cube reliably, find a so-
lution and then solve the cube using a robot. We want to demonstrate the challenges associated with
building puzzle solving robots. From the real world challenges in vision and robotics down to the theoretical
challenges that lie in the search space, this is not a trivial task.
The accuracy needed by the robot should not be overlooked. An error of millimeters from a perfectly
aligned 90 degree turn on any face can lead to a disaster for turning adjacent faces. Likewise, the complexity
of the vision system should not be underestimated - a single wrongly recognised colour leads to a completely
different cube state. As mentioned earlier, finding an optimal solution is difficult in a state space of size
4.3 1019 . In comparison, a fifteen sliding tile puzzle has only 1013 possible states. Unlike many other
projects, we are not trying to solve a problem. Instead, we are trying to detail and demonstrate techniques
and challenges that are often initially overlooked by those who try to build similar systems.
1.2 Contributions
Although Rubiks cube solving robots already exist, they are very rarely documented in any detail at all.
In fact, the worlds fastest Rubiks Cube solving robot, CubeStormer III[24], has kept almost all of its
implementation a complete secret. In this project, we want to be able to contribute the following to the
SpeedSolving community:
1. We present a reliable vision system for reading the state of the Rubiks Cube using Edge Detection,
Adaptive Thresholding, pattern recognition, automatic white balancing and square prediction (section
4.5).
2. We detail a fast multithreaded implementation of Korfs algorithm using a perfect minimal hash
function similar to a technique used in finding Gods number [18] to save 75x more memory over
a generic implementation and to speed up heuristic database lookups down to just O(1).
3. We present a new algorithm that is variation of Korfs algorithm using multithreaded Fringe Searching
instead of IDA* as well as an algorithm that combines Korfs and Kociembas algorithm that can
5
outperform Kociembas algorithm at shorter solution lengths and detail the shortcomings of Fringe
Search within this particular use case.
4. We evaluate and compare the performance of different algorithms in terms of the speed that they can
find a solution and the length of solution they give. In particular we compare, Korfs, Kociembas and
our various improvements and variations of Korfs algorithm.
5. We compare existing designs and explore grabbing mechanisms, gearing and motor controllers to
demonstrate how accurate turning can be achieved using basic Lego Mindstorm kits. As well as
evaluating our particular design to highlight the major hardware challenges in designing such a robot.
6
Chapter 2
Background
4. A corner is another type of cubie. It refers to the cubies that have 3 colours attched to them. Figure
2.1 Dark Grey.
5. A centre is also a type of cubie. It refers to the cubies that only have 1 colour attached to them.
Figure 2.1 White.
6. A move is the movement of a particular face by 90, 180 or 270 degrees.
7. A rotation is the movement of the whole cube without moving any faces.
8. A facelet refers to a sticker on a face.
9. A speedsolver refers to a person who attempts to solve the cube in the fastest time possible.
7
Figure 2.1: Labelled Cube
8
Figure 2.3: Move Notation Table
R R3
U U3
F F3
L L3
D D3
9
B B3
2.1.4 Rotation Notation
So far, we have only defined which faces we can move. We can also express cube rotations[5] that rotate the
whole cube. We can define how to rotate the entire cube by defining the axis of rotations X, Y and Z. If we
draw a line through the R face to the L face as per figure 2.4, we define the clockwise rotation X as following
the clockwise direction turn of the move R. Similarly, the Y clockwise rotation would follow the clockwise
rotation of U in Figure 2.3 and Z clockwise rotation would follow the clockwise rotation of F.
We can now represent a cube state using a 54 character string. In the following order:
u1u2u3u4u5u6u7u8u9r1r2r3r4r5r6r7r8r9f1f2f3f4f5f6f7f8f9d1d2d3d4d5d6d7d8d9l1l2l3l4l5l6l7l8l9b1b2b3b4b5b6b7b8b9
10
Figure 2.5: Cube Net
This notation is often useful for human input as we can just read the colours directly off the cube. It
may sometimes be refered to as Facelet Level notation [13]
11
2.1.7 Singmaster Notation
Another way of representing a cube is to use Singmaster Notation[21]. Singmaster Notation uses the piecewise
notation and allows us to represent the cube in a much more compact form. The Singmaster Notation needs
to account for two properties of a piece: its permutation (position in the cube) and its orientation (which
way the piece is facing). Using the piece notation, we can define a cube as follows: UF UR UB UL DF DR
DB DL FR FL BR BL UFR URB UBL ULF DRF DFL DLB DBR. That is, we put the actual piece that lies
in each position UF, UR, UB, UL, etc. This determines the permutation. The orientation is determined by
the order of how the piece is input. For example, if we defined a cube state as starting with UB FD..., this
tell us that the piece that was in position UB in the solved state, is now in position UF. Similarly, the piece
FD is in position UR. Notice the distinction between FD and DF: FD means that the F colour is facing the
U direction and the D colour is facing the R direction. DF would mean a flipped version of this where the
D colour is facing the U direction and the F colour is facing the R direction. This notation is often useful to
reason about the Mathematics behind the number of states a Rubiks cube has.
12
2.2 The Mathematics
2.2.1 Laws & Lemmas
There are certain laws a Rubiks cube has which are often overlooked[11]. These properties are crucial to
reach the true number of the size of the problem space, since it proves that some states are unreachable by
using the moves defined in section 2.1.3 .
(a) Illegal state edge swap (b) Illegal state edge flip
With the remaining faces: F and B, any 90 degree move will flip all 4 edges on that face. We prove this
by simply performing an F move and then attempting to flip any flipped edges on the F face using only U,
R, D or L. We previously proved that U, R, D or L cannot flip edges so it will be impossible to return any
edges on the F face to back to their original positions and orientations. Again, since we can only flip 4 edges
at a time, the total number of edge flips for any reachable cube state must be even.
13
2.2.1.3 All corner orientation totals are divisible by 3
So far, weve only seen laws associated with edges. It is slightly harder to reason about corner orientations
since there are 3 possible orientations per corner. Let us label the orientations by labelling the solved state
as 0, solved state twisted clockwise as 1 and solved state twisted anti clockwise as 2. The diagram below
shows all three corner orientation states and their labels:
If we sum the labels of all corners of any reachable state of a cube, the total is divisible by 3. The diagram
below shows a state that is not reachable since the sum of all the labels is 1.
Once again, we must define what it means to have a good corner or a bad corner. Notice that all
corners lie on the U face or D face. This means for each corner, there is always a sticker that faces the U or
D direction. A good corner is defined as a corner where the sticker that faces the U or D face is of either
U or D colour. Any other corner orientations are defined as bad.
Using these definitions, we can see that any moves involving the U or D faces cannot change the orien-
tation of any corners. For example, take solved cube and only perform U or D moves. All corners are good
in a solved cube state. No number of U or D moves can change these corners from good to bad. For the
other R, F, L and B moves, a 90 degree turn will increment the orientation label of 2 corners by 1 and add 2
to the orientation label of another 2 corners (modulo 3). The total change is therefore 1+1+2+2 = 6 since
each R, F, L, B will only add 6 to the total label sum. Since the total label sum of a solved cube is 0 and
any move can the total change is can either be 6 or 0, the corner orientation totals must be divisible by 3.
14
2.2.2 Problem space
Now that we have an idea of the structure and laws of the Rubiks cube, we can now begin to reason about
the problem space[20].
2.2.2.1 Orientations
We define the number orientation as the number of directions a piece can face towards.
Any edge piece can only have 2 orientations. This is obvious if we define some edge piece as XY, its
other orientation is YX. There are no other possible orientations. Since there are 12 edges, we would think
that the total number of edge orientations is 212 . However, using the edge lemma in section 2.2.1.2, we can
reason that half of all edge orientations are unreachable since all odd numbered edge flips cannot be reached.
This reduces the number of reachable edge orientations to only those with even edge flips:
Similarly, any corner piece can have 3 orientations. Since there are 8 corners, we would think that the
total number of corner orientations is 38 . However, using the corner lemma in section 2.2.1.3, we can prove
that only a third of all corner states are actually reachable since only those with a total corner label sum
divisible by 3 can be reached. This reduces the number of corner orientations to:
2.2.2.2 Permutations
We define the number of permutations as the number of positions a cubie/piece can be in.
Let us take any corner piece from the 8 corners. For any cube state, this corner piece can be in any 1
of 8 positions. We then choose a second corner piece. Since the first piece has already claimed a position,
the second piece can only choose from 1 of 7 positions. This continues until the last piece. This gives us
8 7 6 5 4 3 2 1 possible corner permutations. The total number of corner permutations is:
.
A similar argument can be made for edge permutations.
One may then argue that the total number of permutations is 8!12!. However, only half of these permutations
are actually reachable using the legal moves defined in 2.1.3. Using our even swap lemma from section 2.2.1.1,
we can reason that all states that have an odd number of swaps are not reachable which halves the number
of permutations to 12! 8!/2.
15
2.2.3 Group Theory
2.2.3.1 What are groups?
A group is a structure which consists of a set and an operation that can combine any two elements[4]. There
are four conditions called group axioms that the set and operation combination must satisfy. Let G be our
set and OP be our operator:
1. Closure - Any two elements combined using the operator must given another element that is in the
set. That is:
a, b G, c G : a OP b = c
1. Closure - Since the group contains all the states reachable using legal moves and we can only apply
legal moves using *, it is impossible to generate unreachable states and we therefore have closure.
2. Associativity - Let us take any cube states S1, S2 and S3: (S1 S2) S3 = S1 (S2 S3). We can
see that this is the case since taking S1 and applying the sequence of moves that took C to S2 and
then applying the sequence of moves that took C to S3 is exactly the same regardless of the evaluation
order. For example, lets use (C R) U = C (R U ). We can see that if we take a solved cube and
apply the move R and then U is the same as taking a solved cube and then applying the cube state
(R U ) which is just R followed by U.
3. Identity - The identity is our solved state, C. Since C is the same as applying no moves.
4. Invertibility - All reachable cube states must have an inverse. Let us take a cube state C S where S
is a sequence of moves. We can reverse any sequence by simply undoing all the moves. For example,
the cube state C R3 U B has an inverse C B3 U 3 R.
Proof: (C R3 U B) (C B3 U 3 R)
= (C R3 U B B3 U 3 R)(by def. of identity C and associativity)
= (C R3 U C U 3 R)(by def. of B inverse)
= (C R3 U U 3 R)(by def. of identityandassociativity)
16
= (C R3 C R)(by def. of U inverse)
= (C R3 R)(by def. of identityandassociativity)
= (C C)(by def. of R inverse)
=C
2.2.3.3 Parity
We can define the parity of a permutation as whether the number of swaps required to obtain that permu-
tation is even or odd[12]. An even permutation is a permutation that requires an even number of swaps.
An odd permutation is one that requires an odd number of swap. Notice how this relates to the even swap
lemma in section 2.2.1.1. Another way of expressing this lemma would be to say that the parity of all edge
and corner permutations must be even.
Our factorial number can be expressed as follows 24 03 12 01 where xb is x expressed in base b. To con-
vert to base 10:
base10(24 03 12 01 ) = 2 4! + 0 3! + 1 2! + 0 1! = 5010
17
(61)!
5, (64)! 5 = 300
(62)!
0, (64)! 0=0
(63)!
2, (64)! 1 = 3, since 0 has already been assigned
(64)!
3, (64)! 1 = 1, since 2 and 0 have already been assigned
18
2.3 Existing Optimal Algorithms
This section aims to detail the existing algorithms for finding optimal solutions.
Scrambled
R U F L D B R2 ... D3 B3
.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Exponentially increasing size - With each increasing depth level of our search tree, the branching
factor of 18 means that the number of solutions we are required to look at will increase by a factor of
18. This will explode very quickly and become infeasible to search within a reasonable time.
Exponential increase in memory consumption - A non-recursive implementation of a breadth
first search requires a queue to store the child nodes to be explored. In the worst case, a 20 move
solution would require searching 1820 nodes. It is clearly not feasible to maintain a queue of this size.
To show the extent of how the number of nodes increase with depth up to 10:
19
2.3.2 The First Real Attempt: Thistlethwaites Algorithm
The first attempt at creating an optimal solution finder used group theory. Thistlethwaites algorithm aims to
break down the Rubiks cube into smaller sub-problems that can be calculated within a reasonable time[19].
The algorithm works by splitting the solve into 4 phases where we increasingly restrict certain moves. This
will reduce the number of reachable states gradually until there is only one state left: the solved state.
Let us define the following groups:
G0 = hL, R, F, B, U, Di (2.9)
G1 = hL, R, F, B, U 2, D2i (2.10)
G2 = hL, R, F 2, B2, U 2, D2i (2.11)
G3 = hL2, R2, F 2, B2, U 2, D2i (2.12)
G4 = {C} (2.13)
2.3.2.1 Group G0
G0 is the group of all states reachable using moves L,R,F,B,U,D. Notice how this is just all reachable
states using any of the legal moves defined in section 2.1.3 since we can perform any L2,R2,F2, etc moves
by simply performing L * L , R * R, F * F, etc. Similarly we can perform any L3, R3, F3, etc by
performing moves L * L * L, R * R * R, F * F * F, etc. Our aim is to move from G0 G1
G2 G3 G4. Where G4 contains only the solved cube state.
2.3.2.2 Group G1
G1 is the group of all states reachable using moves L,R,F,B,U2,D2. In contrast to group G0, the reachable
states are smaller. G1 contains only good edges. To see why this is so, let us look back to our edge flip
lemma in section 2.2.1.2. To explain why there are always an even number of flips, we proved that using
only moves U, R, D and L, it is not possible to flip any edges. Instead of moves U, R, D and L, let
us prove the same result is possible using moves L, R, F, B, U2 and D2.
Let us perform an X rotation (described in section 2.1.4). Notice that if we rotate the cube, in order to
rotate the same faces as in our previous orientation, our previous U moves would now be B moves, D moves
would now be F moves and R and L moves would remain the same. This means that moves L, R, F and B
also have the same property in that they cannot flip any edges. Now lets look at moves U2 and D2. Since
weve performed an X rotation, previous F moves are now U moves and previous B moves are now D moves.
Remember in our edge flip lemma in section 2.2.1.2 we said that quarter turns of these faces would flip 4
edges. However, we dont have quarter turns. Instead, in G1, we only have U2 and D2 (180 degree turns). If
we imagine these as 2 quarter turns, the first quarter turn would flip the 4 edges of that face. However, the
next quarter turn would flip the same 4 edges back to their original orientations so these 180 degree moves
cannot possibly flip any edges. This mean that if we start from the solved state where all edges are good,
all edges will remain good assuming we only use moves L,R,F,B,U2,D2.
2.3.2.3 Group G2
G2 is the group where we further restrict the reachable states to those reachable using moves L,R,F2,B2,U2,D2.
G2 contains only good corners where good corners are now defined as the corners which have an R or L
sticker facing the R or L direction. To see why this is so, let us take moves L and R. None of these moves
can change the orientation of the corners. Now lets look at F2, B2, U2 and D2, none of these moves can
change the good-ness of a corner since they make any stickers facing left face right and any stickers facing
right face left.
As well as only containing good corners, we also fix edges in the centre layer in between the R and L
faces. This means all edges that belong on the centre layer are on the centre layer but not necessarily
20
permuted correctly. To see why this is so, consider moves R and L. These cannot affect any edges in this
middle layer. The remaining F2, B2, U2 and D2 moves can only change the permutation of the edges on the
middle layer, it can never move them out.
2.3.2.4 Group G3
G3 is the group where we restrict all quarter turn moves. G3 contains the states where the edges in the L
and R faces are in their correct slices. Where slices are defined as the middle layer between any two opposite
faces, the UD slice is the middle layer between U and D, the FB slice is the middle layer between F and B and
the RL slice is the middle layer between faces R and L. G3 only contains edges in their correct slice because
all 180 degree turns can only permute edges within their respective slices. In addition, G3 enforces that the
parity of the edge permutations is made even (i.e. we need an even number of edge swaps). This is easy to
see if we start from the solved state and only perform 180 degree turns - we only ever swap the positions of
2 edges. Using our even edge swap lemma in section 2.2.1.1, we can also say that the permutation of the
corners is also even since the total number of swaps must be even in any cube state.
Below shows a table of the search space for each group transition:
21
Figure 2.13: Thistlethwaites group transition size
Groups Size
G0 G1 2048
G1 G2 1,082,565
G2 G3 29,400
G3 G4 663,552
This gives a worst case scenario of 52 moves to solve a cube which is far from optimal.
The A* part of the algorithm comes from the fact that it use heuristics to estimate its distance to our
goal. In this case our goal is the solved state and the distance is the number of moves required to solve a
given state.
22
Figure 2.15: Iterative Deepening vs IDA*
The highlighted areas in figure 2.15b and figure 2.15a represent what we actually search for a bound of
8. We can see that in IDA*, we only search a subset of all possible states for a given bound since some
branches have been pruned.
2.3.3.2.1 Heuristics The heuristic used must be admissible i.e it never overestimates the distance to
the goal. Since there is currently no way to estimate the solution length for any arbitrary cube state, Korfs
algorithm breaks down the heuristic into three smaller and easier to measure sub state estimates:
1. Corners - Number of moves to solve corners only
2. 6 of 12 Edges - Number of moves to solve any 6 of the 12 edges only
where h(s) is the heuristic value of some state, s, sc is the number of moves needed to solved the corners,
se1 is the number of moves needed to solve the first 6 of 12 edges and se2 is the number of moves needed to
solve the rest of the edges. Since the maximum number of moves to solve any of these sub states would be
less than or equal to the moves required to solve the whole cube, the heuristic is admissible.
2.3.3.2.2 Using the heuristic We first set up our search tree with the scrambled cube as the root node.
We start searching the tree and fixing the bound to 1, i.e we try all 1 move solutions using a depth first
search. We then try all 2 move solutions in a depth first fashion. When we wish to expand a cube state
to explore its children, we use the heuristic measure to estimate the number of moves required to solve the
cube from the given state.
Let the initial scrambled state be called m. Also, let the cube state we are questioning if we should ex-
pand be called n, we need two things to estimate the number of moves required to solve the scrambled cube.
The first is the number of moves we have already executed to get from m to n, lets call this g(m, n). The
second is the estimate of the number of moves required to solve the cube from state n, h(n). We can now
estimate the length of the solution that goes through node n, f (m, n) = g(m, n) + h(n). If the current bound
of our search is b, we know that if f (m, n) > b, then there is no point in exploring any paths involving n
since we are looking for a b move solution and so we can prune this branch.
As an example, lets assume we are currently searching for a 10 move solution. i.e. weve tried all 9,8,7,etc
solutions and have found nothing. Lets now assume that we encounter a node n that we got to via 5 moves
23
from a scrambled state m. i.e. g(m, n) = 5. Additionally, lets estimate that from this point, we require 7
moves to solve. i.e. h(n) = 7. In this case, since we have a bound of 10, there is no point in expanding this
node since our admissible heuristic told us that we would need at least 12 moves to solve it. We can prune
this branch which significantly reduces the number of nodes we need to search.
G0 = hU, D, R, L, F, Bi (2.15)
2.3.4.1 Group G0
The group G0 is the same as Thistlethwaites algorithm. (All reachable states).
2.3.4.2 Group G1
The group G1 is equivalent to Thistlethwaites algorithms groups G2 hL, R, F 2, B2, U 2, D2i but weve
rotated the cube using a Z rotation which would make all previous L , R, F2, B2, U2, D2 moves into U, D,
F2, B2, R2, L2 moves respectively. G0 G1 is the same as Thistlethwaites G0 G1 G2. Therefore the
same properties hold: good edges are always preserved, edges that belong on the UD-slice (layer between
U and D faces) are now fixed but not necessarily permutated in their correct positions.
2.3.4.3 Group G2
The group G2 is just the solved state. To transition directly from G1 G2 using only moves in G1 we must
restore the permutations of all 8 corners. The 8 edges that lie on the U and D faces and the permutation of
the 4 edges on the UD slice is the same as Thistlethwaites G2 G3 G4.
24
a solution that will transition us from one group to another. The good news is that the maximum depths
of these search trees are smaller than in Korfs. The maximum number of moves to transition between G0
and G1 is 12 and the number of moves to trainsition between G1 and G2 is 18. Most implementations of
Kociembas algorithm use IDA* for this search.
25
2.4 Existing Visioning Systems
Our aim is to somehow feed in data about the cube state so that we can begin to find a solution. There are
some requirements we are looking to satisfy:
2.4.1.0.1 RGB Colour Scheme All light is made from 3 component colours: Red, Green and Blue.
We can create any colour using these 3 component colours by varying the intensities on each component. We
can create white by having maximum intensity for all components, on the other hand, black can be made by
having 0 intensity for all components. This is why any grey colour can be represented using a single intensity
value.
Saturation determines the perceived intensity of the colour. It determines how colourful the colour is.
The closer the saturation is to 0, the more dull it will look. A saturation of 0 will just give a grey image.
Below shows an image of varying saturation for red2 .
1 http://i.imgur.com/PKjgfFXm.jpg
2 http://en.wikipedia.org/wiki/Colorfulness#/media/File:Saturationdemo.png
26
Figure 2.17: HSV Saturation Demo
Value determines the brightness of the colour. The higher the value, the closer the colour will appear to
white. The lower it is, the closer it will be to black.
2.4.2 Hardware
2.4.2.1 RGB sensors
The most primitive implementations of Rubiks Cube state readers use RGB sensors. The sensor will hover
over each sticker of the cube in a pre-determined order to read the colour. As suggested by the name, the
sensor reads the RGB values of the sticker, we can then manipulate this input to try to identify the colour.
Although simple, the major drawback with this approach is that it is difficult to distinguish between colours
in varying lighting situations. E.g. if a cube is placed in a room with yellow light then the white may be
mistaken for yellow. It is also very slow since we have to read one colour at a time. This is shown below3 :
2.4.2.2 Camera
More advanced implementations of Rubiks Cube state readers use cameras to take pictures of each face.
Attempts at this have previously been made and documented by Yakir Dahan and Iosef Felberbaum[6] in
their CubeSolver4 Android application in which the user is instructed to move the cube in front of the camera
until their algorithms detect where the cube is and what the colours of each sticker are. There are other
implementations of such a vision system out there and assumptions vary widely between them. Here are a
few:
3 http://imageshack.com/f/607/imag0130ql.jpg
4 https://play.google.com/store/apps/details?id=com.rubik.cubesolver
27
Fixing cube position - We assume the position of where the cube appears in the camera frame is
fixed. We can then make assumptions about which coordinate a specific sticker of the cube would lie
in. This is not reliable if the cube does not lie perfectly within the specified boundaries.
Fixed predictable lighting conditions - We assume that the pictures taken of the cube are in
predictable lighting conditions. This is so we can assume that colour values will always lie within
specific boundaries. This is not robust if we take the cube into a different kind of lighting. E.g. if we
assumed we would always have natural white light but we take the cube into a room with yellow light.
Fixed cube distance - As the distance between the camera and cube increases, the cube will appear
smaller and it will be harder to differentiate distinct squares. Similar to fixing the cube position within
the camera frame, if the cube is too far away, then the vision system will be unreliable.
The sections below show potential ways to work around these assumptions.
Suppose we wish to detect the edges of the image above. Let us reduce this problem into a 1 dimensional
problem and first plot the intensities of each pixel within the drawn square.
5 All images and content adapted from: http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/laplace operator/laplace operator.html
28
Figure 2.20: Getting the first derivative
The graph above on the left shows the plot of intensities of the pixels, f (t). Let us take the first derivative
of the graph. The graph in figure 2.20 shows the first derivative on the right, f 0 (t), i.e. the change in pixel
intensity. When the change in pixel intensity is at its highest (peak in the graph), we assume that this is an
edge. So how do we find peaks in the graph? We know at the peak of a curve, the gradient is 0. So if we
take the second derivative, f 00 (t) and look for values of t where f 00 (t) = 0, we can identify the peaks of f 0 (t).
More strictly, since we are working on a 2 dimensional image, in a 2 dimensional space, the laplacian
operator is defined as:
2f 2f
Laplace(f ) = 2
+ 2 (2.18)
x y
29
2.4.4.1.1 Estimation of illumination This is the estimate of the colour casted by the incoming light.
In its simplest form, Gray World Assumption computes the average of each colour channel of the image. Let
us assume we have an N M pixel image. Lets further assume that pixels are represented using the RGB
colour scheme. The average of any given colour channel c, can be computed by:
Pm=M 1 Pn=N 1
m=0 n=0 pixelc (m, n)
avgc = (2.19)
M N
where pixelc (m, n) is the colour channel c value of pixel(m, n)
2.4.4.1.2 Using the illumination estimate Now that we have an average for each of the colour
channels, avgr , avgg , avgb , we must work out how much we need to normalise each pixel to make them
a more neutral colour. Again, in its simplest form, the Gray World Assumption uses the average of all 3
channels to calculate the coefficient of adjustment for each channel. Let us name the coefficient of adjustment
for channel c be Sc
avg = (avgr + avgg + avgb )/3 (2.20)
Sc = avg/avgc (2.21)
We can now adjust each channel of each pixel:
Let pixelorig be the original unadjusted pixel and pixelbalanced be the colour balanced pixel
2.4.4.1.3 Variations The standard Gray World Assumption algorithm works well in most cases. There
are variants that can improve the algorithm in some use cases.
Sc = max/avgc (2.25)
Normalised Minkowski P-norms Although in its simplest form, we use the average of each channel
for our illumination estimate, another method of illumination estimate is to use p-norms in order to calculate
avgc . A p-norm as avgc is defined as:
Pm=M 1 Pn=N 1
pixelc (m, n)p
avgc = ( m=0 n=0
)1/p (2.26)
M N
Notice how the 1-norm just gives the average formula in equation 2.19
30
2.5 Existing Robots
There are currently a few Rubiks cube solvers out there. Each of them have various advantages and
disadvantages to their designs.
2.5.1 MindCuber
2.5.1.1 Design
MindCuber[2] is a single armed solver by David Gilday. The single arm is responsible for holding the cube
in place whilst the lower platform rotates the D face of the cube. The single arm is also responsible for
performing cube rotations. This robot can be built using an EV3 Lego Mindstorms set which gives it the
advantage of being cheap to build. This can be seen in figure 2.226 . The limitations lie with its design. Since
only a single side can be turned at a time, the cube needs to be rotated every time we wish to change the
face we want to turn. This is extremely time consuming. The MindCuber uses a single RGB sensor to read
each square individually.
2.5.1.1.1 Algorithm MindCuber is powered solely on the EV3 Intelligent Brick .With only 64 MB
of RAM and ARM-9 processor, the method used for solving the cube uses an undisclosed block-building
method which is far from optimal. Optimal algorithms require significantly more processing power and RAM
in order to find a solution within a reasonable time.
31
Figure 2.23: CubeSolver
2.5.2.1.1 Algorithm Since the CubeSolver has a PC at its disposal, Kociembas algorithm was the
algorithm of choice. This is because it will find a solution within a relatively short time and since the
execution of moves is slow, the solution needs to be short.
2.5.3 Cubestormer
CubeStormer was developed by David Gilday and Mike Dobson[24]. CubeStormer uses 4 arms but not much
else is known about its design since there is no official documentation. CubeStormer III took 18 months of
development to improve on their previous Cubestormer II design. The robot is powered by an ARM CPU
Smartphone and uses some variation of Kociembas algorithm judging by the solutions it generates.
32
2.6 PID Controller
The PID (Proportional-Integral-Derivative)[7] controller is used in a closed feedback loop mechanism.
We can see in the diagram above, the difference between an open and closed loop. The open loop gets a
single input and then blindly attempts reach a goal given this piece of information. On the other hand, a
closed loop has a few stages:
33
Kp e(t) This proportional term determines how much we scale up or down the error term by adjusting
the gain, Kp . A higher gain makes the system more sensitive and responsive to change. If the gain is
too high, the system can become unstable and oscillate and never reach the goal. If the gain is too
low, the system may be unresponsive to small errors.
Rt
Ki 0 e( )d This integral term determines the accumulated offset up until time t. In effect, if we
were to plot our error against time, it is summing the area under the curve until time t. Imagine a
case where we use a motor on a sticky floor. As we approach the goal distance, we want to reduce
the amount of power we give to the motor to make sure we dont overshoot the goal. If the power
becomes too low, we end up standing still and not moving towards the goal, the sum of errors becomes
increasingly large and this would then give more power to the motors. We adjust this term by scaling
Ki . A high value will accelerate the error towards 0. Too high and it may overshoot.
d
Kd dt e(t) This derivative term calculates the change in error over time. It is scaled using Kd . This
term helps to improve stability and can reduce the settling time.
We tune each of the K constants for different use cases.
34
Chapter 3
Design
3.1.1.1.1 Which one is better for searching? Although initially it may seem that 1s notation should
be favoured since it explodes less quickly than 2s notation (12 vs 18), if we compare them when searching
for long solutions, we will see that 2s notation produces less nodes: 1s notation has 1226 1.14 1028 nodes
for a 26 length solution whereas 2s notation has only 1820 1.27 1025 nodes for a 20 length solution - 1000
times less! 1s notation may find short solutions more quickly since it explodes less quickly, but in reality,
most solutions will require at least 10 moves in 2s notation, which definitely gives 2s notation the edge.
35
3.1.2 Vision Design
The aim of the vision system is to be able to detect the position of the cube within the camera frame re-
gardless of where the cube is placed and independent of background, as well as have robust colour detection.
This will be coded in the form of an Android application in Java to make use of an Android Smartphone
camera.
In order to accomplish this, we will use a well known Open Source Computer Vision (OpenCV) library.
OpenCV provides a myriad of tried and tested Computer Vision methods such as Canny Edge Detection,
Laplacian Operators, Gaussian Blur, etc. The OpenCV library is available for Android which is perfect as
we are using an Android Smartphone. This will allow us to quickly build an app that can scan all the sides of
the cube without having to worry about the deep technical details of individual Computer Vision methods.
We can use Androids built in video API and OpenCVs video frame processing libraries in order to achieve
most of what we want.
Stability - Having more arms allows us to grip the cube more easily, allowing for a more stable design.
Speed - Having more arms means we dont have to rotate the whole cube as many times during a
solve. Less moves means fast solve times.
Simple - Less cube rotations also reduces the complexity of the solve.
We chose to have 4 arms that grasp onto faces R, L, B and F as it means our robot can lie horizontally flat
on the ground which is a stable structure.
36
3.1.3.2 NXT Intelligent Brick vs BrickPi
A major choice we had to make was whether we wanted to use the classic NXT Intelligent Brick or BrickPi2 .
The BrickPi offers many advantages over the NXT. The BrickPi runs on top of a Raspberry Pi and along
with this gives us a Linux programming environment, WiFi capabilities and capacity to control more motors.
Naturally, BrickPi was our first choice.
There was a major drawback, however. We found the motor controllers on the BrickPi board were very
temperamental. The torque needed to turn the cube face would often burn out the motor controller. Nu-
merous attempts to change design and power parameters were unsuccessful and would slow down the arm
movements.
Another major problem we found was that when we tried to power 4 motors simultaneously, the volt-
age supplied with a DC 12v battery would not be sufficient to power all 4 motors and the Raspberry Pi.
The Raspberry Pi would often reboot when the voltage would drop below its operating threshold. Halfway
through the project we decided to make the switch to NXT instead. The NXT motor controllers are far
more robust than those of their BrickPi counterparts, and we did not suffer from any power issues. The
drawback is that we are only able to control 3 motors at most with each Brick and we are forced to use Blue-
tooth instead of WiFi. The NXT Bluetooth interface can only have 1 inbound connection and 3 outbound
connections which is sufficient for a master-slave configuration but does not give us any room to change into
a more complex configuration.
3.1.3.3 Lejos
Lejos3 is an NXT firmware that gives us a Java programming environment. This makes the programming
language consistent with the rest of the system. It also offers a well documented Robotics API which is
great for those who have never tried Lego programming before. The Lejos libraries also give us a lot of other
useful functionality such as an automatic closed feedback PID controller to keep motors in their positions
after moving. This is particularly useful for holding the Rubiks cube in place. The PID controller also gives
us far more accurate movement than BrickPis basic API. We can see why this is so in our PID controller
explanation in section 2.6.
The BrickPi API offers a very simple motor controller. In essence, it works as follows: We move our
motor for 100ms at a constant speed, and then check if weve reached our goal distance. If not, we repeat
the motor moverment for another 100ms. This continues in a loop until we hit the goal distance. Since
there is no dampening towards the goal, the BrickPi motors are very likely to overshoot the goal distance.
A disaster for this system because we need very accurate movements. An error of millimetres off a perfect
90 degree turn could cause chaos!
1. Take pictures of each side of the cube using the Smartphones camera
2. Build the cube state
3. Send the cube state from the Android Smartphone to the PC via WiFi
37
Figure 3.2: Overview of system through each stage
38
Chapter 4
Implementation
4.1.1.1 Corners
Since there are 8 corners, we can represent the position of any of the 8 corners by an integer from 0 -7 which
can easily be represented by 3 bits per corner, giving a total of a 24 bits. However, we still need to handle
orientation of the corner. Since a corner can only have three states, we can represent this with an integer
between 0-2 which can be represented in 2 bits. This means any corner can be represented in 5 bits. Since
the smallest amount of declarable memory in Java is a single byte, we can represent any corner using a single
byte. This gives us an 8 byte representation of the state of all corners. In our implementation we store an
array of 8 bytes. Below shows a table of how weve chosen to label our corners:
1 Rule A3a1: https://www.worldcubeassociation.org/regulations/#article-10-solved-state
39
Figure 4.1: Corner labelling
The byte in an element of the array gives the orientation of the corner in the first 2 least significant bits
and the remaining 3 bits gives the position of where that corner lies on the cube.
4.1.1.2 Edge
Edges are expressed analogously to corners but since there are 12 edges, we need 4 bits to represent an
edges position. Edges can only have 2 orientations so only a single bit is required. Just like corners, we can
represent any edge with just 5 bits. Again, since Javas smallest unit of declarable memory is a single byte,
we require 12 bytes to represent all edges. In our implementation we store an array of 12 bytes. Below shows
a table of how weve chosen to label our edges:
The byte in an element of the array gives the position of where that edge lies in the first 4 least significant
bits and orientation in the final bit.
40
possible. In order to move as quickly as possible, we precompute move tables that tell us how a corners or
edges position and orientation wll change given a move. No computation is needed during the search since
weve already predetermined where each piece will go. This will allow us to move with just 20 reads and 20
writes:
You may notice that this method could have easily been written as two for loops. The reason why we
chose not to use a loop is because a loop would translate to jumps in the code. This is not necessary for
something so simple and the sequential code will be much faster to execute. In this case, we sacrificed code
size for a little bit of extra speed. The same technique was used in the Gods number experiment[18].
Least number of moves to solve any corner states. There are 8! 37 = 88, 179, 840 possible corner
states.
6
Least number of moves to solve 6 of 12 edges. There are 12 P6 2 = 42, 577, 920 possible 6 edge states.
Least number of moves to solve the remaining 6 edges. There are 12 P6 26 = 42, 577, 920 possible 6
remaining edge states.
41
well as this, it uses 4 Capacity for each entry array (for when collisions occur). The default load factor is
0.75. We can use this to estimate the size of a HashMap implementation in memory[22]:
We can estimate the capacity by using 0.75 Size. For our corners table, this is: 4 0.75 88179840 + 32
88179840 2.9GB! Likewise, both edge tables would take 2 (4 0.75 42577920 + 32 42577920) 2.8GB.
Thats almost 6GB of memory on heuristic pattern databases alone! We can do better.
4.1.2.2.1 Corners Ignoring orientations, corners can have 8! permutations. Since weve labelled our
corners from 0 - 7 (ignoring orientations), we can use the Factorial Numbering Scheme in section 2.2.4.1 to
number each permutation of our corners.
We can then number the number of corner orientations[25]. There are 38 possible orientations but the
orientation of 7 corners automatically gives the orientation of the last corner. Using the corner lemma men-
tioned in section 2.2.1.3, we can prove that there are actually only 37 orientations. Since each orientation is
a number between 0 and 2, we can concatenate 7 of 8 corner orientations which gives a number in base 3.
Since each orientation in base 3 is unique, when converted to base 10, it will give a unique numbering for
each orientation.
Now we can combine the permutation number and orientation number using a cartesian product count-
ing scheme:
minimal hash(cornerState) = perm number 37 + orientation number (4.2)
4.1.2.2.2 Edges Ignoring orientations, edges have 12 P6 = 655280 permutations. We can use the nPr
numbering scheme described in section 2.2.4.2 to give each permutation a unique numbering.
There are 26 possible orientations for 6 of 12 edges. In this case, since every orientation is a number
between 0 and 1, we can concatenate 6 of 12 edge orientations to give a binary value. We can convert this
to base 10 to give a unique mapping.
We can now combine the permutation and orientation number using a cartesian product counting scheme
analogous to corners:
4.1.2.2.3 Faster Heuristic Lookup Now we have a way of encoding corner and edge states into se-
quential integers! We can now store a byte array where the index is the encoded state. Indexing by index i
will give us the minimum number of moves required to solve state i. A similar technique was used in finding
Gods Number[18].
4.1.2.2.4 NibbleArray We can still do better! After the first set of pattern database generation, we
found that the maximum minimal number of turns required to solve any corner state is 11 and the maximum
minimal number of moves to solve any 6 of 12 edge states is 10. This means we need just 4 bits to
represent each states move count. Since Javas minimum size for memory declaration is a single byte, we
decided to write a custom NibbleArray class that allows us to store 4 bits. Even for our largest table
of 88,179,840 corner states we only use 4 88, 179, 840 bits 42MB. For both edge arrays we only use
4 2 42, 577, 920 40MB giving us a memory consumption total of 82MB. Thats a massive 75x smaller
than the HashMap implementation!
42
4.1.2.2.5 Using the Heuristics So how can we generate these tables? In our implementation, weve
used a work queue approach. Lets use the corner heuristic generation as an example. Figure 4.4 shows some
Pseudo Java-like code for corner generation:
1 while(!workQueue.isEmpty()){
2 cornerState = workQueue.pop();
3 moveCount = cornerStates[encodeCorners(state)];
4
5 for(int move = 0; move < NUMMOVES; move++){
6 //Move corners
7 newCornerState = moveCorners(move, cornerState);
8
9 //The new corner encoding
10 cornerEncoding = encodeCorners(newCornerState);
11
12 if((cornerStates[cornerEncoding] == 0 ||
13 cornerStates[cornerEncoding] >
14 (moveCount + moveCost[move]))
15 && cornerEncoding != 0){
16
17 workQueue.add(newCornerState);
18 cornerStates[cornerEncoding] =
19 (moveCount + moveCost[move]));
20 }
21 }
22 }
The work queue starts with the solved cube state. On each iteration, we take a corner state from the
workqueue and find its corner encoding described by the perfect minimal hash function in section 4.1.2.2.
We look up the current number of moves we have calculated for this corner state. Then, for each of the
18 moves from this state, we calculate the 18 different corner states that it can generate. For each of these
new states we find the new corner encoding and check to see if the moveCount we calculate is less than our
current best guess. If it is, then we need to add it to the work queue so that we can further expand this
state as it may change other best guesses we have for the shortest number of moves to solve a state. When
the work queue is empty, it means weve found no states that can be reached in a shorter number of moves,
which gives us the shortest number of moves for any corner state.
4.1.2.2.6 The Search Now that we have the pattern database generated, the the searching is fairly
simple. Our search follows the typical IDA* algorithm described in section 2.3.3 of the background. Firstly,
we load our pattern databases into memory. This can take a few seconds but we keep the pattern database
in memory for all susequent solves so we only have to do this once. We take the cube state and maximum
depth to search as arguments. The maximum depth is 20 by default since this is Gods number: public
static String idaStarKorfs(int maxDepth, Cube cube). We take the cube state and use our
perfect hashing scheme to number it. We can now lookup this number as an index into our pattern database
to obtain the estimated moves to being solved. This is our first bound because we know that the number of
moves required to solve the cube will be at least as much as this estimate.
Some implementations of Korfs algorithm increment the bound when we fail to find a solution for that
bound, but we can do better than this. Since the heuristic for Korfs algorithm is admissible, we know that
the real number of moves needed to the goal can not be better than the number of moves given by our
heuristic. Assuming weve searched every node at a specific bound, we know that the solution length will be
at least as long as the minimum estimate to the goal for every node. We can recursively track the minimum
bound in our tree whilst we traverse it so we will not waste time attempting to sort through lots of heuristic
values.
43
Figure 4.5: Main IDA* loop
We then proceed to try all 18 moves on our current state: R, R2, R3, U, U2, U3, etc. This will give 18
new states. For each of these 18 states, we must recurse.We first check if any of these states are solved. If
one is solved, we return FOUND. For each of these states, we must recursively lookup the estimated number
of moves to goal and then check if the current number of moves weve done added to the estimated number
of moves to goal exceeds our current bound. If it does, we return the bound immediately for this branch
which stops any further searching of this branch. Otherwise, we continue to recurse. Notice how at the
end of each search, we must undo our last move using cube.move(Cube.INV MOVES[move]). This is
because we dont want to spawn a new child state in memory everytime we move, so we reuse the original
cube object and just undo our previous moves to get back to our original state.
44
Figure 4.6: Search function
4.1.3 Improvements
Implementing Korfs algorithm, we can only make so many design decisions about the base Korfs algorithm
until we can get no faster. Eventually, we need to make tweaks to the algorithm. The main drawback of
Korfs algorithm is the time it takes to find an optimal solution. This is due to its average branching factor.
Although a reduction from 18 to 13 is a significant reduction, a branching factor of 13 is still relatively large.
In this section, we detail attempts at speeding up Korfs algorithm.
4.1.3.1 Randomisation
Korfs algorithm uses a systematic depth first search until the branch it is searching reaches the specified
bound, but why should we search systematically when the cube isnt scrambled systematically? Assume R is
always the first branch, further assume we have a cube state which can be solved in 3 moves. What are the
chances that 2 of 3 of these moves are an R move? Additionally, by searching systematically, solutions that
start with moves involving the last branch will always take significantly longer to find than earlier branches.
Instead of systematically searching, we choose to explore branches randomly to eliminate this bias and
give each branch an equal chance of being explored earlier. Additionally, this will speed up searches on aver-
age since we reduce the chance of searching unlikely branches that contain cube states reached by performing
45
many of the same moves.
We can extend this idea further to opposite sides. Imagine we perform the moves R*L*R3. Performing
moves R*R3*L would give exactly the same cube state, which can be further simplified to L. This is because
opposite faces affect two disjoint sets of edges and corners so they can be performed in any order. Searching
these sequences are clearly redundant.
To implement this, we iterate backwards over our solution stack. We look at the latest move and then
rule out the relevant moves. We then look at the second to last move and check to see if that move involves
the opposite face to the last move. If it does, we rule out relevant moves for the opposite face. We only need
to look back 2 moves since the search is recursive. We can prove that any length of redundant sequence will
be eliminated. Using a simple example: R*L*R3 would never be a branch that we search because R3 would
be ruled out once we see R*L has already been performed.
Similarly, any redundant length sequence can be caught. E.g. R*L*R*L*L*R would never be a branch
we explore because R*L*R*L*L would also never have been generated in the first place because R*L*R*L
would have never been generated because R*L*R would have never been generated because R would have
been ruled out when we see that R*L has already been performed. A similar argument can be made for
any length of redudant sequence. Notice how the redundant sequence is equivalent to R3*L3 which will
eventually be searched so we are arent missing any branches that would potentially lead to a solution.
4.1.3.3.1 Cache Often performing the same sequence of moves over and over will give you the same
cube state youve started with. For example: (R2 U 2)6 . It is also possible that 2 different sequences of
moves lead to the same state. It is obvious that we cannot store every state weve ever seen; the search space
is far too large. This is why we have chosen to implement a fixed size cache that stores recently seen cube
states. When the cache is full, we remove half of the oldest entries from the cache. This is because we do
not want computation time to be dominated by removing and replacing cache entries each time we use the
cache. The cache stores two things: cube state and number of moves used to get to that cube state. We
add a cube state to the cache if it does not already exist there or if the cube state is reachable via a smaller
number of moves. When we encounter a cube state that is already in the cache and takes more moves to get
to that state, we choose not to expand this cube state any further because we know weve already explored
a smaller sequence of moves to get to that state. Although this will not completely eliminate duplicates,
it should have some impact on the duplicate state size reduction. Weve found that a cache size of around
1,000,000 nodes works well with 6GB of RAM. We implement this using a Java HashMap to minimize lookup
times for each state.
4.1.3.3.2 Fringe Search The major drawback with the IDA* algorithms low memory consumption is
that it is memoryless, i.e if it finds no solution for a specific bound, it will need to repeat all of its previous
work in order to search to the next bound. This is because we pruned branches based on the the estimated
moves to goal using the previous bound but the same branches may pass the heuristic tests for the new bound
which means they need to be explored. The fringe search improves on this by storing 2 fringe lists[26]:
46
Current - Stores cube states that we currently need to expand for the current bound.
Future - Stores cube states that we need to expand on the next bound.
In a fringe search, we start the same as IDA*. The only difference is that when we hit a cube state whose
estimation to the goal exceeds the current bound, in addition to pruning the branch, we add that cube state
to the Future list. Once we finish exploring the bound, if we still have not found a solution, we move to the
next bound. But instead of starting our search again, we move the Future list that we collected last round
into our Current list. We iteratively expand nodes in the Current list which simultaneously populates the
Future list for the next bound. This reduces search time since we wont have to repeat the search from the
top but instead we carry on where we left off in the previous bound. In essence, we are storing the fringe
of our search so we can pick up where we left off from the previous bound without having to repeat all of
our previous work.
In our implementation, we only keep track of solutions in our Current and Future lists. This is be-
cause as long as we have a sequence of moves and the inital scrambled cube state, we can recover any state
we were previously on by simply performing the sequence of moves we have in our potential solution. Fringe
Search has been proven to be faster than A* and IDA* in path finding game maps [26]. We wanted to
experiment with it on an application with a much larger branching factor because the size of our lists will
explode much faster. We perform a few tests in the Evaluation chapter to see how far we can take Fringe
Searching.
4.1.3.4 Parallelism
4.1.3.4.1 IDA* We can improve the speed of the IDA* search by parallelising the search. Assuming we
have n threads, we can split the tree into n subtrees and search each subtree in parallel. We find that using
one thread per core yields the best results. Figure 4.7 shows how we can split the tree.
Synchronisation We synchronise all n searches so that all searches search with the same bound. Even
if one core finishes early, we must wait for the other cores to finish their searches first. This is because even if
we found a solution in some higher bound, we would need to wait for the other cores to finish their searches
for the lower bounds in order to prove that the solution found is optimal. However, if we do find a solution
47
at some bound and all cores are searching at that bound, we can terminate the search early!
To do this, we use an array containing n flags. Each thread will have its own flag. A main thread is
responsible for keeping track of whether or not a solution has been found. It will set off each thread to
search their own respective subtrees and then sleep. If a thread finds a solution, it will set its flag to FOUND
before waking up the main thread. The main thread will then wake up and see that a solution has been
found and terminate all other n-1 threads. On the other hand, if none of the n threads finds a solution, they
will all set their flags to MOVE ON and wake up the main thread. The main thread will wake up and see
that all threads have asked it to move on. The main thread will only move on if ALL of the n threads have
finished their search.
4.1.3.4.2 Fringe We also experimented with multithreaded fringe search. In our multithreaded fringe
search, we expanded a node in the fringe just like how we parallelised our IDA* algorithm. That is, we split
the search tree n times using the fringe node as our root. This means we have to synchronise access to our
later fringe list since there will be lots of concurrent writes to this list from the n threads.
We make use of HPPCs ByteArrayDeque collection to keep track of our solutions. This collection becomes
invaluable when we implement fringe search in section 4.1.3.3.2 which must keep track of many solutions in
a list. The Byte wrapper uses at least 8 bytes of memory so the classic Java alternative would have cost a
lot of memory.
4.2.1.1 Phase 1
Remember that the first step moves us from G0 = hR, U, F, L, D, Bi to G1 = hU, D, R2, L2, F 2, B2i. This
step makes all corners and edges good and puts edges that belong on the middle slice between faces U
and D (UD-slice for short) within that middle slice. We can use our numbering system described in Korfs
algorithm (but slightly differently) in order to number these state, just like how we did in Korfs to save space.
We can represent any corner orientation using an integer between 0 and 2186 (see section 4.1.2.2.1). Edge
48
orientations can be numbered between 0 and 2047 because an edge orientation is either 1 or 0 so we can
describe any total edge orientation state using a combination of 11 1s or 0s. We only need 11 states because
even though we have 12 edges, the edge flip lemma described in section 2.2.1.2 halves the number of reachable
edge orientations to 211 . Finally, the number of edges on the UD-slice is 4. That means that these edges
can appear anywhere within the 12 edge positions and gives us a total number of UD-slice numberings of
12 C4 = 495 so we can number each of these UD-slice states with a number from 0 to 494. We can then
combine this triple to give us a unique encoding for any state in phase 1:
4.2.1.2 Phase 2
The phase 2 step moves us from G1 = hU, D, R2, L2, F 2, B2i to G2 = {C}, i.e solve the cube. In this step
we want to permute all of our remaining 8 edges not in the UD-slice and corners and we want to permute
our UD-Slice edges in their correct positions. We know that all edges and corners are orientated to make
them good. Again, we can number our edge permutations using the same method as our implementation
in Korfs algorithm but this time we only have 8 edges and 8 positions for those edges. Since we can only
use moves U,D,R2,L2,F2 and B2, the 4 positions in the UD-slice are not reachable but these 8 remaining
edges. This gives us 8! = 40320 states which can be numbered from 0 to 40319 using our factorial numbering
scheme. Permutating the 8 corners also has 8! = 40320 states since there are 8 corners and 8 positions that
those corners can reach using the moves in G1.
Moving the UD-Slice edges into their correct positions only has 4! = 24 states since there are 4 edges
on this UD-slice and the no edges can leave or enter the UD-slice. We can use our factorial numbering
scheme to number these states between 0 and 23. We can now label any state within phase 2 uniquely:
P hase2Encoding = ((8EdgeP ermN umber 40320) + CornerP ermN umber) 24 + U DsliceP ermN umber
(4.5)
This makes the Phase 2 pattern database a size of 40320 40320 24 = 39, 016, 857, 600
4.2.1.3 Two-Phase
Just like Korfs algorithm, now that we have a way to uniquely number each important state of the cube,
we can now generate our pattern databases using our work queue approach in section 4.1.2.2.5. Armed with
2 pattern databases for each phase, we can perform a search from G0 G1 and then G1 G2 using IDA*.
1. A better solution is found - In this case, we replace the solution found by Kociembas algorithm
with the one found by Korfs algorithm.
2. Nothing is found - In this case, we just use Kociembas algorithms solution.
3. A solution of the same length is found - In this case, we calculate a rough time to solve estimate
for both solutions described in section 4.3.1, and pick the faster solution.
49
4.3.1 Time to Solve Estimation
We can compare two sequences of moves by estimating the amount of time they would take to solve. Although
two sequences may have the same length, the sequence that requires less cube rotations is preferred. Since
weve chosen to use a four armed robot, there are two faces: U and D, that we cannot directly turn without
cube rotations. We must rotate using either an X rotation or a Z rotation. If we perform an X rotation, any
F or B moves would now require a cube rotation. Likewise, if we perform a Z rotation, any R or L moves
would require a cube rotation. We track the orientation of the cube and use profiling estimates to estimate
how long each move will take to perform and then sum the estimates.
4.3.1.0.1 Profiling We profile by simply measuring how long each move takes to execute in different
orientations and then rounding to give a rough ratio. A quarter move that requires no rotations is of size 1.
Everything else is then measured relative to this. We are essentially using quarter move rotations as a unit
of measurement.
Imagine we perform an X rotation in order to perform a U move. All subsequent U moves can now be
executed quickly. However, how much is a B or F move now going to cost? They cost around the same as U
moves used to. This is because we now need to perform an X3 rotation in order to perform B or F moves.
This shows that our costs are not static as we assume in Korfs algorithm, but they change depending on
the orientation of the cube. That means we will need to track an extra piece of information: cube orientation.
There are three possible orientations that we need to consider, the standard orientation we have been
working with so far, the orientation we obtain from performing an X rotation and the orientation we obtain
from performing a Z rotation. Not only this, there is now more than one branch for each move. All moves
on the U or D can be performed after a Z rotation or an X rotation which would lead to different costs for
subsequent moves. This increases our branching factor from 18 to 24. The same reasoning can be applied
to any other face for other orientations of the cube.
50
4.4.3 Search speed
We made a few early benchmarks for small scramble lengths of 10, 12 and 14 just to test how long searching
would take using this dynamic costing so that we could measure its viability for use in this project. The
results were unexpected. Searching for solutions for scrambles of length 10 took 40 times longer to find over
the normal Korfs algorithm. In addition, these solutions were no different from Korfs algorithm meaning
at depth 10, all of our test scrambles gave an optimal solution that was also the shortest number of robot
moves! At a depth of 12, the algorithm took a few hours to find a solution. Again, the solution was exactly
the same as the normal Korfs algorithm.
Here is a simple proof. Assume we have some cube state S that requires 20 moves to solve optimally.
We know that if we move any face, the number of moves to solve the resulting state must be at least 19.
Why can it not be below 19? If we arrive at a state that needs less than 19 moves to solve and we undo the
move we just performed, our solution length would be at most 19, which contradicts the assumption about
us needing 20 moves to solve the cube optimally. This shows that there could be multiple close to optimal
solutions for long solution depths.
Lets assume that our move, M , on S gives a resultant state that requires 20 different moves to solve
optimally. As well as the 20 moves, we have another solution of length 21: (S M )1 . If the difference
between our optimal and suboptimal solution is just 1, we can see that it is easy for us to solve the cube
faster than the optimal one if we have 1 less cube rotation in our suboptimal solution.
At lower scramble depths, the story is very different. Any move we perform can either increase or de-
crease the number of moves we will require to solve the resulting state, so the probability of there being a
solution that is only 1 or 2 moves away from optimal and has less cube rotations than the optimal solution
is lower than at longer solution depths.
51
4.5 Vision System
The vision system has a wide scope of potential. We could have taken a simple approach to extracting the
colours from the cube whilst sacrificing robustness and reliability. However, we chose to put more time into
making the vision more robust and reliable since this is the start of the cube solve. If we take a lot of
shortcuts with the vision system, we reduce the reliability of the system as a whole since a cube solve cannot
start if we cannot reliably determine the state of the cube.
1. Cube recognition - Recognise where the cube is in the frame and identify where the stickers lie.
2. Colour extraction - Accurately determine the colour of the area that we have identified as being a
sticker.
3. Construct the cube state - Construct the cubestate and check if the cube state is a valid one
according to lemmas from section 2.2.1.
52
4.5.1.2 Gaussian Blur
By using Gaussian Blur first we can remove noise. Weve found Gaussian Blur with these parameters quite
effective at removing noise for a Rubiks Cube:
Imgproc.GaussianBlur(mat2, mat2, new Size(7,7), 0);
We can see that Gaussian Blur removes a lot of noise caused by light reflections. Since the cube is plastic
and the stickers are glossy, this step is crucial to get clean edges for the next step.
We can see that Laplacian Operator gives us a vague outline of the cube and each of the stickers but also
gives us a few spurious outlines from noise that was not removed during the Gaussian Blur phase as well as
outlining other objects in the frame.
4.5.1.4 Dilation
Since the Laplacian Operator gives edges that are rather thin, we use a dilation transformation to make
these lines extremely obvious. That way, we can extract square regions of the cube with confidence:
Mat size20 = Imgproc.getStructuringElement(Imgproc.MORPH RECT, new Size(20, 20));
Imgproc.dilate(mat, mat, size20);
53
(a) Laplacian Operator (b) Dilation
We can see that dilation merges together some noise around the cube edges to eliminate them. We also
see that other objects edges that are not the cube also dilate! This is not ideal.
Notice how it outlines other objects in the frame too! We need a way to only recognise the cube in the
image.
Difference between inner area of contour and bounding rectangle area is below a threshold. Weve
found that below 10% gives a good estimate.
54
We still have another problem. What if there are other squares in the image? We eliminate other squares
in the image by looking for only squares that are the same size as the Rubiks cube stickers. But how do
we know how big the stickers are? We could assume some threshold but this would restrict the distances
that we would be allowed to have the camera at too much. Instead, we assume that the majority of squares
will probably come from the Rubiks cube itself and look for the the median squares area. If the number of
squares is dominated by the Rubiks cube, the median area value should be one of the Rubikss cube sticker.
We only proceed to the colour extraction stage if we have all 9 squares detected in a grid.
4.5.1.7.1 Prediction We estimate the distance between any two stickers by using the height or width of
the bounding box. Weve found that 20% of the bounding boxes width added to the bounding boxes original
width is a good estimate. From here, since the stickers are arranged in a grid, we can recover any other
sticker position by simply adding or taking away our distance between stickers measurements on the x or
y axis and since we have maximum and minimum (x,y) coordinates, we know exactly where each sticker
lies relative to another.
55
4.5.2 Recognising Colour
Now that we know the positions of each of the stickers on the face, we can begin to extract its colour. The
major difficulty with this stage is the thresholds needed to determine colour.
4.5.2.2.1 HSV Colour Wheel The HSV Colour wheel (Figure 2.16) gives us the degree thresholds for
colour when our colour source is white. We used these thresholds to determine the colour of each sticker
and assume we have white light. You may notice that the colour white itself is missing from the wheel.
Remember from section 2.4.1.1 that colour is represented as 3 components in the HSV model. The colour
wheel only gives thresholds for a hue with max saturation. White is just any colour with a high value and
low saturation.
Weve found that anything below 30% of the maximum S value for OpenCV and anything above 30% of
the max V value identifies white stickers the majority of the time in a moderately lit room. The thresholds
56
for other colours will vary depending on the specific stickers on the cube.
As we can see in Figure 4.16a, a yellow light source can have a profound affect on the hue and shifts
all colours towards yellow. This is particularly prominent on the white sticker which now looks more like a
yellow sticker.
4.5.2.3.1 P-Norms We experimented with a few different P-Norms as described in section 2.4.4.1.3 and
found varying results between just a standard mean and a high P-Norm. In all cases we found that using
the max average channel value mentioned in section 2.4.4.1.3 would yield a more well balanced image.
57
Using a normal mean, we can see that the Grey World Assumption algorithm tends to overcompensate
the yellow light and now there is a blue illumination in the image. However, using a high P-Norm of around
6, we can see the Grey World Assumption algorithm balances the colour well enough that it looks like white
light illumination!
58
4.6 Robot
This section describes the final component of the system: The Robot. The Robot has a 4 arms in a master-
slave configuration.
(a) Claw taken out of the robot (b) Claw attached to the robot
One gear is connected directly to the motor whilst the other lies beneath it. Rotating the main gear on
the motor in one direction will rotate the gear beneath it in the opposite direction which allows us to open
and close the claw. Although the arm configuration looks fairly simple, there are a few important aspects
to the arm that need to be considered:
Clearance - Adjacent perpendicular cube rotations must be able to be rotated freely without hitting
this arm. Figure 4.19 shows this. We can see the outlined turning circle of the cube. The arm
(excluding the claw) must be able to clear this turning circle.
Symmetric - The clamp must close symmetrically so that the cube lies perfectly in the centre of the
claw. This is so the centre of rotation remains consistently around the same spot. Small variantions
in the centre of rotation can lead to major shifts in the cubes position which would throw off all other
arms. We achieve this by using two gears of the exact same size. This means they will both move at
the same angular velocity.
Rigid - The arm needs to be rigid enough to not warp under strain. We have reinforced the mounting
points of the arms with square frames to prevent the arm from sagging when holding the cube. We
also used 4 gears instead of just 2 to minimise gear slippage under high tension. Gear slippage can
throw off the symmetry described above.
59
Figure 4.19: Turn circle of the cube
In figure 4.20c we can see the impact of having a non-symmetric claw gripping mechanism. The figure
shows where the cube lies before and after a 180 degree rotation for a non-symmetric claw grip - a massive
variation.
(a) Position of cube in normal ori- (b) Position of cube after 180 deg
entation turn of the claw (c) Superimposed
4.6.1.1.1 Performing a face rotation To perform a face rotation, an arm does the following:
1. Clamp down on the face to rotate using the clamping motor
Why even unclamp to rotate back? If we do not rotate the cube back, the wire that connects to
the NXT motor will keep twisting until it becomes tangled. We always twist back in the opposite direction
to reset the wire back to a neutral position.
60
Figure 4.21: Overview: Birdseye view
4.6.1.2.1 Configuration Choices Now that we have 4 Bricks and 8 motors, we need to decide which
motors should be connected to which Bricks. Although this may seem like a trivial task, the choice makes
a huge difference in how we implement the rest of the robot. We have 4 motors that control clamping (one
for each arm) and 4 motors that control face rotation (one for each arm).
By Arm Each arm uses 2 motors. A natural way to divide by the 8 motors is to give each arm an
independent NXT Brick. This gives us fine grained control over each arm and is easy to control using a
master-slave configuration. We could only need to send a single bluetooth command to the Brick in charge
of an arm to rotate a face since this single Brick can perform both clamping and rotation for that face.
However, this configuration does bring a major complications of synchronisation with it. Rotating the en-
tire cube needs two opposite arms to rotate in the same directions at the same speed. This would require
synchronisation between the two Bricks that control both arms. Remember that we are restricted to a
master-slave conguration. The delay between the master and slave communication is never consistent and
so synchronising between two slaves is complex.
We did experiment with this configuration and tried the Berkeley clock synchronisation algorithm[23]. We
would attempt to sync the clock of the slaves with the clock of the master (our Smartphone). When the
master sent commands to each of the slaves, it would also send a time for this command to be executed.
Determining the time to execute was the hardest problem. We needed to assume that bluetooth had some
maximum latency such that after this time, all messages will have been received. In reality, this assumption
61
just isnt true so this configuration would fall over in cases where a slave would receive a message later than
our specified maximum latency. We could keep increasing this maximum latency time but this would slow
down the solve.
By Job A slightly less natural way of segmenting the motors is to seperate by job. By job we mean
whether a Brick should be responsible for clamping motors or rotating faces. Each Brick can only control 2
clamps or 2 rotation motors. If we think about how we want our arms to operate, the only time where we
would need 2 arms to rotate or clamp at the same time is when we perform a whole cube rotation. In this
case, the two arms are opposite to each other. Therefore, it is logical that the 2 clamp or rotation motors
that we connect to each Brick are opposite to each other. If a slave is responsible for clamping or rotating
two opposite arms, we do not have to worry about synchronisation over bluetooth!
This does however, slightly complicate turning a face. We would need to send at least 4 messages (one
for each instruction specified in section 4.6.1.1.1) to 2 Bricks. The Brick that is responsible for clamping the
face we with to turn and the Brick that is responsible for the rotation of the arm we wish to rotate.
Although segementation by Job makes the face turning protocol more complex, it saves us from having
to run a synchronisation protocol all the time. Most synchronisation protocols attempt to delay performing
a job from the master until we are certain that all slaves have received their jobs so that we can start at the
same time. This would slow down solve times since we would always need to synchronise and check other
slaves before we can perform any moves. Therefore a small gain in complexity in this case is a good trade
off for the gain in speed and is actually more simple when compared to a complex synchronisation protocol.
4.6.2 Movements
With this design, we can perform any R,L,F or B moves with ease, since the arms clamp down on these
faces by default. Let us label each claw ClawR, ClawL, ClawF and ClawB with respect to the default face
it clamps down on. Let us also label the Bricks as follows: RLClamp, RLRotate, FBClamp, FBRotate,
where RL or RB tells us that this Brick is responsible for those 2 faces and Clamp or Rotate tells us its job.
Assuming the robot is already holding the cube (all clamps are closed), to perform a face rotation we do the
following using R as an example:
1. Tell Brick RLRotate to rotate ClawR clockwise 90 degrees
2. Tell Brick RLClamp to release the clamp of ClawR
62
This is pretty much in line with our description on performing a face rotation in section 4.6.1.1.1.
4.6.3 Software
Now that we know about how the robot is set up, lets dive into the software. As mentioned in section
3.1.3.3 we use Lejos firmware for our implementation. This brings a few advantages over the previous
BrickPi implementation: Lejos offers a fuller collection of libraries than the standard BrickPi libraries. These
include: built in PID controller and PID-like controller interfaces, more online support and more example
code. Weve found that the default Lejos PID parameters provided in firmware version 0.9.1beta-3 give
accurate enough motor control for this use case.
The first parameter true tells us to rotate clockwise, the second parameter tells us which arm to rotate.
In this case, since we are performing an R move, we pass Arm.RIGHT.
63
Connecting Protocol When the Arm is first constructed, we connect to the Bricks responsible for
clamping and rotation. We then send a message to each Brick telling it what its job is: clamping or rotating.
Move Protocol Our move protocol allows us to perform moves in sequence synchronously or allows
us to move multiple faces asynchronously. Moves are defined as follows:
1 public Arm{
2 public static final int CLAMPONE = 0;
3 public static final int CLAMPTWO = 1;
4 public static final int UNCLAMPONE = 2;
5 public static final int UNCLAMPTWO = 3;
6 public static final int CLAMPBOTH = 4;
7 public static final int UNCLAMPBOTH = 5;
8
9 public static final int CLOCKONE = 6;
10 public static final int CLOCKTWO = 7;
11 public static final int ANTIONE = 8;
12 public static final int ANTITWO = 9;
13 public static final int CLOCKBOTH = 10;
14 public static final int ANTIBOTH = 11;
15 public static final int CLOCK180ONE = 12;
16 public static final int CLOCK180TWO = 13;
17 public static final int ANTI180ONE = 14;
18 public static final int ANTI180TWO = 15;
19
20 public static final int DONE = -1;
21 public static final int END_SEQUENCE = -777;
22 ...
23 }
The suffix ONE tells the Brick to perform the move specified in the prefix using the first motor port.
Analogously, TWO tells the Brick to use the second motor port. Finally, BOTH means perform the prefix
action using both motor ports. We send one of these messages to the corresponding Brick. If the Bricks job
is to clamp and we send it a rotation command, it will simply ignore it. Otherwise, the Brick will perform
the action specified. Once the Brick finishes the movement, it will send a DONE message back. We can
either choose to wait for this message or continue. This is how we achieve the synchronous or asynchronous
behaviour. Once we finish transmitting all the moves we need, we send an END SEQUENCE message to all of
the Bricks to tell them to reset their positions ready for the next solve.
64
4.6.3.3 Motor rotation parameters
Every motor is built slightly differently so one motor might not necessarily have the same behaviour as
another even if they are given the same parameters. Each motor needs to be calibrated individually. In this
section we detail our calibration process and approximate parameters for each command.
4.6.3.3.1 First Approximation We calculate the first approximation using the following formula:
Nsecondary
Degrees T o M ove = desired angle (4.6)
Nmain
where Nmain is the number of notches in the main gear attached to the motor and Nsecondary is the number
of notches in the second gear next to the main gear. Figure 4.26 shows the gears used for controlling arm
rotation. The number of notches for the main gear is 44 and the secondary gear is 60. We can calculate the
how many degrees we require our motor to move for a 90 degree rotation as follows: 60/44 90 123.
4.6.3.3.2 Adjustment Although we have calculated how many degrees we wish to turn, this is only an
approximation. In reality, we need to adjust this. To adjust this parameter, we simply perform the moves
on a Rubiks cube and check for under/over rotation and adjust the parameters accordingly. Weve found
that our parameters vary at most 3-4 degrees between motors.
65
Chapter 5
Evaluation
5.1 Vision
5.1.1 Vision accuracy
Weve seen how our vision system is implemented, but how accurate is it? In this case, we measure accuracy
by the number of stickers the vision system correctly recognises. There are two parts to the vision system that
need to be measured here: the cube recognition process and the colour recognition process. It is important
that both of these obtain high scores in accuracy. The cube recognition process is the less important of the
two. After all, if the video camera API is feeding in 6 or 7 frames every second, we will get lots of chances
to try to recognise the cube from each frame. More serious errors occur when we wrongly identify a colour
since the system will not know it has wrongly identified a colour. This could then lead to impossible cube
states or wrong solutions.
Not being able to find the cube in the photograph is penalised the same way as getting every sticker of a
face wrong so number of errors moves in increments of 9. The other two items of data move in increments of
1 for each sticker they recognise correctly or incorrectly. As well as scenarios, we have compared our vision
system with that of Yakir Dahan and Iosef Felberbaum [6] which is well documented and has a version of
their application on the Android Play Store 1 .
66
Figure 5.1: Test Results For Vision
We can see that when we have outdoor white light, the cube is easily visible and the contours of the cube
are easily recognisable. In fact, we have 0 errors and 0 failures when we have sufficient light to illuminate
the cube i.e. we recognise it correctly in all photos. Similarly, indoors, we have 0 errors and just a small
percentage of colour recognition failures (less than 1%).
In dim scenarios, we start to see a major drop off in reliability and robustness. We found that most of
the pictures taken by our test device (Nexus 4) in low light scenarios were grainy and out of focus. This
could have contributed to the dramatic decline in our pass count. Colours also become harder to distinguish
when there is less light. However, our pass rate of over 50% is significantly higher than the 25% measured
with the competition. This shows that our cube recognition process is more robust than the competition
under low light.
If we now move towards different coloured illuminations, we chose to use the most common indoor lighting
scenarios: yellow light and blue light. We can see that under direct yellow light (In this case we used a lamp
with a fluorescent bulb) our vision system still performs reliably with almost a 70% pass rate. Our error
count does increase when compared to outdoor or indoor white light, however. This may be caused by heavy
shadows being cast on the cube under direct light which makes it harder to find contours. We also see that
our error count is still quite low which means that our white balancing algorithm is working nicely here.
Under blue light, we see similar results to yellow light. Our lighting conditions are slightly different. Our
light was brighter than our dim test but not as bright as our indoor white light test. Our error percentage
rises slightly due to low light. With the success rate still above 60% and a low colour recognition failure
percentage, the system is still perfectly usable under dim/moderate blue light. Below we have included some
sample images of each lighting scenario:
67
Figure 5.2: Sample images of each light scenario
Conversely, a dimly lit room can cause an equal amount of inaccuracy. In a dimly lit room, the stick-
ers are difficult to differentiate from the cube. The cube is also difficult to differentiate from the background.
In a dark room, we lose a lot of information on the edges and the sticker colours. With our test device
(Nexus 4), the camera lacked Optical Image Stabilisation and used a lens with a small aperture. Both limit
the amount of light entering the camera, giving the impression of a really dark image. There is not much we
can do in this situation other than attempt to introduce some light into the environment.
68
an even spread of colours on each face. But what about cases where it doesnt? In the rare cases such as
this case:
In this case, the average colour of the image is red. Gray World Assumption would mistakenly think
that the image is illuminated from a red light source! Our current work around for this is having a button
in the application that allows the user to enable or disable white-balancing. The user can then decide if the
image needs white balancing or not themselves. This obviously isnt the best solution but for now the user
can continue to the use the application until we find a more robust colour balancing algorithm.
5.1.2.4 OpenCV
Although OpenCV gives a lot of functionality it also leaves a lot to be desired. In order for the application
to work, we must also download the OpenCV Manager application to supplement the use of the OpenCV
libraries. The problem with this is that the OpenCV Manager lacks compatibility with a long list of Android
devices making it harder for us to test across multiple devices. Additionally, there is a known bug with the
Nexus 4 camera and OpenCV version 2.4.6 where the default camera only gives a maximum frame rate of
around 10 frames per second on the lowest resolution 320x240 when other slower devices are able to get 30
frames per second. It also causes the device to randomly reboot.
Given that this is just one part of the bigger project, we are overall very happy with the way the vision
system performs although there is obviously room for improvement.
69
5.2 Algorithm
In this section we want to benchmark our algorithm against existing ones. For these benchmarks we use
a dual-core Haswell Intel Core i5-4258U CPU 2.4GHz. We have configured our Java VM using 2 flags:
-Xms2048m, -Xmx6144m to give the VM enough Heap space so that our algorithm speed tests will not be
bottlenecked by swapping.2
Fringe Multi represents Korfs algorithm with all of our improvements including parallelism and
fringe searching. (Blue dashed line)
Korf s Kociemba Combined represents our Korfs and Kociemba combined algorithm. (Orange
solid line)
Kociemba represents the original Kociembas algorithm given by Kociembas libraries. (Grey dotted
line)
2 Thanks to Rokicki, Romas and Kociemba, Herbert and Davidson, Morley and Dethridge, John @ www.cube20.org for their
70
We can see that all algorithms cope fairly well until a depth of around 13. Perhaps the most surprising
results are Single/Multithreaded Fringe Searches We found that anything above a depth of 13 would make
our fringe lists use far too much memory (greater than 6GB) and the whole system would then be bottle-
necked by Javas Garbage collector. Additionally, we found that in our multithreaded implementation of
fringe search, the overhead of synchronising locks to our fringe list exceeded any benefit we received from
having multiple parallel searches. Perhaps a better implementation would have been to use n mutually
exclusive fringe lists that each thread can manage on their own. This way, we would not need to synchronise
any access. Each thread would then be responsible for performing a fringe search for its selected portion of
the fringe.
Amazingly, we see that there is a big improvement of the original stock Korfs algorithm vs Korfs Im-
proved algorithm (with improvements mentioned in section 4.1.3. Our improvements allowed us to search
an entire extra depth in almost the same time. Unsurprisingly, the biggest speedup came from our Multi-
threaded IDA* algorithm. We managed to obtain this speedup using only 2 cores. We suspect we would see
even bigger speed ups with 4 or even 8 cores. The dashed red line shows the 15 second mark that solvers
are allowed to have for inspection before attempting a solve. We can see that our multithreaded IDA*
algorithm just about makes the mark for a depth of 14. Anything above depth 14 takes significantly longer
than 15 seconds. At around this mark, we can see that our Korfs and Kociemba combined algorithm starts
to make a switch from Korfs to Kociembas solutions since Korfs is using all of its allotted 15 seconds.
Predictably, Kociembas algorithm remains consistently under 15 seconds. In terms of speed, Kociembas
algorithm clearly wins out for large depths. Although not quite visible in the graph, we did find that Korfs
Improved and Korfs Multithreaded were able to beat out Kociembas algorithm for depths below 12.
On the left we can see that Kociembas algorithm becomes more consistent the closer it is to solving a
20 move scramble. Solves that require 10 - 14 moves saw massive variation with a maximum difference of 10
moves away from the optimal solution. On average, Kociembas comes pretty close to optimal once we get
above 17 move solves - requiring 5 moves more than optimal at most. Obviously this is just a sample of solves.
In reality, it has been said that Kociembas algorithm can give a maximum of 30 moves (although this is rare).
71
On the right we can see our Kociembas Korfs combined algorithm. As expected, up until around a 14
move scramble, we remain optimal; a big improvement over Kociembas algorithm. The average line begins
to shift towards Kociembas algorithms average (red dotted line) at around 14. At this threshold, we had
some solves that were able to be solved under the given 15 seconds, others that were not. This brought down
the average slightly over just using Kociemba.
5.3 Robot
5.3.1 Robot Accuracy
In this test, we perform numerous face rotations and physically measure how far the face is away from a
perfect rotation. Any inaccurate face turning that will cause subsequent moves to fail will be deemed a
failure. Once we reach failure, we will stop testing any further. The more moves the robot can make, the
better it will score. Figure 5.6 shows what a perfect alignment of claws and faces looks like.
72
We also perform cube rotations and measure how far the clamps are off centre for each cube rotation.
Anything that will impede subsequent moves will be deemed a failure.
Figure 5.7: Number of face turns and cube rotations until failure
We can see that the PID controller really helps with achieving the desired accuracy. We were able to
perform 90+ face rotations before the drifts caused a failure. Since each solution length can only be 30 at
most and on average around 20, our chance of failing during a solve will be quite low! In fact, over 5 trials,
there were no instances where the face rotation number did not exceed 70 before failure.
Cube rotations on the other hand are seemingly less reliable. We found that on average we were able
to perform around 50 cube rotations before failure. Although this is much lower than face rotations, the
number of cube rotations in a solve on average is significantly less than face rotations. If we look at an
average case where we assume each move is equally likely to happen in a 20 move sequence then 1/3 of the
time we will need a cube rotation. This means on average we require around 6-7 cube rotations for a solve.
Over 5 trials, there were no instances where the number of cube rotations did not exceed 30 before a failure.
On average, the cube rotations should be reliable enough for every solve.
We can see that on average, our solve times lie around 74 seconds. Our average turns per second is just
0.28.
73
5.3.3 Robot Limitations
5.3.3.1 Claw Slippage
Naturally, any robot with no external facing sensors will experience some sort of drift. Although our cube
rotations are quite reliable, there is the odd ocassion where we perform a cube rotation and the cube slips
between the claws. This is because during the rotation, only 2 claws will be holding the cube (as described
in section 4.6.2. Even a small amount of slippage can cause an adverse effect on the rest of the solve. The
robot will not know this has happened and continue to try to solve the cube. The cube will no longer be
centred and we will not be able to finish the solve.
5.3.5 System
5.3.5.1 System Limitations
The main limitation with our system is that it isnt a closed system. We require human intervention during
the vision phase and we also require a PC to search for solutions. In a closed system, the vision would be
autonomous because the robot would be able to move the cube to view each side. We also need the user
to place the cube in the correct position for the claws to grab the cube. If this is off centre, the user must
manually adjust the position of the cube within the claws. Additionally, the system is not bullet-proof. Each
component has its own limitations which in the end brings down the robustness and reliability of the system
as a whole.
74
Chapter 6
Conclusions
We have built a system capable of solving the Rubiks cube within 70 seconds. Our system uses a reliable
and robust vision system, a fast and close to optimal solution finding algorithm and an accurate 4 armed
Lego robot. Our vision system is able to cope well under various lighting environments and can perform
well even with background noise. Our algorithm can find optimal solutions in a reasonable time for solution
depths of up to 14 and also give solutions that are close to optimal for depths above 14. Our robot can solve
a Rubiks cube quickly and reliably enough to rival intermediate level speedsolvers.
Although we werent hoping to break any world records, we saw this instead as an opportunity to explore
and compare many different aspects of each component of the system. We have compared and documented
various search algorithms, vision algorithms and robot builds for solving the Rubiks cube which has never
before been done in this field. There have been Rubiks cube solvers built in the past, but most of them use
a canned stock Kociembas algorithm and are rarely documented in such detail.
75
of our system. This is the dynamic aspect: we choose which substate heuristics to use depending on the
systems memory capacity. This would make use of the memory that IDA* saves.
76
6.1.3.2 Rotation Speed
Currently, our rotation speed is quite slow. We made the decision to sacrifice some rotation speed in favour
of stability and accuracy of turning. The biggest problem caused by high turn speed is vibration and inertia.
Vibration can throw off the alignment of the claws and inertia can jolt the cube out of alignment. We would
like to improve our turn speed and in order to do so, we would need to build a bigger and more stable
structure that isnt as susceptible to vibration. To solve our inertia problem, we can look into PID controller
adjustments so that we can dampen the deceleration of the motor more. We should also look into our claw
design and find ways to get more grip on the cube.
77
Bibliography
78
[19] Jaap Scherphuis. Thistlethwaites 52-move algorithm. http://www.jaapsch.net/puzzles/
thistle.htm. Accessed: June 1, 2015.
[20] Martin Schner. Analyzing rubiks cube with gap. http://www.gap-system.org/Doc/Examples/
rubik.html, 1993. Accessed: June 1, 2015.
[21] David Singmaster. Notes on Rubiks Magic Cube. Enslow Pub Inc, 1981.
[22] Mikhail Vorontsov. Java performance tuning guide: Memory consumption
of popular java data types - part 2. http://java-performance.info/
memory-consumption-of-java-data-types-2/. Accessed: June 1, 2015.
[23] Wikipedia. Berkeley algorithm wikipedia, the free encyclopedia, 2015. [Accessed: June 7, 2015].
[24] Wikipedia. Cubestormer 3 wikipedia, the free encyclopedia, 2015. [Accessed: June 7, 2015].
[25] Wikipedia. Factorial numbering system wikipedia, the free encyclopedia, 2015. [Accessed: June 7,
2015].
[26] Robert C. Holte Yngvi Bjornsson, Markus Enzenberger and Jonathan Schaeffe. Fringe search: Beating
a* at pathfinding on game maps. pages 23.
79
Appendix A
A.1 Prerequisites
A.1.1 Hardware requirements
Our application requires an Android Smartphone with a minimum of:
Android 4.0.3 (API 15)
Bluetooth
Camera
WiFi
The smartphone must also support OpenCV manager. So far, we have only tested our application using
OpenCV Manager version 2.18.
A.1.2 Setup
We need to set up a few things before we can use the system. Firstly, we must pair all NXT Bricks with
our Android Smartphone via Bluetooth and ensure WiFi and Bluetooth are turned on. Next, we need
to ensure our PC is ready to receive cube states so that it can find solutions. We can start a local server
instance from our PC by using the main function inside of our RubiksSolver project in package package
com.rubiks.lehoang.rubikssolver. We then turn on all NXT Bricks and run the progam Arm.nxj
on each Brick.
80
Figure A.1: U Face
If the colours are correct, rotate the cube to the F face using an X rotation and click next. Otherwise,
click next and try the same face again until the colours are correct. You can click next and try the same
face as many times as you need.
Once the app has taken the correct colours from the F face, perform another X rotation to the D face.
Touch next and repeat the same steps again until the colours are correct.
We then perform an X3 rotation followed by a Z rotation. This should get us to the R face. Touch next
and repeat.
81
Figure A.4: R Face
Perform another Z rotation to move to the B face. Touch next and repeat.
Perform another Z rotation to move to the L face. Touch next and repeat again. Once we have the final
face, we can either touch Done if the colours are correct or Try again if the L face was not recognised
correctly.
If weve performed this correctly, we should get a dialog box that says We have a cube state!. Otherwise,
we will get an error message.
82
Figure A.7: First Connection Failure
Dismiss this message and enter the Local IP address of where your Local Server instance is running.
If youve entered the correct IP, pressing Find Solution again should be successful and give back a Got
solution! message.
83