Learning To Play Othello Without Human Knowledge

Our agent learns to play Othello without human knowledge by using self-play reinforcement learning. It uses a neural network to evaluate board states and select moves, and improves its policy through Monte Carlo tree search guided self-play games. The outcomes of self-play games are used to update the neural network without human data. Testing found that for 6x6 Othello, our agent achieves superhuman performance, outperforming humans.

Uploaded by

Anonymous 6VQrirYk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

594 views5 pages

Learning To Play Othello Without Human Knowledge

Uploaded by

Anonymous 6VQrirYk

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Learning to Play Othello Without Human Knowledge

Shantanu Thakoor Surag Nair Megha Jhunjhunwala

Stanford University Stanford University Stanford University
[email protected] [email protected] [email protected]

Abstract through self-play. This new system, AlphaGo Zero, even

outperforms the earlier AlphaGo model. This represents
Game playing is a popular area within the field of
artificial intelligence. Most agents in literature have a very exciting result, that computers may be capable of
hand-crafted features and are often trained on datasets superhuman performances entirely through self-learning,
obtained from expert human play. We implement a self- and without any guidance from humans.
play based algorithm using neural networks for policy In our work, we extract ideas from the AlphaGo Zero
estimation and Monte Carlo Tree Search for policy im- paper and apply them to the game of Othello. We use
provement, with no input human knowledge that learns board sizes of 6x6 and 8x8, for which learning through
to play Othello. We evaluate our learning algorithm for self-play is more tractable on the computing resources
6x6 and 8x8 versions of the game of Othello. Our work available to us. For evaluation, we compare our trained
is compared with random and greedy baselines, as well agents to random and greedy baselines, as well as a
as a minimax agent that uses a hand-crafted scoring
minimax agent with hand-crafted features. We also
function, and achieves impressive results. Further, our
agent for the 6x6 version of Othello easily outperforms compared against humans, and found that our 6x6 ver-
humans when tested against it. sion achieves superhuman performance very quickly.

2 Related Work
1 Introduction
Self-play for learning optimal playing strategies in games
Game playing is a popular area within the field of artifi- has been a widely studied area. For example, 9x9 Go
cial intelligence. One of the earliest works in this field has been studied in (Gelly and Silver 2008). Chess,
was a checkers engine developed in (Samuel 2000), that though widely played using alpha-beta search strategies,
learned through self-play and machine learning, and has also seen some work through self-play methods in
not through rule-based methods. An early triumph was (Heinz 2001). (Wiering 2010) study the problem of
Deep Blue (Campbell, Hoane, and hsiung Hsu 2002), a learning to play Backgammon through a combination
computer program capable of superhuman performance of self-play and expert knowledge methods.
on Chess respectively, beating the top human players. In particular, (Van Der Ree and Wiering 2013) learn
These are relatively simple games, where the branching to play Othello through self-play methods, and (Nijssen
factor for each state is small, and it is easy to evaluate 2007) apply Monte Carlo methods to Othello. For the
how good a non-terminal position is. It was estimated 6x6 version, a perfect strategy for player 2 is known to
that games like Go, which have a large branching factor, exist 1 .
and where it is very difficult to determine the likely (Silver et al. 2016) and (Silver et al. 2017b) have
winner from a non-terminal board position, would not trained a novel neural network agent to achieve state of
be solved for several decades. However, AlphaGo (Silver the art results in the game of Go. Very recently (just 4
et al. 2016), which uses recent deep reinforcement learn- days before submission of this report!), this approach has
ing and Monte Carlo Tree Search methods, managed to also been extended to a general game-playing strategy
defeat the top human player, through extensive use of in (Silver et al. 2017a), achieving state of the art in the
domain knowledge and training on the games played by games of Chess and Shogi.
top human players.
Many of the existing approaches for designing sys-
tems to play games relied on the availability of expert
3 Methods
domain knowledge to train the model on and evaluate We provide a high-level overview of the algorithm we
non-terminal states. Recently, however, AlphaGo Zero employ, which is based on the AlphaGo Zero (Silver
(Silver et al. 2017b) described an approach that used et al. 2017b) paper. The algorithm is based on pure
absolutely no expert knowledge and was trained entirely self-play and does not use any human knowledge except
1
Stanford University CS238 Final Project Report Solved by Joel F Feinstein
the rules of the game. At the core, we use a neural P (s, ·) = p~θ (s), which is the prior probability of taking
network that evaluates the value of a given board state a particular action from state s according to the policy
and estimates the optimal policy. The self-play is guided returned by our neural network. From these, we calcu-
by a Monte-Carlo Tree Search (MCTS) that acts as a late U (s, a), which is an upper confidence bound on the
policy improvement operator. The outcomes of each Q value of our edge. These values are calculated as
game of self-play are then used as rewards, which are pP
used to train the neural network along with the improved b N (s, b)
U (s, a) = Q(s, a) + cpuct P (s, a)
policy. Hence, the training is performed in an iterative 1 + N (s, a)
fashion- the current neural network is used to execute
self-play games, the outcomes of which are then used Here, cpuct is a hyperparameter controlling the degree
to retrain the neural network. The following sections of exploration (set as 1.0 in our experiments).
describe the different components of our system in more When using MCTS to find a policy from a given state
detail. s, we start creating the MCTS tree with s as the root.
At each step of our iteration, we calculate the action
3.1 Neural Policy and Value Network to take as the a which maximizes the upper confidence
We use a neural network fθ parametrised by θ that takes bound U (s, a). If our next state already exists in our
as input the board state s and outputs the continuous MCTS tree, we continue our simulation. If it does not
value of the board state vθ ∈ [−1, 1] from the perspective exist, we create a new node in our tree and initialize its
of the current player, and a probability vector p~ over all P (s, ·) = p~θ (s) and the expected reward v = vθ (s) from
possible actions. p~θ represents a stochastic policy that our neural network, and initialize Q(s, a) and N (s, a)
is used to guide the self-play. to 0 for all a. We then propagate the reward v back
The neural network is initialized randomly. At the up the MCTS tree, updating all the Q(s, a) values seen
end of each iteration of self-play, the neural network is during the simulation, and start again from the root.
provided training examples of the form (st , ~πt , zt ). ~πt On the other hand, if we encounter a terminal state, we
gives an improved estimate of the policy after performing propagate the actual reward found from the board and
MCTS starting from st (described in Section 3.2), and restart our MCTS.
zt ∈ {−1, 1} is the final outcome of the game from the Now, after a few simulations of the MCTS, our N (s, a)
perspective of the current player. The neural network values provide a good approximation for the optimal
is then trained to minimize the following loss function: stochastic process from each state. Hence, the action we
take is randomly sampled from a distribution πs , with
X 1
l= (vθ (st ) − zt )2 + ~πt log(~
pθ (st )) probability proportional to N (s, a) τ , where τ is a tem-
t perature parameter. Setting τ to a high value gives us
We use a neural network that takes the raw board almost uniform distribution, while setting it to 0 makes
state as the input. This is followed by 4 convolutional us always select the best action. τ is hence another
networks and 2 fully connected feedforward networks. hyperparameter controlling the degree of exploration
This is followed by 2 connected layers- one that outputs during our learning. Hence, the training example gen-
vθ and another that outputs the vector p~θ . Training is erated from the MCTS starting at s is (s, πs , r), where
performed using the Adam (Kingma and Ba 2014) opti- r ∈ {+1, −1} which is determined at the end of the
mizer with a batch size of 64, with a dropout (Srivastava game by considering whether the current player won or
et al. 2014) of 0.3, and batch normalisation (Ioffe and lost. Pseudocode of the MCTS search is provided in
Szegedy 2015). The code is implemented in PyTorch2 . Algorithm 1.

3.2 Monte Carlo Tree Search for Policy 3.3 Policy iteration through Self-play
Improvement We now describe the complete training algorithm. We
We use a Monte Carlo Tree Search (Browne et al. 2012) initialize our neural network with random weights, thus
to improve upon the policy learned by the neural net- starting with a random policy. In each iteration of
work. MCTS is a policy search algorithm that balances our algorithm, we play a number of episodes (100 in
exploration with exploitation to output an improved pol- our experiments) of self-play using MCTS. This results
icy after a number of simulations of the game. MCTS in a set of training examples of the form (st , ~πt , zt ).
explores the tree where nodes represent different board We exploit the symmetry of the state space to further
configurations and a directed edge exists between two augment our dataset. In our experiments, since Othello
nodes (i → j) if a valid action can cause state to transi- is invariant to rotations and flips of the board, we thus
tion from state i to state j. For each edge, we maintain obtain 7 extra training examples per examples in our
a Q value denoted by Q(s, a) which is the expected dataset.
reward for taking that action and N (s, a) which repre- Then, we update our neural network using our new
sents the number of times we took action a from state training examples, to get a new neural network. We then
s across different simulations. We also keep track of play our old and new networks against each other for a
number of games (40 in our experiments). If the new
2
www.pytorch.org network wins more than a set threshold number of times
Algorithm 1 Monte Carlo Tree Search Algorithm 2 Policy Iteration through Self-Play
1: procedure MCTS(s, θ) 1: procedure PolicyIterationSP
2: if s is terminal then 2: θ ←initNN()
3: return game_result 3: trainExamples ← []
4: if s ∈
/ Tree then 4: for i in [1, . . . , numIters] do
5: Tree ← Tree ∪ s 5: for e in [1, . . . , numEpisodes] do
6: Q(s, ·) ← 0 6: ex ← executeEpisode(nn)
7: N (s, ·) ← 0 7: trainExamples.append(ex)
8: P (s, ·) ← p~θ (s) θnew ← trainNN(trainExamples)
9: return vθ (s) 8: if θnew beats θ ≥ thresh then
10: else 9: θ ← θnew
return θ
11: a ← argmaxa0 ∈A U (s, a0 )
12: s0 ←getNextState(s, a)
13: v ←MCTS(s0 ) Algorithm 3 Execute Episode
14: Q(s, a) ← N (s,a)∗Q(s,a)+v
N (s,a)+1
1: procedure ExecuteEpisode(θ)
15: N (s, a) ← N (s, a) + 1 2: examples ← []
16: return v 3: s ← gameStartState()
4: while True do
5: for i in [1, . . . , numSims] do
6: MCTS(s, θ)
(60% in our experiments), we update the network and 7: examples.add((s, πs , _))
continue with the next iteration, resetting the MCTS 8: a∗ ∼ πs
tree. Else, we continue with the old network and the old 9: s ← gameNextState(s, a∗ )
MCTS tree, and conduct another iteration to augment 10: if gameEnded(s) then
our training examples further. Experimentally, we find 11: //fill _ in examples with reward
that when the new network was not better than the 12: examples ←assignRewards(examples)
old network, the new network obtained after a further 13: return examples
iteration of training was far better. Hence, in one or
two iterations we almost always improve our network.
In our experiments, the temperature parameter τ is set
to 1 for the first 25 turns in an episode, to encourage
4.1 Baselines
early exploration, and then set to 0. It is always set to We implemented two baselines for comparison with our
0 during evaluation. Pseudocode of the policy iteration trained AI player. The first is a greedy player that always
algorithm is provided in Algorithms 2 and 3. chooses a move that causes the maximum number of
flips in the next step of the game. The second is a
random player baseline. A random player chooses from
4 Experiments one of the valid moves randomly at each step in the
game.
The above sections describe a general approach to game- We also used a minimax agent4 as a third baseline
playing. In our experiments, we specifically tackled which tries to maximize the worst-case gain assuming
the problem of learning to play the game of Othello. that the opponent plays perfectly at each move by ex-
Othello is traditionally played on an 8x8 sized board. ploring the game tree up to a certain depth. The results
The size of the state space is exponential in the size of of the different baselines are listed in Table 1.
the board. Experimentally, we found that converging
to an optimal policy on the 8x8 board with limited 4.2 Human Evaluation
computing resources would take a very long time. In We also implemented an interface where a human player
order to show the effectiveness of our approach, we also can play against any of our baselines or our learned
ran experiments on a 6x6 version of Othello3 . The 8x8 strategies. For the 6x6 version, we evaluated our bot
version was trained with 50 simulations of the MCTS against a local player who has been playing Othello from
per step, while the 6x6 version was trained with 25. childhood. These results are also available in Table 1.
Both were trained on training examples of 100 episodes Since the 8x8 version took a lot more time to train, we
per training iteration. The 6x6 version completed 78 did not get a chance to evaluate against humans.
iterations of training, while the 8x8 version completed
30 iterations of training. Both were trained for over 4.3 Analysis of Experiments
72 hours on a Google Compute Engine instance with a
We analyze our performance as a function of training
GPU.
time. In Figure 1 and Figure 2, we plot our performance
3
against the greedy and random baselines against the
Environment adapted from https://github.com/
4
JaimieMurdock/othello From https://github.com/Zolomon/reversi-ai
Figure 1: Performance Against Random and Greedy Figure 2: Performance Against Random and Greedy
baselines over 30 iterations (6x6) baselines over 30 iterations (8x8)

Baseline 6x6 board 8x8 board

number of iterations trained. As we see, these simple Greedy 20/20 20/20
baselines are quickly beaten by the 6x6 version in a few Random 20/20 18/20
iterations. However, learning a good agent for the 8x8 Minimax 30/30 29/30
bot is much more difficult. If performance against a Human 6/6 -
more sophisticated baseline is observed, it would help
decide when our model has converged and we can stop Table 1: Number of games won against various baselines
training. by our final models
Aside from the comparisons against baselines, we
follow the (Silver et al. 2016) approach of analyzing the
games played by our agent, to try and understand its edge. Our agent convincingly beats all baselines in-
strategies. In Figure 3 we examine our agent’s early cluding greedy, random and the standard alpha-beta
game strategies against the minimax bot. Our agent is minimax AI baseline. Further, the time taken to make
black, while the opponent is white. Boards are shown a move is much less for our agent, since it is just a feed
on each turn after our agent has made a move. We see forward operation in a neural network, compared to the
that the strategy adopted is to quickly grow towards the minimax algorithm, which involves exploring an expo-
walls and corners, and capture the pieces there. This is nential state space to a large depth to get good results.
indeed a strong high-level strategy that human players As seen in Section 3, our framework is very generic in
use, since pieces at corners and walls are very difficult its implementation, and can be easily extended to many
for the opponent to flip. It is quite remarkable that our other games such as Chess or Go.
agent is able to display such subtle strategies through The original implementation by DeepMind (Silver et
self-play, even against a strong minimax opponent. al. 2017b) uses orders of magnitude more raw compu-
In Figure 4, we examine some late game moves of our tational power on industry hardware (4TPUs, 64GPUs,
agent against the minimax strategy. We observe that 4 and 19CPUs, for several days). In our work, we show
moves before the end, in terms of number of pieces we do that it is possible to train similar networks on commod-
not appear to be performing significantly different from ity hardware for smaller problems. We plan to release
our opponent. However, by the endgame our agent has our implementation for the open source community.
learned to position its pieces very strategically. Instead
of placing a position in a place which would maximize the References
number of flips in one move (as the greedy baseline would [Browne et al. 2012] Browne, C. B.; Powley, E.; Whitehouse,
do), it places them in such a way that the opponent has D.; Lucas, S. M.; Cowling, P. I.; Rohlfshagen, P.; Tavener, S.;
no moves left and is forced to pass. Hence, it can quickly Perez, D.; Samothrakis, S.; and Colton, S. 2012. A survey
cover a larger portion of the board without the opponent of monte carlo tree search methods. IEEE Transactions on
moving and thus completely dominate the board at the Computational Intelligence and AI in games 4(1):1–43.
end of the game. [Campbell, Hoane, and hsiung Hsu 2002] Campbell, M.;
Hoane, A.; and hsiung Hsu, F. 2002. Deep blue. Artificial
Intelligence 134(1):57 – 83.
5 Conclusions
[Gelly and Silver 2008] Gelly, S., and Silver, D. 2008. Achiev-
We implement an agent that learns to play Othello ing master level play in 9 x 9 computer go. In AAAI, volume 8,
through pure self-play, without using any human knowl- 1537–1540.
a b c d e f a b c d e f a b c d e f a b c d e f

1 1 1 1

2 2 2 2

3 3 3 3

4 4 4 4

5 5 5 5

6 6 6 6

Figure 3: Early game play of our agent(B) vs minimax(W), capturing walls and corners

a b c d e f a b c d e f a b c d e f a b c d e f

1 1 1 1

2 2 2 2

3 3 3 3

4 4 4 4

5 5 5 5

6 6 6 6

Figure 4: Late game play of our agent(B) vs minimax (W), forcing passes

[Heinz 2001] Heinz, E. A. 2001. New Self-Play Results in [Srivastava et al. 2014] Srivastava, N.; Hinton, G. E.;
Computer Chess. Berlin, Heidelberg: Springer Berlin Heidel- Krizhevsky, A.; Sutskever, I.; and Salakhutdinov, R. 2014.
berg. 262–276. Dropout: a simple way to prevent neural networks from
[Ioffe and Szegedy 2015] Ioffe, S., and Szegedy, C. 2015. overfitting. Journal of machine learning research 15(1):1929–
Batch normalization: Accelerating deep network training by 1958.
reducing internal covariate shift. In International Conference [Van Der Ree and Wiering 2013] Van Der Ree, M., and Wier-
on Machine Learning, 448–456. ing, M. 2013. Reinforcement learning in the game of othello:
[Kingma and Ba 2014] Kingma, D., and Ba, J. 2014. Adam: learning against a fixed opponent and learning from self-
A method for stochastic optimization. arXiv preprint play. In Adaptive Dynamic Programming And Reinforcement
arXiv:1412.6980. Learning (ADPRL), 2013 IEEE Symposium on, 108–115.
IEEE.
[Nijssen 2007] Nijssen, J. 2007. Playing othello using monte
carlo. Strategies 1–9. [Wiering 2010] Wiering, M. A. 2010. Self-play and using an
expert to learn to play backgammon with temporal differ-
[Samuel 2000] Samuel, A. L. 2000. Some studies in machine ence learning. Journal of Intelligent Learning Systems and
learning using the game of checkers. IBM Journal of Research Applications 2(02):57.
and Development 44(1.2):206–226.
[Silver et al. 2016] Silver, D.; Huang, A.; Maddison, C. J.;
Guez, A.; Sifre, L.; van den Driessche, G.; Schrittwieser, J.;
Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; Dieleman,
S.; Grewe, D.; Nham, J.; Kalchbrenner, N.; Sutskever, I.;
Lillicrap, T.; Leach, M.; Kavukcuoglu, K.; Graepel, T.; and
Hassabis, D. 2016. Mastering the game of go with deep
neural networks and tree search. Nature 529(7587):484–489.
Article.
[Silver et al. 2017a] Silver, D.; Hubert, T.; Schrittwieser, J.;
Antonoglou, I.; Lai, M.; Guez, A.; Lanctot, M.; Sifre, L.;
Kumaran, D.; Graepel, T.; Lillicrap, T.; Simonyan, K.; and
Hassabis, D. 2017a. Mastering Chess and Shogi by Self-Play
with a General Reinforcement Learning Algorithm. ArXiv
e-prints.
[Silver et al. 2017b] Silver, D.; Schrittwieser, J.; Simonyan,
K.; Antonoglou, I.; Huang, A.; Guez, A.; Hubert, T.; Baker,
L.; Lai, M.; Bolton, A.; et al. 2017b. Mastering the game of
go without human knowledge. Nature 550(7676):354–359.

Macbeth .Critica, Genero y Tragedia Del Ser Humano, 2007 PDF
100% (1)
Macbeth .Critica, Genero y Tragedia Del Ser Humano, 2007 PDF
18 pages
Science Aar6404
No ratings yet
Science Aar6404
5 pages
Mastering Chess and Shogi by Self-Play With A General Reinforcement Learning Algorithm
No ratings yet
Mastering Chess and Shogi by Self-Play With A General Reinforcement Learning Algorithm
19 pages
Chin CD Cover
No ratings yet
Chin CD Cover
32 pages
Neuro-Evolution of Augmenting Topologies
No ratings yet
Neuro-Evolution of Augmenting Topologies
18 pages
Mastering The Game of Go Without Human Knowledge
100% (1)
Mastering The Game of Go Without Human Knowledge
18 pages
Alpha Go Nature Paper
No ratings yet
Alpha Go Nature Paper
20 pages
Agz Unformatted Nature
No ratings yet
Agz Unformatted Nature
42 pages
Section 5
No ratings yet
Section 5
29 pages
Application of Reinforcement Learning To The Game of Othello
No ratings yet
Application of Reinforcement Learning To The Game of Othello
20 pages
An Analysis of Heuristics in Othello
No ratings yet
An Analysis of Heuristics in Othello
11 pages
Reinforcement Learning - Playing Tic-Tac-Toe (Pre-Print)
No ratings yet
Reinforcement Learning - Playing Tic-Tac-Toe (Pre-Print)
11 pages
AlphaGo Paper
No ratings yet
AlphaGo Paper
20 pages
Learning To Play Go From Scratch
No ratings yet
Learning To Play Go From Scratch
2 pages
Reinforcement
No ratings yet
Reinforcement
3 pages
Alphazero - The New Chess King - Ver2019uio
No ratings yet
Alphazero - The New Chess King - Ver2019uio
20 pages
Sayan Kar Choudhury - CSE
No ratings yet
Sayan Kar Choudhury - CSE
15 pages
Othello Solved
No ratings yet
Othello Solved
14 pages
Zhu2018 1
No ratings yet
Zhu2018 1
6 pages
Mastering The Game of Go With Deep Neural Networks and Tree Search - Nature - Nature Research
No ratings yet
Mastering The Game of Go With Deep Neural Networks and Tree Search - Nature - Nature Research
15 pages
Unbeatable Othello (Ver 1.0)
No ratings yet
Unbeatable Othello (Ver 1.0)
7 pages
AlphaZero - AI in Chess
100% (1)
AlphaZero - AI in Chess
9 pages
L1 - UCLxDeepMind DL2020
No ratings yet
L1 - UCLxDeepMind DL2020
97 pages
Mass Mahjong Decision System Based On Transfer Learning: Yajun Zheng Shuqin LI
No ratings yet
Mass Mahjong Decision System Based On Transfer Learning: Yajun Zheng Shuqin LI
6 pages
Alphazero - The New Chess King - Ver2020uio
No ratings yet
Alphazero - The New Chess King - Ver2020uio
19 pages
AlphaZero Research Paper Summary
No ratings yet
AlphaZero Research Paper Summary
3 pages
Algoritmi UI
No ratings yet
Algoritmi UI
72 pages
TD (λ) and Q-learning based Ludo players: · September 2012
No ratings yet
TD (λ) and Q-learning based Ludo players: · September 2012
9 pages
AlphaZero en
No ratings yet
AlphaZero en
14 pages
AlphaZero en PDF
No ratings yet
AlphaZero en PDF
14 pages
Alpha Go Zero Pseudo Code
No ratings yet
Alpha Go Zero Pseudo Code
3 pages
Deepchess: End-To-End Deep Neural Network For Automatic Learning in Chess
No ratings yet
Deepchess: End-To-End Deep Neural Network For Automatic Learning in Chess
8 pages
Lecture 12 Evaluating and Learning in Multi-Agent Systems 2
No ratings yet
Lecture 12 Evaluating and Learning in Multi-Agent Systems 2
51 pages
Icpram Chess DNN 2018
No ratings yet
Icpram Chess DNN 2018
8 pages
Literasi B. Ing
No ratings yet
Literasi B. Ing
9 pages
Reimagining Chess With AlphaZero
No ratings yet
Reimagining Chess With AlphaZero
7 pages
Othello
No ratings yet
Othello
24 pages
Playing Tic-Tac-Toe Using Genetic Neural Network With Double Transfer Functions
No ratings yet
Playing Tic-Tac-Toe Using Genetic Neural Network With Double Transfer Functions
8 pages
Companez2016 - Can Mcts Learn To Sacrifice
No ratings yet
Companez2016 - Can Mcts Learn To Sacrifice
31 pages
Unit 7
No ratings yet
Unit 7
3 pages
Backgammon Strategy
No ratings yet
Backgammon Strategy
23 pages
A 2023 Socio-Ecological Imagination
No ratings yet
A 2023 Socio-Ecological Imagination
9 pages
Assessing Game Balance With Alphazero: Exploring Alternative Rule Sets in Chess
No ratings yet
Assessing Game Balance With Alphazero: Exploring Alternative Rule Sets in Chess
98 pages
smashgetdiva21471180FULLTEXT01 PDF
No ratings yet
smashgetdiva21471180FULLTEXT01 PDF
12 pages
Full Text 01
No ratings yet
Full Text 01
32 pages
Kasparov, Chess, Drosophilia of Reasoning
No ratings yet
Kasparov, Chess, Drosophilia of Reasoning
2 pages
Teaching Deep Convolutional Neural Networks To Play Go: S SMS ED AC UK
No ratings yet
Teaching Deep Convolutional Neural Networks To Play Go: S SMS ED AC UK
9 pages
Impulse Balance Theory and its Extension by an Additional Criterion
From Everand
Impulse Balance Theory and its Extension by an Additional Criterion
Reinhard Selten
1/5 (1)
AI Algorithms: Foundations, Applications, and Advancements
From Everand
AI Algorithms: Foundations, Applications, and Advancements
Anand Vemula
No ratings yet
Othello Reinforcement Learning
No ratings yet
Othello Reinforcement Learning
30 pages
Learning Self-Game-Play Agents For Combinatorial Optimization Problems
No ratings yet
Learning Self-Game-Play Agents For Combinatorial Optimization Problems
8 pages
Lecture 24
No ratings yet
Lecture 24
25 pages
Paper 35
No ratings yet
Paper 35
9 pages
Mastering The Game of Go Without Human Knowledge
No ratings yet
Mastering The Game of Go Without Human Knowledge
43 pages
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
No ratings yet
Jkuilui Using Learning by Imitation: Dapeng Zhang, Zhongjie Cai, Bernhard Nebel
6 pages
Improvements To Increase The Efficiency of The Alphazero Algorithm: A Case Study in The Game 'Connect 4'
No ratings yet
Improvements To Increase The Efficiency of The Alphazero Algorithm: A Case Study in The Game 'Connect 4'
9 pages
Knightcap: A Chess Program That Learns by Combining TD With Minimax Search
No ratings yet
Knightcap: A Chess Program That Learns by Combining TD With Minimax Search
16 pages
Deep Learning Fundamentals in Python
From Everand
Deep Learning Fundamentals in Python
LazyProgrammer
4/5 (9)
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Original Games and Novel Game Variations
From Everand
Original Games and Novel Game Variations
Stanley Korn
No ratings yet
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
From Everand
Techniques and Tools for Artificial Intelligence. Neural Networks via R and PYTHON
César Pérez López
No ratings yet
Jazzblog Archives: Larry Grenadier and Jeff Ballard Masterclass - Ottawa Citizen
No ratings yet
Jazzblog Archives: Larry Grenadier and Jeff Ballard Masterclass - Ottawa Citizen
12 pages
Soal Pas B.inggris KLS Xi
No ratings yet
Soal Pas B.inggris KLS Xi
3 pages
8th Grade Football Roster
No ratings yet
8th Grade Football Roster
1 page
MQTT 2016
No ratings yet
MQTT 2016
4 pages
The Demons in My Room Hrebik John
No ratings yet
The Demons in My Room Hrebik John
3 pages
Kang
No ratings yet
Kang
2 pages
Admit - Card - 1008211610 - 27-03-2025 19 - 48 - 06
No ratings yet
Admit - Card - 1008211610 - 27-03-2025 19 - 48 - 06
1 page
J.S. Bach's Ornament Table
No ratings yet
J.S. Bach's Ornament Table
4 pages
Richard Wiseman: Psychology and Magic
No ratings yet
Richard Wiseman: Psychology and Magic
36 pages
Telechips 2012 v201212 - (General Intro)
No ratings yet
Telechips 2012 v201212 - (General Intro)
16 pages
Freight Train: (Sheet Music Plus Audio)
67% (3)
Freight Train: (Sheet Music Plus Audio)
9 pages
SM PC1800-6 Sebm027706 PDF
No ratings yet
SM PC1800-6 Sebm027706 PDF
955 pages
Accessing and Running Workday Reports
100% (1)
Accessing and Running Workday Reports
16 pages
Schneider Electric Modicon M221 Nano PLC TM221C16R
No ratings yet
Schneider Electric Modicon M221 Nano PLC TM221C16R
18 pages
USB 3.0 - en
No ratings yet
USB 3.0 - en
2 pages
E3 - Prince of Undeath
No ratings yet
E3 - Prince of Undeath
100 pages
AIR Fabric XL - User Guide - v1.1
No ratings yet
AIR Fabric XL - User Guide - v1.1
19 pages
Chess Combinations 1 World Champions (Sinsol)
100% (2)
Chess Combinations 1 World Champions (Sinsol)
255 pages
Bank Protection Guidance
No ratings yet
Bank Protection Guidance
28 pages
Camping New York - A Comprehensive Guide To Public Tent and RV Campgrounds (PDFDrive)
No ratings yet
Camping New York - A Comprehensive Guide To Public Tent and RV Campgrounds (PDFDrive)
227 pages
IV. Engine Primary Tightening Torque Table
No ratings yet
IV. Engine Primary Tightening Torque Table
5 pages
Naturalism in Cinematography
No ratings yet
Naturalism in Cinematography
13 pages
Jadwal
No ratings yet
Jadwal
7 pages
Rubric Conversation Comic
No ratings yet
Rubric Conversation Comic
2 pages
LG Monitor Pricelist by Bizgram Whatsapp 87776955 PDF
100% (1)
LG Monitor Pricelist by Bizgram Whatsapp 87776955 PDF
2 pages
Quarter 1 Lesson 4
No ratings yet
Quarter 1 Lesson 4
9 pages
Key Test 3
No ratings yet
Key Test 3
3 pages
Hare Tortoise
No ratings yet
Hare Tortoise
19 pages
Literature Review
100% (1)
Literature Review
2 pages