Sayan Kar Choudhury - CSE
Sayan Kar Choudhury - CSE
Sayan Kar Choudhury - CSE
COMPUTER SCIENCE
CA-2
NAME:-SAYAN KAR CHOUDHURY
SUBJECT:-MATHEMATICS
STREAM:-COMPUTER SCIENCE & ENGINEERING
SECTION:-C
UNIVERSITY ROLL NO:-14200122181
COURSE CODE:-BS-M201
1
APPLICATION OF BAYES THEOREM IN
SCIENCE AND TECHNOLOGY
2
CONTENTS
1. Abstract
2. Introduction
3. Fields of probability in Computer Science:-
I. Computer games
II. Neural Networks
III. Quantam Computing
4. Methods:-
I. Anatomy of a Computer Chess Program
II. Prior Work on Computer Chess and Shogi
III. Domain Knowledge
IV. Representation
5. Configuration
6. Evaluation
7. Conclusion
8. References
9. Acknowledgement
3
Abstract
In science, the probability of an event is a number that indicates how likely the event is to occur.
It is expressed as a number in the range from 0 and 1, or, using percentage notation, in the range
from 0% to 100%. The more likely it is that the event will occur, the higher its probability. The
probability of an impossible event is 0; that of an event that is certain to occur is 1.The
probabilities of two complementary events A and B – either A occurs or B occurs – add up to 1. A
simple example is the tossing of a fair (unbiased) coin. If a coin is fair, the two possible outcomes
("heads" and "tails") are equally likely; since these two outcomes are complementary and the
probability of "heads" equals the probability of "tails", the probability of each of the two outcomes
equals 1/2 (which could also be written as 0.5 or 50%).
4
Introduction
Probability is used in computer science in many ways. Probability is used to design algorithms
that make decisions, solve problems, and optimize performance. Probability is also used to
model and analyze systems, networks, and communication protocols. In addition, probability
is used to develop machine learning and artificial intelligence systems. Probability is used to
create models of data and to create predictive systems. Probability is also used to design
data structures and to optimize search algorithms. Finally, probability is used to analyze and
understand the behavior of complex systems.
5
Fields of probability in Computer Science
Computer Games:
There are many applications, but the more famous ones are related to interpreting
data and Machine Learning.
For example, let’s say that Google wants to show you the best search results when
you are consulting it. It cannot possibly know exactly what you like the most, and exactly what
you mean when you type something. But using the data you type, it can actually put it in a
probability-driven algorithm of what you want to mean, and get closer and closer to what your
tastes are. That is how they know how to display the best ads to you, things that you might be
interested on, and make your computer researching experience more personal.
A long-standing ambition of artifificial intelligence has been to create programs that can in
stead learn for themselves from fifirst principles. Recently, the AlphaGo Zero algorithm
achieved superhuman performance in the game of Go, by representing Go knowledge using
deep convolutional neural networks trained solely by reinforcement learning from
games of self-play (29). In this paper, we apply a similar but fully generic algorithm, which we
1 arXiv:1712.01815v1 [cs.AI] 5 Dec 2017call AlphaZero, to the games of chess and shogi as
well as Go, without any additional domain knowledge except the rules of the game,
demonstrating that a general-purpose reinforcement learning algorithm can achieve, tabula
rasa, superhuman performance across many challenging domains.
A landmark for artifificial intelligence was achieved in 1997 when Deep Blue defeated
the human world champion. Computer chess programs continued to progress steadily beyond
human level in the following two decades. These programs evaluate positions using features
hand crafted by human grandmasters and carefully tuned weights, combined with a high-
performance alpha-beta search that expands a vast search tree using a large number of
clever heuristics and domain-specifific adaptations. In the Methods we describe these
augmentations, focusing on the 2016 Top Chess Engine Championship (TCEC) world-
6
champion Stockfifish; other strong chess programs, including Deep Blue, use very similar
architectures.
Neural Network
Go is well suited to the neural network architecture used in AlphaGo because the rules of
the game are translationally invariant (matching the weight sharing structure of convolutional
networks), are defifined in terms of liberties corresponding to the adjacencies between points
on the board (matching the local structure of convolutional networks), and are rotationally and
reflflectionally symmetric (allowing for data augmentation and ensembling). Furthermore, the
action space is simple (a stone may be placed at each possible location), and the game outcomes
are restricted to binary wins or losses, both of which may help neural network training.
Instead of a handcrafted evaluation function and move ordering heuristics, AlphaZero utilises
a deep neural network (p, v) = fθ(s) with parameters θ. This neural network takes the board po
sition s as an input and outputs a vector of move probabilities p with components pa = P r(a|s)
2for each action a, and a scalar value v estimating the expected outcome z from position s,
v ≈ E[z|s]. AlphaZero learns these move probabilities and value estimates entirely from self
play; these are then used to guide its search.
Instead of an alpha-beta search with domain-specifific enhancements, AlphaZero uses a
generalpurpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of
simulated games of self-play that traverse a tree from root sroot to leaf. Each simulation proceeds by
selecting in each state s a move a with low visit count, high move probability and high value
(averaged over the leaf states of simulations that selected a from s) according to the current
neural network fθ. The search returns a vector π representing a probability distribution over
moves, either proportionally or greedily with respect to the visit counts at the root state
7
Quantam Computing
Methods
8
The effificiency of alpha-beta search depends critically upon the order in which
moves are considered. Moves are therefore ordered by iterative deepening (using a shallower
search to order moves for a deeper search). In addition, a combination of domain-independent
move ordering heuristics, such as killer heuristic, history heuristic, counter-move heuristic,
and also domain-dependent knowledge based on captures (SEE) and potential captures
(MVV/LVA).
A transposition table facilitates the reuse of values and move orders when the same
position is reached by multiple paths. A carefully tuned opening book is used to select moves
at the start of the game. An endgame tablebase, precalculated by exhaustive retrograde
analysis of endgame positions, provides the optimal move in all positions with six and
sometimes seven pieces or less.
Other strong chess programs, and also earlier programs such as Deep Blue, have used
very similar architectures (9,23) including the majority of the components described above,
although important details vary considerably. None of the techniques described in this section
are used by AlphaZero. It is likely that some of these techniques could further improve the
performance of AlphaZero; however, we have focused on a pure self-play reinforcement
learning approach and leave these extensions for future research.
In this section we discuss some notable prior work on reinforcement learning in computer
chess. NeuroChess evaluated positions by a neural network that used 175 handcrafted input
features. It was trained by temporal-difference learning to predict the fifinal game outcome,
and also the expected features after two moves. NeuroChess won 13% of games against
GnuChess using a fifixed depth 2 search.
Beal and Smith applied temporal-difference learning to estimate the piece values in
chess and shogi starting from random values and learning solely by self-play.
KnightCap evaluated positions by a neural network that used an attack-table based on
knowledge of which squares are attacked or defended by which pieces. It was trained by a
variant of temporal-difference learning, known as TD(leaf), that updates the leaf value of the
principal variation of an alpha-beta search. KnightCap achieved human master level after
training against a strong computer opponent with hand-initialised piece-value weights.
Meep evaluated positions by a linear evaluation function based on handcrafted features.
It was trained by another variant of temporal-difference learning, known as TreeStrap, that
updated all nodes of an alpha-beta search. Meep defeated human international master players
in 13 out of 15 games, after training by self-play with randomly initialised weights.
Kaneko and Hoki trained the weights of a shogi evaluation function comprising a mil
lion features, by learning to select expert human moves during alpha-beta serach. They also
per formed a large-scale optimization based on minimax search regulated by expert game logs
this formed part of the Bonanza engine that won the 2013 World Computer Shogi Champi
onship.
Giraffe (19) evaluated positions by a neural network that included mobility maps and
attack and defend maps describing the lowest valued attacker and defender of each square. It
was trained by self-play using TD(leaf), also reaching a standard of play comparable to
international masters.
9
DeepChess rained a neural network to performed pair-wise evaluations of positions.
It was trained by supervised learning from a database of human expert games that was pre-
fifiltered to avoid capture moves and drawn games. DeepChess reached a strong grandmaster
level of play.
All of these programs combined their learned evaluation functions with an alpha-beta
search enhanced by a variety of extensions. An approach based on training dual policy and
value networks using AlphaZero-like policy iteration was successfully applied to improve on
the state-of-the-art in Hex.
Domain Knowledge
1. The input features describing the position, and the output features describing the move,
are structured as a set of planes; i.e. the neural network architecture is matched to the
grid-structure of the board.
2. AlphaZero is provided with perfect knowledge of the game rules. These are used during
MCTS, to simulate the positions resulting from a sequence of moves, to determine game
termination, and to score any simulations that reach a terminal state.
3. Knowledge of the rules is also used to encode the input planes (i.e. castling, repetition,
no-progress) and output planes (how pieces move, promotions, and piece drops in shogi).
4. The typical number of legal moves is used to scale the exploration noise (see below).
5. Chess and shogi games exceeding a maximum number of steps (determined by typical
game length) were terminated and assigned a drawn outcome; Go games were terminated
and scored with Tromp-Taylor rules, similarly to previous work .
AlphaZero did not use any form of domain knowledge beyond the points listed above.
Representation
In this section we describe the representation of the board inputs, and the representation of the
action outputs, used by the neural network in AlphaZero. Other representations could have
been used; in our experiments the training algorithm worked robustly for many reasonable
choices.
10
Table S1: Input features used by AlphaZero in Go, Chess and Shogi respectively.
The first set of features are repeated for each position in a T = 8-step history. Counts are
represented by a single real-valued input; other input features are represented by a one-hot
encoding using the specifified number of binary input planes. The current player is denoted
by P1 and the opponent by P2.
Table S2: Action representation used by AlphaZero in Chess and Shogi respectively.
The policy is represented by a stack of planes encoding a probability distribution over legal
moves; planes correspond to the entries in the table.
The policy in Go is represented identically to AlphaGo Zero (29), using a flflat distribution
11
over 19 × 19 + 1 moves representing possible stone placements and the pass move. We also
tried using a flflat distribution over moves for chess and shogi; the fifinal result was almost
identical although training was slightly slower. The action representations are summarised in
Table S2. Illegal moves are masked out by setting their probabilities to zero, and re-
normalising the probabilities for remaining moves.
Configuration
During training, each MCTS used 800 simulations. The number of games, positions, and
thinking time varied per game due largely to different board sizes and game lengths, and are
shown in Table S3. The learning rate was set to 0.2 for each game, and was dropped three
times (to 0.02, 0.002 and 0.0002 respectively) during the course of training. Moves are
selected in proportion to the root visit count. Dirichlet noise Dir(α) was added to the prior
probabilities in the root node; this was scaled in inverse proportion to the approximate
number of legal moves in a typical position, to a value of α = {0.3, 0.15, 0.03} for chess, shogi
and Go respectively. Unless otherwise specifified, the training and search algorithm and
parameters are identical to AlphaGo Zero (29).
During evaluation, AlphaZero selects moves greedily with respect to the root visit count.
Each MCTS was executed on a single machine with 4 TPUs.
Evaluation
We evaluated the relative strength of AlphaZero (Figure 1) by measuring the Elo rating of
each player. We estimate the probability that player a will defeat player b by a logistic
function
and estimate the ratings e(·) by Bayesian logistic regression, computed by the BayesElo
program (10) using the standard constant celo = 1/400. Elo ratings were computed from the
results of a 1 second per move tournament between iterations of AlphaZero during training,
12
and also a baseline player: either Stockfifish, Elmo or AlphaGo Lee respectively. The Elo
rating of the baseline players was anchored to publicly available values (29).
13
Conclusion
In other words we can say that probability is used in computer science in many ways.
Probability is used to design algorithms that make decisions, solve problems, and optimize
performance. Probability is also used to model and analyze systems, networks, and
communication protocols. In addition, probability is used to develop machine learning and
artificial intelligence systems. Probability is used to create models of data and to create
predictive systems. Probability is also used to design data structures and to optimize search
algorithms. Finally, probability is used to analyze and understand the behavior of complex
systems.
Probability theory is a powerful tool that is used in many areas of Computer Science
and Engineering to model and analyze uncertainty, risk, and random phenomena. The
applications of probability theory in CSE are diverse and wide-ranging, including randomized
algorithms, machine learning, cryptography, queuing theory, information theory, and control
theory. A solid understanding of probability theory is essential for any student or practitioner
of CSE who wishes to apply probabilistic methods to real-world problems.
REFERENCES:
https://u-next.com/
WWW.UNACADEMY.COM
https://www.upgrad.com/
www.byjus.com
https://plato.stanford.edu/
14
ACKNOWLEDGEMENT
There are a few people who deserve this special mention without whom this assignment
would have been incomplete. I want to express my gratitude towards the teachers of
Mathematics Department sir Pranab Das Choudhury & Sarmee Bose maam. Their help,
guidance, valuable suggestions really helped to complete the assignment on time without any
problem. I would also like to sincerely thank my parents, whose support and help was also
required to make this assignment possible. Last but not the least, my friends also deserve to
get a special mention as they always supported and encouraged me during
this course, which was a great help.
15