Sayan Kar Choudhury - CSE

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 15

APPLICATION OF PROBABILTY IN

COMPUTER SCIENCE
CA-2
NAME:-SAYAN KAR CHOUDHURY
SUBJECT:-MATHEMATICS
STREAM:-COMPUTER SCIENCE & ENGINEERING
SECTION:-C
UNIVERSITY ROLL NO:-14200122181
COURSE CODE:-BS-M201

1
APPLICATION OF BAYES THEOREM IN
SCIENCE AND TECHNOLOGY

THIS TECHNICAL REPORT IS SUBMITTED AS


PART OF CA-2 IN B.Tech 2nd SEMESTER 2022-
23

2
CONTENTS

1. Abstract
2. Introduction
3. Fields of probability in Computer Science:-

I. Computer games
II. Neural Networks
III. Quantam Computing

4. Methods:-
I. Anatomy of a Computer Chess Program
II. Prior Work on Computer Chess and Shogi
III. Domain Knowledge
IV. Representation

5. Configuration
6. Evaluation
7. Conclusion
8. References
9. Acknowledgement

3
Abstract

In science, the probability of an event is a number that indicates how likely the event is to occur.
It is expressed as a number in the range from 0 and 1, or, using percentage notation, in the range
from 0% to 100%. The more likely it is that the event will occur, the higher its probability. The
probability of an impossible event is 0; that of an event that is certain to occur is 1.The
probabilities of two complementary events A and B – either A occurs or B occurs – add up to 1. A
simple example is the tossing of a fair (unbiased) coin. If a coin is fair, the two possible outcomes
("heads" and "tails") are equally likely; since these two outcomes are complementary and the
probability of "heads" equals the probability of "tails", the probability of each of the two outcomes
equals 1/2 (which could also be written as 0.5 or 50%).

These concepts have been given an axiomatic mathematical formalization in probability


theory, a branch of mathematics that is used in areas of study such
as statistics, mathematics, science, finance, gambling, artificial intelligence, machine
learning, computer science and game theory to, for example, draw inferences about the expected
frequency of events. Probability theory is also used to describe the underlying mechanics and
regularities of complex systems.

4
Introduction

Probability is used in computer science in many ways. Probability is used to design algorithms
that make decisions, solve problems, and optimize performance. Probability is also used to
model and analyze systems, networks, and communication protocols. In addition, probability
is used to develop machine learning and artificial intelligence systems. Probability is used to
create models of data and to create predictive systems. Probability is also used to design
data structures and to optimize search algorithms. Finally, probability is used to analyze and
understand the behavior of complex systems.

5
Fields of probability in Computer Science

 Computer Games:

There are many applications, but the more famous ones are related to interpreting
data and Machine Learning.

For example, let’s say that Google wants to show you the best search results when
you are consulting it. It cannot possibly know exactly what you like the most, and exactly what
you mean when you type something. But using the data you type, it can actually put it in a
probability-driven algorithm of what you want to mean, and get closer and closer to what your
tastes are. That is how they know how to display the best ads to you, things that you might be
interested on, and make your computer researching experience more personal.

When it comes to Machine Learning, they use probability distributions to “calibrate”


algorithms, and make the computer actually approach a desired performance.

A long-standing ambition of artifificial intelligence has been to create programs that can in
stead learn for themselves from fifirst principles. Recently, the AlphaGo Zero algorithm
achieved superhuman performance in the game of Go, by representing Go knowledge using
deep convolutional neural networks trained solely by reinforcement learning from
games of self-play (29). In this paper, we apply a similar but fully generic algorithm, which we
1 arXiv:1712.01815v1 [cs.AI] 5 Dec 2017call AlphaZero, to the games of chess and shogi as
well as Go, without any additional domain knowledge except the rules of the game,
demonstrating that a general-purpose reinforcement learning algorithm can achieve, tabula
rasa, superhuman performance across many challenging domains.

A landmark for artifificial intelligence was achieved in 1997 when Deep Blue defeated
the human world champion. Computer chess programs continued to progress steadily beyond
human level in the following two decades. These programs evaluate positions using features
hand crafted by human grandmasters and carefully tuned weights, combined with a high-
performance alpha-beta search that expands a vast search tree using a large number of
clever heuristics and domain-specifific adaptations. In the Methods we describe these
augmentations, focusing on the 2016 Top Chess Engine Championship (TCEC) world-

6
champion Stockfifish; other strong chess programs, including Deep Blue, use very similar
architectures.

Shogi is a signifificantly harder game, in terms of computational complexity, than


chess, it is played on a larger board, and any captured opponent piece changes sides and
may subsequently be dropped anywhere on the board. The strongest shogi programs, such
as Computer Shogi Association (CSA) world-champion Elmo, have only recently defeated
human champions. These programs use a similar algorithm to computer chess programs,
again based on a highly optimised alpha-beta search engine with many domain-specifific
adaptations.

 Neural Network

Go is well suited to the neural network architecture used in AlphaGo because the rules of
the game are translationally invariant (matching the weight sharing structure of convolutional
networks), are defifined in terms of liberties corresponding to the adjacencies between points
on the board (matching the local structure of convolutional networks), and are rotationally and
reflflectionally symmetric (allowing for data augmentation and ensembling). Furthermore, the
action space is simple (a stone may be placed at each possible location), and the game outcomes
are restricted to binary wins or losses, both of which may help neural network training.
Instead of a handcrafted evaluation function and move ordering heuristics, AlphaZero utilises
a deep neural network (p, v) = fθ(s) with parameters θ. This neural network takes the board po
sition s as an input and outputs a vector of move probabilities p with components pa = P r(a|s)
2for each action a, and a scalar value v estimating the expected outcome z from position s,
v ≈ E[z|s]. AlphaZero learns these move probabilities and value estimates entirely from self
play; these are then used to guide its search.
Instead of an alpha-beta search with domain-specifific enhancements, AlphaZero uses a
generalpurpose Monte-Carlo tree search (MCTS) algorithm. Each search consists of a series of
simulated games of self-play that traverse a tree from root sroot to leaf. Each simulation proceeds by
selecting in each state s a move a with low visit count, high move probability and high value
(averaged over the leaf states of simulations that selected a from s) according to the current
neural network fθ. The search returns a vector π representing a probability distribution over
moves, either proportionally or greedily with respect to the visit counts at the root state

7
 Quantam Computing

In a deterministic universe, based on Newtonian concepts, there would be no probability if all


conditions were known (Laplace's demon) (but there are situations in which sensitivity to initial
conditions exceeds our ability to measure them, i.e. know them). In the case of a roulette wheel, if the
force of the hand and the period of that force are known, the number on which the ball will stop would
be a certainty (though as a practical matter, this would likely be true only of a roulette wheel that had
not been exactly levelled – as Thomas A. Bass' Newtonian Casino revealed). This also assumes
knowledge of inertia and friction of the wheel, weight, smoothness, and roundness of the ball,
variations in hand speed during the turning, and so forth. A probabilistic description can thus be more
useful than Newtonian mechanics for analyzing the pattern of outcomes of repeated rolls of a roulette
wheel. Physicists face the same situation in the kinetic theory of gases, where the system, while
deterministic in principle, is so complex (with the number of molecules typically the order of
magnitude of the Avogadro constant 6.02×1023) that only a statistical description of its properties is
feasible.

Methods

 Anatomy of a Computer Chess Program

In this section we describe the components of a typical computer chess program,


focusing specififically on Stockfifish (25), an open source program that won the 2016 TCEC
computer chess championship.
Each position s is described by a sparse vector of handcrafted features φ(s), including
midgame/endgame-specifific material point values, material imbalance tables, piece-square
tables, mobility and trapped pieces, pawn structure, king safety, outposts, bishop pair, and
other miscellaneous evaluation patterns. Each feature φi is assigned, by a combination of
manual and automatic tuning, a corresponding weight wi and the position is evaluated by a
linear combination v(s, w) = φ(s)> w. However, this raw evaluation is only considered
accurate for positions that are “quiet”, with no unresolved captures or checks. A domain-
specialised quiescence search is used to resolve ongoing tactical situations before the
evaluation function is applied.
The fifinal evaluation of a position s is computed by a minimax search that evaluates
each leaf using a quiescence search. Alpha-beta pruning is used to safely cut any branch that
is provably dominated by another variation. Additional cuts are achieved using aspiration
windows and principal variation search. Other pruning strategies include null move pruning
(which assumes a pass move should be worse than any variation, in positions that are unlikely
to be in zugzwang, as determined by simple heuristics), futility pruning (which assumes
knowledge of the maximum possible change in evaluation), and other domain-dependent
pruning rules (which assume knowledge of the value of captured pieces).
The search is focused on promising variations both by extending the search depth of promis
ing variations, and by reducing the search depth of unpromising variations based on heuristics
like history, static-exchange evaluation (SEE), and moving piece type. Extensions are based
on domain-independent rules that identify singular moves with no sensible alternative, and
domain dependent rules, such as extending check moves. Reductions, such as late move
reductions, are based heavily on domain knowledge.

8
The effificiency of alpha-beta search depends critically upon the order in which
moves are considered. Moves are therefore ordered by iterative deepening (using a shallower
search to order moves for a deeper search). In addition, a combination of domain-independent
move ordering heuristics, such as killer heuristic, history heuristic, counter-move heuristic,
and also domain-dependent knowledge based on captures (SEE) and potential captures
(MVV/LVA).
A transposition table facilitates the reuse of values and move orders when the same
position is reached by multiple paths. A carefully tuned opening book is used to select moves
at the start of the game. An endgame tablebase, precalculated by exhaustive retrograde
analysis of endgame positions, provides the optimal move in all positions with six and
sometimes seven pieces or less.

Other strong chess programs, and also earlier programs such as Deep Blue, have used
very similar architectures (9,23) including the majority of the components described above,
although important details vary considerably. None of the techniques described in this section
are used by AlphaZero. It is likely that some of these techniques could further improve the
performance of AlphaZero; however, we have focused on a pure self-play reinforcement
learning approach and leave these extensions for future research.

 Prior Work on Computer Chess and Shogi

In this section we discuss some notable prior work on reinforcement learning in computer
chess. NeuroChess evaluated positions by a neural network that used 175 handcrafted input
features. It was trained by temporal-difference learning to predict the fifinal game outcome,
and also the expected features after two moves. NeuroChess won 13% of games against
GnuChess using a fifixed depth 2 search.

Beal and Smith applied temporal-difference learning to estimate the piece values in
chess and shogi starting from random values and learning solely by self-play.
KnightCap evaluated positions by a neural network that used an attack-table based on
knowledge of which squares are attacked or defended by which pieces. It was trained by a
variant of temporal-difference learning, known as TD(leaf), that updates the leaf value of the
principal variation of an alpha-beta search. KnightCap achieved human master level after
training against a strong computer opponent with hand-initialised piece-value weights.
Meep evaluated positions by a linear evaluation function based on handcrafted features.
It was trained by another variant of temporal-difference learning, known as TreeStrap, that
updated all nodes of an alpha-beta search. Meep defeated human international master players
in 13 out of 15 games, after training by self-play with randomly initialised weights.
Kaneko and Hoki trained the weights of a shogi evaluation function comprising a mil
lion features, by learning to select expert human moves during alpha-beta serach. They also
per formed a large-scale optimization based on minimax search regulated by expert game logs
this formed part of the Bonanza engine that won the 2013 World Computer Shogi Champi
onship.
Giraffe (19) evaluated positions by a neural network that included mobility maps and
attack and defend maps describing the lowest valued attacker and defender of each square. It
was trained by self-play using TD(leaf), also reaching a standard of play comparable to
international masters.

9
DeepChess rained a neural network to performed pair-wise evaluations of positions.
It was trained by supervised learning from a database of human expert games that was pre-
fifiltered to avoid capture moves and drawn games. DeepChess reached a strong grandmaster
level of play.

All of these programs combined their learned evaluation functions with an alpha-beta
search enhanced by a variety of extensions. An approach based on training dual policy and
value networks using AlphaZero-like policy iteration was successfully applied to improve on
the state-of-the-art in Hex.

 Domain Knowledge

1. The input features describing the position, and the output features describing the move,
are structured as a set of planes; i.e. the neural network architecture is matched to the
grid-structure of the board.

2. AlphaZero is provided with perfect knowledge of the game rules. These are used during
MCTS, to simulate the positions resulting from a sequence of moves, to determine game
termination, and to score any simulations that reach a terminal state.

3. Knowledge of the rules is also used to encode the input planes (i.e. castling, repetition,
no-progress) and output planes (how pieces move, promotions, and piece drops in shogi).

4. The typical number of legal moves is used to scale the exploration noise (see below).

5. Chess and shogi games exceeding a maximum number of steps (determined by typical
game length) were terminated and assigned a drawn outcome; Go games were terminated
and scored with Tromp-Taylor rules, similarly to previous work .

AlphaZero did not use any form of domain knowledge beyond the points listed above.

Representation

In this section we describe the representation of the board inputs, and the representation of the
action outputs, used by the neural network in AlphaZero. Other representations could have
been used; in our experiments the training algorithm worked robustly for many reasonable
choices.

10
Table S1: Input features used by AlphaZero in Go, Chess and Shogi respectively.
The first set of features are repeated for each position in a T = 8-step history. Counts are
represented by a single real-valued input; other input features are represented by a one-hot
encoding using the specifified number of binary input planes. The current player is denoted
by P1 and the opponent by P2.

The input to the neural network is an N × N × (MT + L) image stack


that represents state using a concatenation of T sets of M planes of size N × N. Each set of
planes represents the board position at a time-step t − T + 1, ..., t, and is set to zero for time-
steps less than 1. The board is oriented to the perspective of the current player. The M feature
planes are composed of binary feature planes indicating the presence of the player’s pieces,
with one plane for each piece type, and a second set of planes indicating the presence of the
opponent’s pieces. For shogi there are additional planes indicating the number of captured
prisoners of each type. There are an additional L constant-valued input planes denoting the
player’s colour, the total move count, and the state of special rules: the legality of castling in
chess (kingside or queenside); the repetition count for that position (3 repetitions is an
automatic draw in chess; 4 in shogi); and the number of moves without progress in chess (50
moves without progress is an automatic draw). Input features are summarised in Table S1.
A move in chess may be described in two parts: selecting the piece to
move, and then selecting among the legal moves for that piece. We represent the policy π(a|s)
by a 8 × 8 × 73 stack of planes encoding a probability distribution over 4,672 possible
moves. Each of the 8×8 positions identififies the square from which to “pick up” a piece. The
fifirst 56 planes encode possible ‘queen moves’ for any piece: a number of squares [1..7] in
which the piece will be moved, along one of eight relative compass directions {N, NE, E, SE,
S, SW, W, NW}. The next 8 planes encode possible knight moves for that piece. The fifinal 9
planes encode possible

Table S2: Action representation used by AlphaZero in Chess and Shogi respectively.
The policy is represented by a stack of planes encoding a probability distribution over legal
moves; planes correspond to the entries in the table.

Underproduction for pawn moves or captures in two possible diagonals, to


knight, bishop or rook respectively. Other pawn moves or captures from the seventh rank are
promoted to a queen.

The policy in shogi is represented by a 9 × 9 × 139 stack of planes similarly encoding a


probability distribution over 11,259 possible moves. The fifirst 64 planes encode ‘queen
moves’ and the next 2 moves encode knight moves. An additional 64 + 2 planes encode
promoting queen moves and promoting knight moves respectively. The last 7 planes encode a
captured piece dropped back into the board at that location

The policy in Go is represented identically to AlphaGo Zero (29), using a flflat distribution

11
over 19 × 19 + 1 moves representing possible stone placements and the pass move. We also
tried using a flflat distribution over moves for chess and shogi; the fifinal result was almost
identical although training was slightly slower. The action representations are summarised in
Table S2. Illegal moves are masked out by setting their probabilities to zero, and re-
normalising the probabilities for remaining moves.

Configuration

During training, each MCTS used 800 simulations. The number of games, positions, and
thinking time varied per game due largely to different board sizes and game lengths, and are
shown in Table S3. The learning rate was set to 0.2 for each game, and was dropped three
times (to 0.02, 0.002 and 0.0002 respectively) during the course of training. Moves are
selected in proportion to the root visit count. Dirichlet noise Dir(α) was added to the prior
probabilities in the root node; this was scaled in inverse proportion to the approximate
number of legal moves in a typical position, to a value of α = {0.3, 0.15, 0.03} for chess, shogi
and Go respectively. Unless otherwise specifified, the training and search algorithm and
parameters are identical to AlphaGo Zero (29).

During evaluation, AlphaZero selects moves greedily with respect to the root visit count.
Each MCTS was executed on a single machine with 4 TPUs.

Evaluation

To evaluate performance in chess, we used Stockfifish version 8 (offificial Linux release) as a


baseline program, using 64 CPU threads and a hash size of 1GB.

To evaluate performance in shogi, we used Elmo version WCSC27 in combination


with YaneuraOu 2017 Early KPPT 4.73 64AVX2 with 64 CPU threads and a hash size of
1GB with the usi option of EnteringKingRule set to NoEnteringKing.

We evaluated the relative strength of AlphaZero (Figure 1) by measuring the Elo rating of
each player. We estimate the probability that player a will defeat player b by a logistic
function

and estimate the ratings e(·) by Bayesian logistic regression, computed by the BayesElo
program (10) using the standard constant celo = 1/400. Elo ratings were computed from the
results of a 1 second per move tournament between iterations of AlphaZero during training,

12
and also a baseline player: either Stockfifish, Elmo or AlphaGo Lee respectively. The Elo
rating of the baseline players was anchored to publicly available values (29).

We also measured the head-to-head performance of AlphaZero against each


baseline player. Settings were chosen to correspond with computer chess tournament
conditions: each player was allowed 1 minute per move, resignation was enabled for all
players (-900 centipawns for 10 consecutive moves for Stockfifish and Elmo, 5% winrate for
AlphaZero). Pondering was disabled for all players.

13
Conclusion

In other words we can say that probability is used in computer science in many ways.
Probability is used to design algorithms that make decisions, solve problems, and optimize
performance. Probability is also used to model and analyze systems, networks, and
communication protocols. In addition, probability is used to develop machine learning and
artificial intelligence systems. Probability is used to create models of data and to create
predictive systems. Probability is also used to design data structures and to optimize search
algorithms. Finally, probability is used to analyze and understand the behavior of complex
systems.
Probability theory is a powerful tool that is used in many areas of Computer Science
and Engineering to model and analyze uncertainty, risk, and random phenomena. The
applications of probability theory in CSE are diverse and wide-ranging, including randomized
algorithms, machine learning, cryptography, queuing theory, information theory, and control
theory. A solid understanding of probability theory is essential for any student or practitioner
of CSE who wishes to apply probabilistic methods to real-world problems.

REFERENCES:

 https://u-next.com/
 WWW.UNACADEMY.COM
 https://www.upgrad.com/
 www.byjus.com
 https://plato.stanford.edu/

14
ACKNOWLEDGEMENT
There are a few people who deserve this special mention without whom this assignment
would have been incomplete. I want to express my gratitude towards the teachers of
Mathematics Department sir Pranab Das Choudhury & Sarmee Bose maam. Their help,
guidance, valuable suggestions really helped to complete the assignment on time without any
problem. I would also like to sincerely thank my parents, whose support and help was also
required to make this assignment possible. Last but not the least, my friends also deserve to
get a special mention as they always supported and encouraged me during
this course, which was a great help.

15

You might also like