0% found this document useful (0 votes)

13 views

Report

Uploaded by

256 chairmanmao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Report

Uploaded by

256 chairmanmao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Raghav Aggarwal Page 1

Hangman Challenge
Success rate in recorded session = 0.675

Algorithm Part 1
Base Algorithm (CatBoost)

1. Ideation
1.1 The default algorithm for solving word puzzles or games begins by making
initial guesses. These guesses are based on two factors: the length of the
unknown word and the frequency of letters in commonly used words.
Instead of considering where letters appear in the word, the algorithm focuses
on the nature of the dictionary you're using and how often individual letters tend
to appear in words. In other words, it prioritizes which letters are more likely to
be in the word based on their overall usage rather than their speci c position.

1.2 It can be observed that long words often share common pre xes and
su xes, such as 'tion', 'ous', 'ing', 'dis', 'pre', and so on. These common
pre xes and su xes should be used strategically in the early stages of guessing
words.

2. Dataset Creation
2.1 Simulated Hangman Instances

To train a model for making initial guesses, a dataset of simulated hangman

problems is created. Each hangman problem instance includes a set of correct
letters.

A unique set of letters is formed for each word in the dataset. For example, for
the word "tree," the set would be {‘t’, ’r’, ’e’}.

Then, for each combination of letters in this set, a hangman problem instance is
generated.
ffi
fi
ffi
fi
fi
Raghav Aggarwal Page 2

Hangman Instance Combination of correct guesses

_ree { ’t’ }

__ee { ’t’ , ‘r’ }

_____ { ’t’ , ‘r’ , ‘e’}

t_ee { ‘r’ }

t___ { ‘r’ , ‘e’ }

tr__ { ‘e’ }

_r__ { ’t’ , ‘e’}

2.2 Dataset for Improved Guessing

We need to create a dataset that utilizes the observations for initial guesses and
provides an advantage to the nal model for correctly identifying su xes and
pre xes.

Each hangman word is represented with an encoding, using '0' for positions to
be guessed and '-1' for other positions.

Hangman word - _pp_e

Encoding - 0,16,16,0,5,-1,-1,-1,-1,-1,-1,-1, ……… -1,-1,-1,-1,-1,0,16,16,5
Guess Letters - { ‘a’ , ‘l’ }

Hangman word - tr__qu_nt

Encoding - 20,18,0,0,17,21,0,14,20,-1, ..…, -1,20,18,0,0,17,21,0,14,20
Guess Letters - { ‘e’ , ‘a’, ’x’ }

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

_ p p _ e _ p p _ e

0 16 16 0 5 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 16 16 0 5

t r _ _ q u _ n t t r _ _ q u _ n t
fi
fi
ffi
Raghav Aggarwal Page 3

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

20 18 0 0 17 21 0 14 20 -1 -1 20 18 0 0 17 21 0 14 20

A row of size 80 is used (e.g., 20 in the example provided). The letter values are
inserted at the front and back, considering the word's size so that it ts.
Underscore positions are lled with '0', and the rest are lled with ‘-1'. This
encoding provides information about the positions of letters in pre xes and
su xes relative to each other.

The size 80 was chosen because the maximum size of the word in the given
dictionary was 26, which was assumed to be max 40 in the testing dictionary.

Note that this encoding gives the model the positions of letters in su x and
pre x relative positions.

This dataset had been created from the given dictionary using the le attached
“dataset_creation.ipynb”, and the subsequent dataset produced was saved in
parquet format named “fd.parquet”

3. Model Selection and Customization

The subsequent dataset contained 100 million rows/instances.

3.1 For each letter, a classi cation model is trained that identi es if the letter
should be guessed (1) or not guessed (0) for a given instance of the hangman
game.

Since the dataset only contains categorical features (‘0’ for ‘_’, ‘3’ for ‘c’, etc ),
the classi cation model chosen was “CatBoost Classi er” due to its
pro ciency on categorical datasets.

A model class is created that includes 26 models, one for each letter of the
alphabet. This class has a 'predict' function that provides a guessing probability
for each alphabet based on the models' prediction probability for '1' (i.e.,
guessing).

( See : catboost.CatBoostClassi er( ).predict_proba( ) )

ffi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
ffi
Raghav Aggarwal Page 4

The model achieves an average balanced accuracy of 0.89 for all letters on the
test split. This accuracy measures how accurately the model determines
whether a letter should be guessed in a given instance.

( Balanced accuracy calculates the average of the sensitivity (true positive rate)
and speci city (true negative rate) of a binary classi er.
See : sklearn.metrics.balanced_accuracy_score )

3.2 Some imbalance is observed in the prediction probabilities for letters with
lower occurrence frequencies, such as 'z,' due to its infrequent appearance in
the dictionary.

However, the model performs well in instances where words contain su xes like
"..._ized" because of how the dataset was created to account for letter positions
in su xes.

While the model excels at providing accurate initial guesses and identifying
pre xes and su xes, it encounters challenges when the word contains less
common letters. This is because the relative positioning of letters makes sense
for the beginning and end of the word but not for the middle.

4. Model Guessing Function

A simple helper function was created, identifying the highest probability letter
not present in the guessed_list.

(The model trained from this approach was stored as a pickle le named
“model_trained/multilabel_catboost_model.pkl” )

Algorithm Part 2
Fine-tuning algorithm

5. Ideation of the second algorithm

5.1 The base algorithm e ectively guesses most of the letters but struggles to
guess the nal missing letters, especially when they have low frequency in the
fi
ffi
fi
fi
ffi
ff
fi
fi
ffi
Raghav Aggarwal Page 5

dictionary. These unguessed letters often have their neighboring letters already
guessed.

6. Dictionary Creation
6.1 For length ve and, for instance, “une_ualizer”, the following substrings
were created containing only one missing letter position

Substrings for 5

une_u

ne_ua

e_ual

_ual

All substrings of length 5 for each word in the dictionary are created and stored
in a new_dictionary.

This is then used to create a frequency_array for each substring containing the
number of occurrences of each alphabet in the missing position when the rest of
the letters match the substrings in new_dictionary.

Substrings A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

une_u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0

ne_ua 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0

e_ual 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0 0 7 0 0 4 0 0 0 0 0 0

_uali 0 0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 9 0 0 0 0 4 0 0 0 0

6.2 The frequencies of already guessed letters are made 0.

6.3 The frequencies are then divided by the total frequency of each substring.
These frequency_arrays are then added across all substrings of the instance.

Note that this ensures that if a substring’s frequency_array has a single letter
with non-zero frequency or a very high-frequency ratio letter, it results in a
fi
Raghav Aggarwal Page 6

con dence value > 1 for that letter, prioritizing a uniquely positioned substring
over the ones with more frequency.

This nal array represents the con dence level of each letter with which that
letter should be guessed. For each length value, a single letter with the highest
con dence level is returned.

7. Merged predictions with Base Model

Con dence levels for substrings - 10,9,8,7,6,5,4,3 are calculated, and in a
descending fashion of substring length, it is compared with the current letter
guessed by the base model’s prediction probability.

If, at any comparison level, the con dence level of a letter is more than the
probability of the predicted letter, it is returned as the guessed letter, for that
instance, in the hangman game. Otherwise, the predicted letter is returned.

8. Modi cation
For substrings of length 10,9,8, the number of missing values allowed in the
selected substring was changed to 2 due to instances where two or more low-
frequency letters could appear in words of such considerable lengths.

9. Results
The initial algorithm showed promising performance in initial tests, but it
achieved signi cantly better results when integrated with a ne-tuning algorithm.
The second algorithm, speci cally designed to address the limitations of the
base algorithm, proved to be an ideal solution. After conducting 1000 recorded
sessions, the nal success_rate reached an impressive 0.625.
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi
fi

Filmora 14 Course
No ratings yet
Filmora 14 Course
2 pages
18.4 Evaluating and Choosing The Best Hypothesis: Model Selection: Complexity vs. Goodness of Fit
No ratings yet
18.4 Evaluating and Choosing The Best Hypothesis: Model Selection: Complexity vs. Goodness of Fit
8 pages
HAngman Report
40% (5)
HAngman Report
2 pages
Training Doc
No ratings yet
Training Doc
4 pages
Hangman!
No ratings yet
Hangman!
5 pages
Solutions To Problems From IOI 2018: Tomasz Idziaszek
No ratings yet
Solutions To Problems From IOI 2018: Tomasz Idziaszek
72 pages
Pps Project
No ratings yet
Pps Project
4 pages
Hangman Report1
No ratings yet
Hangman Report1
10 pages
pset2file - Jupyter Notebook
No ratings yet
pset2file - Jupyter Notebook
6 pages
AI Lab Final
No ratings yet
AI Lab Final
22 pages
Running Time of Binary Search
No ratings yet
Running Time of Binary Search
4 pages
Wa0000.
No ratings yet
Wa0000.
7 pages
Project
No ratings yet
Project
18 pages
7) Classes and Instances Find The Three Elements That Sum To Zero From A Set of N Real Numbers
No ratings yet
7) Classes and Instances Find The Three Elements That Sum To Zero From A Set of N Real Numbers
3 pages
Cs Project (1)
No ratings yet
Cs Project (1)
23 pages
HANGMAN GAME (1)
No ratings yet
HANGMAN GAME (1)
2 pages
coding (hangman)
No ratings yet
coding (hangman)
2 pages
Longest Palindromic Substring: Yang Liu
No ratings yet
Longest Palindromic Substring: Yang Liu
27 pages
Assi-4.ipynb - Colab
No ratings yet
Assi-4.ipynb - Colab
6 pages
Hang XXX
No ratings yet
Hang XXX
3 pages
XI CS- Adithi
No ratings yet
XI CS- Adithi
17 pages
Python Programming With Sequences of Data - Y9
No ratings yet
Python Programming With Sequences of Data - Y9
34 pages
Extra Credit Problem
No ratings yet
Extra Credit Problem
8 pages
CS388N Practice Questions Answers
No ratings yet
CS388N Practice Questions Answers
48 pages
okoko
No ratings yet
okoko
18 pages
ICPC 2019 - Online Preliminary Problem Set Analysis
No ratings yet
ICPC 2019 - Online Preliminary Problem Set Analysis
6 pages
Ai Lab 01
No ratings yet
Ai Lab 01
16 pages
AI Practicals
No ratings yet
AI Practicals
51 pages
Ai Lab 01
No ratings yet
Ai Lab 01
15 pages
Rabin Karp
100% (1)
Rabin Karp
13 pages
Unit 3
No ratings yet
Unit 3
18 pages
Hangman Game Using Python Code
No ratings yet
Hangman Game Using Python Code
1 page
Algorithms For Predictino
No ratings yet
Algorithms For Predictino
7 pages
Untitled Document-2
No ratings yet
Untitled Document-2
6 pages
$ Pip Install Random-Word
No ratings yet
$ Pip Install Random-Word
5 pages
Topic_2_Homeworks
No ratings yet
Topic_2_Homeworks
9 pages
PSA Lab 3
No ratings yet
PSA Lab 3
10 pages
Machine Learning Lab File
No ratings yet
Machine Learning Lab File
48 pages
DSA in Python
No ratings yet
DSA in Python
8 pages
Part 4: Implementing The Solution in Python
No ratings yet
Part 4: Implementing The Solution in Python
5 pages
Ad3311- Ai Lab Reference Manual-output
No ratings yet
Ad3311- Ai Lab Reference Manual-output
58 pages
MBAn Technical Interview Skills Training Workshop Part 1 With Jack Dunn Daisy Zhou Slides PDF
No ratings yet
MBAn Technical Interview Skills Training Workshop Part 1 With Jack Dunn Daisy Zhou Slides PDF
82 pages
34358-05 Hangman
No ratings yet
34358-05 Hangman
6 pages
Chapter5.pdf 2
No ratings yet
Chapter5.pdf 2
34 pages
CS 771 Assignment 2
No ratings yet
CS 771 Assignment 2
2 pages
Rrock Deepikaaaaa
No ratings yet
Rrock Deepikaaaaa
14 pages
Hangman Game
No ratings yet
Hangman Game
6 pages
Pascual Module3 Pex5
No ratings yet
Pascual Module3 Pex5
13 pages
Over Fitting and TBL
No ratings yet
Over Fitting and TBL
46 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
88 pages
Ai Manual
No ratings yet
Ai Manual
22 pages
Arithmetic, Run Length, Compression
No ratings yet
Arithmetic, Run Length, Compression
62 pages
52408
No ratings yet
52408
51 pages
Def Generate - N - Chars (A, B) : Return A B
No ratings yet
Def Generate - N - Chars (A, B) : Return A B
20 pages
03 03 Password Guessing With Neural Networks
No ratings yet
03 03 Password Guessing With Neural Networks
20 pages
IT Project Par 1
No ratings yet
IT Project Par 1
13 pages
Report Orange Quockhanh - Abc
No ratings yet
Report Orange Quockhanh - Abc
15 pages
Code on python
No ratings yet
Code on python
18 pages
Mock Quiz-1 Solutions
No ratings yet
Mock Quiz-1 Solutions
22 pages
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
From Everand
Python for Beginners: This comprehensive introduction to the world of coding introduces you to the Python programming language
Vere salazar
No ratings yet
Computer Programming: A Step-by-Step Guide to Learn Python, SQL, C++, C#, Raspberry Pi, and Data Science
From Everand
Computer Programming: A Step-by-Step Guide to Learn Python, SQL, C++, C#, Raspberry Pi, and Data Science
Vere salazar
No ratings yet
BRM Chi Square Test
No ratings yet
BRM Chi Square Test
13 pages
Jad Maker
No ratings yet
Jad Maker
2 pages
EM4TII+ Manual V1.0 EN
No ratings yet
EM4TII+ Manual V1.0 EN
33 pages
A Micro-Computer Program For The Elastic-Plastic Analysis and Optimum Design of Plane Frames
No ratings yet
A Micro-Computer Program For The Elastic-Plastic Analysis and Optimum Design of Plane Frames
7 pages
lec 1 Data Acquisition and preprocessing
No ratings yet
lec 1 Data Acquisition and preprocessing
8 pages
DJVU To AZW - Convert Ebook Online
No ratings yet
DJVU To AZW - Convert Ebook Online
4 pages
Medical Datasets
No ratings yet
Medical Datasets
21 pages
NPTEL Week8 Programmable Networks
No ratings yet
NPTEL Week8 Programmable Networks
196 pages
BW Strategies, Techniques, and Best Practices To Upgrade, Copy, and Migrate SAP BW Systems PDF
No ratings yet
BW Strategies, Techniques, and Best Practices To Upgrade, Copy, and Migrate SAP BW Systems PDF
86 pages
Introduction To ROC Analysis: Pattern Recognition Letters June 2006
No ratings yet
Introduction To ROC Analysis: Pattern Recognition Letters June 2006
16 pages
Erbium-Doped Fiber Amplifiers
No ratings yet
Erbium-Doped Fiber Amplifiers
6 pages
Uni Konstanz Dissertationen
100% (2)
Uni Konstanz Dissertationen
6 pages
COP CD Unit3
No ratings yet
COP CD Unit3
247 pages
PVTC Chapter 06 - Daily Startup and Shutdown A4
No ratings yet
PVTC Chapter 06 - Daily Startup and Shutdown A4
32 pages
A_comprehensive_survey_of_anomaly_detect
No ratings yet
A_comprehensive_survey_of_anomaly_detect
30 pages
Computer Network PDF
No ratings yet
Computer Network PDF
19 pages
Inside The Case
No ratings yet
Inside The Case
2 pages
Project Registration Form - V1.1
No ratings yet
Project Registration Form - V1.1
4 pages
Introduction of Operating System (Unit 1)
No ratings yet
Introduction of Operating System (Unit 1)
14 pages
Updates On The Philippine Identification System (Philsys) Step 2 Registration in San Pablo City, Laguna
No ratings yet
Updates On The Philippine Identification System (Philsys) Step 2 Registration in San Pablo City, Laguna
26 pages
2D Transformations
No ratings yet
2D Transformations
14 pages
Sony VAIO Brand Guidelines
100% (2)
Sony VAIO Brand Guidelines
27 pages
Chapter 5 - Elements of Arts
No ratings yet
Chapter 5 - Elements of Arts
7 pages
Manual TV Hannspree
No ratings yet
Manual TV Hannspree
23 pages
CMOS Analog and Mixed-Signal Circuit Design: Practices and Innovations Arjuna Marzuki 2024 Scribd Download
100% (2)
CMOS Analog and Mixed-Signal Circuit Design: Practices and Innovations Arjuna Marzuki 2024 Scribd Download
65 pages
High Power LED
No ratings yet
High Power LED
2 pages
Final - Drone Group Project Report - Obi, AKPAN, Major
No ratings yet
Final - Drone Group Project Report - Obi, AKPAN, Major
69 pages
Application Form For Stipendium Hungaricum Scholarship 2023 2024
No ratings yet
Application Form For Stipendium Hungaricum Scholarship 2023 2024
3 pages
Faktor-Faktor Yang Berhubungan Dengan Kebugaran Jasmani Pada Remaja Siswa Kelas Xi SMK Negeri 11 Semarang
No ratings yet
Faktor-Faktor Yang Berhubungan Dengan Kebugaran Jasmani Pada Remaja Siswa Kelas Xi SMK Negeri 11 Semarang
13 pages

Report

Uploaded by

Report

Uploaded by

Raghav Aggarwal Page 1

To train a model for making initial guesses, a dataset of simulated hangman

Hangman Instance Combination of correct guesses

__ee { ’t’ , ‘r’ }

_____ { ’t’ , ‘r’ , ‘e’}

t___ { ‘r’ , ‘e’ }

_r__ { ’t’ , ‘e’}

2.2 Dataset for Improved Guessing

Hangman word - _pp_e

Hangman word - tr__qu_nt

3. Model Selection and Customization

( See : catboost.CatBoostClassi er( ).predict_proba( ) )

4. Model Guessing Function

5. Ideation of the second algorithm

6.2 The frequencies of already guessed letters are made 0.

7. Merged predictions with Base Model

You might also like