Multi-Agent Systems project

Symmetric Hangman

Jefta Saija & Ruud Henken
April 2011

Contents

  1. Introduction
  2. Problem description
  3. Implementation
  4. Output and results
  5. Documents
  6. References

1. Introduction

The original game of Hangman is a two player game. One player (player 1) thinks of a word and the other player (player 2) is to guess the word. The word is represented by underscores. Player 2 is to guess the word by asking single letters. If the letter occurs in the word, player 2 gets to know the positions of the occurrences in the word. If the letter does not occur in the word, player 1 draws an element of the hangman diagram (see figure below). The game ends if:
  • Player 1 completes the hangman diagram, player 1 wins the game
  • Player 2 completes the word, player 2 wins the game
This is not a trivial game since the alphabet contains more letters than the diagram contains elements. Therefore player 2 needs to choose the letters carefully.

Hangman diagram

2. Problem description

We want to model a symmetric version of hangman. That is, both players need to complete a word and prevent their hangman diagram to be completed. The symmetric version is more complicated than the non-symmetric version. If one players asks a letter in the symmetric version, both players reveal the occurrences of the letter in their word. This means that both players have to be even more careful in asking letters. You do not want to help your opponent to win the game by asking a specific letter unless it is strictly necessary.

This requires knowledge of the other player's word. Similar to the non-symmetric version of the game both players know the word the opponent has to guess. So, both players only need to reason about the letter occurrences in their own word, since they simply know the letter occurrences in the other player's word.

Knowledge and possible worlds
Both players are using the same lexical database. That is, they share common knowledge on all possible words.

Based on the following knowledge, both players will turnwise need to select a letter:
  • Lexical database
  • Word length
  • Guessed letters
    • Letters at specific positions
    • Letters NOT occurring
  • Probability of occurrence at opponents word
Word selection
A great deal of effort is in choosing the right word for your opponent. By choosing a difficult word for your opponent the chance of winning the game increases. But what is a 'difficult' word? Both players do not know from each other which strategy they are using in selecting a word. Therefore they have to take into account the whole dataset when reasoning about possible worlds.

Stopping criteria
The symmetric version of the game ends if:
  • One of the players completes his word, that player wins
  • One of the players' hangman diagram is completed, that player loses
In this variant of the game, we also need to consider a draw. It might be the case that one player asks a letter and as a result both words are completed. In that case, the game ends in a draw.

Player 1 Player 2
Hangman player 1 (player 1 tries not to complete this one) Hangman player 2 (player 2 tries not to complete this one)
Word player 1 (player 1 tries to complete this one) Word player 2 (player 2 tries to complete this one)

3. Implementation

For the winning strategy we will assume that both players are perfect logicians.

Choosing a word for your opponent
The game is started by both players choosing a word that the opponent player has to guess. To select a 'difficult' word is not a trivial solution. It strongly depends on the opponent how good the word is. For example while playing against another perfect logician, your opponent might reason that in selecting a word you would have selected it from a certain subset of the whole lexical database. This would greatly reduce the opponent's possible worlds. Thus would it be more effective not to choose a word from this subset? To overcome this problem we defined four different selection criteria. Each of them separating the lexical database in a certain subset. Both players will use a combination of these four criteria to generate a subset of the whole lexical database. Thereafter a word is randomly selected out of this subset. In this way a player can never know which strategy is used by the opponent to select the word that the player has to guess. Both players therefore need to consider the whole lexical database as possible worlds. The four selection criteria are:
  • Dissimilarity. Each letter has at most one occurrence in the whole word.
  • Letter frequency. Choose a word containing only letters with a very low letter frequency.
  • Prefixes and subwords. The chosen word is not allowed to contain any subwords given the lexical database. This also includes all subwords with different letter-order. For example, using this constraint it is not possible to select the word 'bal' given a lexical database also containing the word 'labs'.
  • Word length. This method prefers long words above short words. Intuitively a long word is harder to guess than a short word.
Guessing the word
Once both players have selected a word for eachother, they can start guessing their own word. Based on all the available knowledge they need to select a letter. Since they do not know the word they are looking for, we need to devise a method that selects a letter which will help win the game. At this point it is good to realise letter selection can be represented like a search tree. A path in the tree represents the guesses of a player (NOT the word itself). At each point in the tree a letter must be guessed based on available knowledge. We want to optimize this guess of the player. The possible worlds are represented by the possible words and the optimal guessed letter has to optimally reduce the possible worlds. We will use two tools for this: the information gain (entropy) and letter frequency. Entropy is a measure of disorder, the letter frequency is the frequency of occurrence of a specific letter in the list of possible words.

Entropy
The entropy is calculated as follows [1]:
Entropy = - P(+) * Log2(P(+)) - P(-) * Log2(P(-))

With P(+) being the number of positive examples (words containing the letter) and P(-) being the number of negative examples (words not containing the letter).

Entropy is a measure of impurity and is always between 0 and 1, with 0 being the state of most disorder/chaos. Entropy is a very suitable tool for separating two classes. Ideally, a property (in our case: a word contains a specific letter or it does not) splits the class in 50% on the one side and 50% on the other side. The entropy value in that case would be 1. Logically you would want to select the letter with the highest entropy, since the letter with the highest entropy has the highest information gain. In our implementation entropy is used as the main decision criteria (see the section below under 'Selection strategy' for the full strategy).

Frequency list
The frequency list is generated by taking into account all remaining possible worlds and counting each instance of a letter. The frequency list is sorted so that the letter with the highest occurrence is at the top of the list. The letter(s) with the lowest (or zero) occurrence is/are at the bottom of the list.

Selection strategy
The player starts his turn by computing the entropy of each letter for both himself and his opponent. Given the entropy of each letter and the frequency list, the player will make a decision. The player will always try to select a letter with an information gain as high as possible while taking into account the information gain of the other player too. If the information gain of the letter with the highest information gain for the current player is higher than the information gain for this letter for the opponent player, the current player will ask this letter. If the letter happens to be of more value for the opponent player, the current player will skip to the next letter until it runs out of possible letters or it finds a letter with higher entropy than the opponent.

It might happen that for each letter the opponent has a higher or equal information gain (the player runs out of possible letters). That is, the player knows that it does not matter which letter it will ask, it will always help the opponent even more or equal. What we can do then is to select a letter solely based on the player's own entropy and frequency list. The player will thus select the letter with the highest entropy. If there are several letters with the same highest information gain, the letter with the highest letter frequency is selected.

There is one exception to this, namely if the current player has only one world left (current player knows the word it is guessing). The entropy is then 0 for all remaining letters. The entropy is therefore not of much value. The current player will start asking all the remaining letters of his word, starting by the letters which do not occur at the opponents word. If every letter is contained in the opponents word it does not matter which letter to choose first.

Below is the winning strategy in pseudo-code:
  1. Check # of remaining possible worlds. If possible worlds > 1, continue with step 2. Otherwise select one of the remaining letters, starting with the ones not occurring in your opponent's word and go to step 5.
  2. Compute the entropy and frequency list for both yourself and opponent. Iterate through the list of entropies and store the letter with the highest value. Go to step 3.
  3. If the letter with the highest entropy is higher than the entropy of the opponent for this letter, select this letter and go to step 5. Otherwise skip to the next letter until you run out of possible letters or find a letter with higher entropy than your opponent. If you run out of letters, go to step 4. Otherwise select the letter and go to step 5.
  4. Iterate trough all remaining letters and select the letter with highest entropy and letter frequency. Go to step 5.
  5. Check if there is a winner and loser or a draw. If there is a winner and loser or a draw, the game ends. Otherwise go back to step 1.
Other strategies (not implemented)
  • Random: randomly select a letter each turn
  • Vowels first: first ask vowels (almost any real word contains vowels)
  • Highest letter frequency (ignorant): whatever words the word list contains, use the highest letter frequency list based on 'all' English words
  • Highest letter frequency (smart): each turn generate a letter frequency list based on the possible worlds.

4. Output and results

Assume the following configuration:
  • Word to guess by player 1: korst
  • Word to guess by player 2: vorst
  • Using the tiny lexical database
A game of symmetric Hangman
Player 1 starts the game
The only constraint at the start of the game is the length of the word (no letters have been guessed yet). Based on this information and using the lexical database player 1 will calculate the entropy for each letter and generate a frequency list. The entropy of the letters 'b', 'd', 'h', 'k', 'm' and 'v' are equal to 0.650. All the other letters have an entropy value of 0. Player 1 knows that this holds for player 2 too. Player 1 therefore needs to choose between 'b', 'd', 'h', 'k', 'm' and 'v'. Player 1 will then guess 'b'.

The letter 'b' is in none of the players words.
p1: _ _ _ _ _
p2: _ _ _ _ _
Guessed: b

Turn player 2
The only extra information player 2 has, is that the letter 'b' does not occur in any of the players words. There are no further constraints. Based on this information player 2 will calculate the entropy of all remaining letters and generate a frequency list. The entropy of the letters 'd', 'h', 'k', 'm' and 'v' are equal to 0.722. All the other letters have an entropy value of 0. Player 2 knows this holds for player 1 too. Player 2 therefore needs to choose between 'd', 'h', 'k', 'm' and 'v'. Player 2 will then guess 'd'.

The letter 'd' is in none of the players words.
p1: _ _ _ _ _
p2: _ _ _ _ _
Guessed: b, d

Turn player 1
Again the guessed letter did not occur in either one of the words. Player 1 will calculate the entropy and frequency list based on this information. The entropy of the letters 'h', 'k', 'm' and 'v' are equal to 0.811. All the other letters have an entropy value of 0. Player 1 knows that the information gain between all remaining letters is higher for or equal to player 2. Player 1 therefore needs to choose between 'h', 'k', 'm' and 'v'. Player 1 will then guess 'h'.

The letter 'h' is in none of the players words.
p1: _ _ _ _ _
p2: _ _ _ _ _
Guessed: b, d, h

Turn player 2
Again the guessed letter did not occur in either one of the words. Player 2 wil now calculate the entropy of the remaining letters and generate a frequency list based on this information. The entropy of letters 'k', 'm' and 'v' are equal to 0.918. All the other letters have an entropy value of 0. Player 2 knows that the information gain between all remaining letters is higher for or equal to player 1. Player 2 therefore needs to choose between 'k', 'm' and 'v'. Player 2 will then guess 'k'.

The letter 'k' is in player 1 his word.
p1: k _ _ _ _
p2: _ _ _ _ _
Guessed: b, d, h, k

Turn player 1
It turns out that player 1 is rather lucky. After the guess of player 2, player 1 now has only one possible world! Player 1 knows the word to guess is 'korst'. Player 1 also knows that all the remaining letters are contained in the word of player 2. And so it does not matter which of the remaining letters player 1 will choose first. The remaining letters are 'o', 'r', 's' and 't'. Player 1 will then guess 'o'.

The letter 'o' is in both players' words.
p1: k o _ _ _
p2: _ o _ _ _
Guessed: b, d, h, k, o

Turn player 2
Player 2 still does not know his own word. Though after the turn of player 1, player 2 now has the letter 'o' of his own word. Based on the new information, player 2 calculates the entropy for the remaining letters and generates a frequency list. The letter 'm' has an entropy value of 1. Player 2 knows the information gain of asking letter 'm' is higher for player 2 than for player 1. Player 2 will therefore guess letter 'm'.

The letter 'm' is in none of the players words.
p1: k o _ _ _
p2: _ o _ _ _
Guessed: b, d, h, k, m, o

Turn player 1
Still, player 1 knows his word is 'korst'. Player 1 also knows all remaining letters are in the word of player 2 and therefore it does not matter which remaining letter to choose. The remaining letters for player 1 are 'r', 's' and 't'. Player 1 will then guess letter 'r'.

The letter 'r' is in both players' words.
p1: k o r _ _
p2: _ o r _ _
Guessed: b, d, h, k, m, o, r

Turn player 2
Since there are no other words left, player 2 knows his word to guess is 'vorst'. The remaining letters for player 2 are 'v', 's', and 't'. Player 2 knows his word contains the letter 'v' and the word of player 1 does not. Player 2 will therefore guess the letter 'v'.

The letter 'v' is in player 2 his word.
p1: k o r _ _
p2: v o r _ _
Guessed: b, d, h, k, m, o, r, v

Turn player 1
Still, player 1 knows his words is 'korst'. Player 1 also knows all remaining letters are in the word of player 2 and therefore it does not matter which remaining letter to choose. The remaining letters for player 1 are 's' and 't'. Player 1 will then guess letter 's'.

The letter 's' is in both players' words.
p1: k o r s _
p2: v o r s _
Guessed: b, d, h, k, m, o, r, s, v

Turn player 2
Player 2 knows his word to guess is 'vorst'. The remaining letter of player 2 is 't'. Player 2 also knows this remaining letter is contained in the word of player 1. The only thing player 2 can then do is to guess letter 't'. Player 2 will therefore guess letter 't'.

The letter 't' is in both players' words.
p1: k o r s t
p2: v o r s t
Guessed: b, d, h, k, m, o, r, s, t, v

Now both players have completed their word.
The games ends in a draw!

5. Documents

Hangman.zip
The file above contains the source file and data files to compile and run the implementation. It contains a folder called src which needs to be extracted. The folder src contains the source file Hangman.java. It can be compiled and run by typing:

javac Hangman.java

Before running the code it is necessary to check whether the folders data and img are also extracted from Hangman.zip and are in the same folder as Hangman.java. Then run the code by typing:

java Hangman

Lexical database
In order to change the lexical database, please change the variable dataFileName at line 23 of Hangman.java. The wordlist files are currently stored in the folder data.

About the GUI
On the total left of the GUI the frequency list of player 1 is shown. On the right the frequency list of player 2 is shown. Both lists are sorted, indicating the top letter has the highest probability of being in the word and at the bottom indicating the lowest probability. A Green letter indicates the player wants to select this letter at his turn. Black is used for all the other letters.

At the start of the game an option screen is shown. Please select the game mode and word length before clicking 'ok'. The game has three game modes:
  • Human vs. Human
  • Human vs. CPU
  • CPU vs. CPU
Please be patient at the start of the game, since it might take some time for the frequency lists and gain to be computed.

6. References

Entropy
[1] Alpaydin, E. (2010). Introduction to Machine Learning (pp. 189). Massachusetts: MIT Press. ISBN 978-0-262-01243-0

Wordlists
Tiny (tiny -- 6 words)
Normal (normal -- 109.582 words)