Bluff poker

Final Project for the Multi Agent Systems Course

By Marten Schutten (s1777475) and Michiel van de Steeg (s1793489)

Simplifications and assumptions

In order to make the game managable by our agent, a number of simplifications had to be made within the game. Here follows a list of the assumptions we made, with an explanation of why we made them.

Agents knowledge

The agent has some specific knowledge, which it can adress and is used in order to determine possible outcomes from other players' turns. Here we give a short explanation of the small amount of knowledge that is stored in order to create a model concerning these outcomes. However, note that a lot of knowledge is implicitly used in the way we determine our agents choices. This, however, is not formally stored.

First we will discuss the knowledge that is stored by the agents, and the way we store this. This is followed by an explanation of the model the agent creates from this.

Knowledge

In order to store her knowledge an agent has two sets of facts, his own knowledge, and the common knowledge. These both are a set of facts, which added, removed or edited. The common knowledge is a singleton instance. This way common knowledge will always be the same for every agent, and when one agents updates her common knowledge, common knowledge will also be updated for the other agents.

In order to store formulas we constucted a special grammar, so we could interpret any kind of fact, and interpret any composition of (concatanations of) statements. Our grammar knows propositions and formulas. The definition of propositions speaks for itself, and we defined formulas as following:

  1. proposition
  2. ! formula
  3. ( formula )
  4. formula ∧ formula
  5. formula → formula
  6. Ki formula

We use this grammar to store information on the current value of each die, and the knowledge of each player concerning this value. When a die is laid open, the value of this die is considered common knowledge. When a player doesn't know that value of a die, this is stored as if the die had a value of 0. When a die is thrown closed, only the current player knows its value, while the others don't. Once an open die is put under the cup, players will remember its value until the die is thrown. Finally, all players know which players know the value of a die (regardless of whether they the value of a die themselves.)

Model

We use the known values of the dice to construct a model of possbile values that are currently considere possible. This model is simply represented as an array of booleans, which depict a relation to any of the possible 56 outcomes. Every agent has her own model. However, they are able to reconstruct the model for other agents, as far as their knowledge allows them to. The agents use their own possibilities, and those of others in order to make decisions in the game. How this is done, will be explained below.

Decision making

Bidding

If we believe the last bid, we have to make a new bid that is higher than the last. First, we will attempt to roll the dice in such a way that we have the highest possible probability of getting a value higher than the last bid. After this, we have to decide if we want to bluff or speak the truth. First off, if our roll is not actually higher than the last bid, we will be forced to bluff. If our roll is higher than the last bid, though, we may still want to bluff to reduce the odds of getting another turn this round. To determine if we want to bluff, we first look at how high our roll was. If we have a very high roll, we want to bluff less often, because it is more likely we will be fine playing it safe. For this, we look at the number of values above our roll that are consistent with the knowledge of the next player, and divide this by the total number of possibilities. Furthermore, the more players there are in the game, the safer you can play, because the odds of getting another turn will be lower, so we are less inclined to bluff. Finally, there's a constant value to keep the bluffing chance in check.

If we don't bluff, it means we speak the truth. Speaking the truth also includes underbidding (bidding a value lower than your actual roll).

Bluffing and Underbidding

If we decide to bluff, the value with which we bluff is chosen randomly from all possible (for the next player) consistent values that are higher than the last bid, as well as higher than our roll (or it wouldn't be a bluff). These possibilities are weighted, with lower values having higher odds than higher values. The highest value gets a weight of 1, the second highest of 2, etcetera. A value's probability to be chosen is it's weight divided by the sum of all weights.

If we decide to speak the truth, we can also underbid. This is done in a similar fashion to the bluffs, except high values now have a higher chance, with the exact bid having the highest weight.

Throwing dice

Once an agent has decided to believe a bid she will have to throw the dice. The purpose of throwing the dice is obtaining a higher value than the previous player's bid, however we will have to think about two things. First we will have to determine how to optimize our chances of obtaining a higher result. Secondly we have to determine whether we want to throw the dice open or closed.

Optimize chances

When throwing the dice, we have three options. We can either throw all dice, or only two or one of them. When not rerolling all the dice we assume we roll the lowest one(s), since this will always have a higher chance of a better result (with the exception of possibilities for pokers). Once we have rolled the dice we see whether we have exceeded the previous bid. When we haven't we wish to check whether we can still roll some dice, in the hope to improve our result.

In order to determine whether we wish to roll one, two or three dice, we look at our chances for each option to increase our result. We do this by taking the lowest n dice(one, two or three), and see how many times our result will have improved. The option with the best chances will be selected.

Determine openness

Once an agent has decided which dice she wants to throw, she'll have to determine whether she wants to throw them open or closed. She decides this by examining the knowledge of the next player, and tries to minimize the amount of dice that are known to the next player. In the case where this is the same for both an open or closed throw, it will be decided to throw the dice closed.

Calling or believing a bid

In determining whether or not to believe a bid, we start by estimating the probability with which the previous player bluffed. A player has two possible reasons for bluffing. First off, he might be forced to bluff because his roll was not high enough. We estimate this by the proportion of consistent values that are lower than bid of the previous player. In addition to this, if the player did roll high enough, he might choose to bluff in order to reduce the odds of getting another turn this round. For this, we use the same calculation as described in the bidding section. However, we do not have full knowledge of the roll of the previous player, so we use the median value consistent with our knowledge instead.

Because now have an estimated bluffing probability, we also have an estimate of the probability with which we would be safe if we were to call his bid a bluff, which is the complement of the bluffing probability.

Now that we know the probability of being safe if we call bluff, we want to know the probability of being safe if we choose to believe the bid. This is determined by the estimated probability of rolling higher than the last bid, and added to that the probability of getting away with rolling lower, because the next player may believe us. The first part is done by calculating the probability of rolling higher given the last bid is true, multiplied by the probability of this being the case (the complement of the bluff probability), and then calculating the probability of rolling higher if the last bid wasn't true. For this we assume that the bid before that was true, lacking a better estimate. We did not quite find the time for the second part (estimated probability of getting away with rolling lower), so we ended up using a constant estimation.

We now know the probabilities of being safe if we call, and of being safe if we believe. These two are combined to form a final probability of believing. We don't want this to be deterministic because there is a risk of this being abused. We won't go too deep into the formula combining these two, but it should suffice to say that if both probabilities are equal, we will believe half of the time, and if a probability would be 0 or 1, there is no uncertainty in the probability of believing either. Other values are scaled appropriately in between.